How to fix broken or redirecting links

As I said in my post about why we should fix broken or redirecting links – even though broken links and redirects aren’t ideal, we can’t hope to get rid of all redirects or broken links, as with anything in business, we need to prioritise what will have the biggest impact. We need to find the worst offenders.

By the time we’ve finished this post, we will have found just three changes lego.com could make to their site which could;

  • Make sure that Google sees their UK product pages
  • Fix over 9,000 internal redirecting links.

To do this we use the free notebook I put together to help you find redirect chains and templated redirects. You don’t need to know code to use it, you just need to be comfortable with Screaming Frog – I’ll lay out the step-by-step instructions below.

Running the Redirect Checker Google Colab Notebook
Running my link checker notebook

As mentioned above – we’ll use lego.com as an example of how we can use this information. That’s because it can be difficult to picture how how something might work without an example, because they aren’t a client of mine, and because they don’t seem to be blocking crawlers. Plus Lego is cool so seemed like a good choice.

lego.com uk homepage
lego.com uk homepage

Below I start out by explaining a bit of terminology and what some of the columns mean so if you just want to jump to a specific part of my post, click the table of contents below.

First steps

This will require Screaming Frog. First we run an undirected crawl of lego.com to find all of the pages we can. Then, once done, we export the Redirect Chains and Inlinks reports (the screenshots below show how to get to each).

How to export Screaming Frog Redirects and Canonical Chains report
How to export Screaming Frog Redirects and Canonical Chains report
How to export Screaming Frog Inlinks report
How to export Screaming Frog Inlinks report

Open up and run my notebook in Google Colab

You don’t need to know how to code to use my notebook. I’ve given step-by-step instructions for people who just want the output without having to change any code (and I’ve copied them below), including gifs for illustration;

We just want results, so we just do the following;

  • Open up the left-hand pane by clicking the arrow in the top-left
  • Click the “Files” tab
  • Click UPLOAD and upload the Screaming Frog Inlinks and Redirect Chains reports (they need to be csv files)
Setting up the Redirects Checker Google Colab Notebook
Open the left-hand pane, click Files and then upload your files
  • When they are uploaded, make sure they are called redirect_and_canonical_chains.csv and all_inlinks.csv respectively. If they aren’t, either rename them here by right-clicking on them or rename them on your desktop and reupload.
  • In the box below, write out the domain of the website. If you’re not sure what that means, the domain of http://www.therobinlord.com/ is just “therobinlord.com”. In this case we write “lego.com”
  • At the top of this window, click “Runtime” then “Run All” (In Jupyter this would be “Cell”, “Run All”)
  • Leave this for a little while to do its thing, it should make a “OH YEAH” sound when it’s done but your browser might block that so just check back after a little while. When it’s finished it will save two [Complete] files which you can find in the same place where you uploaded the files to begin with. If the [Complete] files don’t appear right away, we can click “Refresh”
  • Download the files by right-clicking on them and selecting Download.
Running the Redirect Checker Google Colab Notebook
Make sure your files are called the right thing (you can rename them here if needed) then click “Runtime” and “Run All”.

Open the output

When my notebook has finished running, it’ll spit out two files each starting with [Complete]. The sheets are designed to do a lot of the heavy-lifting for you in prioritising your changes.

Terminology

If there’s any part of the terminology below that you don’t understand, I explain it in this post about why we need to fix broken or redirecting internal links. The most important thing to remember is templated links.

A templated link is when we link to one page from lots of other pages. Examples include your navigation menu or footer. When we talk about templated links, they’re usually on pretty much every page of the site, they always have the same anchor text and always link to the same page. If we make a mistake in our templated links, that can have a big impact across our site.

The [Complete] redirects and broken links patterns.xlsx is the bigger of the two. It’s split into a few tabs which are all in the same format.

Columns and tabs definitions in the broken links report

What the columns mean

Recommendation – This is the broad change we think we should make.

  • The default is “Update Link” which just means we need to change the on-page link to match the url in the Final Address column.
  • “Update link, fix temp redirect” means we need to change the link but that we’ve also found a redirect chain with a temporary redirect in it. We don’t want temporary redirects so we need to remove that (the other sheet will help with that)
  • “Redirect 404, update link” means that either the page linked to is missing, or it redirects or canonicalises to a page that is missing. We need to redirect the missing page and update the link to an active page.
  • “Fix redirect loop, update link” means that the page linked to is in a redirect loop, or a canonical loop. So we need to change the link to be an active, indexable page, and we need to make it so that those pages don’t redirect or canonicalise in a loop.
  • “Fix redirect chain, update link” means we can probably get away with just updating the link but we should also check the redirect chains report to see how we can get rid of that redirect chain.

Link URL – This is the page being linked to, it’s like the url that we need to change.

Anchor – The text on the page which the link is on. I.e. this is anchor text and the page its Link URL is http://www.therobinlord.com/how-to-fix-broken-or-redirecting-links/ we include the Anchor because it makes it easier to find the link when we’re looking for it.

Final Address – If we go to the link and follow all of the redirect etc. then this is the address we end on. If we’re dealing with a normal redirect chain, anything other than a redirect loop or a 404 then this is the address we want to update the link to. If it’s a loop or a 404 this won’t help us that much.

Example Source Page – Another way to make it easier to find the link in question. Because we’re often dealing with templated links, they are being linked-to from multiple pages. We don’t need to check all of the pages to see the change we need to make, but by going to an example page we can see what it looks like and what template we need to change.

Linked-page Status Code – The Status Code of the first page linked to.

Number of Linking Pages – This will help you work out how much you can fix with one change. If there are a high number of linking pages then fixing this will have a bigger impact on your site. This column also helps you see if it’s a templated link. If the number is really high then a lot of pages are linking to this page using the exact same anchor text, it’s probably a templated link so easier to find.

Temp Redirect in Chain – If true, one of the redirects is temporary, we probably want to prioritise fixing that because it’ll have a bigger impact on how authority is passed around our site.

Redirect Loop – If the link goes to a redirect loop then we need to fix that, it’s pretty much the same as a 404.

Fixing broken links - Example of Redirect Checker links-to-change output
Example of Redirect Checker links-to-change output
What the tabs mean

Each of the tabs is dedicated to one of the issues in our Recommendation column.

All internal links – All of our data sorted high-to-low by Number of Linking Pages so we can see everything and prioritise by what comes up most commonly across the site. This tab is the easiest way to look at all of the problems and sort by number of inlinks, from highest to lowest, to find the most prevalent issues.

All other tabs – Each of the other tabs is one of the types of issues we list in our Recommendations column. This is because we probably want to take swifter action for things like 404s or redirect loops than we do for redirect chains.

Checking lego.com’s links

So, we’ve run the process and open up the broken and redirecting links sheet for lego.com

The top 10 rows are like this;

RecommendationAnchorNumber of Linking Pages
Update linkLEGO® LIFE3179
Update linkFind a store2120
Update linkDinosaur Fossils2110
Update linkCommon Questions2110
Update linkLiebherr R 9800 Excavator2110
Update linkLEGO Catalogue2108
Update linkMinibuilds2108
Update linkGift Cards1060
Update link, fix redirect chainContact Us1059

We can assume a few bits of information from this;

  • Lego’s templated issues top out at 3,179 linking pages – so we’re not talking about millions of links here
  • Most of the top issues are going to be a fairly simple link change – they don’t involve fixing redirect loops or 404s (although we could see that crop up in one of the other lists)
  • A few of these templated issues are part of one specific templated block. We can assume this because the Find a store, Dinosaur Fossils, Common Questions, and Liebherr R 9800 Excavator links anchor text all have exactly the same number of linking pages. Although we’ll check this as part of our next step anyway.

Checking an individual page – Lego Life

Now that we can see the top most occurring errors, we can use the example page and anchor text to find where these links are occurring.

Our first link in the list is https://www.lego.com/life, apparently a link to this page appears on https://www.lego.com/en-gb with the anchor text “LEGO® LIFE”.

Fixing broken links - Lego UK homepage
https://www.lego.com/en-gb/

The first, easiest place for us to check for this link is the navigation menu or the header. If you look right at the top of the page you can see that “LEGO® LIFE” appears right there next to KIDS ZONE.

If we have a Redirect Path plugin installed we can click on that link and confirm that yep – it does redirect from https://www.lego.com/life ==> https://www.lego.com/en-gb/life

Looking at that redirect, based on the fact that I’m checking from the UK, we can assume that this is Lego trying to manage their international sites. The UK page template links to the US version https://www.lego.com/life but as I’m in the UK when I click that link I get redirected through to https://www.lego.com/en-gb/life

This is interesting, Google always crawls from the US so one of two things is happening here;

  1. Because Google crawls from the US it is always redirected to US pages so doesn’t see the UK pages (this would be quite a problem)
  2. Lego have made it so that Google doesn’t get redirected based on its country. In which case Google only sees the link going to the US version of this page and in Google’s eyes, lego.com/en-gb/life doesn’t get any of the benefit here.

Our recommendations based on this information would be;

  1. Make sure Google isn’t being redirected to only US sites
  2. Change this link to point to the proper UK page.

Checking an individual page – liebherr excavator

Another example our script picks out is;

/en-gb/dinosaur-fossils-21320 ==> /en-gb/product/dinosaur-fossils-21320

The link has the anchor text “Dinosaur Fossils”, There are 2110 pages with this redirecting link and again the homepage is our example. All of that suggests to us that this might be in the nav again.

Fixing broken links on lego.com - Lego Dinosaur Fossils Link
Lego Dinosaur Fossils Link

As we thought, Dinosaur Fossils is there in the nav. If we click it, we first go to /en-gb/dinosaur-fossils-21320 then we get redirected to /en-gb/product/dinosaur-fossils-21320

We’ve got one easy recommendation to start with here;

  1. Update the nav so that the Dinosaur Fossils page points to /en-gb/product/dinosaur-fossils-21320

But we can push this a bit further. It looks like the URL is exactly the same except /product/ has been added. That looks like another templated change. It looks like individual products, for instance dinosaur-fossils-21320 used to only be inside the country folder, but they were moved inside the /product/ folder. Funnily enough if we look further down our list, we also have;

/en-gb/liebherr-r-9800-excavator-42100 ==> /en-gb/product/liebherr-r-9800-excavator-42100

Because of the JavaScript Lego has running on site, you don’t actually see these redirects if you’re a user, using something like a redirect checker plugin but our sheet picks up these patterns. So now our recommendations are;

  1. Update the nav so that the Dinosaur Fossils page points to /en-gb/product/dinosaur-fossils-21320
    1. This one change could fix 2,110 links across the site
  2. Look through the nav for any other individual product pages which aren’t within the /product/ folder – update those links to include /product/
    1. This could fix at least 4,200 links across the site.

Getting the notebook

If you’d like to use the script, access it through the window below or at this link. If you think it’s missing something or you want more information about how it works, I’d love to hear from you.

Scroll to Top