If you feel like you already have a good understanding of why you should fix broken or redirecting links on your site and just want to get fixing, go to my post here which shares a free Google Colab notebook which will help identify and prioritise problems for you with some easy and dev-readable Excel files.
Otherwise – strap in. Let’s talk about why having redirecting or broken links on your site is a problem and why you should fix it.
Some terminology that will come in useful later
What are templated links?
In short – lots of links across lots of pages, to the same place. Think about your navigation menu or footer. Templated links are often present on pretty much every page, they always have the same content, they are always linking to the same places. Templated links are very useful when you need a page to be accessible from anywhere on your site but it’s also easy to overlook mistakes that can cause you issues.
What are broken links?
A broken link is any link which points to a page which has been deleted and not redirected. That often means a 404 page, named after the status code 404 meaning “not found”.
What are redirect chains?
One redirect going to another redirect etc. etc. So instead of;
page-a ==> page-b
the redirects go like this;
page-a ==> page-b ==> page-c ==> page-d
Now instead of asking for just one page, we’re having to go through three hops to get to the active page. Redirect chains make the usual redirect problems even worse.
What are redirect loops?
This is like a redirect chain but worse. Instead of going;
page-a ==> page-b ==> page-c ==> page-d
It’d be something like;
page-a ==> page-b ==> page-c ==> page-a ==> page-b ==> page-c ==> page-a
And so on until whatever is trying to access the page just gives up. These make redirect problems even worse than redirect chains do.
What are 302 redirects?
The standard redirect involves your website responding with status code 301 which means – “this page has been permanently moved”. An alternative is status code 302 which means “this page has been temporarily moved”.
So essentially, all a 302 redirect is, is a redirect where you send a different, less strong message at the same time.
Don’t be fooled by the terminology – if you are redirecting a page and don’t have imminently plans to change it back (like, within the week), 301 is the way to go. If you use a 302 redirect things like Google aren’t as sure what’s going on. They’re thinking “Sure, you tell me the content is in this new place, but you don’t sound very certain of it, so I’m going to keep an eye on the old page, I’ll probably let it compete with the new page in search results and I definitely won’t treat this as you transferring all of the authority from page-a to page-b.”
Why fix internal redirects
Imagine we’re moving our whole blog. So we’re redirecting mysite.com/blog/• to blog.mysite.com/• .
When we redirect a page, we aren’t actually moving a page. All we’re really doing is deleting the old page, and saying to everything which tries to visit it (person or machine) “don’t look here, look in this other place instead”. We don’t really notice it as people but the machines are doing something like this.
Request: page-a
Response: 301 this page has moved permanently to page-b
Request: page-b
Response: 200 here’s page-b
The first problem – authority
We often talk about search engines, like Google trying to understand the internet in terms of authority – why should this site appear for a search, even if that site is talking about the same topic?
One early way Google used to judge this is links. Well respected, high-value sites tend to get more links than less respected, low-value sites. If you treat every link on the internet as a kind of a vote of confidence for the page it’s linking to, you start to get an idea of what people think is worthy of attention.
Not only do these votes of confidence help a page rank, they also mean that when that page links out to another page it’s vote of confidence bears more weight.
It kind of makes sense right? If we trust a page, we trust what it says more too. A page can’t pass on all of the clout that other pages have given it, but most of that authority gets split between all of the pages it links out to.
Pages on your site will have links going to them, even if they aren’t links from other sites, you will have internal links. That means your pages have some votes of confidence that they can use.
That means that this authority kind of flows around your site. Pages like your homepage pass authority to the higher level pages on your site, then it trickles down to the lower pages, but they link up to other pages so the authority can flow back up to the top.
The problem is, if you redirect a page, all of those votes of confidence aren’t for the new, active page – they are for the old page which doesn’t exist any more. So how does Google interpret this? Let’s use the example above.
When we redirect www.mysite.com/blog/post-1 to blog.mysite.com/post-1 we are essentially replacing all of the content of /blog/post-1 with one giant vote for the blog.mysite page.
As we said, a page can’t give away all of its authority, that’s not how a vote of confidence works, so while we preserve a good amount of the authority that page has built up, it’s still not everything. Some of that authority is still tied up in the old page that’s not doing any good any more.
So, with each unnecessary redirect, we are losing those hard-earned votes of confidence which could help this page rank. What’s more our new page has less authority to help our other pages rank. It basically throws away some of the votes of confidence we could use across our site – we’re hurting this page specifically and our whole site in general.
How do templated links make this worse?
Imagine we have a site with 500 pages (which would be smaller than most) and each of those pages has a footer link to a redirected page. That means that 500 times across our site we’re giving a vote of confidence to a page that doesn’t exist – every page is losing some of the authority it’s trying to pass on and the whole site is losing 500x the votes it would be if we were just talking about one link.
How do redirect chains make this worse?
We’ve already said that we lose a bit of authority with one redirect, if we redirect again we lose a little bit more, another one and it’s a little bit more on the hop after that. So we’re losing even more authority unnecessarily.
The second problem – time
The second and more intuitive problem is that redirects take a little bit more time and resources. Instead of having to just ask for one thing – computers, or Google, are having to ask for it, wait to be told it’s the wrong thing, then ask for another thing and wait to be told that’s the right thing.
That probably seems relatively insignificant but these things stack up quickly. Google is trying to see and understand the whole internet. That’s billions and billions of pages, which means they have to be careful with where they spend their time. If every time Google tried to access a page on your site, it has to go through multiple steps – that’s all taking away resources Google could be using on pages you actually care about.
What’s more, when users are trying to use your site, everything is going to seem slower because their computer is having to go through these additional hops. Which means users are less likely to do what you want (if you want to know why having a slow site is bad, I touch upon that in this Distilled post)
How do templated links make this worse?
As you’d expect, it means that more often users and robots are having to deal with these hops.
How do redirect chains make this worse?
With each redirect hop it’s taking more and more time and resources to get to the page a person or machine actually wants to access.
So do I have to get rid of all redirects?!
The key thing to remember here is redirects are a necessary and expected part of the internet. It’s just not practical to get external websites to update their links whenever we change a page so we need a way to make sure users get to the page we want them to. What’s more, Google remembers old pages it has seen so if we don’t redirect those pages it’ll just keep going back to them.
Why fix broken links?
As we said above – links on our sites are a way for our pages to give a vote of confidence to each other. However, a 404 page doesn’t exist at all, 404 means not found so if we give a vote of confidence to something that doesn’t exist then that vote is pretty much wasted.
Again, kind of makes sense right? If we say to Google – “Hey, this thing is great!” and the thing doesn’t exist any more we’ve just wasted our vote.
Similar to redirects – because of the way all of our pages are giving votes of confidence to all of our other pages – every time we link to a 404 page we’re throwing away votes that could be used across the site. We are limiting the strength of all of our pages by a little bit.
Having lots of links to 404 pages is also a Bad Sign for Google. If a site often links to 404 pages it’s more likely to be a site in disrepair and less likely to be a good user experience. Google doesn’t want to send users to a bad site so we’re less likely to appear in search results.
How do templated broken links make this worse?
More lost votes
As we said, every link to a 404 page is us throwing away a vote. A templated link is often a link that is present on every page of our site. Imagine all of the pages on our site have about 20 links on them. If two of our templates links go to a broken 404 page, that means that we’re throwing away 10% of our possible voting power and we’re reducing our site strength by quite a lot.
Waylaying enthusiastic users
Even if we take a cue from our favourite dictator and ignore all of those lost votes, even if we say we don’t care about Google’s evaluation of our site, this kind of problem could still cause havoc. Say we want a user to buy our product but they want to find out a bit more about it first. If links in our navigation, say, are going to broken pages, the user doesn’t get the information they want, they don’t trust the product, and they don’t buy.
So do I have to fix all 404 pages on my site?
I mean, that would be nice but I am not saying that having any 404 will be the death of your business. If you were running a physical shop and one one your shelves was broken that’s not going to kill the store right? If, on the other hand, you were running a shop and half of your shelves are broken that’s a problem you’ve got to fix pretty quickly.
Don’t believe people who email you saying you have to fix every single broken link on your site or everything will go up in flames, or who tell you that any links on your site which go to broken pages on other sites could make Google penalise you. These people are trying to sell you something.
As ever – all of this comes down to prioritising what is having the biggest impact. You just need to find the patterns of worst offenders.
What should I do next?
As I said above – we can’t hope to get rid of all redirects or broken links, the trick is to find the worst offenders.
Check out this post I wrote sharing a free notebook which will help you find redirect chains and templated broken links, and prioritise your fixes for you so you can work with your devs to fix the problem.