Creating a Twitter Bot using Google Cloud Functions

Start with a call to action

They told me to start with the most important thing someone should do after reading this post so, like, buy your mum some flowers. Maybe potted ones rather than cut ones. Turns out flowers and mums are both more important than marketing and Twitter bots.

Twitter bots

Some examples of Twitter bots include productivity tools, for instance the Thread Reader App and Read Later Bot make it easier to manage Twitter content.

They can also be a big part of political marketing (or you might say propoganda). It’s a common understanding that many of the accounts online advocating or criticising political points of view are actually automated mouthpieces. Whether these accounts are real people or not – we now understand that it’s easy to create an army to spread your message.

Continue reading “Creating a Twitter Bot using Google Cloud Functions”

Good Place Word Clouds

Everything's fine

I am a huge fan of the Good Place so I created these Good Place word clouds specific to each of the core “team cockroach”. Zoom in to find phrases or words you recognise from the show.

I’ll follow up with more detail about how these were created but I used scripts from the show, for each character I found the times where they were involved and grabbed the words around those times. Then I used Andreas Mueller’s awesome word cloud script to generate the word clouds. I did tweak the weightings a bit to get the interesting phrases to show up (thanks very much to @nocontextgoodplace on Twitter for inspiration).

Continue reading “Good Place Word Clouds”

How to fix broken or redirecting links

As I said in my post about why we should fix broken or redirecting links – even though broken links and redirects aren’t ideal, we can’t hope to get rid of all redirects or broken links, as with anything in business, we need to prioritise what will have the biggest impact. We need to find the worst offenders.

By the time we’ve finished this post, we will have found just three changes lego.com could make to their site which could;

  • Make sure that Google sees their UK product pages
  • Fix over 9,000 internal redirecting links.
Continue reading “How to fix broken or redirecting links”

Why Melt (unpivot) is the most powerful function in Pandas

Pandas is a Python library that lets us do Excel-type-stuff. Well, that’s not really giving it the credit it deserves. Pandas is a Python library which makes Excel-type stuff waaaaaaaaaaaaaaaaaay easier.

You might have seen me speak about how Jupyter Notebooks can make our lives easier as marketers (if not – you’ve clearly been missing out on Distilled Searchlove and you should absolutely buy a ticket). A lot of the examples I use are to do with how using Pandas is much much easier than trying to do the same stuff in Excel.

One function I haven’t been able to talk about on-stage is melt. As I said in the title, melt is kind of like unpivot and it is one of the best functions in Pandas because it lets us easily do things that wouldn’t just be harder in Excel – they would be pretty much impossible for anyone who isn’t a pretty advanced Excel user.

Continue reading “Why Melt (unpivot) is the most powerful function in Pandas”

Why should I fix my site links?

Photo by Zdeněk Macháček on Unsplash

If you feel like you already have a good understanding of why you should fix broken or redirecting links on your site and just want to get fixing, go to my post here which shares a free Google Colab notebook which will help identify and prioritise problems for you with some easy and dev-readable Excel files.

Otherwise – strap in. Let’s talk about why having redirecting or broken links on your site is a problem and why you should fix it.

Some terminology that will come in useful later

What are templated links?

In short – lots of links across lots of pages, to the same place. Think about your navigation menu or footer. Templated links are often present on pretty much every page, they always have the same content, they are always linking to the same places. Templated links are very useful when you need a page to be accessible from anywhere on your site but it’s also easy to overlook mistakes that can cause you issues.

What are broken links?

A broken link is any link which points to a page which has been deleted and not redirected. That often means a 404 page, named after the status code 404 meaning “not found”.

What are redirect chains?

One redirect going to another redirect etc. etc. So instead of;

page-a ==> page-b

the redirects go like this;

page-a ==> page-b ==> page-c ==> page-d

Now instead of asking for just one page, we’re having to go through three hops to get to the active page. Redirect chains make the usual redirect problems even worse.

What are redirect loops?

This is like a redirect chain but worse. Instead of going;

page-a ==> page-b ==> page-c ==> page-d

It’d be something like;

page-a ==> page-b ==> page-c ==> page-a ==> page-b ==> page-c ==> page-a

And so on until whatever is trying to access the page just gives up. These make redirect problems even worse than redirect chains do.

What are 302 redirects?

The standard redirect involves your website responding with status code 301 which means – “this page has been permanently moved”. An alternative is status code 302 which means “this page has been temporarily moved”.

So essentially, all a 302 redirect is, is a redirect where you send a different, less strong message at the same time.

Don’t be fooled by the terminology – if you are redirecting a page and don’t have imminently plans to change it back (like, within the week), 301 is the way to go. If you use a 302 redirect things like Google aren’t as sure what’s going on. They’re thinking “Sure, you tell me the content is in this new place, but you don’t sound very certain of it, so I’m going to keep an eye on the old page, I’ll probably let it compete with the new page in search results and I definitely won’t treat this as you transferring all of the authority from page-a to page-b.”

Why fix internal redirects

Imagine we’re moving our whole blog. So we’re redirecting mysite.com/blog/• to blog.mysite.com/• .

When we redirect a page, we aren’t actually moving a page. All we’re really doing is deleting the old page, and saying to everything which tries to visit it (person or machine) “don’t look here, look in this other place instead”. We don’t really notice it as people but the machines are doing something like this.

Request: page-a

Response: 301 this page has moved permanently to page-b

Request: page-b

Response: 200 here’s page-b

The first problem – authority

We often talk about search engines, like Google trying to understand the internet in terms of authority – why should this site appear for a search, even if that site is talking about the same topic?

One early way Google used to judge this is links. Well respected, high-value sites tend to get more links than less respected, low-value sites. If you treat every link on the internet as a kind of a vote of confidence for the page it’s linking to, you start to get an idea of what people think is worthy of attention.

Not only do these votes of confidence help a page rank, they also mean that when that page links out to another page it’s vote of confidence bears more weight.

It kind of makes sense right? If we trust a page, we trust what it says more too. A page can’t pass on all of the clout that other pages have given it, but most of that authority gets split between all of the pages it links out to.

Pages on your site will have links going to them, even if they aren’t links from other sites, you will have internal links. That means your pages have some votes of confidence that they can use.

That means that this authority kind of flows around your site. Pages like your homepage pass authority to the higher level pages on your site, then it trickles down to the lower pages, but they link up to other pages so the authority can flow back up to the top.

The problem is, if you redirect a page, all of those votes of confidence aren’t for the new, active page – they are for the old page which doesn’t exist any more. So how does Google interpret this? Let’s use the example above.

When we redirect www.mysite.com/blog/post-1 to blog.mysite.com/post-1 we are essentially replacing all of the content of /blog/post-1 with one giant vote for the blog.mysite page.

As we said, a page can’t give away all of its authority, that’s not how a vote of confidence works, so while we preserve a good amount of the authority that page has built up, it’s still not everything. Some of that authority is still tied up in the old page that’s not doing any good any more.

So, with each unnecessary redirect, we are losing those hard-earned votes of confidence which could help this page rank. What’s more our new page has less authority to help our other pages rank. It basically throws away some of the votes of confidence we could use across our site – we’re hurting this page specifically and our whole site in general.

How do templated links make this worse?

Imagine we have a site with 500 pages (which would be smaller than most) and each of those pages has a footer link to a redirected page. That means that 500 times across our site we’re giving a vote of confidence to a page that doesn’t exist – every page is losing some of the authority it’s trying to pass on and the whole site is losing 500x the votes it would be if we were just talking about one link.

How do redirect chains make this worse?

We’ve already said that we lose a bit of authority with one redirect, if we redirect again we lose a little bit more, another one and it’s a little bit more on the hop after that. So we’re losing even more authority unnecessarily.

The second problem – time

The second and more intuitive problem is that redirects take a little bit more time and resources. Instead of having to just ask for one thing – computers, or Google, are having to ask for it, wait to be told it’s the wrong thing, then ask for another thing and wait to be told that’s the right thing.

That probably seems relatively insignificant but these things stack up quickly. Google is trying to see and understand the whole internet. That’s billions and billions of pages, which means they have to be careful with where they spend their time. If every time Google tried to access a page on your site, it has to go through multiple steps – that’s all taking away resources Google could be using on pages you actually care about.

What’s more, when users are trying to use your site, everything is going to seem slower because their computer is having to go through these additional hops. Which means users are less likely to do what you want (if you want to know why having a slow site is bad, I touch upon that in this Distilled post)

How do templated links make this worse?

As you’d expect, it means that more often users and robots are having to deal with these hops.

How do redirect chains make this worse?

With each redirect hop it’s taking more and more time and resources to get to the page a person or machine actually wants to access.

So do I have to get rid of all redirects?!

The key thing to remember here is redirects are a necessary and expected part of the internet. It’s just not practical to get external websites to update their links whenever we change a page so we need a way to make sure users get to the page we want them to. What’s more, Google remembers old pages it has seen so if we don’t redirect those pages it’ll just keep going back to them.

Why fix broken links?

As we said above – links on our sites are a way for our pages to give a vote of confidence to each other. However, a 404 page doesn’t exist at all, 404 means not found so if we give a vote of confidence to something that doesn’t exist then that vote is pretty much wasted.

Again, kind of makes sense right? If we say to Google – “Hey, this thing is great!” and the thing doesn’t exist any more we’ve just wasted our vote.

Similar to redirects – because of the way all of our pages are giving votes of confidence to all of our other pages – every time we link to a 404 page we’re throwing away votes that could be used across the site. We are limiting the strength of all of our pages by a little bit.

Having lots of links to 404 pages is also a Bad Sign for Google. If a site often links to 404 pages it’s more likely to be a site in disrepair and less likely to be a good user experience. Google doesn’t want to send users to a bad site so we’re less likely to appear in search results.

How do templated broken links make this worse?

More lost votes

As we said, every link to a 404 page is us throwing away a vote. A templated link is often a link that is present on every page of our site. Imagine all of the pages on our site have about 20 links on them. If two of our templates links go to a broken 404 page, that means that we’re throwing away 10% of our possible voting power and we’re reducing our site strength by quite a lot.

Waylaying enthusiastic users

Even if we take a cue from our favourite dictator and ignore all of those lost votes, even if we say we don’t care about Google’s evaluation of our site, this kind of problem could still cause havoc. Say we want a user to buy our product but they want to find out a bit more about it first. If links in our navigation, say, are going to broken pages, the user doesn’t get the information they want, they don’t trust the product, and they don’t buy.

So do I have to fix all 404 pages on my site?

I mean, that would be nice but I am not saying that having any 404 will be the death of your business. If you were running a physical shop and one one your shelves was broken that’s not going to kill the store right? If, on the other hand, you were running a shop and half of your shelves are broken that’s a problem you’ve got to fix pretty quickly.

Don’t believe people who email you saying you have to fix every single broken link on your site or everything will go up in flames, or who tell you that any links on your site which go to broken pages on other sites could make Google penalise you. These people are trying to sell you something.

As ever – all of this comes down to prioritising what is having the biggest impact. You just need to find the patterns of worst offenders.

What should I do next?

As I said above – we can’t hope to get rid of all redirects or broken links, the trick is to find the worst offenders.

Check out this post I wrote sharing a free notebook which will help you find redirect chains and templated broken links, and prioritise your fixes for you so you can work with your devs to fix the problem.

How to Do Change Detection with Screaming Frog and Google Sheets

I made a Google Sheet that does change detection for you based on two Screaming Frog crawls. I’ll tell you why that’s important. 

Two problems frequently come up for SEOs, regardless of if we’re in-house or external.

  1. Knowing when someone else has made key changes to the site
  2. Keeping a record of specific changes we made to the site, and when.

Both can sound trivial, but unnoticed changes to a site can undo months of hard work and, particularly with large e-commerce sites, it’s often necessary to update internal links, on-page text, and external plugins in search of the best possible performance. That doesn’t just go for SEO, it applies just as much to CRO and Dev teams.

Keeping a record of even just our changes can be really time-consuming but without it, we often have to rely on just remembering what we did when, particularly when we see a pattern of changing traffic or rankings and want to know what might have caused it. 

These things are people problems. When we can’t rely on other teams to work with us on their planned changes, that needs to be fixed at a team level. When we don’t have a system for listing the changes we make it’s understandable, particularly for smaller keyword or linking tweaks, but if we compare ourselves to a Dev team for example – a record of changes is exactly the kind of thing we’d expect them to just include in their process. At the end of the day, when we don’t keep track of what we doing that’s because we either don’t have the time or don’t have the commitment to a process. 

We shouldn’t really be trying to fix people problems with tools. That said, people problems are hard. Sometimes you just need a way of staying on top of things while you fight all the good fights. That’s exactly what this is for. 

This is a way to highlight the changes other teams have made to key pages, so you can quickly take action if needed, and to keep track of what you’ve done in case you need to undo it.

As a completely separate use-case, you can also use this sheet to check for differences between different versions of your site. Say, for the sake of argument, that you need to know the difference between the mobile and desktop versions of your site, or your site with and without JavaScript rendering, or even the differences between your live site and a private developer version you’re about to release. There are tools that offer change detection and cover some of the functions of this sheet, but I really like the flexibility this offers to check for changes between versions as well as over time.

What sites is this good for?

This sheet is for anyone who needs an idea of what is changing on a fairly large number of pages but can’t afford to pay for big, expensive change detection systems. It’ll work its way through around 1,000 key pages. 

That said, 1,000 key pages stretches further than you would think. For many small sites, that’ll more than cover all the pages you care about and even larger eCommerce sites get the vast majority of their ranking potential through a smaller number category pages. You would be surprised how big a site can get before more than 1,000 category pages are needed. 

That 1,000 URL limit is a guideline, this sheet can probably stretch a bit further than that, it’s just going to start taking quite a while for it to process all of the formulas.

So what changes does it detect?

This Google Sheet looks at your “new crawl” and “old crawl” data and gives you tabs for each of the following;

  • Newly found pages – any URL in the new crawl that isn’t in the old crawl
  • Newly lost pages – any URL in the old crawl that isn’t in the new crawl
  • Indexation changes – i.e. Any URL which is now canonicalised or was noindexed
  • Status code changes – i.e. Any URL which was redirected but is now code 200
  • URL-level Title Tag or Meta Description changes
  • URL-level H1 or H2 changes
  • Any keywords that are newly added or missing sitewide.

What’s that about keyword change detection?

On many sites, we’re targeting keywords in multiple places at a time. Often we would like to have a clear idea of exactly what we’re targeting where but that’s not always possible.

The thing is, as we said, your pages keep changing – you keep changing them. When we update titles, meta descriptions and H1s we’re not checking every page on the site to confirm our keyword coverage. It’s quite easy to miss that we are removing some, middlingly important, keyword from the site completely. 

Thanks to a custom function, the Google sheet splits apart all of your title tags, meta descriptions, and H#s into their component words and finds any that, as of the last crawl, have either been newly added, or removed from the site completely.

It then looks the freshly removed words up against Search Console data to find all the searches you were getting clicks from before, to give you an idea of what you might be missing out on now.

The fact that it’s checking across all your pages means you don’t end up with a bunch of stopwords in the list (stopwords being; it, and, but, then etc.) and you don’t have to worry about branded terms being pulled through either – it’s very unlikely that you’ll completely remove your brand name from all of your title tags and meta descriptions by accident, and if you do that’s probably something you’d want to know about.

How do I use it?

Start by accessing a copy of this Google Sheet so you can edit it. There are step-by-step instructions in the first tab but broadly all you need to do is;

  1. Run a Screaming Frog crawl of all the pages you want to detect changes on
  2. Wait a bit (like a couple of weeks) or crawl the mobile, JavaScript, or dev version right away for comparison
  3. Run another SF crawl of the pages you want to detect changes on
  4. Export the internal_all report for both crawls and paste them into the “old crawl” and “new crawl” tabs respectively
  5. Wait a bit (like 30 minutes)
  6. Check the results tabs for changes
  7. (Optional) Import Search Console data to give “value lost” information for keywords you removed.

How to Check Your Site Speed: 5 Things You Need to Know About the Google User Experience Report

This is a copy of a post at distilled.net and is canonicalised there.

You’ve done your keyword research, your site architecture is clear and easy to navigate, and you’re giving users really obvious signals about how and why they should convert. But for some reason, conversion rates are the lowest they’ve ever been, and your rankings in Google are getting worse and worse.

You have two things in the back of your mind. First, recently a customer told your support team that the site was very slow to load. Second, Google has said that it is using site speed as part of how rankings are calculated.

It’s a common issue, and one of the biggest problems about site speed is it is so hard to prove it’s making the difference. We often have little-to-no power to impact site speed (apart from sacrificing those juicy tracking snippets and all that content we fought so hard to add in the first place). Even worse – some fundamental speed improvements can be a huge undertaking, regardless of the size of your dev team, so you need a really strong case to get changes made.

Sure, Google has the site speed impact calculator which gives an estimate of how much revenue you could be losing for loading more slowly, and if that gives you enough to make your case – great! Crack on. Chances are, though, that isn’t enough. A person could raise all kinds of objections, for instance;

  1. That’s not real-world data
    1. That tool is trying to access the site from one place in the world, our users live elsewhere so it will load faster for them
    2. We have no idea how the tool is trying to load our site, our users are using browsers to access our content, they will see different behaviour
  2. That tool doesn’t know our industry
  3. The site seems pretty fast to me
  4. The ranking/conversion/money problems started over the last few months – there’s no evidence that site speed got worse over that time.

Tools like webpagetest.org are fantastic but are usually constrained to accessing your site from a handful of locations

Pretty much any site speed checker will run into some combination of the above objections. Say we use webpagetest.org (which wouldn’t be a bad choice), when we give it a url, an automated system accesses our site tests how long it takes to load, and reports to us on that. As I say, not a bad choice but it’s very hard to to test accessing our site from everywhere our users are, using the browsers they are using, getting historic data that was recording even when everything was hunky-dory and site speed was far from our minds, and getting comparable data for our competitors.

Or is it?

Enter the Chrome User Experience (CRUX) report

In October 2017 Google released the Chrome User Experience report. The clue is in the name – this is anonymised domain-by-domain, country-by-country site speed data they have been recording through real-life Google Chrome users since October 2017. The data only includes records from Chrome users which have opted into syncing browser history, and have usage statistic reporting enabled, however many will have this on by default (see Google post). So this resource offers you real-world data on how fast your site is.

That brings us to the first thing you should know about the CRUX report.

1. What site speed data does the Chrome User Experience report contain?

In the simplest terms, the CRUX report gives recordings of how long it took your webpages to load. But loading isn’t on-off, even if you’re not familiar with web development, you will have noticed that when you ask for a web page, it thinks a bit, some of the content appears, maybe the page shuffles around a bit and eventually everything falls into place.

Example of a graph showing performance for a site across different metrics. Read on to understand the data and why it’s presented this way.

There are loads of reasons that different parts of that process could be slower, which means that getting recordings for different page load milestones can help us work out what needs work.

Google’s Chrome User Experience report gives readings for a few important stages of webpage load. They have given definitions here but I’ve also written some out below;

  • First Input Delay
    • This is more experimental, it’s the length of time between a user clicking a button and the site registering the click
    • If this is slow the user might think the screen is frozen
  • First Paint
    • The first time anything is loaded on the page, if this is slow the user will be left looking at a black screen
  • First Contentful Paint
    • Similar to first paint, this is the first time any user-visible content is loaded onto the screen (i.e. text or images).
    • As with First Paint, if this is slow the user will be waiting, looking at a blank screen
  • DOM Content Loaded
    • This is when all the html has been loaded. According to Google, it doesn’t include CSS and all images but by-and-large once you reach this point, the page should be usable, it’s quite an important milestone.
    • If this is slow the user will probably be waiting for content to appear on the page, piece by piece.
  • Onload
    • This is the last milestone and potentially a bit misleading. A page hits Onload when all the initial content has finished loading, which could lead you to believe users will be waiting for Onload. However, many web pages can be quite operational, as the Emperor would say, before Onload. Users might not even notice that the page hasn’t reached Onload.
    • To what extent Onload is a factor in Google ranking calculations is another question but in terms of User Experience I would prioritise the milestones before this.

All of that data is broken down by;

  • Domain (called ‘origin’)
  • Country
  • Device – desktop, tablet, mobile (called ‘client’)
  • Connection speed

So for example, you could see data for just visitors to your site, from Korea, on desktop, with a slow connection speed.

2. How can I access the Chrome User Experience report?

There are two main ways you can access Google’s Chrome user site speed data. The way I strongly recommend is getting it out using BigQuery, either by yourself or with the help of a responsible adult.

DO USE BIGQUERY

If you don’t know what BigQuery is, it’s a way of storing and accessing huge sets of data. You will need to use SQL to get the data out but that doesn’t mean you need to be able to write SQL. This tutorial from Paul Calvano is phenomenal and comes with a bunch of copy-paste code you can use to get some results. When you’re using BigQuery, you’ll ask for certain data, for instance, “give me how fast my domain and these two competitors reach First Contentful Paint”. Then you should be able to save that straight to Google Sheets or a csv file to play around with (also well demonstrated by Paul).

DO NOT USE THE PREBUILT DATA STUDIO DASHBOARD

The other, easier option, which I actually recommend against is the CRUX Data Studio dashboard. On the surface, this is a fantastic way to get site speed data over time. Unfortunately, there are a couple key gotchas for this dashboard which we need to watch out for. As you can see in the screenshot below, the dashboard will give you a readout of how often your site was Fast, Average, or Slow to reach each loading point. That is actually a pretty effective way to display the data over time for a quick benchmark of performance. One thing to watch out for with Fast, Average, and Slow is that the description of the thresholds for each isn’t quite right.

If you compare the percentages of Fast, Average, and Slow in that report with the data direct from BigQuery they don’t line up. It’s an understandable documentation slip but please don’t use those numbers without checking them. I’ve chatted with the team and submitted a bug report on the Github for this tool . I’ve also listed the true definitions below, in case you want to use Google’s report despite the compromises, or use the Fast, Average, Slow categorisations in the reports you create (as I say, it’s a good way to present the data). The link to generate one of these reports is g.co/chromeuxdash.

Another issue is that it uses the “all” dataset – meaning data from every country in the world. That means data from US users is going to be influenced by data from Australian users. It’s an understandable choice given the fact that this report is free, easily generated, and probably took a bunch of time to put together, but it’s taking us further away from that real-world data we were looking for. We can be certain that internet speeds in different countries will vary quite a lot (for instance South Korea is well known for having very fast internet speeds) but also that expectations of performance could vary by country as well. You don’t care if your site speed looks better than your competitor because you’re combining countries in a convenient way, you care if your site is fast enough to make you money. By accessing the report through BigQuery we can select data from just the country we’re interested in and get a more accurate view.

The final big problem with the Data Studio dashboard is it lumps desktop results in with mobile and tablet. That means that even looking at one site over time, it could look like your site speed has taken a major hit one month just because you happened to have more users on a slower connection that month. It doesn’t matter whether desktop users tend to load your pages faster than mobile, or vice versa – if your site speed dashboard can make it look like your site speed is drastically better or worse because you’ve started a facebook advertising campaign that’s not a useful dashboard.

The problems get even worse if you’re trying to compare two domains using this dashboard – one might naturally have more mobile traffic than the other, for example. It’s not a direct comparison and could actually be quite misleading. I’ve included a solution to this in the section below, but it will only work if you’re accessing the data with BigQuery.

Wondering why the Data Studio dashboard reports % of Fast, Average, and Slow, rather than just how long it takes your site to reach a certain load point? Read the next section!

3. Why doesn’t the CRUX report give me one number for load times?

This is important – your website does not have one amount of time that it takes to load a page. I’m not talking about the difference between First Paint or Dom Content Loaded, those numbers will of course be different. I’m talking about the differences within each metric every single time someone accesses a page.

It could take 3 seconds for someone in Tallahassee to reach Dom Content Loaded, 2 seconds for someone in London. Then another person in London loads the page on a different connection type, Dom Content Loaded could take 1.5 seconds. Then another person in London loads the page when the server is under more stress, it takes 4 seconds. The amount of time it takes to load a page looks less like this;

Median result from webpagetest.org

And more like this;

Distribution of load times for different page load milestones

That chart is showing a distribution of load times. Looking at that graph, you could think 95% of the time, the site is reaching DOM Content Loaded in under 8 seconds. On the other hand you could look at the peak and say it most commonly loads in around 1.7 seconds, but you could, for example see a strange peak at around 5 seconds and realise – something is intermittently going wrong that means sometimes the site takes much longer to load.

So you see saying “our site loads in X seconds, it used to load in Y seconds” could be useful when you’re trying to deliver a clear number to someone who doesn’t have time to understand the finer points, but it’s important for you to understand that performance isn’t constant and your site is being judged by what it tends to do, not what it does under sterile testing conditions.

4. What limitations are there in the Chrome User Experience report?

This data is fantastic (in case you hadn’t picked up before, I’m all for it) but there are certain limitations you need to bear in mind.

No raw numbers

The Chrome User Experience report will give us data on any domain contained in the data set. You don’t have to prove you own the site to look it up. That is fantastic data, but it’s also quite understandable that they can’t get away with giving actual numbers. If they did, it would take approximately 2 seconds for an SEO to sum all the numbers together and start getting monthly traffic estimates for all of their competitors.

As a result, all of the data comes as a percentage of total throughout the month, expressed in decimals. A good sense check when you’re working with this data is that all of your categories should add up to 1 (or 100%) unless you’re deliberately ignoring some of the data and know the caveats.

Domain-level data only

The data available from BigQuery is domain-level only, we can’t break it down page-by-page which does mean we can’t find the individual pages which load particularly slowly. Once you have confirmed you might have a problem, you could use a tool like Sitebulb to test page load times en-masse to get an idea of which pages on your site are the worst culprits.

No data at all when there isn’t much data

There will be some sites which don’t appear in some of the territory data sets, or at all. That’s because Google hasn’t added their data to the dataset, potentially because they don’t get enough traffic.

Losing data for the worst load times

This data set is unlikely to be effective at telling you about very very long load times. If you send a tool like webpagetest.org to a page on your site, it’ll sit and wait until that page has totally finished loading, then it’ll tell you what happened.

When a user accesses a page on your site there are all kinds of reasons they might not let it load fully. They might see the button they want to click early on and click on it before too much happened, if it’s taking a very long time they might give up altogether.

This means that the CRUX data is a bit unbalanced – the further we look along the “load time” axis, the less likely it is it’ll include representative data. Fortunately, it’s quite unlikely your site will be returning mostly fast load times and then a bunch of very slow load times. If performance is bad the whole distribution will likely shift towards the bad end of the scale.

The team at Google have confirmed that if a user doesn’t meet a milestone at all (for instance Onload) the recording for that milestone will be thrown out but they won’t throw out the readings for every milestone in that load. So, for example, if the user clicks away before Onload, Onload won’t be recorded at all, but if they have reached Dom Content Loaded, that will be recorded.

Combining stats for different devices

As I mentioned above – one problem with the CRUX report is all of the reported data is as a percentage of all requests.

So for instance, it might report that 10% of requests reached First Paint in 0.1 seconds. The problem with that is that response times are likely different for desktop and mobile – different connection speeds, processor power, probably even different content on the page. But desktop and mobile are lumped together for each domain and in each month, which means that a difference in the proportion of mobile users between domains or between months can mean that site speed could even look better, when it’s actually worse, or vice versa.

This is a problem when we’re accessing the data through BigQuery, as much as it is if we use the auto-generated Data Studio report, but there’s a solution if we’re working with the BigQuery data. This can be a bit of a noodle-boiler so let’s look at a table.

DeviceResponse time (seconds)% of total
Phone0.110
Desktop0.120
Phone0.250
Desktop0.220

In the data above, 10% of total responses were for mobile, and returned a response in 0.1 seconds. 20% of responses were on desktop and returned a response in 0.1 seconds.

If we summed that all together, we would say 30% of the time, our site gave a response in 0.1 seconds. But that’s thrown off by the fact that we’re combining desktop and mobile which will perform differently. Say we decide we are only going to look at desktop responses. If we just remove the mobile data (below), we see that, on desktop, we’re equally likely to give a response at 0.1 and at 0.2 seconds. So actually, for desktop users we have a 50/50 chance. Quite different to the 30% we got when combining the two.

DeviceResponse time (seconds)% of total
Desktop0.120
Desktop0.220


Fortunately, this sense-check also provides our solution, we need to calculate each of these percentages, as a proportion of the overall volume for that device. While it’s fiddly and a bit mind-bending, it’s quite achievable. Here are the steps;

  1. Get all the data for the domain, for the month, including all devices.
  2. Sum together the total % of responses for each device, if doing this in Excel or Google Sheets, a pivot table will do this for you just fine.
  3. For each row of your original data, divide the % of total, by the total amount for that device, e.g. below

Percent by device

Device% of total
Desktop40
Phone60

Original data with adjusted volume

DeviceResponse time (seconds)% of totalDevice % (from table above)Adjusted % of total
Phone0.1106010% / 60% = 16.7%
Desktop0.1204020% / 40% = 50%
Phone0.2506050% / 60% = 83.3%
Desktop0.2204020% / 40% = 50%

5. How should I present Chrome User Experience site speed data?

Because none of the milestones in the Chrome User Experience report have one number as an answer, it can be a challenge to visualise more than a small cross section of the data. Here are some visualisation types that I’ve found useful.

% of responses within “Fast”, “Average”, and “Slow” thresholds

As I mention above, the CRUX team have hit on a good way of displaying performance for these milestones over time. The automatic Data Studio dashboard shows the proportion of each metric over time, that gives you a way to see if a slowdown is a result of being Average or Slow more often, for example. Trying to visualise more than one of the milestones on one graph becomes a bit messy so I’ve found myself splitting out Fast, and Average so I can chart multiple milestones on one graph.

In the graph above, it looks like there isn’t a line for First Paint but that’s because the data is almost identical for that and First Contentful Paint

I’ve also used the Fast, Average, and Slow buckets to compare a few different sites during the same time period, to get a competitive overview.

Comparing competitors “Fast” responses by metric

An alternative which Paul Calvano demonstrates so well is histograms. This helps you see how distributions break down. The Fast, Average, and Slow bandings can hide some sins in that movement within those bands will still impact user experience. Histograms can also give you an idea of where you might be falling down in comparison to others, or your past performance and could help you identify things like inconsistent site performance. It can be difficult to understand a graph with more than a couple time periods or domains on it at the same time, though.

I’m sure there are many other (perhaps better) ways to display this data so feel free to have a play around. The main thing to bear in mind is that there are so many facets to this data it’s necessary to simplify it in some way, otherwise we just won’t be able to make sense of it on a graph.

What do you think?

Hopefully, this post gives you some ideas about how you could use the Chrome User Experience report to identify whether you should improve your site speed. Do you have any thoughts? Anything you think I’ve missed? Let me know in the comments!

If this has inspired you to dig into your site speed page-by-page, my colleague Meagan Sievers has written a post explaining how to use the Google Page Speed API and Google Sheets to bulk test pages. Happy testing.

Bonus – what are the actual thresholds in the CRUX Data Studio report?

As mentioned above, the thresholds in the CRUX Data Studio report aren’t 100% correct, I have submitted a GitHub issue but here are the updated thresholds.

Listed definitionActual time
FCP FastX <1 secondX < 1 second
FCP Average1 < x < 31 < X< 2.5
FCP SlowX >= 3 secondsX >= 2.5 seconds
FIrst Paint FastX <1 secondX < 1 second
First Paint Average1 < x < 31 < x < 2.5
First Paint SlowX >= 3 secondsX >= 2.5
First Input Delay FastX < 100 milX< 50 mil
First Input Delay Average100 mil < x < 150 mil < x < 250 mil
First Input Delay SlowX > 1X > 250 mil
DOM Content Load FastX < 1X < 1.5
DOM Content Load Average1 < x < 31.5 < x < 3.5
DOM Content Load SlowX > 3X > 3.5
Onload FastX < 1X < 2.5
Onload Average1 < x < 32.5 < x < 6.5
Onload Slowx >3X > 6.5