Good Place Word Clouds

Everything's fine

I am a huge fan of the Good Place so I created these Good Place word clouds specific to each of the core “team cockroach”. Zoom in to find phrases or words you recognise from the show.

I’ll follow up with more detail about how these were created but I used scripts from the show, for each character I found the times where they were involved and grabbed the words around those times. Then I used Andreas Mueller’s awesome word cloud script to generate the word clouds. I did tweak the weightings a bit to get the interesting phrases to show up (thanks very much to @nocontextgoodplace on Twitter for inspiration).

Continue reading “Good Place Word Clouds”

How to fix broken or redirecting links

As I said in my post about why we should fix broken or redirecting links – even though broken links and redirects aren’t ideal, we can’t hope to get rid of all redirects or broken links, as with anything in business, we need to prioritise what will have the biggest impact. We need to find the worst offenders.

By the time we’ve finished this post, we will have found just three changes lego.com could make to their site which could;

  • Make sure that Google sees their UK product pages
  • Fix over 9,000 internal redirecting links.
Continue reading “How to fix broken or redirecting links”

Why Melt (unpivot) is the most powerful function in Pandas

Pandas is a Python library that lets us do Excel-type-stuff. Well, that’s not really giving it the credit it deserves. Pandas is a Python library which makes Excel-type stuff waaaaaaaaaaaaaaaaaay easier.

You might have seen me speak about how Jupyter Notebooks can make our lives easier as marketers (if not – you’ve clearly been missing out on Distilled Searchlove and you should absolutely buy a ticket). A lot of the examples I use are to do with how using Pandas is much much easier than trying to do the same stuff in Excel.

One function I haven’t been able to talk about on-stage is melt. As I said in the title, melt is kind of like unpivot and it is one of the best functions in Pandas because it lets us easily do things that wouldn’t just be harder in Excel – they would be pretty much impossible for anyone who isn’t a pretty advanced Excel user.

Continue reading “Why Melt (unpivot) is the most powerful function in Pandas”

Why should I fix my site links?

Photo by Zdeněk Macháček on Unsplash

If you feel like you already have a good understanding of why you should fix broken or redirecting links on your site and just want to get fixing, go to my post here which shares a free Google Colab notebook which will help identify and prioritise problems for you with some easy and dev-readable Excel files.

Otherwise – strap in. Let’s talk about why having redirecting or broken links on your site is a problem and why you should fix it.

Some terminology that will come in useful later

What are templated links?

In short – lots of links across lots of pages, to the same place. Think about your navigation menu or footer. Templated links are often present on pretty much every page, they always have the same content, they are always linking to the same places. Templated links are very useful when you need a page to be accessible from anywhere on your site but it’s also easy to overlook mistakes that can cause you issues.

What are broken links?

A broken link is any link which points to a page which has been deleted and not redirected. That often means a 404 page, named after the status code 404 meaning “not found”.

What are redirect chains?

One redirect going to another redirect etc. etc. So instead of;

page-a ==> page-b

the redirects go like this;

page-a ==> page-b ==> page-c ==> page-d

Now instead of asking for just one page, we’re having to go through three hops to get to the active page. Redirect chains make the usual redirect problems even worse.

What are redirect loops?

This is like a redirect chain but worse. Instead of going;

page-a ==> page-b ==> page-c ==> page-d

It’d be something like;

page-a ==> page-b ==> page-c ==> page-a ==> page-b ==> page-c ==> page-a

And so on until whatever is trying to access the page just gives up. These make redirect problems even worse than redirect chains do.

What are 302 redirects?

The standard redirect involves your website responding with status code 301 which means – “this page has been permanently moved”. An alternative is status code 302 which means “this page has been temporarily moved”.

So essentially, all a 302 redirect is, is a redirect where you send a different, less strong message at the same time.

Don’t be fooled by the terminology – if you are redirecting a page and don’t have imminently plans to change it back (like, within the week), 301 is the way to go. If you use a 302 redirect things like Google aren’t as sure what’s going on. They’re thinking “Sure, you tell me the content is in this new place, but you don’t sound very certain of it, so I’m going to keep an eye on the old page, I’ll probably let it compete with the new page in search results and I definitely won’t treat this as you transferring all of the authority from page-a to page-b.”

Why fix internal redirects

Imagine we’re moving our whole blog. So we’re redirecting mysite.com/blog/• to blog.mysite.com/• .

When we redirect a page, we aren’t actually moving a page. All we’re really doing is deleting the old page, and saying to everything which tries to visit it (person or machine) “don’t look here, look in this other place instead”. We don’t really notice it as people but the machines are doing something like this.

Request: page-a

Response: 301 this page has moved permanently to page-b

Request: page-b

Response: 200 here’s page-b

The first problem – authority

We often talk about search engines, like Google trying to understand the internet in terms of authority – why should this site appear for a search, even if that site is talking about the same topic?

One early way Google used to judge this is links. Well respected, high-value sites tend to get more links than less respected, low-value sites. If you treat every link on the internet as a kind of a vote of confidence for the page it’s linking to, you start to get an idea of what people think is worthy of attention.

Not only do these votes of confidence help a page rank, they also mean that when that page links out to another page it’s vote of confidence bears more weight.

It kind of makes sense right? If we trust a page, we trust what it says more too. A page can’t pass on all of the clout that other pages have given it, but most of that authority gets split between all of the pages it links out to.

Pages on your site will have links going to them, even if they aren’t links from other sites, you will have internal links. That means your pages have some votes of confidence that they can use.

That means that this authority kind of flows around your site. Pages like your homepage pass authority to the higher level pages on your site, then it trickles down to the lower pages, but they link up to other pages so the authority can flow back up to the top.

The problem is, if you redirect a page, all of those votes of confidence aren’t for the new, active page – they are for the old page which doesn’t exist any more. So how does Google interpret this? Let’s use the example above.

When we redirect www.mysite.com/blog/post-1 to blog.mysite.com/post-1 we are essentially replacing all of the content of /blog/post-1 with one giant vote for the blog.mysite page.

As we said, a page can’t give away all of its authority, that’s not how a vote of confidence works, so while we preserve a good amount of the authority that page has built up, it’s still not everything. Some of that authority is still tied up in the old page that’s not doing any good any more.

So, with each unnecessary redirect, we are losing those hard-earned votes of confidence which could help this page rank. What’s more our new page has less authority to help our other pages rank. It basically throws away some of the votes of confidence we could use across our site – we’re hurting this page specifically and our whole site in general.

How do templated links make this worse?

Imagine we have a site with 500 pages (which would be smaller than most) and each of those pages has a footer link to a redirected page. That means that 500 times across our site we’re giving a vote of confidence to a page that doesn’t exist – every page is losing some of the authority it’s trying to pass on and the whole site is losing 500x the votes it would be if we were just talking about one link.

How do redirect chains make this worse?

We’ve already said that we lose a bit of authority with one redirect, if we redirect again we lose a little bit more, another one and it’s a little bit more on the hop after that. So we’re losing even more authority unnecessarily.

The second problem – time

The second and more intuitive problem is that redirects take a little bit more time and resources. Instead of having to just ask for one thing – computers, or Google, are having to ask for it, wait to be told it’s the wrong thing, then ask for another thing and wait to be told that’s the right thing.

That probably seems relatively insignificant but these things stack up quickly. Google is trying to see and understand the whole internet. That’s billions and billions of pages, which means they have to be careful with where they spend their time. If every time Google tried to access a page on your site, it has to go through multiple steps – that’s all taking away resources Google could be using on pages you actually care about.

What’s more, when users are trying to use your site, everything is going to seem slower because their computer is having to go through these additional hops. Which means users are less likely to do what you want (if you want to know why having a slow site is bad, I touch upon that in this Distilled post)

How do templated links make this worse?

As you’d expect, it means that more often users and robots are having to deal with these hops.

How do redirect chains make this worse?

With each redirect hop it’s taking more and more time and resources to get to the page a person or machine actually wants to access.

So do I have to get rid of all redirects?!

The key thing to remember here is redirects are a necessary and expected part of the internet. It’s just not practical to get external websites to update their links whenever we change a page so we need a way to make sure users get to the page we want them to. What’s more, Google remembers old pages it has seen so if we don’t redirect those pages it’ll just keep going back to them.

Why fix broken links?

As we said above – links on our sites are a way for our pages to give a vote of confidence to each other. However, a 404 page doesn’t exist at all, 404 means not found so if we give a vote of confidence to something that doesn’t exist then that vote is pretty much wasted.

Again, kind of makes sense right? If we say to Google – “Hey, this thing is great!” and the thing doesn’t exist any more we’ve just wasted our vote.

Similar to redirects – because of the way all of our pages are giving votes of confidence to all of our other pages – every time we link to a 404 page we’re throwing away votes that could be used across the site. We are limiting the strength of all of our pages by a little bit.

Having lots of links to 404 pages is also a Bad Sign for Google. If a site often links to 404 pages it’s more likely to be a site in disrepair and less likely to be a good user experience. Google doesn’t want to send users to a bad site so we’re less likely to appear in search results.

How do templated broken links make this worse?

More lost votes

As we said, every link to a 404 page is us throwing away a vote. A templated link is often a link that is present on every page of our site. Imagine all of the pages on our site have about 20 links on them. If two of our templates links go to a broken 404 page, that means that we’re throwing away 10% of our possible voting power and we’re reducing our site strength by quite a lot.

Waylaying enthusiastic users

Even if we take a cue from our favourite dictator and ignore all of those lost votes, even if we say we don’t care about Google’s evaluation of our site, this kind of problem could still cause havoc. Say we want a user to buy our product but they want to find out a bit more about it first. If links in our navigation, say, are going to broken pages, the user doesn’t get the information they want, they don’t trust the product, and they don’t buy.

So do I have to fix all 404 pages on my site?

I mean, that would be nice but I am not saying that having any 404 will be the death of your business. If you were running a physical shop and one one your shelves was broken that’s not going to kill the store right? If, on the other hand, you were running a shop and half of your shelves are broken that’s a problem you’ve got to fix pretty quickly.

Don’t believe people who email you saying you have to fix every single broken link on your site or everything will go up in flames, or who tell you that any links on your site which go to broken pages on other sites could make Google penalise you. These people are trying to sell you something.

As ever – all of this comes down to prioritising what is having the biggest impact. You just need to find the patterns of worst offenders.

What should I do next?

As I said above – we can’t hope to get rid of all redirects or broken links, the trick is to find the worst offenders.

Check out this post I wrote sharing a free notebook which will help you find redirect chains and templated broken links, and prioritise your fixes for you so you can work with your devs to fix the problem.

How to Do Change Detection with Screaming Frog and Google Sheets

I made a Google Sheet that does change detection for you based on two Screaming Frog crawls. I’ll tell you why that’s important. 

Two problems frequently come up for SEOs, regardless of if we’re in-house or external.

  1. Knowing when someone else has made key changes to the site
  2. Keeping a record of specific changes we made to the site, and when.

Both can sound trivial, but unnoticed changes to a site can undo months of hard work and, particularly with large e-commerce sites, it’s often necessary to update internal links, on-page text, and external plugins in search of the best possible performance. That doesn’t just go for SEO, it applies just as much to CRO and Dev teams.

Keeping a record of even just our changes can be really time-consuming but without it, we often have to rely on just remembering what we did when, particularly when we see a pattern of changing traffic or rankings and want to know what might have caused it. 

These things are people problems. When we can’t rely on other teams to work with us on their planned changes, that needs to be fixed at a team level. When we don’t have a system for listing the changes we make it’s understandable, particularly for smaller keyword or linking tweaks, but if we compare ourselves to a Dev team for example – a record of changes is exactly the kind of thing we’d expect them to just include in their process. At the end of the day, when we don’t keep track of what we doing that’s because we either don’t have the time or don’t have the commitment to a process. 

We shouldn’t really be trying to fix people problems with tools. That said, people problems are hard. Sometimes you just need a way of staying on top of things while you fight all the good fights. That’s exactly what this is for. 

This is a way to highlight the changes other teams have made to key pages, so you can quickly take action if needed, and to keep track of what you’ve done in case you need to undo it.

As a completely separate use-case, you can also use this sheet to check for differences between different versions of your site. Say, for the sake of argument, that you need to know the difference between the mobile and desktop versions of your site, or your site with and without JavaScript rendering, or even the differences between your live site and a private developer version you’re about to release. There are tools that offer change detection and cover some of the functions of this sheet, but I really like the flexibility this offers to check for changes between versions as well as over time.Screaming Frog Change Detection Google Sheet.
Access for free.

  • I’d like to receive content like this via email.

What sites is this good for?

This sheet is for anyone who needs an idea of what is changing on a fairly large number of pages but can’t afford to pay for big, expensive change detection systems. It’ll work its way through around 1,000 key pages. 

That said, 1,000 key pages stretches further than you would think. For many small sites, that’ll more than cover all the pages you care about and even larger eCommerce sites get the vast majority of their ranking potential through a smaller number category pages. You would be surprised how big a site can get before more than 1,000 category pages are needed. 

That 1,000 URL limit is a guideline, this sheet can probably stretch a bit further than that, it’s just going to start taking quite a while for it to process all of the formulas.

So what changes does it detect?

This Google Sheet looks at your “new crawl” and “old crawl” data and gives you tabs for each of the following;

  • Newly found pages – any URL in the new crawl that isn’t in the old crawl
  • Newly lost pages – any URL in the old crawl that isn’t in the new crawl
  • Indexation changes – i.e. Any URL which is now canonicalised or was noindexed
  • Status code changes – i.e. Any URL which was redirected but is now code 200
  • URL-level Title Tag or Meta Description changes
  • URL-level H1 or H2 changes
  • Any keywords that are newly added or missing sitewide.

What’s that about keyword change detection?

On many sites, we’re targeting keywords in multiple places at a time. Often we would like to have a clear idea of exactly what we’re targeting where but that’s not always possible.

The thing is, as we said, your pages keep changing – you keep changing them. When we update titles, meta descriptions and H1s we’re not checking every page on the site to confirm our keyword coverage. It’s quite easy to miss that we are removing some, middlingly important, keyword from the site completely. 

Thanks to a custom function, the Google sheet splits apart all of your title tags, meta descriptions, and H#s into their component words and finds any that, as of the last crawl, have either been newly added, or removed from the site completely.

It then looks the freshly removed words up against Search Console data to find all the searches you were getting clicks from before, to give you an idea of what you might be missing out on now.

The fact that it’s checking across all your pages means you don’t end up with a bunch of stopwords in the list (stopwords being; it, and, but, then etc.) and you don’t have to worry about branded terms being pulled through either – it’s very unlikely that you’ll completely remove your brand name from all of your title tags and meta descriptions by accident, and if you do that’s probably something you’d want to know about.

How do I use it?

Start by accessing a copy of this Google Sheet so you can edit it. There are step-by-step instructions in the first tab but broadly all you need to do is;

  1. Run a Screaming Frog crawl of all the pages you want to detect changes on
  2. Wait a bit (like a couple of weeks) or crawl the mobile, JavaScript, or dev version right away for comparison
  3. Run another SF crawl of the pages you want to detect changes on
  4. Export the internal_all report for both crawls and paste them into the “old crawl” and “new crawl” tabs respectively
  5. Wait a bit (like 30 minutes)
  6. Check the results tabs for changes
  7. (Optional) Import Search Console data to give “value lost” information for keywords you removed.

How to Check Your Site Speed: 5 Things You Need to Know About the Google User Experience Report

This is a copy of a post at distilled.net and is canonicalised there.

You’ve done your keyword research, your site architecture is clear and easy to navigate, and you’re giving users really obvious signals about how and why they should convert. But for some reason, conversion rates are the lowest they’ve ever been, and your rankings in Google are getting worse and worse.

You have two things in the back of your mind. First, recently a customer told your support team that the site was very slow to load. Second, Google has said that it is using site speed as part of how rankings are calculated.

It’s a common issue, and one of the biggest problems about site speed is it is so hard to prove it’s making the difference. We often have little-to-no power to impact site speed (apart from sacrificing those juicy tracking snippets and all that content we fought so hard to add in the first place). Even worse – some fundamental speed improvements can be a huge undertaking, regardless of the size of your dev team, so you need a really strong case to get changes made.

Sure, Google has the site speed impact calculator which gives an estimate of how much revenue you could be losing for loading more slowly, and if that gives you enough to make your case – great! Crack on. Chances are, though, that isn’t enough. A person could raise all kinds of objections, for instance;

  1. That’s not real-world data
    1. That tool is trying to access the site from one place in the world, our users live elsewhere so it will load faster for them
    2. We have no idea how the tool is trying to load our site, our users are using browsers to access our content, they will see different behaviour
  2. That tool doesn’t know our industry
  3. The site seems pretty fast to me
  4. The ranking/conversion/money problems started over the last few months – there’s no evidence that site speed got worse over that time.

Tools like webpagetest.org are fantastic but are usually constrained to accessing your site from a handful of locations

Pretty much any site speed checker will run into some combination of the above objections. Say we use webpagetest.org (which wouldn’t be a bad choice), when we give it a url, an automated system accesses our site tests how long it takes to load, and reports to us on that. As I say, not a bad choice but it’s very hard to to test accessing our site from everywhere our users are, using the browsers they are using, getting historic data that was recording even when everything was hunky-dory and site speed was far from our minds, and getting comparable data for our competitors.

Or is it?

Enter the Chrome User Experience (CRUX) report

In October 2017 Google released the Chrome User Experience report. The clue is in the name – this is anonymised domain-by-domain, country-by-country site speed data they have been recording through real-life Google Chrome users since October 2017. The data only includes records from Chrome users which have opted into syncing browser history, and have usage statistic reporting enabled, however many will have this on by default (see Google post). So this resource offers you real-world data on how fast your site is.

That brings us to the first thing you should know about the CRUX report.

1. What site speed data does the Chrome User Experience report contain?

In the simplest terms, the CRUX report gives recordings of how long it took your webpages to load. But loading isn’t on-off, even if you’re not familiar with web development, you will have noticed that when you ask for a web page, it thinks a bit, some of the content appears, maybe the page shuffles around a bit and eventually everything falls into place.

Example of a graph showing performance for a site across different metrics. Read on to understand the data and why it’s presented this way.

There are loads of reasons that different parts of that process could be slower, which means that getting recordings for different page load milestones can help us work out what needs work.

Google’s Chrome User Experience report gives readings for a few important stages of webpage load. They have given definitions here but I’ve also written some out below;

  • First Input Delay
    • This is more experimental, it’s the length of time between a user clicking a button and the site registering the click
    • If this is slow the user might think the screen is frozen
  • First Paint
    • The first time anything is loaded on the page, if this is slow the user will be left looking at a black screen
  • First Contentful Paint
    • Similar to first paint, this is the first time any user-visible content is loaded onto the screen (i.e. text or images).
    • As with First Paint, if this is slow the user will be waiting, looking at a blank screen
  • DOM Content Loaded
    • This is when all the html has been loaded. According to Google, it doesn’t include CSS and all images but by-and-large once you reach this point, the page should be usable, it’s quite an important milestone.
    • If this is slow the user will probably be waiting for content to appear on the page, piece by piece.
  • Onload
    • This is the last milestone and potentially a bit misleading. A page hits Onload when all the initial content has finished loading, which could lead you to believe users will be waiting for Onload. However, many web pages can be quite operational, as the Emperor would say, before Onload. Users might not even notice that the page hasn’t reached Onload.
    • To what extent Onload is a factor in Google ranking calculations is another question but in terms of User Experience I would prioritise the milestones before this.

All of that data is broken down by;

  • Domain (called ‘origin’)
  • Country
  • Device – desktop, tablet, mobile (called ‘client’)
  • Connection speed

So for example, you could see data for just visitors to your site, from Korea, on desktop, with a slow connection speed.

2. How can I access the Chrome User Experience report?

There are two main ways you can access Google’s Chrome user site speed data. The way I strongly recommend is getting it out using BigQuery, either by yourself or with the help of a responsible adult.

DO USE BIGQUERY

If you don’t know what BigQuery is, it’s a way of storing and accessing huge sets of data. You will need to use SQL to get the data out but that doesn’t mean you need to be able to write SQL. This tutorial from Paul Calvano is phenomenal and comes with a bunch of copy-paste code you can use to get some results. When you’re using BigQuery, you’ll ask for certain data, for instance, “give me how fast my domain and these two competitors reach First Contentful Paint”. Then you should be able to save that straight to Google Sheets or a csv file to play around with (also well demonstrated by Paul).

DO NOT USE THE PREBUILT DATA STUDIO DASHBOARD

The other, easier option, which I actually recommend against is the CRUX Data Studio dashboard. On the surface, this is a fantastic way to get site speed data over time. Unfortunately, there are a couple key gotchas for this dashboard which we need to watch out for. As you can see in the screenshot below, the dashboard will give you a readout of how often your site was Fast, Average, or Slow to reach each loading point. That is actually a pretty effective way to display the data over time for a quick benchmark of performance. One thing to watch out for with Fast, Average, and Slow is that the description of the thresholds for each isn’t quite right.

If you compare the percentages of Fast, Average, and Slow in that report with the data direct from BigQuery they don’t line up. It’s an understandable documentation slip but please don’t use those numbers without checking them. I’ve chatted with the team and submitted a bug report on the Github for this tool . I’ve also listed the true definitions below, in case you want to use Google’s report despite the compromises, or use the Fast, Average, Slow categorisations in the reports you create (as I say, it’s a good way to present the data). The link to generate one of these reports is g.co/chromeuxdash.

Another issue is that it uses the “all” dataset – meaning data from every country in the world. That means data from US users is going to be influenced by data from Australian users. It’s an understandable choice given the fact that this report is free, easily generated, and probably took a bunch of time to put together, but it’s taking us further away from that real-world data we were looking for. We can be certain that internet speeds in different countries will vary quite a lot (for instance South Korea is well known for having very fast internet speeds) but also that expectations of performance could vary by country as well. You don’t care if your site speed looks better than your competitor because you’re combining countries in a convenient way, you care if your site is fast enough to make you money. By accessing the report through BigQuery we can select data from just the country we’re interested in and get a more accurate view.

The final big problem with the Data Studio dashboard is it lumps desktop results in with mobile and tablet. That means that even looking at one site over time, it could look like your site speed has taken a major hit one month just because you happened to have more users on a slower connection that month. It doesn’t matter whether desktop users tend to load your pages faster than mobile, or vice versa – if your site speed dashboard can make it look like your site speed is drastically better or worse because you’ve started a facebook advertising campaign that’s not a useful dashboard.

The problems get even worse if you’re trying to compare two domains using this dashboard – one might naturally have more mobile traffic than the other, for example. It’s not a direct comparison and could actually be quite misleading. I’ve included a solution to this in the section below, but it will only work if you’re accessing the data with BigQuery.

Wondering why the Data Studio dashboard reports % of Fast, Average, and Slow, rather than just how long it takes your site to reach a certain load point? Read the next section!

3. Why doesn’t the CRUX report give me one number for load times?

This is important – your website does not have one amount of time that it takes to load a page. I’m not talking about the difference between First Paint or Dom Content Loaded, those numbers will of course be different. I’m talking about the differences within each metric every single time someone accesses a page.

It could take 3 seconds for someone in Tallahassee to reach Dom Content Loaded, 2 seconds for someone in London. Then another person in London loads the page on a different connection type, Dom Content Loaded could take 1.5 seconds. Then another person in London loads the page when the server is under more stress, it takes 4 seconds. The amount of time it takes to load a page looks less like this;

Median result from webpagetest.org

And more like this;

Distribution of load times for different page load milestones

That chart is showing a distribution of load times. Looking at that graph, you could think 95% of the time, the site is reaching DOM Content Loaded in under 8 seconds. On the other hand you could look at the peak and say it most commonly loads in around 1.7 seconds, but you could, for example see a strange peak at around 5 seconds and realise – something is intermittently going wrong that means sometimes the site takes much longer to load.

So you see saying “our site loads in X seconds, it used to load in Y seconds” could be useful when you’re trying to deliver a clear number to someone who doesn’t have time to understand the finer points, but it’s important for you to understand that performance isn’t constant and your site is being judged by what it tends to do, not what it does under sterile testing conditions.

4. What limitations are there in the Chrome User Experience report?

This data is fantastic (in case you hadn’t picked up before, I’m all for it) but there are certain limitations you need to bear in mind.

No raw numbers

The Chrome User Experience report will give us data on any domain contained in the data set. You don’t have to prove you own the site to look it up. That is fantastic data, but it’s also quite understandable that they can’t get away with giving actual numbers. If they did, it would take approximately 2 seconds for an SEO to sum all the numbers together and start getting monthly traffic estimates for all of their competitors.

As a result, all of the data comes as a percentage of total throughout the month, expressed in decimals. A good sense check when you’re working with this data is that all of your categories should add up to 1 (or 100%) unless you’re deliberately ignoring some of the data and know the caveats.

Domain-level data only

The data available from BigQuery is domain-level only, we can’t break it down page-by-page which does mean we can’t find the individual pages which load particularly slowly. Once you have confirmed you might have a problem, you could use a tool like Sitebulb to test page load times en-masse to get an idea of which pages on your site are the worst culprits.

No data at all when there isn’t much data

There will be some sites which don’t appear in some of the territory data sets, or at all. That’s because Google hasn’t added their data to the dataset, potentially because they don’t get enough traffic.

Losing data for the worst load times

This data set is unlikely to be effective at telling you about very very long load times. If you send a tool like webpagetest.org to a page on your site, it’ll sit and wait until that page has totally finished loading, then it’ll tell you what happened.

When a user accesses a page on your site there are all kinds of reasons they might not let it load fully. They might see the button they want to click early on and click on it before too much happened, if it’s taking a very long time they might give up altogether.

This means that the CRUX data is a bit unbalanced – the further we look along the “load time” axis, the less likely it is it’ll include representative data. Fortunately, it’s quite unlikely your site will be returning mostly fast load times and then a bunch of very slow load times. If performance is bad the whole distribution will likely shift towards the bad end of the scale.

The team at Google have confirmed that if a user doesn’t meet a milestone at all (for instance Onload) the recording for that milestone will be thrown out but they won’t throw out the readings for every milestone in that load. So, for example, if the user clicks away before Onload, Onload won’t be recorded at all, but if they have reached Dom Content Loaded, that will be recorded.

Combining stats for different devices

As I mentioned above – one problem with the CRUX report is all of the reported data is as a percentage of all requests.

So for instance, it might report that 10% of requests reached First Paint in 0.1 seconds. The problem with that is that response times are likely different for desktop and mobile – different connection speeds, processor power, probably even different content on the page. But desktop and mobile are lumped together for each domain and in each month, which means that a difference in the proportion of mobile users between domains or between months can mean that site speed could even look better, when it’s actually worse, or vice versa.

This is a problem when we’re accessing the data through BigQuery, as much as it is if we use the auto-generated Data Studio report, but there’s a solution if we’re working with the BigQuery data. This can be a bit of a noodle-boiler so let’s look at a table.

DeviceResponse time (seconds)% of total
Phone0.110
Desktop0.120
Phone0.250
Desktop0.220

In the data above, 10% of total responses were for mobile, and returned a response in 0.1 seconds. 20% of responses were on desktop and returned a response in 0.1 seconds.

If we summed that all together, we would say 30% of the time, our site gave a response in 0.1 seconds. But that’s thrown off by the fact that we’re combining desktop and mobile which will perform differently. Say we decide we are only going to look at desktop responses. If we just remove the mobile data (below), we see that, on desktop, we’re equally likely to give a response at 0.1 and at 0.2 seconds. So actually, for desktop users we have a 50/50 chance. Quite different to the 30% we got when combining the two.

DeviceResponse time (seconds)% of total
Desktop0.120
Desktop0.220


Fortunately, this sense-check also provides our solution, we need to calculate each of these percentages, as a proportion of the overall volume for that device. While it’s fiddly and a bit mind-bending, it’s quite achievable. Here are the steps;

  1. Get all the data for the domain, for the month, including all devices.
  2. Sum together the total % of responses for each device, if doing this in Excel or Google Sheets, a pivot table will do this for you just fine.
  3. For each row of your original data, divide the % of total, by the total amount for that device, e.g. below

Percent by device

Device% of total
Desktop40
Phone60

Original data with adjusted volume

DeviceResponse time (seconds)% of totalDevice % (from table above)Adjusted % of total
Phone0.1106010% / 60% = 16.7%
Desktop0.1204020% / 40% = 50%
Phone0.2506050% / 60% = 83.3%
Desktop0.2204020% / 40% = 50%

5. How should I present Chrome User Experience site speed data?

Because none of the milestones in the Chrome User Experience report have one number as an answer, it can be a challenge to visualise more than a small cross section of the data. Here are some visualisation types that I’ve found useful.

% of responses within “Fast”, “Average”, and “Slow” thresholds

As I mention above, the CRUX team have hit on a good way of displaying performance for these milestones over time. The automatic Data Studio dashboard shows the proportion of each metric over time, that gives you a way to see if a slowdown is a result of being Average or Slow more often, for example. Trying to visualise more than one of the milestones on one graph becomes a bit messy so I’ve found myself splitting out Fast, and Average so I can chart multiple milestones on one graph.

In the graph above, it looks like there isn’t a line for First Paint but that’s because the data is almost identical for that and First Contentful Paint

I’ve also used the Fast, Average, and Slow buckets to compare a few different sites during the same time period, to get a competitive overview.

Comparing competitors “Fast” responses by metric

An alternative which Paul Calvano demonstrates so well is histograms. This helps you see how distributions break down. The Fast, Average, and Slow bandings can hide some sins in that movement within those bands will still impact user experience. Histograms can also give you an idea of where you might be falling down in comparison to others, or your past performance and could help you identify things like inconsistent site performance. It can be difficult to understand a graph with more than a couple time periods or domains on it at the same time, though.

I’m sure there are many other (perhaps better) ways to display this data so feel free to have a play around. The main thing to bear in mind is that there are so many facets to this data it’s necessary to simplify it in some way, otherwise we just won’t be able to make sense of it on a graph.

What do you think?

Hopefully, this post gives you some ideas about how you could use the Chrome User Experience report to identify whether you should improve your site speed. Do you have any thoughts? Anything you think I’ve missed? Let me know in the comments!

If this has inspired you to dig into your site speed page-by-page, my colleague Meagan Sievers has written a post explaining how to use the Google Page Speed API and Google Sheets to bulk test pages. Happy testing.

Bonus – what are the actual thresholds in the CRUX Data Studio report?

As mentioned above, the thresholds in the CRUX Data Studio report aren’t 100% correct, I have submitted a GitHub issue but here are the updated thresholds.

Listed definitionActual time
FCP FastX <1 secondX < 1 second
FCP Average1 < x < 31 < X< 2.5
FCP SlowX >= 3 secondsX >= 2.5 seconds
FIrst Paint FastX <1 secondX < 1 second
First Paint Average1 < x < 31 < x < 2.5
First Paint SlowX >= 3 secondsX >= 2.5
First Input Delay FastX < 100 milX< 50 mil
First Input Delay Average100 mil < x < 150 mil < x < 250 mil
First Input Delay SlowX > 1X > 250 mil
DOM Content Load FastX < 1X < 1.5
DOM Content Load Average1 < x < 31.5 < x < 3.5
DOM Content Load SlowX > 3X > 3.5
Onload FastX < 1X < 2.5
Onload Average1 < x < 32.5 < x < 6.5
Onload Slowx >3X > 6.5

Tips for social media competitor analysis: Let’s stop talking about follower count

This is a copy of a blog post on distilled.net and is canonicalised there.

This year, Hootsuite announced that 3.196 billion people are now active social media users. That is 42% of all the people on earth. In the UK, that percentage climbs to 66% and it’s 71% in the US. Even with recent data protection scandals, platforms like Facebook, Twitter, Instagram, LinkedIn, Wechat, and Pinterest are a huge part of daily life.

This kind of impressive cut through makes it more likely that we can use social media to find our audience, but that doesn’t mean that everyone on the platform is desperate to hear from us. In reality, when we use social media as businesses we’re competing for what might be a very small, very niche, but very valuable cross-section of a network. This means that whenever we do social media marketing, we need a strategy, and to have a successful social media marketing strategy it’s vital to know how we compare to our competitors, what we’re doing well and what threats we should be worrying about.

Without effective social media competitor analysis we’re working in the dark. Unfortunately, a lot of the time when we compare social media communities we keep coming back to the same metrics which aren’t always as informative as we might like. Fear not! Here’s a guide to find the social media stats which really tell us which competitors to watch out for and why.

What are we trying to achieve with social media?

One of the biggest problems with creating a social media strategy is the subjectivity of social can make it incredibly hard to get solid, reliable performance data that can tell us what to do next. If we want to get actionable information about how we compare to competitors, it’s important for us to start with why we’re on the platforms to begin with (we’ll use these agreed facts in later sections). If we agree that;

  1. The value of a social media competitor analysis is to help us perform better on social
  2. The value of social is to help us achieve the business objectives that we set out in the first place.

Then we can agree that the numbers we look at in a social media competitor analysis must be defined by what we actually need the networks to achieve (even if it takes a while for engagement to become page views).

With that in mind, here are the most common aims I think we try to achieve through social media, ordered roughly from high commitment on the part of our audience, to low. When we are comparing social networks we need to make sure we have an idea of how the numbers we look at can contribute to at least one of the items below (and how efficiently).

  1. Sales (this can include donations or affiliate marketing as well as traditional sales)
  2. Support (event attendance etc. paid event attendance being included in sales)
  3. Site visits (essentially ad sales, visits to websites that don’t run on ads can be considered a step towards a sale)
  4. Impressions/staying front of mind (this is also a prerequisite for each of the above).

Why we should stop talking about raw follower counts

We often hear social media accounts evaluated and compared based on raw follower counts. If we agree we should look at numbers that are defined by our key goals I have some reasons why I don’t think we should talk about follower counts as much as we do.

“Followers” is a static number trying to represent a dynamic situation

When we compare social communities we don’t care how effective they were in 2012. The only reason we care about how effective they were over the last six months is because it’s a better predictor of how much of the available audience attention, and conversions they’ll take up over the next six months. What’s more, as social networks grow, and implement or update sharing algorithms, the goal posts are moving, so what happened a few years ago becomes even less relevant to the present.

Unfortunately, raw follower count includes none of that context, it’s just a pile of people who have expressed an interest at some point. Trying to judge how successful a community will be based on follower count is like trying to guess the weather at the top of a large hill based solely on its height – if it gets really big you can probably guess it’ll be colder or windier, but you’re having to ignore a whole bunch of far more relevant factors.

Follower buying can also really throw off these numbers. If you want to check competitors for follower buying you may be able to find some signs by checking for sudden, unusual changes in follower numbers (see “What we should look at instead”) or try exporting all their followers with a service like Export Tweet and check for a large number of accounts with short lifespans, low follower numbers or matching follower numbers.

An “impression” is required for every other social goal

I’m going to move on to what other numbers we should look at in the next section, but we have to agree that in order for anyone to do anything you want with your content, they have to have come into contact with it in some way.

Because of the nature of social networks we can also agree the number of impressions is unlikely to exactly match the follower number, even in a perfect system – some people who aren’t following will see your content, some people who are following won’t. So we’ve started to decouple “follows” from “impressions” – the most basic unit of social media interaction.

Next we can agree – if an account stops producing effective content, or stops producing content altogether, follower count will make no difference. A page that posts nothing will not have people viewing its nonexistent posts. So follower count isn’t sufficient for impressions and impressions are necessary for any other kind of success.

Depending on the kind of social network, the way in which content spreads though it will change. Which means follower count can be less decisive than other systems in different ways. We’ll look at each format below in isolation, where a network relies on more than one means (for instance hashtags and shares) the effect is compounded rather than cancelled out.

Discovery driven by hashtags

Ignoring other amplification mechanisms (which we’ll discuss below), follower count can be much less relevant in comparison to the ability to cut through hashtags. The end result of either a large, active following or content effectively cutting through a hashtag (or both) will be shown in the engagement metrics on the content itself, we have those numbers, so why rely on follows?

Discovery driven by shares and interaction

The combined followings or networks of everyone who follows you (even at relatively small numbers) can easily outweigh your audience or the audience of your competitors. Engagement or shares (whatever mechanism the platform uses to spread data via users) becomes a better predictor of how far content will reach, and we have those numbers, so why rely on follows?

If you’re interested in analysing your followers or competitor followers to find out how many followers those followers have and compare those numbers, services like Export Tweet will let you export a CSV of all the followers of an account, complete with their account creation date and follower number. Also, if you have to look into raw follower numbers this can be a way of checking for fake followers.

Discovery guided by algorithms

In this case, content won’t be shown to the entire following, the platform will start by showing it to a small subsection to gather data about how successful the post is. A successful post is likely to be seen by most of the following and probably users that don’t follow that account too, a less successful post will not be shown to much more than the testing group. Key feedback the platforms will use to gauge post success is engagement and, as we’ve said, we have those numbers, why rely on follows?  

This particular scenario is interesting because having a very large audience of mostly disengaged followers can actually harm reach – when the platform tests your content with your audience, it’s less lightly to be seen by the engaged subset, early post success metrics are likely to fare worse so the content will look less worthy of being shared more widely by the platform. This can mean that tactics like buying followers, or running short-term competitions just to boost follower count without a strategy for how to continually engage those followers, can backfire.

I’m not saying follower count has no impact at all

A large number of follows does give an advantage, and make it more likely that content is widely seen. The fact is that in most cases, engagement metrics usually tell us if posts were widely seen, so they are a much more accurate way to get a snapshot of current effectiveness. Engagement numbers are also far closer to the business objectives we laid out above so I’ll say again, why rely on follows?

At most I’d only ever want to use follower count to prioritise the first networks to investigate – as far as I’m concerned it isn’t a source of the actionable insights we said we wanted.

What we should look at instead

Engagements

In many ways, engagement-based numbers are the best to look at if we want to put together a fair and informative comparison including accounts we don’t own.

Engagement numbers are publicly visible on almost every social network (ignoring private-message platforms), meaning we aren’t having to work with estimates. What’s more, engagement is content-specific and requires some level of deliberate action on behalf of the user, meaning they can be a much better gauge of how many people have actually seen and absorbed a message, rather than glancing at something flying past their screen at roughly the top speed of a Honda Civic.

What business goal does this relate to?

Impressions. As mentioned above, engagements require the content to be on-screen and for the user to have recognised it at some level. Because engagements are like opt-in impressions, we can judge comparative success at staying front of mind. We could also use it as a sign that our audience is likely to take further action, like visiting our site or attending an event, depending on how you interpret the numbers (as long as it’s consistent). It’s fuzzy, but in a lot of ways less fuzzy than follows (due to removal from actual business goals) and actual impressions (due to lack of data). What’s more, the inaccuracy of this data leans towards only counting users who cared about the content, so it’s something I’m happy to live with.

That being said, when you’re comparing your own community to itself over time (and not worrying about competitors) impressions itself is still a good metric to use – most social platforms will give you that number and it can give you a fuller idea of your funnel (we’ll cover impressions more below).

What numbers should you use?

As with follower change and impressions (which I discuss below), we need to control for varying follower base and posts-per-day. I’d recommend:

  • Engagements per (post*follower) (where you multiply total follower count by total updates posted)
  • Engagements per post
  • Total engagements per post.

The first number should help you compare how well a follower base is being engaged, the second should give an idea of return on investment, and the third is to avoid being totally thrown off by tiny communities which might not actually be moving the needle for business objectives.

It’s worth checking the Facebook and Twitter ad reporting (relatively new additions to each platform) to see if the page is spending money promoting that content.

What tools should you use?

The platforms themselves are an option for gathering engagement numbers, which is one of the reasons this kind of check is ideal. This can be as simple as scrolling through competitor timelines and making notes of what engagement they’ve received. Unfortunately, sometimes this is time-consuming and many platforms take steps to block scraping of elements. However, I’ve found some success with scraping engagement numbers from Facebook and Twitter and I’ve included my selectors in case you do manage to use a tool like Agenty or Artoo.js to help automate this.

Facebook

NumberSharesLikesCommentsAdditional commentsAll visible posts
Selector.UFIShareLink._4arz span.UFICommentActorAndBody.UFIPagerLink._q7o

Twitter

NumberInteractionsAll visible posts
Selectorspan.ProfileTweet-actionCountForPresentationspan._timestamp

Facebook Insights is another great source of information because it’ll give you some direct comparisons between your page and others. It’s not quite the level of granularity we’d like but it’s easy, free, and direct, so gift horses and all that.

NapoleonCat – I don’t work for this company but they have a 14-day free trial and their reports offer exactly the kind of information I’d be looking for, for both managed profiles, and ones you are watching. That includes daily raw engagement numbers, and calculated engagement rate and SII their “Social Interaction Index” which claims to account for differing audience size, allowing direct comparison between communities.

The hitch is that Twitter and Instagram only start collecting information from when you add them to the account, so if you want to collect data over time you’ll need to pay the premium fees. On the other hand, their support team has confirmed that they’re perfectly happy with you upgrading for a month, grabbing the stats you need, removing your payment card for a few months (losing access in the process) and repeating six months later for another snapshot.

Socialblade – offers some engagement rate metrics for platforms like Instagram and Twitter.  It doesn’t require you to log in but the data isn’t over time so your information is only as good as your dedication to recording it. 

Fanpage Karma does an impressive job of trying to give you actionable information about what is engaging. For instance, it’ll give you a scatter chart of engagement for other pages, colour coded by post type. Unfortunately,  anything more than a small number of posts can make that visualisation incredibly noisy and hard to read. The engagement-by-post-type charts are easier to read but sacrifice some of that granularity (honestly I don’t think there is a visualisation that has engagement number and post type over time that isn’t noisy).

It’ll also let you compare multiple pages in the same kind of visualisation where the dots still show number of engagement but are colour coded by page instead of post type, patterns can be a bit easier to divine with that one but the same tension can arise.

If you’re tracking these stats for your own content Twitter analytics and Instagram Insights are great, direct, sources of information. Any profile can view Twitter analytics, but you’ll need an Instagram business profile to look at the Instagram data. At the very least, each can be a quick way of gathering stats about your own contents’ impressions and engagement numbers, so you don’t have to manually collect numbers.

If you have to include a follower metric…

If you have to include a follower metric, I’d advise focusing on something far more representative of recent activity. Rather than total or raw number of follows, we can use recent change in followers.

While I still think this is a bit too close to raw followers for my liking, there’s one important difference – this can give you more of an idea of what’s happening now. A big growth in followers could mean a network is creating better content, it could also mean they’ve recently bought a bunch of followers, either way, we know they’re paying attention.

What business goal does this relate to?

Some people might use this number to correlate with impressions, but as I said we can use other numbers to more accurately track that. This number (along with raw post frequency) is one means of gauging effort put into a social network, and so can inform your idea of how efficient that network is, when you are looking at the other metrics.

These numbers are also likely closer to what senior managers are expecting so they can be a nice way to begin to refocus.

What number should you use?

We need to account for differing community histories, a way to do this is to consider both:

  • Raw followers gained over a recent period
  • Followers gained over a recent period as a proportion of total current followers.

We can use these two numbers to get an idea of how quickly networks are growing at the moment. The ideal would be to graph these numbers over time, that way we can see if follower growth has recently spiked, particularly in comparison to other accounts of similar focus or size.

Once we’ve identified times where an account has achieved significant change in growth, we can start to examine activity around that time.

What tools should you use?

NapoleonCat (I promise I’m not getting paid for this) can give you historic follower growth data for accounts you don’t own, although unfortunately it only reports Twitter follower growth since the point an account starts being monitored (other networks seem to backdate).

Socialblade offers historic follower stats for accounts you don’t own, the first time anyone searches for stats on an account, that account will be added to Socialblade’s watchlist and it’ll start gathering stats from that point. If you’re lucky, someone will already have checked, otherwise you can have a look now and check back later.

Impressions

It can be harder to get a comparison of impressions for content, but it’s one of our most foundational business objectives – a way to stay front of mind and ideally build towards sales. Everything we’ve covered in terms of Follower numbers is a step removed from actual impression numbers so it’s worth comparing actual impression numbers for recent content where we can.

What business goal does this relate to?

Impressions, but as impressions are the minimum bar to clear for all of our other business goals, this can also be considered top of the funnel for other things.

What numbers should you use?

  • Impressions per (post*follower) (where you multiply total follower count by total updates posted)
  • Impressions per post
  • Total impressions per account/all impressions for competitor accounts during that same period

Once you have collected impression numbers from a range of accounts on the same platform which are targeting the same audiences, we can sum them together and compare total impressions per account against total impressions overall to get a very rough share of voice estimate. This number will be heavily impacted by users who view content from one account again and again, but as those users are likely to be the most engaged, it’s a bias we can live with. Again, comparing this over time can give us an idea of trajectory and growth.

Some accounts may try to drive up key metrics by posting a huge number of times a day, there’s definitely a law of diminishing returns so as with engagements I’d also get an average per-post impression number to gauge comparative economy.

As this is post-specific, I would also recommend breaking this numbers down by post type (whether that be “meme”, “blog post”, or “video”) to spot trends in effectiveness.

What tools should you use?

Fanpage Karma again goes out of its way to give you means of slicing this data. Just like with engagement you can show impressions by post type for one Facebook page, or compare multiple at the same time. It can result in the same information overload but I definitely can’t fault the platform for a lack of granularity. Unlike with engagement, the platform will pretty much only give you impression data for Facebook and unfortunately sometimes it’s patchy (see the SEMrush and Moz graph below). 

It’ll also give YouTube view information, as well as giving you a breakdown of video views and interactions based on when the video was posted, it also offers cumulative figures which show how the performance of a video improved over time.

Tweetreach will give estimated reach for hashtags and keywords, by searching for a specific enough phrase, you can get an idea of reach for individual tweets, or a number of related tweets if you’re smart about it.

Content shares

This is specifically people sharing a page of your site on a social network. It may help us flesh out some of the impressions metrics we’ve been dancing around, particularly in terms of content from your site or competitors’ being shared by site visitors rather than an official account.

What business goal does this relate to?

Impressions, site visits generating ad revenue

What numbers should you use?

To control for volume of content created by different sites, I would look at both total number of shares and shares per blog post, for example, during the same time period. It could also be valuable information to sum total follower count of the accounts that shared the content, to weight shares by reach, but that could be a huge task and also opens us up to the problems of follower count.

What tools should you use?

Buzzsumo will let you search for shared content by domain, and will let you dig in to which accounts shared a particular item. It can give a slightly imbalanced picture because it’s just looking for shares of your website content (so don’t expect the figures to include particularly successful social-only content for example) but it’s an excellent tool to get a quick understanding of what content is doing how well, and for who.

Link clicks

This can be difficult information to gather but given its potential value to our business goals it’s worth getting this information where we can.

What business goal does this relate to?

Site visits generating ad revenue, event attendance, sales, depending on where the link is pointing.

In my experience it’s usually much harder to get users to click away from a social media platform than it is to get them to take any action within the same platform. Sharing links can also cause a drop in engagement, often because the primary purpose of the content isn’t to encourage engagement – success with a user often won’t be visible at all on the platform.

What numbers should you use?

  • Clicks per (link post*follower) (where you multiply total follower count by total updates posted)
  • Clicks per link post
  • Total link clicks

What tools should you use?

Understandably this is fairly locked-down, Fanpage Karma again goes out of its way to get you the data you need, and does offer to plot posts against link clicks in one of those scatter graphs we love. I’ve reached out to them for information on how they collect this data, will update when I hear back. As with impression data, click data can sometimes be patchy – the platform seems to miss data consistently across metrics.

Outside of that, the best trick I’ve found is by taking advantage of link shortener tracking. For example, anyone who uses free service Bit.ly to shorten their links can also get access to link click stats over time. The thing is, those stats aren’t password protected, anyone can access them just by copying the Bit.ly link and putting a + sign at the end before following the link.

Here are the stats for a link Donald Trump recently shared in a tweet.

Go forth and analyse

Hopefully, some of the metrics and processes I’ve included above prove helpful when you’re next directing your social media strategy. I would never argue that every single one of these numbers should be included in every competitor analysis, and there are a whole host of over factors to include in determining the efficacy of a community, for instance; does the traffic you send convert in the way you want?

That being said, I think these numbers are a great place to start working out what will make the difference, and will hopefully get us away from that frequent focus on follower numbers. If there are any numbers you think I’ve missed or any tips and tricks you know of that you particularly like, I’d love to hear about them in the comments below.

Visualizing Time: A Project Management How-To Using Google Sheets

This is a copy of a post on Moz and is canonicalised to there.




The short version of this post: Project management is a vital part of our job as marketers, but planning and visualizing projects over time is hard, so I’ve created a set of Google Sheets to make that work easier for you.

I’ve found this system helpful in a number of ways, so I’m sharing my templates here in case it’ll make your day a bit shorter. I’ll start off with a brief overview of what the sheets do, but in the latter section of this post I’ll also go into greater depth about how they work so you can change them to suit your own needs.

If you’d like to skip this post and get straight to the templates, you can access them here (but I’d recommend reading a bit about how they work first):

It’s worth mentioning: I don’t consider these sheets to be the only solution. They are a free solution that I’ve found pretty useful, but I have colleagues who swear by the likes of Smartsheet and Teamwork.

It’s also worth noting that different tools work better or worse with different styles. My aim with these sheets is to have a fairly concrete plan for the next three or four months, then a looser set of ideas for further down the line. When I’m filling out these sheets, I also focus on outcomes rather than processes – that helps cut down the time I spend updating sheets, and makes everything clearer for people to read.

The long version of this post is a lot like the short version above, but I talk more about some principles I try to stick to and how this setup fulfills them (shocker, eh?). As promised, the final section will describe how the sheets work, for anyone who runs into problems or wants to make something of their own.

Contents (for if you just want to jump to a specific section):

The 3 principles (which are about people as much as using the sheets)
An early conclusion
Appendices & instructions
How to add tasks to the list
Splitting tasks across multiple time periods
Working with the Month View tab (Planner and Stakeholder Versions)
How to make the Gantt charts work (and add categories)
How to make the Category-Filterable Forward-Facing Gantt Charts work
How to create the Stakeholder View
How to update the God’s-I Version


The principles (which are about people as much as using the sheets)

Principle 1: We shouldn’t need to store all our information in our heads.

This is a simple one — if we have to regularly understand something complex, particularly if it changes over time, that information has to be on the page. For example, if I’m trying to plan a marketing strategy and I have to constantly look at the information on the screen and then shuffle it around in my head to work out what we have time for month to month, I’m going to lose the thread and, eventually, my mind.

The Planner Version sheet aims to solve this in a few ways. First, you write all the tasks down in the Task View tab, the time period you’re completing them in is on the far left (in my example, it’s the month the task is planned for), and there are other columns like status and category — but initially, it can just be a brain dump of what needs to happen. The idea here is that when you’re first writing everything out, you don’t have to think too much about it — you can easily change the dates and add other information later.

The Month View tab takes the information in the Task List tab and reorders it by the months listed in column A of the Task View (it could be other time periods, as long as it’s consistent).

This way you can look at a time period, see how much resource is left, and read everything you currently have planned (the remaining resource calculation will also take into account recurring tasks you don’t always want to write out, like meetings).

While the Month View tab can help you focus on specific time periods, it doesn’t give you a long-term view of the plan or task dependencies, so we have the two Gantt views. The Gantt View tab contains everything from sixty days ago and into the future, as long as you haven’t just marked the task as “Later.” The Category-Filterable Gantt only focuses on things that are planned for the next six months.

As the name suggests, you can filter this second Gantt to only show specific categories (you label tasks with categories in the Task View tab). This filter is to help with broader trends that are harder to notice — for instance, if the most important part of the project is a social campaign or a site change and you don’t get to it for six months, you may need to make sure everyone is aware of that and agrees. Likewise, if you need to be showing impact but spend most of your time reporting, you may want to change your plan or make sure everyone understands why things are planned that way.

Principle 2: No one knows everything (and they shouldn’t).

If you’re working on a project where you have all the information, then one of two things is likely happening:

  1. You’ve really doubled down on that neuroticism we share
  2. You’re carrying this thing — you should just quit and start your own company selling beads* or something.

We can trust that our clients/bosses have more context than we do about wider plans and pressures. They may know more about wider strategies, that their boss tenses up every time a certain project is mentioned, or that a colleague hasn’t yet announced their resignation. While a Google Sheet is never an acceptable substitute for actual communication, our clients or bosses may also have an idea of where they want the project to go which they haven’t communicated, or which we haven’t understood.

We can also trust that people working on individual tasks have a good idea of whether things are going to be a problem — for instance, if we’re allowing far too little time for a task. We can try to be as informed as possible, but they’re still likely to know something we don’t.

Even if we disagree that certain things should be priorities or issues, having a transparent, shared plan helps us kick off difficult conversations with a shared understanding of what the plan currently is. The less everyone has to reprocess information to understand it (see Principle 1), the more likely we are to weed out problems early.

This is all well and good, but expecting someone to absorb everything about a project is likely to have the opposite effect. We need a source of data that everyone can refer to, without crowding their thoughts or our conversations with things that only we as project managers have to worry about.

That’s why we have the Stakeholder Version of our sheets. When we write everything in the Planner Version, the Planned tab is populated with just the things that are relevant for people who aren’t us (i.e, all the tasks where the status isn’t “unpitched,” “cancelled,” “forgotten,” or blank) with none of the resource or project identifier information.

We never have to fill out the Stakeholder Version sheet — it just grabs that information from the Planned tab using importrange() and creates all the same Gantt charts and monthly views — so we don’t have to worry about different plans showing different information.

*Bees?

Principle 3: I’m going to miss stuff (less is more).

I’ll be honest: I’ve spent a bunch of time in the past putting together tracking systems that I don’t check enough. I keep filling them out but I don’t spend enough time figuring out what’s needed where. If we have a Stakeholder Version which takes out the stuff that is irrelevant to other people, we need the same for us. After all, this isn’t the only thing we’re thinking about, either.

The What-in-God’s-name-have-I-missed Version (God’s-I from now on) pulls in data from all of your individual project management sheets and gives you one place to go to be reminded about all the things you’ve forgotten and messed up. It’s like dinner with your parents in a Google Sheet. You’re welcome.

The three places to check in this version are:

  1. Alerts Dashboard tab, which shows you the numbers of deadlines upcoming or missed, the work you need to budget for or brief, and how much unplanned budget you have per project, per month (where budget could just be internal people-hours, as that is still finite).
  2. Task Issues tab, which gives a filterable view of everything over the next three months (so you can dig in to the alerts you see in step one).
  3. Deadlines This Week tab so you have a quick reminder of what you need to complete soon.

An early conclusion:

Often, when I’m making a point, people tell me they hope I’ll wrap up early. This section is mainly proof of personal growth.

It’s also because everything after this is specific to using, changing, or understanding the project management sheets I’ve shared, so you need only read what follows if you’re interested in how to use the sheets or how I made them (I really do recommend dabbling with some uses of filter() and query(), particularly in conjunction with RegEx formulas).

Aside from that, I hope you find these resources useful. I’ve been getting a lot of value from them as a way to plan with people collaboratively and separate the concept of “project manager” from “person who needs to know all the things,” but I would be really interested in any thoughts you have about how to improve them or anything you think I’ve missed. Feel free to comment below!

Access the template sheets here:

Appendices & instructions

Some general notes

Quick notes on avoiding problems:

  1. Make sure that when you copy the sheets, the sharing permissions for the Planner View is email- or at least organization-based (anyone with access to the Stakeholder View will see the Planner View URL). It’s a good idea to keep the God’s-I Version permissions email-based, too.
  2. Try to follow the existing format of words and numbers as closely as possible when creating new information.
  3. If you want a new row, I’d insert a row, select the one above, copy it down into the new row, then change the information — that way, the formulas in the hidden columns should still work for you.
  4. If you want a new column, it might break one of the query() functions; once you’ve added it, have a quick look for formulas using =query() and consider changing the columns they reference that will have been affected by your change.

Quick notes on fixing problems:

Here’s a list of things to check for if you’ve changed something and it isn’t being reflected in the sheet:

  1. Go through all the tabs in the stakeholder view and unhide any hidden columns
    1. They usually just contain a formula that reformats text so our lookups work. See if any of those are missing or broken.
  2. Try copying the formulas from the row above or next to the cell that isn’t working.
  3. Try removing the =iferror portion of formulas.
    1. A lot of the cells are set up to be blank if they break. It makes it easier to read the sheet, but can make it harder to know whether something is actually empty or just looks empty.
  4. If one sheet isn’t properly pulling through data from another, look for the =importrange() formulas and make sure there is one that matches the URL of the sheet you’re trying to reference and that you’ve given permission for the formula to work — you’ll need to click a button.
    1. Check the Task View tab in the Stakeholder Version and Project URLs tab in the God’s-I Version
  5. Have you just called a task “Part 4” or similar? There is a RegEx formula which will strip that out.
  6. Have you forgotten to give a task a type? If so, the Gantt view will warn you in the Status column.

The query function

The =query() function in Google Sheets is awesome — it makes tons of things tons easier, particularly in terms of automating data manipulation. Most of what these sheets do could be achieved with =query, but I’ve often used =filter (which is also very powerful) because =filter is apparently quicker in Google Sheets and at times these sheets have a lot to process.

RegEx

You shouldn’t need to know any RegEx for this sheet, but it is useful in general. Here the RegEx is mainly used to remove the “Part #” in multi-part tasks (see below) and look for anything that matches multiple options — for instance, when selecting multiple categories in the Category-specific forward-facing Gantt tab (see below). RegEx is only used here in RegExmatch(), RegExextract(), RegExreplace(), or as part of the query function where we say “matches.”

Query/filter and isblank

A lot of the formulas in these sheets are either filter() or query() or are wrapped in =if(isblank() — that’s basically because filter and query functions can fill more cells than just the one you put the formula in. For example, they can fill a whole row, column, or sheet. That means that other cells are calculating or looking up against cells which may or may not be empty, so I’ve added the isblank() check so that the cells don’t break when there isn’t information somewhere, but as you add information you don’t have to do as much copying and pasting of formulas.

Tick boxes

The tick boxes are relatively new in Google Sheets. If you need another one, just copy it from an existing cell or select from the “Insert” menu. Where I’ve used tick boxes, I often have another formula in the sheet which filters rows based on what boxes are ticked, then creates a RegEx based on the values that have a tick next to them.

You don’t need to understand this to use the sheets, but you can see it in the rows I’ve unhidden in the Category-specific forward-facing Gantt tab of the Stakeholder Version sheet.

Quick tip — if you want to change all the boxes to ticked/unticked and don’t want to have to do so one by one, you can copy a ticked or unticked checkbox across all the other cells.

How to add tasks to the list

In the task view, the most important things to include are the task name, time period it’s planned for, cost, and type.

For ease, when creating a new task I recommend inserting a row, copying the row above into it, and then changing the information, that way you know you’re not missing any hidden formulas.

Again, don’t bother changing the Stakeholder Version. Once you’ve added the URL of the Planner Version to the =importrange() function, it will pull automatically from the Planner Version.

Splitting tasks across multiple time periods

You can put more than one thing in the time period for a task, just by separating it with “, “ (comma space). That’s because when we get the full list of months, we join all the individual cells together with “, “ then split them apart by “, “ and then dedupe the list — so multiple months in one cell are treated the same as all the other months.

=unique(transpose(split(JOIN(", ",'Task view'!A:A),", ",0)))

The cost-per-month formula in the Task List tab counts how many commas are present in the month column for that row, then divides the planned cost by that number — meaning the cost is split equally across all of the months listed.

=H2/(len(REGEXREPLACE(A2,"[^\,]*",""))+1)

If you don’t want the task to be completely equally split between different time periods, you can write “Part 1” or “Part 2” next to a task. As long as you write just “Part” and then numbers at the end of the name, that’ll be stripped out in column O of the task list tab so the different parts of a task will be combined into one record in things like the Gantt chart.

=REGEXREPLACE(B2,"Part \d+$","")

Working with the Month View tab (Planner and Stakeholder version)

A few key things are going on in the Month View tab. First, we’re getting all of the time periods we have listed in the Task View.

Because the months don’t always show up in the right format (meaning later filters don’t work), we then use a =text() formula in the hidden column B to make sure the months stay in the format we need.

Then, in the “deliverables” section of this tab, we use the below formula:

=if(not(isblank(A12)), iferror(TRANSPOSE(FILTER('Task view'!B:B,RegExmatch('Task view'!A:A,B12))),""),"")

What we’re doing above is checking if the “month” cell of this row is has anything in it. If there is a month there, we filter the tasks in the Task View to only those that contain that month in the text month column. Then we use the transpose() function to change our filtered tasks from a vertical list to the horizontal list we see in the sheet.

Finally, we use the below formula to filter the costs we’ve listed in the Task View tab, the same way we filtered the task names above. Then we add together all the costs for the month (plus the standing monthly costs) and subtract them from the total amount of time/hours we have to spend. That way we calculate how much we have left to play with, or if we’re running over.

=if(isblank(A12),"",((D12-SUM(FILTER('Task view'!I:I,RegExmatch('Task view'!A:A,B12))))-sum($D$6:$F$8)))

We also pull this value through to our God’s-I Version to see at a glance if we’ve over/under-planned.

How to make the Gantt charts work (and add categories)

Column C in the Task View tab is the category; you also need to fill this out for the Gantt charts to work. I haven’t forced the kind of categories you have to use because each project is different, but it’s worth using consistent categories (down to the capital letter) because we deduplicate the task categories, and that relies on all of the names being consistent.

What’s happening in the Gantt chart is each cell is a combination of a filter and vlookup (the below looks more complicated than it is).

=iferror(if(not(or(isblank($D6),ISBLANK(F$1))),vlookup(filter('Task view'!$C:$C,'Task view'!$O:$O=$D5,REGEXMATCH('Task view'!$A:$A,F$2)),'Status and colour code'!$C:$E,3,0),""),"")

The formula first checks if the task or month cells are blank. If not, it looks in the month cell in its column and cross-references with the task cell in its row. Where the intersection of a month and task matches a task in our Task View (as in the task in that row is taking place during the month in that column), the filter formula will return the category. For those interested, this might also have been achieved with index-match, but filter lets us match with RegEx so we can give multiple matching options and they don’t have to match exactly. Because we split tasks across multiple months, we need to be flexible in our matching.

The reason we check whether the task or month cells are empty, as mentioned above is so we can paste the above formula in all the cells of the Gantt chart and have them fill out as we add more months and tasks, rather than having to copy and paste the formula each time.

When our filter formula returns the specific category of our task, we take that value and run a vlookup in the Status and color code tab. (That’s only necessary so I could set up the conditional formatting for you so it won’t break when you change the specific category names.)

At the moment, the Gantt charts are set up to color-code the first 7 categories, plus a Deadline category if needed. If you want to add more, they’ll show up initially in the Gantt chart as a black block and you’ll need to set up conditional formatting to color-code them.

To add automatic color formatting for more categories, repeat the below process for each of the Gantt chart views in the Planner and Stakeholder sheets:

  1. Select all the cells in the tab
  2. Select “Conditional Formatting” from the Format menu
  3. Find the rule with the black box next to it and make a note of what number it’s currently targeting from
  4. Create a new rule for anything which equals the number in step 3, then set the same color for both the background and text of that rule
  5. Change the rule that’s got a black block next to it to target one number higher

How to make the Category-Filterable Gantt Charts work

This tab uses our old friends, the =filter() and =query() functions. First we use filter to grab the full list of categories from the Status and color code tab we mentioned before:

=FILTER('Gantt view'!A6:B,RegExmatch('Gantt view'!A6:A,".*[a-zA-Z].*"))

Then we put Google Sheets’ shiny new checkboxes next to them (that’ll help us filter our data easily).

Normally we’ll hide row one, but it’s visible to show you a formula that looks at all of the categories and filters them to just those where the tick-box next to them is ticked. If there are none, it returns “(\d|Deadline)” meaning “either a number, or the word Deadline” in RegEx-speak (so anything in our list), because the vertical pipe “|” means “or” and “\d” means “number.”

If there is a tick next to one or more of the categories, the formula will return those things, separated with the “|” that, again, means “or” in RegEx.

=if(countif(C3:C,True)>0,CONCATENATE("(",JOIN("|",FILTER(B3:B,C3:C=True,not(isblank(B3:B)))),")"),"(\d|Deadline)")

Then in cell E3 we have a query formula. The reason we’re using =query and not =filter here is that we need to look for things in more than one column; filter can only really handle one column at a time.

The query function then checks the first six columns of our original Gantt chart, each time looking for any of the category numbers we’ve ticked (what the conditional formatting hides is that the category numbers are in that original Gantt, they’re just the same color as the cell shading). When no tick-boxes are checked, it returns anything that has falls in to any category over the next six planned months. Once we start ticking checkboxes, this will return only the things over the next six planned months that are in one of the categories we’ve selected.

=query('Gantt view'!D1:1056,"Select D, E, F, G, H, I, J where D <> '' and (E matches '"&B1&"' or F matches '"&B1&"' or G matches '"&B1&"' or H matches '"&B1&"' or I matches '"&B1&"' or J matches '"&B1&"')",1)

How to create the Stakeholder View

The Planner Version sheet has a tab called Planned. You don’t need to fill out this tab — it has a query which extracts information from the Task View tab using a =query() function:

=QUERY({'Task view'!A1:F,'Task view'!O1:P},"Select * where not (Col6 contains 'pitched' or Col6 contains 'cancelled' or Col6 = '' or Col6 = 'Forgotten')")

All the formula above is doing is taking the Month, Task, Description, Blocker, Status, Category, and Full task columns, then showing every record where the status isn’t “unpitched,” “cancelled,” “forgotten,” or empty. That gives us a tab with the information we’re ready to share. We could also achieve this with =filter() if we reordered the data in the Task View tab, but this ordering of data is easier to work with, so we just use =query() and select only the columns we want here, combining the ranges horizontally by listing them between {} at the start of this formula.

Then, the Task View tab in our Stakeholder Version sheet file uses =importrange() to target that cleaned list we’ve created. To make sure the Stakeholder Version keeps functioning when you create copies of both of these files, all you need to do is go to the new Planner Version sheet and copy the URL of the page, then go to the Stakeholder Version, find the Task View tab, and update the importrange() formula in cell A1 to have the new URL of your Planner Version sheet. The cell will recalculate, you’ll need to grant permission, then it should work as normal.

How to update the God’s-I Version

This view gives you the following:

A quick look at the total number of tasks in any project which:

  1. Have a deadline within 10 days of now
  2. Have passed a deadline (with the task not completed)
  3. Don’t have a deadline set
  4. Aren’t briefed or aren’t budgeted for the next three months

It’ll also give you a quick look at the amount of unplanned budget per project, per month, to make sure you haven’t forgotten to plan a month and haven’t overplanned a month.

The God’s-I Version works in a similar way to the Stakeholder Version in that it pulls in information using =importrange(), but a key difference is that we want to pull from multiple sheets. Rewriting the formula could get to be a pain, so instead we can generate the formula we need in the Project URLs tab.

The only things you need to do are:

  1. Add the URL of the new Planner View sheet you want to include in the Project URLs tab of the God’s-I Version
  2. Grant permission for this sheet to access that sheet (you can click on the alert that appears in column A)
  3. Copy the value in Cell B1, go to the All Imported Task Views tab and select cell A2, then paste the value into the top bar. It’s important that we don’t paste straight into the cell or the sheet will run the concatenate formula rather than the query formula we’re making.

It’s worth noting that this sheet will have all the information about every project you’re managing. Once it’s set up, you shouldn’t share access to anyone unless you’re happy with them seeing all the budgeting details for each of the sheets.


A late conclusion:

Why are you looking for a conclusion down here? It’s in the middle of the post under the title of An early conclusion,” of course. Have a nice day!

Getting Around the “One Form” Problem in Unbounce

This is a copy of the post on Moz and is canonicalised there.




What is Unbounce?

Unbounce is a well-known and well-regarded landing page creation tool designed to allow HTML novices to create impressive and impactful landing pages, while offering scope for more experienced coders to have a bit more fun.

In this post, I’m going to list some solutions to what I refer to as the “one form” problem of Unbounce, their strengths and weaknesses, and which I personally prefer.

What is the “one form” problem?

As with any system that tries to take complex processes and make them simple to implement, there’s a certain amount of nuance and flexibility that has to be sacrificed.

One limitation is that each landing page on Unbounce can only have one embedded form (there are a few community articles discussing the topic, for instance: 123). While there’s a definite risk of call-to-action fatigue if you bombard your visitors with forms, it’s a reasonable requirement to want to provide easy access to your form at more than one point.

For example, you could lead with a strong call to action and the form at the top of the page, then follow up further down the page when users have had time to absorb more information about your offering. A simple example of this is the below Teambit landing page, which was featured in Hubspot’s 16 of the Best Landing Page Design Examples You Need to See in 2017.

The top of this Teambit page features a simple email collection form

The form is repeated at the bottom of the page once visitors have had a chance to read more.

Potential solutions to the one-form issue

Now that we’ve established the problem, let’s run through some solutions, shall we?

Fortunately, there are a few possible ways to solve this problem, either using built-in Unbounce tools or by adding code through open HTML, CSS, and JavaScript inputs.

It’s worth bearing in mind that one solution is to not have the form on your page at all, and have your call-to-action buttons linking to other pages with forms. This is the approach Unbounce uses in the example below. While that’s a perfectly valid approach, I wouldn’t call it so much a solution to this problem as a completely different format, so I haven’t included it in the list below.

Here Unbounce use two CTAs (the orange buttons), but don’t rely on having the form on the page.

1. Scrolling anchor button

This is potentially the simplest solution, as it’s natively supported by Unbounce:

  1. Create a button further down the page where you would want your second form.
  2. Edit that button, in the “Click Action” section of the right-hand button settings panel, where you would normally put the URL you are linking to
  3. Add in the unique ID code for the box that holds your form (you can find that by editing the box and scrolling to the bottom of the right-hand panel to “Element Metadata”)

Register button

“Click Action” section of right-hand button settings panel

“Element Metadata” section at bottom of right-hand element setting panel

Benefits

Quick and easy to implement, little direct JavaScript or HTML manipulation needed.

Drawbacks

There are far more seamless ways to achieve this from the user perspective. Even with smooth scrolling (see “bonus points” below), the experience can be a little jarring for users, particularly if they want to go back to check information elsewhere on a page.

Bonus points

Just adding that in as-is will mean a pretty jarring experience for users. When they click the button, the page will jump back to your form as though it’s loaded a new page. To help visitors understand what’s going on, add smooth scrolling through JavaScript. Unbounce has how-to information here.

Double bonus

The link anchors work by aligning the top of your screen with the top of the thing you’ve anchored. That can leave it looking like you’ve undershot a bit, because the form is almost falling off the screen. You can solve this simply by putting a tiny, one-pixel-wide box a little bit above the form, with no fill or border, positioning it how you want, and linking to the ID of that box instead, allowing a bit of breathing room above your form.

Without and with the one-pixel-wide box for headroom

2. iFrames

Unbounce allows free <HTML> blocks, which you can use to embed a form from another service or even another Unbounce page that consists of only a form. You’ll need to drag the “Custom HTML” block from the left bar to where you want the form to be and paste in your iFrame code.

The “Custom HTML” block in the left-hand bar

Blank HTML box that pops up

How HTML blocks look in the editor

Benefits

This will allow for multiple forms, for each form to be positioned differently on the page, to function in a different way, and for entries to each form to be tagged differently (which will offer insight on the effectiveness of the page).

This solution will also allow you to make the most of functionality from other services, such as Wufoo (Unbounce has documented the process for that here).

Drawbacks

Having chosen Unbounce as a one-stop-shop for creating landing pages, breaking out of that to use external forms could be considered a step away from the original purpose. This also introduces complications in construction, because you can’t see how the form will look on the page in the editing mode. So your workflow for changes could look like:

  1. Change external form
  2. Review page and see styling issues
  3. Change layout in Unbounce editor
  4. Review page and see that the external form isn’t as readable
  5. Change external form
  6. Etc.

Bonus points

Unbounce can’t track conversions through an iFrame, so even if you use another Unbounce page as the form you draw in, you’re going to be breaking out of Unbounce’s native tracking. They have a script here you can use to fire external tracking hits to track page success more centrally so you get more of a feel for whether individual pages are performing well.

Double bonus

Even if you’re using an identical Unbounce page to pull through the same form functionality twice, tag the form completions differently to give you an idea of whether users are more likely to convert at the top of the page before they get distracted, or lower down when they have had time to absorb the benefits of your offering.

3. Sticky form (always there)

An option that will keep everything on the same page is a sticky form. You can use CSS styling to fix it in place on a screen rather than on a page, then when your visitor scrolls down, the form or CTA will travel with them — always within easy reach.

This simple CSS code will fix a form on the right-hand side of a page for screen widths over 800px (that being where Unbounce switches from Desktop to Mobile design, meaning the positioning needs to be different).

Each ID element below corresponds to a different box which I wanted to move together. You’ll need to change the “lp-pom-box-xxx” below to match the IDs of what you want to move down the page with the user (you can find those IDs in the “Element Metadata” section as described in the Scrolling Anchor Button solution above).

<style>
@media (min-width: 800px) {  
    #lp-pom-box-56{ position:fixed; left:50%; margin-left: 123px; top:25%; margin-top:-70px}  
    #lp-pom-form-59{ position:fixed; left:50%; margin-left: 141px; top:25%; margin-top:60px}  
    #lp-pom-box-54{ position:fixed; left:50%; margin-left: 123px; top:25%; margin-top:50px}}
</style>

Benefits

This allows you to keep tracking within Unbounce. It cuts out a lot of the back and forth of building the form elsewhere and then trying to make that form, within an iFrame, act on your page the way you want it to.

Drawbacks

The problem with this is that users can quickly become blind to a CTA that travels with them, adding some kind of regular attention seeking effect is likely to just annoy them. The solution here is to have your call to action or form obscured during parts of the page, only to reappear at other, more appropriate times (as in the next section).

It can be difficult to see exactly where the form will appear because your CSS changes won’t take effect in the editor preview, but you will be able to see the impact when you save and preview the page.

4. Sticky form (appearing and disappearing)

The simplest way to achieve this is using z-index. In short, the z-index is a way of communicating layers through HTML, an image with a z-index of 2 will be interpreted as closer to the user than a box with a z-index of 1, so when viewing the page it’ll look like the image is in front of the box.

For this method, you’ll need some kind of opaque box in each section of your page. The box can be filled with a color, image, gradient — it doesn’t matter as long as it isn’t transparent. After you’ve put the boxes in place, make a note of their z-index, which you can find in the “Meta Data” section of the right-hand settings bar, the same place that the Element ID is shown.

This box has a z-index of 31, so it’ll cover something with an index of 30

Then use CSS to select the elements you’re moving down the page and set their z-index to a lower number. In the below lines I’ve selected two elements and set their z-index to 30, which means that they’ll be hidden behind the box above, which has a z-index of 31. Again, here you’ll want to replace the IDs that start #lp-pom-box-xxxx with the same IDs you used in the Sticky Form (Always There) solution above.

<style>   
    #lp-pom-box-133{z-index: 30; }  
    #lp-pom-box-135{z-index: 30; }
</style>

When you’re choosing the place where you want your form to be visible again, just remove any items that might obscure the form during that section. It’ll scroll into view.

Benefits

This will allow you to offer a full form for users to fill out, at different points on the page, without having to worry about it becoming wallpaper or whether you can marry up external conversions. Using only CSS will also mean that you don’t have to worry about users with JavaScript turned off (while the bonus points below rely on JavaScript, this will fall back gracefully if JavaScript is turned off).

Drawbacks

Unlike the iFrame method, this won’t allow you to use more than one form format. It also requires a bit more CSS knowledge (and the bonus points will require at least a bit of trial and error with JavaScript).

Bonus points

Use JavaScript to apply and remove CSS classes based on your scrolling position on the page. For example you can create CSS classes like these which make elements fade in and out of view.

CSS:

<style>
@media (min-width: 800px) {    
   /* make the opacity of an element 0 where it has this class */  
       .hide {  
       opacity: 0;
}  
   /* instead of applying an effect immediately, apply it gradually over 0.2 seconds */    .transition {
   -webkit-transition: all 0.2s ease-in-out; 
       -moz-transition: all 0.2s ease-in-out; 
       -o-transition: all 0.2s ease-in-out; 
       transition: all 0.2s ease-in-out;  
    }}
</style>

You could then use this JavaScript to apply the .hide class when user scrolls through certain points, and remove it when they get to the points where you want them to see the form. This can be used for finer-grained control of where the form appears, without having to just cover it up. As before, you’ll need to update the #lp-pom-box-xxx strings to match the IDs in your account.

JavaScript:

<script>
// This script applies the “hide” class, which makes opacity zero, to certain elements when we scroll more than 100 pixels away from the top of the page. Effectively, if we scroll down the page these items will fade away.
$(window).scroll(function() {
    if ($(window).scrollTop() > 100 ){
        $('#lp-pom-box-54').addClass('hide');
        $('#lp-pom-box-228').addClass('hide');  
} 
// This section removes the hide class if we’re less than 500 pixels from the bottom of the page or scroll back up to be less than 100 from the top. This means that those elements will fade back into view when we’re near the bottom of the page or go back to the top.
if ($(document).height() - ($(window).height() + $(window).scrollTop()) < 500 ||
$(window).scrollTop() < 100 ){
    $('#lp-pom-box-54').removeClass('hide');
    $('#lp-pom-box-228').removeClass('hide');
}}
</script>

Double bonus

You could consider using JavaScript to selectively hide or show form fields at different points. That would allow you to show a longer form initially, for example, and a shorter form when it appears the second time, despite it actually being the same form each time.

For this, you’d just add to your .scroll JavaScript function above:

   if ($(document).height() - ($(window).height() + $(window).scrollTop()) < 75){    
// This part hides the “full name” part of the form, moves the submit button up and reduces the size of the box when we scroll down to less than 75 pixels away from the bottom of the page
    $('#container_full_name').addClass('hide');
    $('#lp-pom-box-54').stop().animate({height: "200px"},200);
    $('.lp-pom-button-60-unmoved').animate({top: '-=75'}, 200);
    $('#lp-pom-button-60').removeClass('lp-pom-button-60-unmoved');
    $('#lp-pom-button-60').addClass('lp-pom-button-60-moved');}
    else{
// This part adds the “full name” part back in to the form, moves the submit button back down and increases the size of the box if we scroll back up. 
    $('#container_full_name').removeClass('hide');
    $('#lp-pom-box-54').stop().animate({height: "300px"},200);
    $('.lp-pom-button-60-moved').animate({top: '+=75'}, 200);
    $('#lp-pom-button-60').removeClass('lp-pom-button-60-moved');     
    $('#lp-pom-button-60').addClass('lp-pom-button-60-unmoved');

When scrolling within 75px of the bottom of the page, our JavaScript hides the Full Name field, reduces the size of the box, and moves the button up. This could all happen when the form is hidden from view; I’ve just done it in view to demonstrate.

Conclusion

In the table below I’ve pulled together a quick list of the different solutions and their strengths and weaknesses.

SolutionStrengthsWeaknesses
Scrolling anchor buttonEasy implementation, little coding neededJarring user experience
iFrameMultiple different formsRequires building the form elsewhere and introduces some styling and analytics complexity to workflow
Sticky form (always there)Keeps and design tracking within one Unbounce projectCTA fatigue, using up a lot of page space
Sticky form (appearing and disappearing)The benefits of a sticky form, plus avoiding the CTA fatigue and large space requirementCSS knowledge required, can only use one form

Personally, my favorite has been the Sticky Form (appearing and disappearing) option, to reduce the need to integrate external tools, but if I had to use multiple different forms I could definitely imagine using an iFrame.

Which is your favorite? Have I missed any cool solutions? Feel free to ping me in the comments.

So You Want to Build a Chat Bot – Here’s How (Complete with Code!)

This is a copy of the post at https://moz.com/blog/chat-bot and is canonicalised there.




You’re busy and (depending on effective keyword targeting) you’ve come here looking for something to shave months off the process of learning to produce your own chat bot. If you’re convinced you need this and just want the how-to, skip to “What my bot does.” If you want the background on why you should be building for platforms like Google Home, Alexa, and Facebook Messenger, read on.

Why should I read this?

Do you remember when it wasn’t necessary to have a website? When most boards would scoff at the value of running a Facebook page? Now Gartner is telling us that customers will manage 85% of their relationship with brands without interacting with a human by 2020 and publications like Forbes are saying that chat bots are the cause.

The situation now is the same as every time a new platform develops: if you don’t have something your customers can access, you’re giving that medium to your competition. At the moment, an automated presence on Google Home or Slack may not be central to your strategy, but those who claim ground now could dominate it in the future.

The problem is time. Sure, it’d be ideal to be everywhere all the time, to have your brand active on every platform. But it would also be ideal to catch at least four hours sleep a night or stop covering our keyboards with three-day-old chili con carne as we eat a hasty lunch in between building two of the Next Big Things. This is where you’re fortunate in two ways;

  1. When we develop chat applications, we don’t have to worry about things like a beautiful user interface because it’s all speech or text. That’s not to say you don’t need to worry about user experience, as there are rules (and an art) to designing a good conversational back-and-forth. Amazon is actually offering some hefty prizes for outstanding examples.
  2. I’ve spent the last six months working through the steps from complete ignorance to creating a distributable chat bot and I’m giving you all my workings. In this post I break down each of the levels of complexity, from no-code back-and-forth to managing user credentials and sessions the stretch over days or months. I’m also including full code that you can adapt and pull apart as needed. I’ve commented each portion of the code explaining what it does and linking to resources where necessary.

I’ve written more about the value of Interactive Personal Assistants on the Distilled blog, so this post won’t spend any longer focusing on why you should develop chat bots. Instead, I’ll share everything I’ve learned.

What my built-from-scratch bot does

Ever since I started investigating chat bots, I was particularly interested in finding out the answer to one question: What does it take for someone with little-to-no programming experience to create one of these chat applications from scratch? Fortunately, I have direct access to someone with little-to-no experience (before February, I had no idea what Python was). And so I set about designing my own bot with the following hard conditions:

  1. It had to have some kind of real-world application. It didn’t have to be critical to a business, but it did have to bear basic user needs in mind.
  2. It had to be easily distributable across the immediate intended users, and to have reasonable scope to distribute further (modifications at most, rather than a complete rewrite).
  3. It had to be flexible enough that you, the reader, can take some free code and make your own chat bot.
  4. It had to be possible to adapt the skeleton of the process for much more complex business cases.
  5. It had to be free to run, but could have the option of paying to scale up or make life easier.
  6. It had to send messages confirming when important steps had been completed.

The resulting program is “Vietnambot,” a program that communicates with Slack, the API.AI linguistic processing platform, and Google Sheets, using real-time and asynchronous processing and its own database for storing user credentials.

If that meant nothing to you, don’t worry — I’ll define those things in a bit, and the code I’m providing is obsessively commented with explanation. The thing to remember is it does all of this to write down food orders for our favorite Vietnamese restaurant in a shared Google Sheet, probably saving tens of seconds of Distilled company time every year.

It’s deliberately mundane, but it’s designed to be a template for far more complex interactions. The idea is that whether you want to write a no-code-needed back-and-forth just through API.AI; a simple Python program that receives information, does a thing, and sends a response; or something that breaks out of the limitations of linguistic processing platforms to perform complex interactions in user sessions that can last days, this post should give you some of the puzzle pieces and point you to others.

What is API.AI and what’s it used for?

API.AI is a linguistic processing interface. It can receive text, or speech converted to text, and perform much of the comprehension for you. You can see my Distilled post for more details, but essentially, it takes the phrase “My name is Robin and I want noodles today” and splits it up into components like:

  • Intent: food_request
  • Action: process_food
  • Name: Robin
  • Food: noodles
  • Time: today

This setup means you have some hope of responding to the hundreds of thousands of ways your users could find to say the same thing. It’s your choice whether API.AI receives a message and responds to the user right away, or whether it receives a message from a user, categorizes it and sends it to your application, then waits for your application to respond before sending your application’s response back to the user who made the original request. In its simplest form, the platform has a bunch of one-click integrations and requires absolutely no code.

I’ve listed the possible levels of complexity below, but it’s worth bearing some hard limitations in mind which apply to most of these services. They cannot remember anything outside of a user session, which will automatically end after about 30 minutes, they have to do everything through what are called POST and GET requests (something you can ignore unless you’re using code), and if you do choose to have it ask your application for information before it responds to the user, you have to do everything and respond within five seconds.

What are the other things?

Slack: A text-based messaging platform designed for work (or for distracting people from work).

Google Sheets: We all know this, but just in case, it’s Excel online.

Asynchronous processing: Most of the time, one program can do one thing at a time. Even if it asks another program to do something, it normally just stops and waits for the response. Asynchronous processing is how we ask a question and continue without waiting for the answer, possibly retrieving that answer at a later time.

Database: Again, it’s likely you know this, but if not: it’s Excel that our code will use (different from the Google Sheet).

Heroku: A platform for running code online. (Important to note: I don’t work for Heroku and haven’t been paid by them. I couldn’t say that it’s the best platform, but it can be free and, as of now, it’s the one I’m most familiar with).

How easy is it?

This graph isn’t terribly scientific and it’s from the perspective of someone who’s learning much of this for the first time, so here’s an approximate breakdown:

LabelFunctionalityTime it took me
1You set up the conversation purely through API.AI or similar, no external code needed. For instance, answering set questions about contact details or opening timesHalf an hour to distributable prototype
2A program that receives information from API.AI and uses that information to update the correct cells in a Google Sheet (but can’t remember user names and can’t use the slower Google Sheets integrations)A few weeks to distributable prototype
3A program that remembers user names once they’ve been set and writes them to Google Sheets. Is limited to five seconds processing time by API.AI, so can’t use the slower Google Sheets integrations and may not work reliably when the app has to boot up from sleep because that takes a few seconds of your allocation*A few weeks on top of the last prototype
4A program that remembers user details and manages the connection between API.AI and our chosen platform (in this case, Slack) so it can break out of the five-second processing window.A few weeks more on top of the last prototype (not including the time needed to rewrite existing structures to work with this)

*On the Heroku free plan, when your app hasn’t been used for 30 minutes it goes to sleep. This means that the first time it’s activated it takes a little while to start your process, which can be a problem if you have a short window in which to act. You could get around this by (mis)using a free “uptime monitoring service” which sends a request every so often to keep your app awake. If you choose this method, in order to avoid using all of the Heroku free hours allocation by the end of the month, you’ll need to register your card (no charge, it just gets you extra hours) and only run this application on the account. Alternatively, there are any number of companies happy to take your money to keep your app alive.

For the rest of this post, I’m going to break down each of those key steps and either give an overview of how you could achieve it, or point you in the direction of where you can find that. The code I’m giving you is Python, but as long as you can receive and respond to GET and POST requests, you can do it in pretty much whatever format you wish.


1. Design your conversation

Conversational flow is an art form in itself. Jonathan Seal, strategy director at Mando and member of British Interactive Media Association’s AI thinktank, has given some great talks on the topic. Paul Pangaro has also spoken about conversation as more than interface in multiple mediums.

Your first step is to create a flow chart of the conversation. Write out your ideal conversation, then write out the most likely ways a person might go off track and how you’d deal with them. Then go online, find existing chat bots and do everything you can to break them. Write out the most difficult, obtuse, and nonsensical responses you can. Interact with them like you’re six glasses of wine in and trying to order a lemon engraving kit, interact with them as though you’ve found charges on your card for a lemon engraver you definitely didn’t buy and you are livid, interact with them like you’re a bored teenager. At every point, write down what you tried to do to break them and what the response was, then apply that to your flow. Then get someone else to try to break your flow. Give them no information whatsoever apart from the responses you’ve written down (not even what the bot is designed for), refuse to answer any input you don’t have written down, and see how it goes. David Low, principal evangelist for Amazon Alexa, often describes the value of printing out a script and testing the back-and-forth for a conversation. As well as helping to avoid gaps, it’ll also show you where you’re dumping a huge amount of information on the user.

While “best practices” are still developing for chat bots, a common theme is that it’s not a good idea to pretend your bot is a person. Be upfront that it’s a bot — users will find out anyway. Likewise, it’s incredibly frustrating to open a chat and have no idea what to say. On text platforms, start with a welcome message making it clear you’re a bot and giving examples of things you can do. On platforms like Google Home and Amazon Alexa users will expect a program, but the “things I can do” bit is still important enough that your bot won’t be approved without this opening phase.

I’ve included a sample conversational flow for Vietnambot at the end of this post as one way to approach it, although if you have ideas for alternative conversational structures I’d be interested in reading them in the comments.

A final piece of advice on conversations: The trick here is to find organic ways of controlling the possible inputs and preparing for unexpected inputs. That being said, the Alexa evangelist team provide an example of terrible user experience in which a bank’s app said: “If you want to continue, say nine.” Quite often questions, rather than instructions, are the key.

2. Create a conversation in API.AI

API.AI has quite a lot of documentation explaining how to create programs here, so I won’t go over individual steps.

Key things to understand:

You create agents; each is basically a different program. Agents recognize intents, which are simply ways of triggering a specific response. If someone says the right things at the right time, they meet criteria you have set, fall into an intent, and get a pre-set response.

The right things to say are included in the “User says” section (screenshot below). You set either exact phrases or lists of options as the necessary input. For instance, a user could write “Of course, I’m [any name]” or “Of course, I’m [any temperature].” You could set up one intent for name-is which matches “Of course, I’m [given-name]” and another intent for temperature which matches “Of course, I’m [temperature],” and depending on whether your user writes a name or temperature in that final block you could activate either the “name-is” or “temperature-is” intent.

The “right time” is defined by contextsContexts help define whether an intent will be activated, but are also created by certain intents. I’ve included a screenshot below of an example interaction. In this example, the user says that they would like to go to on holiday. This activates a holiday intent and sets the holiday context you can see in input contexts below. After that, our service will have automatically responded with the question “where would you like to go?” When our user says “The” and then any location, it activates our holiday location intent because it matches both the context, and what the user says. If, on the other hand, the user had initially said “I want to go to the theater,” that might have activated the theater intent which would set a theater context — so when we ask “what area of theaters are you interested in?” and the user says “The [location]” or even just “[location],” we will take them down a completely different path of suggesting theaters rather than hotels in Rome.

The way you can create conversations without ever using external code is by using these contexts. A user might say “What times are you open?”; you could set an open-time-inquiry context. In your response, you could give the times and ask if they want the phone number to contact you. You would then make a yes/no intent which matches the context you have set, so if your user says “Yes” you respond with the number. This could be set up within an hour but gets exponentially more complex when you need to respond to specific parts of the message. For instance, if you have different shop locations and want to give the right phone number without having to write out every possible location they could say in API.AI, you’ll need to integrate with external code (see section three).

Now, there will be times when your users don’t say what you’re expecting. Excluding contexts, there are three very important ways to deal with that:

  1. Almost like keyword research — plan out as many possible variations of saying the same thing as possible, and put them all into the intent
  2. Test, test, test, test, test, test, test, test, test, test, test, test, test, test, test (when launched, every chat bot will have problems. Keep testing, keep updating, keep improving.)
  3. Fallback contexts

Fallback contexts don’t have a user says section, but can be boxed in by contexts. They match anything that has the right context but doesn’t match any of your user says. It could be tempting to use fallback intents as a catch-all. Reasoning along the lines of “This is the only thing they’ll say, so we’ll just treat it the same” is understandable, but it opens up a massive hole in the process. Fallback intents are designed to be a conversational safety net. They operate exactly the same as in a normal conversation. If a person asked what you want in your tea and you responded “I don’t want tea” and that person made a cup of tea, wrote the words “I don’t want tea” on a piece of paper, and put it in, that is not a person you’d want to interact with again. If we are using fallback intents to do anything, we need to preface it with a check. If we had to resort to it in the example above, saying “I think you asked me to add I don’t want tea to your tea. Is that right?” is clunky and robotic, but it’s a big step forward, and you can travel the rest of the way by perfecting other parts of your conversation.

3. Integrating with external code

I used Heroku to build my app . Using this excellent weather webhook example you can actually deploy a bot to Heroku within minutes. I found this example particularly useful as something I could pick apart to make my own call and response program. The weather webhook takes the information and calls a yahoo app, but ignoring that specific functionality you essentially need the following if you’re working in Python:

#start
    req = request.get_json
    print("Request:")
    print(json.dumps(req, indent=4))
#process to do your thing and decide what response should be

    res = processRequest(req)
# Response we should receive from processRequest (you’ll need to write some code called processRequest and make it return the below, the weather webhook example above is a good one).
{
        "speech": “speech we want to send back”,
        "displayText": “display text we want to send back, usually matches speech”,
        "source": "your app name"
    }

# Making our response readable by API.AI and sending it back to the servic

 response = make_response(res)
    response.headers['Content-Type'] = 'application/json'
    return response
# End

As long as you can receive and respond to requests like that (or in the equivalent for languages other than Python), your app and API.AI should both understand each other perfectly — what you do in the interim to change the world or make your response is entirely up to you. The main code I have included is a little different from this because it’s also designed to be the step in-between Slack and API.AI. However, I have heavily commented sections like like process_food and the database interaction processes, with both explanation and reading sources. Those comments should help you make it your own. If you want to repurpose my program to work within that five-second window, I would forget about the file called app.py and aim to copy whole processes from tasks.py, paste them into a program based on the weatherhook example above, and go from there.

Initially I’d recommend trying GSpread to make some changes to a test spreadsheet. That way you’ll get visible feedback on how well your application is running (you’ll need to go through the authorization steps as they are explained here).

4. Using a database

Databases are pretty easy to set up in Heroku. I chose the Postgres add-on (you just need to authenticate your account with a card; it won’t charge you anything and then you just click to install). In the import section of my code I’ve included links to useful resources which helped me figure out how to get the database up and running — for example, this blog post.

I used the Python library Psycopg2 to interact with the database. To steal some examples of using it in code, have a look at the section entitled “synchronous functions” in either the app.py or tasks.py filesOpen_db_connection and close_db_connection do exactly what they say on the tin (open and close the connection with the database). You tell check_database to check a specific column for a specific user and it gives you the value, while update_columns adds a value to specified columns for a certain user record. Where things haven’t worked straightaway, I’ve included links to the pages where I found my solution. One thing to bear in mind is that I’ve used a way of including columns as a variable, which Psycopg2 recommends quite strongly against. I’ve gotten away with it so far because I’m always writing out the specific column names elsewhere — I’m just using that method as a short cut.

5. Processing outside of API.AI’s five-second window

It needs to be said that this step complicates things by no small amount. It also makes it harder to integrate with different applications. Rather than flicking a switch to roll out through API.AI, you have to write the code that interprets authentication and user-specific messages for each platform you’re integrating with. What’s more, spoken-only platforms like Google Home and Amazon Alexa don’t allow for this kind of circumvention of the rules — you have to sit within that 5–8 second window, so this method removes those options. The only reasons you should need to take the integration away from API.AI are:

  • You want to use it to work with a platform that it doesn’t have an integration with. It currently has 14 integrations including Facebook Messenger, Twitter, Slack, and Google Home. It also allows exporting your conversations in an Amazon Alexa-understandable format (Amazon has their own similar interface and a bunch of instructions on how to build a skill — here is an example.
  • You are processing masses of information. I’m talking really large amounts. Some flight comparison sites have had problems fitting within the timeout limit of these platforms, but if you aren’t trying to process every detail for every flight for the next 12 months and it’s taking more than five seconds, it’s probably going to be easier to make your code more efficient than work outside the window. Even if you are, those same flight comparison sites solved the problem by creating a process that regularly checks their full data set and creates a smaller pool of information that’s more quickly accessible.
  • You need to send multiple follow-up messages to your user. When using the API.AI integration it’s pretty much call-and-response; you don’t always get access to things like authorization tokens, which are what some messaging platforms require before you can automatically send messages to one of their users.
  • You’re working with another program that can be quite slow, or there are technical limitations to your setup. This one applies to Vietnambot, I used the GSpread library in my application, which is fantastic but can be slow to pull out bigger chunks of data. What’s more, Heroku can take a little while to start up if you’re not paying.

I could have paid or cut out some of the functionality to avoid needing to manage this part of the process, but that would have failed to meet number 4 in our original conditions: It had to be possible to adapt the skeleton of the process for much more complex business cases. If you decide you’d rather use my program within that five-second window, skip back to section 2 of this post. Otherwise, keep reading.

When we break out of the five-second API.AI window, we have to do a couple of things. First thing is to flip the process on its head.

What we were doing before:

User sends message -> API.AI -> our process -> API.AI -> user

What we need to do now:

User sends message -> our process -> API.AI -> our process -> user

Instead of API.AI waiting while we do our processing, we do some processing, wait for API.AI to categorize the message from us, do a bit more processing, then message the user.

The way this applies to Vietnambot is:

  1. User says “I want [food]”
  2. Slack sends a message to my app on Heroku
  3. My app sends a “swift and confident” 200 response to Slack to prevent it from resending the message. To send the response, my process has to shut down, so before it does that, it activates a secondary process using “tasks.”
  4. The secondary process takes the query text and sends it to API.AI, then gets back the response.
  5. The secondary process checks our database for a user name. If we don’t have one saved, it sends another request to API.AI, putting it in the “we don’t have a name” context, and sends a message to our user asking for their name. That way, when our user responds with their name, API.AI is already primed to interpret it correctly because we’ve set the right context (see section 1 of this post). API.AI tells us that the latest message is a user name and we save it. When we have both the user name and food (whether we’ve just got it from the database or just saved it to the database), Vietnambot adds the order to our sheet, calculates whether we’ve reached the order minimum for that day, and sends a final success message.

6. Integrating with Slack

This won’t be the same as integrating with other messaging services, but it could give some insight into what might be required elsewhere. Slack has two authorization processes; we’ll call one “challenge” and the other “authentication.”

Slack includes instructions for an app lifecycle here, but API.AI actually has excellent instructions for how to set up your app; as a first step, create a simple back-and-forth conversation in API.AI (not your full product), go to integrations, switch on Slack, and run through the steps to set it up. Once that is up and working, you’ll need to change the OAuth URL and the Events URL to be the URL for your app.

Thanks to github user karishay, my app code includes a process for responding to the challenge process (which will tell Slack you’re set up to receive events) and for running through the authentication process, using our established database to save important user tokens. There’s also the option to save them to a Google Sheet if you haven’t got the database established yet. However, be wary of this as anything other than a first step — user tokens give an app a lot of power and have to be guarded carefully.

7. Asynchronous processing

We are running our app using Flask, which is basically a whole bunch of code we can call upon to deal with things like receiving requests for information over the internet. In order to create a secondary worker process I’ve used Redis and Celery. Redis is our “message broker”; it makes makes a list of everything we want our secondary process to do. Celery runs through that list and makes our worker process do those tasks in sequence. Redis is a note left on the fridge telling you to do your washing and take out the bins, while Celery is the housemate that bangs on your bedroom door, note in hand, and makes you do each thing. I’m sure our worker process doesn’t like Celery very much, but it’s really useful for us.

You can find instructions for adding Redis to your app in Heroku here and you can find advice on setting up Celery in Heroku here. Miguel Grinberg’s Using Celery with Flask blog post is also an excellent resource, but using the exact setup he gives results in a clash with our database, so it’s easier to stick with the Heroku version.

Up until this point, we’ve been calling functions in our main app — anything of the form function_name(argument_1, argument_2, argument_3). Now, by putting “tasks.” in front of our function, we’re saying “don’t do this now — hand it to the secondary process.” That’s because we’ve done a few things:

  • We’ve created tasks.py which is the secondary process. Basically it’s just one big, long function that our main code tells to run.
  • In tasks.py we’ve included Celery in our imports and set our app as celery.Celery(), meaning that when we use “app” later we’re essentially saying “this is part of our Celery jobs list” or rather “tasks.py will only do anything when its flatmate Celery comes banging on the door”
  • For every time our main process asks for an asynchronous function by writing tasks.any_function_name(), we have created that function in our secondary program just as we would if it were in the same file. However in our secondary program we’ve prefaced with “@app.task”, another way of saying “Do wash_the_dishes when Celery comes banging the door yelling wash_the_dishes(dishes, water, heat, resentment)”.
  • In our “procfile” (included as a file in my code) we have listed our worker process as –app=tasks.app

All this adds up to the following process:

  1. Main program runs until it hits an asynchronous function
  2. Main program fires off a message to Redis which has a list of work to be done. The main process doesn’t wait, it just runs through everything after it and in our case even shuts down
  3. The Celery part of our worker program goes to Redis and checks for the latest update, it checks what function has been called (because our worker functions are named the same as when our main process called them), it gives our worker all the information to start doing that thing and tells it to get going
  4. Our worker process starts the action it has been told to do, then shuts down.

As with the other topics mentioned here, I’ve included all of this in the code I’ve supplied, along with many of the sources used to gather the information — so feel free to use the processes I have. Also feel free to improve on them; as I said, the value of this investigation was that I am not a coder. Any suggestions for tweaks or improvements to the code are very much welcome.


Conclusion

As I mentioned in the introduction to this post, there’s huge opportunity for individuals and organizations to gain ground by creating conversational interactions for the general public. For the vast majority of cases you could be up and running in a few hours to a few days, depending on how complex you want your interactions to be and how comfortable you are with coding languages. There are some stumbling blocks out there, but hopefully this post and my obsessively annotated code can act as templates and signposts to help get you on your way.

Grab my code at GitHub


Bonus #1: The conversational flow for my chat bot

This is by no means necessarily the best or only way to approach this interaction. This is designed to be as streamlined an interaction as possible, but we’re also working within the restrictions of the platform and the time investment necessary to produce this. Common wisdom is to create the flow of your conversation and then keep testing to perfect, so consider this example layout a step in that process. I’d also recommend putting one of these flow charts together before starting — otherwise you could find yourself having to redo a bunch of work to accommodate a better back-and-forth.

Bonus #2: General things I learned putting this together

As I mentioned above, this has been a project of going from complete ignorance of coding to slightly less ignorance. I am not a professional coder, but I found the following things I picked up to be hugely useful while I was starting out.

  1. Comment everything. You’ll probably see my code is bordering on excessive commenting (anything after a # is a comment). While normally I’m sure someone wouldn’t want to include a bunch of Stack Overflow links in their code, I found notes about what things portions of code were trying to do, and where I got the reasoning from, hugely helpful as I tried to wrap my head around it all.
  2. Print everything. In Python, everything within “print()” will be printed out in the app logs (see the commands tip for reading them in Heroku). While printing each action can mean you fill up a logging window terribly quickly (I started using the Heroku add-on LogDNA towards the end and it’s a huge step up in terms of ease of reading and length of history), often the times my app was falling over was because one specific function wasn’t getting what it needed, or because of another stupid typo. Having a semi-constant stream of actions and outputs logged meant I could find the fault much more quickly. My next step would probably be to introduce a way of easily switching on and off the less necessary print functions.
  3. The following commandsHeroku’s how-to documentation for creating an app and adding code is pretty great, but I found myself using these all the time so thought I’d share (all of the below are written in the command line; type cmd in on Windows or by running Terminal on a Mac):
    1. CD “””[file location]””” – select the file your code is in
    2. “git init” – create a git file to add to
    3. “git add .” – add all of the code in your file into the file that git will put online
    4. “git commit -m “[description of what you’re doing]” “ – save the data in your git file
    5. “heroku git:remote -a [the name of your app]” – select your app as where to put the code
    6. “git push heroku master” – send your code to the app you selected
    7. “heroku ps” – find out whether your app is running or crashed
    8. “heroku logs” – apologize to your other half for going totally unresponsive for the last ten minutes and start the process of working through your printouts to see what has gone wrong
  4. POST requests will always wait for a response. Seems really basic — initially I thought that by just sending a POST request and not telling my application to wait for a response I’d be able to basically hot-potato work around and not worry about having to finish what I was doing. That’s not how it works in general, and it’s more of a symbol of my naivete in programming than anything else.
  5. If something is really difficult, it’s very likely you’re doing it wrong. While I made sure to do pretty much all of the actual work myself (to avoid simply farming it out to the very talented individuals at Distilled), I was lucky enough to get some really valuable advice. The piece of advice above was from Dominic Woodman, and I should have listened to it more. The times when I made least progress were when I was trying to use things the way they shouldn’t be used. Even when I broke through those walls, I later found that someone didn’t want me to use it that way because it would completely fail at a later point. Tactical retreat is an option. (At this point, I should mention he wasn’t the only one to give invaluable advice; Austin, Tom, and Duncan of the Distilled R&D team were a huge help.)