Pandas is a Python library that lets us do Excel-type-stuff. Well, that’s not really giving it the credit it deserves. Pandas is a Python library which makes Excel-type stuff waaaaaaaaaaaaaaaaaay easier.
You might have seen me speak about how Jupyter Notebooks can make our lives easier as marketers (if not – you’ve clearly been missing out on Distilled Searchlove and you should absolutely buy a ticket). A lot of the examples I use are to do with how using Pandas is much much easier than trying to do the same stuff in Excel.
One function I haven’t been able to talk about on-stage is melt. As I said in the title, melt is kind of like unpivot and it is one of the best functions in Pandas because it lets us easily do things that wouldn’t just be harder in Excel – they would be pretty much impossible for anyone who isn’t a pretty advanced Excel user.
When is melt useful?
Think of it like this. Imagine you have a table of people and kinds of expertise;
Person | Expertise 1 | Expertise 2 | Expertise 3 |
Sharon | Tech SEO | Archery | Plumbing |
Jane | Wildlife Taming | Plumbing | Astrophysics |
Bueller | Looking Around | Roller Blading | Dancing |
Imagine you need to find two contractors – someone to fix the plumbing in your house and someone to fix the plumbing in your site (tech SEO). So you want to get a list of all of the people who have tech SEO listed as a skill, and separately find a list of all the people who have plumbing as a skill.
If you were doing this in Excel, you would probably have to ask at least six questions;
- Who has Tech SEO as skill number 1? OR
- Who has Tech SEO as skill number 2? OR
- Who has Tech SEO as skill number 3?
- Who has Plumbing as skill number 1? OR
- Who has Plumbing as skill number 2? OR
- Who has Plumbing as skill number 3?
You could be thinking “ah I’d just copy and paste that around” or “I don’t mind asking six questions” but things get much more complicated when we apply this to a real-world situation.
Imagine we are an e-commerce site. We have crawled a competitor website and we’ve extracted all of the product titles they have listed on their category pages. Maybe they have started selling products with our name on them and we need to know which ones. You could have other reasons – I’ll claim ignorance.
Now we have a table in a similar format but it looks quite different. There are way more columns to deal with.
URL | Product 1 | Product 2 | …. | Product 999 |
page-1 | Red dress | Competitor’s dress | Yellow dress | |
page-50 | Full length jeans | Jean shorts | Competitor’s jeans | |
page-500 | Competitor’s jumper | Cardigan | — |
So now, assuming the brand name is always used in the same way we’re going to have to ask up to 999 questions for every category page on their site. Ain’t nobody got time for that.
Another example case is trying to fix redirect chains. If we use Screaming Frog to crawl our site, the redirect chains report will look something like the below. We have a column showing us where each redirect chain ends up, and all of the individual steps.
If we want to say to our devs – we need to make sure each of these redirects goes straight to the final address page, that’s going to be a massive pain for them – scrolling left to right to find each individual hop – and let’s be fair, it’ll probably be pushed to the bottom of their list.
Final Address | Redirect 1 | Redirect 2 | …. | Redirect 20 |
page-1 | page-a | page-b | page-c | |
page-2 | page-d | page-e | page-f | |
page-3 | page-g | page-h | page-i |
Our problem is we can’t say to Excel – “all of these columns are essentially the same and I want to treat them that way”. This is where Melt comes in.
What does melt do?
With melt we can stack columns on top of each other so we can look at them all at the same time.
With melt – we give the columns we want to stay in the same place, and the columns we want to turn into rows and Pandas handles that for us. Let’s use our first example to see how this works. So we start with a table like this, with the Expertise split across different columns.
Person | Expertise 1 | Expertise 2 | Expertise 3 |
Sharon | Tech SEO | Archery | Plumbing |
Jane | Wildlife Taming | Plumbing | Astrophysics |
Bueller | Looking Around | Roller Blading | Dancing |
We say to melt – the Person column should stay in place, and it should melt everything else. What we get back looks like this;
Person | Variable | Value |
Sharon | Expertise 1 | Tech SEO |
Sharon | Expertise 2 | Archery |
Sharon | Expertise 3 | Plumbing |
Jane | Expertise 1 | Wildlife Taming |
Jane | Expertise 2 | Plumbing |
etc. | etc. | etc. |
Now we only have to ask one question by looking in the values column. Same goes for our product scraping or redirect examples;
Final Address | Variable | Value |
page-1 | Redirect 1 | page-a |
page-1 | Redirect 2 | page-b |
page-1 | Redirect 20 | page-c |
etc. | etc. | etc. |
URL | Variable | Value |
page-1 | Product 1 | Red dress |
page-1 | Product 2 | Competitor’s dress |
page-1 | Product 999 | Yellow dress |
etc. | etc. | etc. |
If you’d like to try using melt, the Pandas documentation is a great source of information. If you’d like to get stuck in with a real-world example can could make a difference to your site today – I also use the redirect example in the free Google Colab code I include in my How to fix broken links post