Data Use, Misuse and Abuse...

Posted by James Dunford Wood 19 Jul 13

  • "The average human will eat one pound of insects in their lifetime."
  • "If you fart constantly for 6 years, 9 months and 23 days you would produce enough gas to explode an atomic bomb."
  • "Britons on average shed 7.6 pounds each before they go on holiday."

Data - you gotta love it! It is all around us. In many cases it is used for fun, like the first two examples above - useless, perhaps, but harmless. We will overlook the process used to get the 'statistic', and the fact that it is almost certainly inaccurate and based on all sorts of assumptions.

The third statistic is much more likely to be accurate - and, as I shall explain in a moment - it's also much more pernicious. As Benjamin Disraeli said (as quoted by Mark Twain), "There are three kinds of lies: lies, damned lies, and statistics."

Our love affair with data has been going on for a while now. I first started to notice it in soccer - suddenly the commentators were able to tell us how many yards each player had run in the course of the match, how many completed passes had been made by each player, and how many tackles, with percentage possession for each side. At Wimbledon, meanwhile, we are treated to the percentage of first serves won, or a graph plotting the numbers of unforced errors during the course of a five set match.

Federer Roddick statistics

This stuff is now embedded in every day life, almost without us noticing. It's the processing speeds of modern day computers that have unlocked it all for us.

Unfortunately others have joined in the bandwagon - people who have things to sell us, who think that if the statistic is surprising enough the press will pick it up and we'll register their brand. Every week, for example, I receive spurious survey results emails from some PR company or other, with some totally meaningless data. For example the latest tells me that 10 million (24%, they add helpfully) go on a pre-holiday diet before they travel, and 10.6 million (25%) increase the amount of exercise they do. The 'startling' conclusion is that Britons on average shed 7.6 pounds each before they reach the beach - and because most of us can't visualise that, they helpfully clarify it: 'equivalent to just over half a stone each.'

This, of course, is a totally useless statistic, and the average means nothing. The fact that some people spend a lot of time losing weight before they go on holiday has no bearing on the rest of us who carry on as normal. This is what I would label data being abused for a commercial end.

So data, in and of itself, can be good, meaningless or, in some cases, bad. As often as it reveals a truth, it can be misleading. And it's not just in the way it is generated, it's also in the way we apply it. Because many of us intrinsically trust data, so we get lazy, and do the bare minimum to qualify or correlate it. In some cases, it's not only misleading, it can be damaging, especially if the recipient of the data is basing business decisions on it.

Nowhere is this more true than in ecommerce. Brought up as we have been on Google Analytics, a supremely complex tool at the best of times, we restrict ourselves to tracking data at the most superficial level, and more often than not draw the wrong conclusions.

Here are three examples of where looking at data at a superficial level can lead to the wrong answer.

Average Order Value

Online retailers often assume that their best customers are those who spend the most and, ooh, there it is, a nice big stat tracking 'Average Order Values' by channel. So therefore the channel that attracts the biggest basket sizes must be the one to focus on, right? Wrong! Because there are other variables at play here, variables that GA is either unable to track, or makes it very hard to. First, you need to see what is in the baskets. Perhaps these so-called big baskets are made up of loads of low cost items like socks, or items where your margin is lowest or, worst of all, items that are likely to be returned. How do you know that? Because somebody has bought four editions of the same dress - either in different sizes or in different colours. Chances are, 75% (or 3, to non-statisticians), will be returned. (Incidentally this is the norm in Germany, where most online purchase are settled post delivery by invoice - it's perfectly normal for a German online shopper to buy 6 of one item, return 5 and get invoiced for one - they have renamed their sitting rooms their fitting rooms.)

It's also a far more useful indicator of a valuable customer to look at what they have spent on individual items - for example someone who buys a 50 quid t-shirt will be a far better prospect that one who spends what I would spend - i.e. ten pounds max! But if that's all they have bought, then their £50 'order value' will rank very poorly against the guy who bought all those discount socks...

Tracking Online Ad Campaigns

A more commercially damaging example is in tracking ROI on advertising, where for years advertisers have been working it out on the 'last click' rule - mainly because they had no other way of doing it. The data captured in this way attributes the sale to the source that drove the person MOST RECENTLY to make the purchase, but on further analysis it's obvious that most shoppers visit at least three or four sites before finally purchasing (it's 15 in travel), and they might have seen your products advertised in a number of places. They might also have visited your own site several times via a number of channels, not all of them paid for. And typically Google gets credited with the final sale because we are lazy, and type the brand WE ARE ALREADY LOOKING FOR into the google search bar.

It's only now that advertisers have been working out proper 'attribution models' that give proportionate weight to each 'touch point' in the customer journey. Prior to that, advertisers and agencies had to justify their spend with whatever data they could get, and the easiest to get their hands on was the last click business - the wrong data! Google - 95% of whose vast revenues come from their Adwords advertising platform - have made a fortune out of these inefficiencies.

The Frosties Conundrum

Screen Shot 2013 07 11 at 10.50.10

The last example comes from a well known supermarket. They had two types of cereal on their shelves - a jumbo pack of Frosties, and a smaller packet of Grapenuts. However, they were squeezed for space, so had to make a decision on which to remove. So they looked at the data. At first glance, the data looked clear - more people bought the Frosties than the Grapenuts by a factor of two, and the revenue uplift was even greater, so the obvious answer would be to remove the Grapenuts. Right? Wrong!

When they looked further - no, wait! If anyone can tell me why it was the wrong call, please add your comment below! Or if you know of any other blatant misuses of data, let us know!

Top Posts