Why Do Metrics Always Lie?

We’ve all come across the phrase “Lies, Damned Lies, & Statistics” which was popularised by Mark Twain in the nineteenth century. And we’re used to politicians using metrics and statistics to prove any point they want to. See my previous example of COVID test numbers or “number theatre” as Professor Sir David Spiegelhalter calls it. His critique to the UK Parliament of the UK government’s metrics used in COVID briefings is sobering reading. We’re right to be sceptical of metrics we see. But we should avoid moving from scepticism to cynicism. Unfortunately, because we see so many examples of the misuse of metrics, we can end up mistrusting all of them and not believing anything.

Metrics can tell us real truths about the world. Over 150 years ago, Florence Nightingale used metrics to demonstrate that more British soldiers were dying in the Crimean War from disease than from fighting. Her use of data eventually saved thousands of lives. Similarly with Richard Doll and Austin Bradford Hill who demonstrated in 1954 the link between smoking and lung cancer. After all, science relies on the use of data and metrics to prove or disprove theories and to progress.

So we should be sceptical when we see metrics being used – we should especially ask who is presenting them and how impartial they might be. We should use our critical thinking skills and not simply accept at face value. What question is the metric trying to answer? Spiegelhalter and others argue for five principles for trustworthy evidence communication:

    • Inform, not persuade
    • Offer balance but not false balance
    • Disclose uncertainties
    • State evidence of quality
    • Pre-empt misinformation

If everyone using metrics followed these principles, then maybe we would no longer be talking about how metrics lie – but rather about the truths they can reveal.

 

Text: © 2021 Dorricott MPI Ltd. All rights reserved.

Image by D Miller from Pixabay

Training for KPIs – Why Bother?

I was facilitating a brainstorm session recently as part of a discussion on the challenges of using Key Performance Indicators (KPIs). People spend a lot of time deciding on their KPIs and often horse-trading over targets. But we were discussing how people actually use the KPIs. After all, KPIs are not an end in themselves. They are there to serve a purpose. To shed light on processes and performance and help move people from the subjective to the objective.

The brainstorm raised lots of good ideas such as:

    • The importance of getting senior level buy-in
    • Regular review of the KPIs
    • Rules on actions to take if there are more than a certain number of “red” KPIs
    • The importance of making sure the definitions are clear

But no-one raised the question of training. I found this intriguing. Do people think that once you have a set of KPIs being reported, everyone somehow automatically knows how to use them? I’m not at all convinced that everyone does know. Sometimes, teams spend a whole meeting debating the possible reasons of why this month’s KPI value is slightly lower than last month’s. They don’t like the idea that perhaps it’s just noise and is unlikely to be worth the time investigating (until we see an actual trend). I’ve been in meetings where most of the KPIs are red and managers throw up their hands because they don’t know what to do. They just hope next month is better. Followers of this blog know that the Pareto Principle would help here. Or maybe the manager gets frustrated and tells the staff the numbers have to get better…which you can always do by playing with definitions rather than actually improving the underlying processes.

There are opportunities to learn more about interpreting data – such as books by Tim Harford, Don Wheeler, Davis Balestracci; or workshops such as at the upcoming MCC vSummit or even from DMPI – but I wonder whether it’s a case of people not knowing what they don’t know? Interpreting KPI data isn’t easy. It needs critical thinking and careful consideration. We should not accept the adage “Lies, damned lies, and statistics!

If people are not trained to use and interpret KPIs, should you bother with collecting and reporting KPIs at all?

 

Text: © 2021 Dorricott MPI Ltd. All rights reserved.

Picture: Tumisu, pixabay

When is a test not a test?

First, I hope you are keeping safe in these disorienting times. This is certainly a time none of us will forget.

There have been lots of really interesting examples during this pandemic of the challenge of measurement. We know that science is key to us getting through this with the minimum impact and measurement is fundamental to science. I described a measurement challenge in my last post. Here’s another one that caught my eye. Deceptively simple and yet…

On 2-Apr-2020, the UK Government announced a target of 100,000 COVID-19 tests a day by the end of April. On 30-Apr-2020, they reported 122,347 tests. So they met the target, right? Well, maybe. To quote the great Donald J. Wheeler’s First Principle for Understanding Data “No data have meaning apart from their context”. So, let’s be sceptical for a moment and see if we can understand what these 122,347 counts actually are. Would it be reasonable to include the following in the total?

    • Tests that didn’t take place – but where there was the capacity to run those tests
    • Tests where a sample was taken but has not yet been reported on as positive or negative
    • The number of swabs taken within a test – so a test requiring two swabs which are both analysed counts as two tests
    • Multiple tests on the same patient
    • Test kits that have been sent out by post on that day but have not yet been returned (and may never be returned)

You might think that including some of these is against the spirit of the target of 100,000 COVID-19 tests a day. Of course, it depends what the question is that the measurement is trying to answer. Is it the number of people who have received test results? Or is it the number of tests supplied (whether results are in or not)? In fact, you could probably list many different questions – each that would give different numbers. Reporting from the Government doesn’t go into all this detail so we’re not sure what they include in their count. And we’re not really sure what question they are asking.

And these differences aren’t just academic. The 122,347 tests include 40,369 test kits that were sent on 30-Apr-2020 but had not been returned (yet). And 73,191 individual patients were tested i.e. a significant number of tests were repeat tests on the same patients.

So, we should perhaps not take this at face value, and we need to ask a more fundamental question – what is the goal we are trying to achieve? Then we can develop measurements that focus on telling us whether the goal has been achieved. If the goal is to have tests performed for everyone that needs them then a simple count of number of tests is not really much use on its own.

As to whether it is wise to set an arbitrary target for a measurement which seems of limited value? To quote Nicola Stonehouse, professor in molecular virology at the University of Leeds, “In terms of 100,000 as a target, I don’t know where that really came from and whether that was a plucked out of thin air target or whether that was based on any logic.” On 6-May-2020, the UK Government announced a target of 200,000 tests a day by the end of May.

Stay safe.

 

Text: © 2020 Dorricott MPI Ltd. All rights reserved.

Picture – The National Guard

Would You Give Me 10 out of 10?

After a recent intercontinental flight, my luggage didn’t turn up on the carousel. Not a great feeling! I was eventually reunited with my bag – more about that in a future post. The airline sent me a survey about the flight and offered a small incentive to complete it. I felt I had something to say and so clicked the button to answer the ‘short’ survey. It went on for page after page asking about the booking process, how I obtained my boarding card, whether my check-in experience was acceptable, on-board entertainment, meals etc. etc. After the first few pages I gave up. And I’m sure I’m not the only one to give up part way through. Why do companies go over the top when asking for feedback? What do they do with all the data?

I’ve come across a number of examples where data from surveys is not really used. At one company, whenever someone resigned, they were asked to complete an exit survey online. I asked HR if I could see the results from the survey as we were concerned about staff retention and I wondered if it might be a useful source of information. They said they had no summary because no-one had ever analysed the data. No-one ever analyses the data? It is disrespectful of people’s time and also misleading them to ask them to complete a survey and then ignore their responses. What on earth were they running the survey for? This is an extreme version of a real danger with surveys – doing them without knowing how you plan to use the data. If you don’t know before you run the survey, don’t run it!

Of course, there are also cases where you know the survey data itself is misleading. I heard a story of someone who worked as a bank teller and was asked to make sure every customer completed a paper survey. They had to get at least 10 completed every day. These were then all forwarded to head office to be entered into a system and analysed. The problem was that the customers did not want to complete the surveys – they were all too busy. So what did the bank tellers do? They got their friends and family to complete them so that they met their 10 per day target. I wonder how many hours were spent analysing the data from those surveys, reporting on them, making decisions and implementing changes. When running a survey, be mindful of how you gather the data – using the wrong incentives might lead to very misleading results.

Another way that incentives can skew your data is by tying financial incentives to the results. At Uber (in 2015 at least) you need an average driver score of 4.6 out of 5 to continue as a driver. So if a passenger gives you 4 out of 5 (which they might think of as a reasonable score), you need another two passengers to give you 5 out of 5 to make up for it. And if a passenger gives you a 3 you need another four passengers to give you a 5 to get you back to 4.6 average. What behaviour does that drive? Some good for sure – trying to improve the passenger experience. But could there also be drivers who make it clear to the passenger how their livelihood depends on getting a top mark of 5 as is apparently common in car dealerships? This data set is surely skewed.

It’s easy to come up with questions and set up a survey. But it’s much more difficult to do it well. Here’s a great article on the “10 big mistakes people make when running customer surveys” along with great suggestions on how to analyse your survey data using Excel.

Talking of surveys, please make sure you ‘like’ this post!

 

Text: © 2017 Dorricott MPI Ltd. All rights reserved.

Lies, Damned Lies and Statistics

Change of plan for this post after receiving a mailshot from a local estate agent…

Statistics are the at the heart of the scientific method. They help us to prove or disprove hypotheses (to a certain level of confidence) and so make the discussion about facts rather than opinion. They have huge power – both to reveal the truth but also to hide it when used wrongly.

When I was 16, I received, as a present, the book “How to Lie with Statistics” by Darrell Huff. OK, so perhaps I was rather an odd teenager as I thought this book was fantastic. I am pleased to see it is still available from good bookstores. It has stood me in good stead for many years as I always go to graphs in any article I read and I always wonder what the author is trying to show (or hide). So when I received a mailshot through the door recently from an estate agent full of pretty graphs I was impressed to see that they were able to demonstrate many of Huff’s observations in one glossy sheet of paper.

The first graph the mailshot has is what Huff calls a “gee whiz graph”. It’s the one below. They state that they have done some “spatial interpolation of property price data, also known as number crunching!” They go on to explain helpfully that “for every 0.25km you live closer to the station, the average property price rose by £2700.” Do you believe them?

Huff describes this use of statistics as “statisticulation” which I rather like. Of course, what they have done is “supress zero” on the y-axis without any warning – cutting off 89% of the bar on the left and 98% of the bar on the right. The bar on the left is nine times the height of the bar on the right even though the numerical difference is just 10%. But, of course, the graph begs many more questions – such as what sort of average is shown? (see Huff’s “Well chosen average”) Is the difference statistically significant? (see Huff’s “Much ado about nothing”) How many properties are included in the figures? Is the mix of properties the same within both radii? And what if I tell you that within 5km of the particular train station they are talking about, there are actually another 9 train stations – including at least one that has many more commuter trains stopping regularly? And there a well-regarded school nearby that it is known many parents want to live near to in order to increase the chance of their child attending. Could that be a factor?

Of course, even if they are able to prove a correlation between distance from station and property price (which they certainly haven’t with the data above), we know that “correlation does not imply causation” and Huff describes it in his chapter “Post hoc rides again”. It reminds me of the annual story of how living near Waitrose (a top-end UK supermarket) can increase the value of your home. Could it be that wealthier people tend to shop at high-end supermarkets and so high-end supermarkets locate where wealthier people live (in more expensive properties).

Another of the graphs is shown below. Along with the text “People are at lots of different stages of their lives. The largest number of people are Retired which accounts for 20.5% of the total. This is 0.4% lower than the national average.” Is this what you take as the most interesting feature of the graph?

When I look at the graph (I am assuming the data is accurate and they claim it comes from the Office for National Statistics so I think that’s OK), the tiny difference in the red and grey bars for Retired is not what strikes me. I would say it looks as though this area has more families and Empty Nesters than the average. But, of course, I don’t really know because I don’t know whether the differences are statistically significant (see Huff’s “Much ado about nothing”). We can be reasonably confident that the larger differences are likely to be significant because the sample is large. But could we really say that there are 0.4% fewer Retired households than the national average? I think it likely this is within the range of error and that we can’t really say whether there is any difference – but I don’t know, of course, because there are no numbers shown for the samples. We only have percentages. It also starts me wondering about how the data is collected. What about a house with grown-up children where one of those grown-up children has had a child (i.e. three generations in the house), which category does that fall into? And a couple without children – are they a Young Family? What if they are older but not retired? Or a split family where one parent looks after the children one week and the other the next week? And how does the Office of National Statistics know what type of family is living in each property? After all, people are moving all the time whether buying/selling but also moving in with others or moving out.

You get the point.

Statistics and data can tell us so much. They are the bedrock of the scientific method. But we must always be sceptical and question them. Who is telling us? How do they know? What’s missing? Does it make sense? Or as Huff puts it “talk back to a statistic”!

In my next post I will go back to looking at the DIGR® method of root cause analysis by looking in some more detail at the G of DIGR®. How using process maps can really help everyone involved to Go step by step and start to see where a process might fail.

 

Text © 2017 Dorricott MPI Ltd. All rights reserved.

DIGR® is a registered trademark of Dorricott MPI Ltd.