Carolyn graciously lent me a copy of “How to Lie with Statistics” by Darrell Huff. Despite its 1954 publication date, this book is remarkably relevant today. Below, I explain why the book, despite its high quality, will never achieve its aim, and my suggestion for a substitute.
How to Lie with Statistics is a gentle introduction to deceit with numbers. It is brief, the writing is elegant and light-hearted, and every single one of the lies described in the book is still in widespread use sixty years later. The book includes an informal catalogue of common statistical errors, reserving special scorn for the Precision Bias.
It is a valiant effort to craft an accessible and persuasive introduction to the issues. The author seems to believe that with sufficient widespread education, we can banish misleading numbers. I disagree. The problem is hard, in that the tiny individual payoff will never justify the effort needed to detect and oppose numerical deception. We need an easy way of certifying and enforcing honest data presentation.
The core of Statistics is the comparison of expectations to results. All of the lies present accurate and precise numerical results (technical honesty) but mislead about the appropriate comparable expectation (de facto dishonesty). The situation is complicated by the fact that even professionals frequently have difficulty crafting the proper expectations. Malicious numerists always have plausible deniability.
To put it another way, statisticians have considerable flexibility in methods and presentation. Special interests abuse the flexibility for their own purposes.
There is an analogy to accounting. Accountants have considerable flexibility in methods and presentation of financial results. Accounting is about leveraging that flexibility to avoid taxes. In response to the inevitable plethora of abuses, accountants developed the Generally Accepted Accounting Principles (GAAP), a catalogue of rules to govern the business.
I propose the development of Generally Accepted Numerical Principles. We must formalize the Expectation side of the statistical Expectation-Results dichotomy so that we may call out a liar and impose consequences where necessary.
How might such a system work? I would leave the details to the expert statisticians, but one way would be to develop a formal catalogue of Expectations given specific Results. It might look like something like the following (though this is not the formal proposal):
Use of “Average”
A number called an “average” in isolation entails the following assumptions:
- The number presented is an arithmetic mean.
- The sample of the average is an unbiased representative of the stated population
- The population has a normal distribution in the variable.
- The median is within 0.1 standard deviations of the mean.
- p <>
N out of M/Percentages
A statement of the form “N out of M Practitioners
- The sample is an unbiased representative of the stated population
- The population has a normal distribution in the variable
- p <>
Line Graphs
A line graph must:
- Have axes labelled and units included
- y-axis has 0 at the origin and no discontinuities
- All data points collected with equal sample characteristics
Appropriate uses could be given a trustworthy logo or stamp. Publications could be “GANP 2011 certified” indicating that they obey the rules of the GANP. It would become easy for lay people know what numbers to trust.
Obviously, the development of such a catalogue would be a monumental task. The organizing committees would be subject to perpetual corruption and interference attempts. The first several iterations of the GANP would permit rampant abuses while loopholes were found and closed. Chaos, confusion and doubt would run amok. During the development of the rules, at least 452,235,239 people will die and more than 1.37 billion will suffer in poverty. Nevertheless, four out of five University of Toronto experts agree, this is a good idea.