Review By Ardian Gill
The Art of Statistics: How to Learn From Data
By David Spiegelhalter
A quick quiz: How many books on statistics have you read that begin with 215 murders? Everyone who answers correctly gets to buy Spiegelhalter’s book. And what is the point of murders in a book of that genre?
It turns out that data on gender, time of day, and location of the crimes described the daily routine of one Dr. Shipman. He had mainly female patients; he made his house calls in the afternoon and of course his route was known. The book calls the result “interocular” or “hits between the eyes.” Spiegelhalter could have gotten by with “obvious” but that wouldn’t be literature, which the book borders on, despite being mathematical. In his introduction he promises to “focus on a range of problems ranging from the expected benefit of different therapies immediately following breast cancer surgery, to why old men have big ears.” Bet you can’t wait for the latter.
In a chapter titled “Getting Things in Proportion,” we are introduced to the art of framing: In essence, positive statements are preferred to negative. A 95% survival rate is better received than a 5% mortality rate for a child’s surgery. And numbers can be scarier than percentages: 10,000 teenagers in London are violent vs. 1% of the teenage population—better still, 99% are nonviolent.
Another means of presentation is odds. Then there are averages, mean, median, and mode. And don’t forget range and standard deviation. The author uses all of these in presenting the number of lifetime sexual partners in Great Britain according to a survey of people aged 35-44. While the mean and median are higher for males, range gives the nod to females: 550 vs. 500 for men. We are advised, without explanation, that standard deviation measures are inappropriate for the spread in this study. The phrase “social acceptability bias” is shown in this survey when some women were connected to a lie detector and some were not. Those connected reported more partners. (They didn’t know the detector was a fake.)
En passant we are treated to examples of graphic presentation of results, strip-chart (representation by dots), box-and-whisker charts (new to me), and bar charts, which he labels histograms.
Each chapter ends with a summary of the previous text—a very useful addition indeed.
The mantra “correlation does not imply causation” is humorously illustrated with “the delightful correlation of .96 between the … consumption of mozzarella cheese and the number of civil engineering doctorates awarded.” A new word for this spurious correlation: apophenia.
And so back to the big ears on old men. Do men with smaller ears die early (a prospective cohort study)? Or one could take a study of old men and try to find earlier evidence of ear size, maybe from photos (a retrospective cohort study). Or take men who died and try to find live men of the same age, etc. (a case-control study).
One of the most intriguing phenomena dealt with is called Simpson’s paradox. The example given is acceptance rates at Cambridge University where for each of the five science subjects, the acceptance rate for women was higher than that for men, yet the overall acceptance rate was lower. Go figure: The explanation lies in the subjects chosen by women being the most popular and therefore most competitive, resulting in the lower acceptance rates.
Reverse Causation: Does having a house near a supermarket add to its value? No, supermarkets tend to locate in wealthier neighborhoods. Ask whether “Fizzy Drinks Make Teenagers Violent” as one newspaper headlined, or does violence work up a thirst? An example closer to home was when a tenant in our co-op suggested we convert to a condo because condos have higher value, ignoring that condos tend to be newer with more amenities.
Francis Galton, a cousin of Charles Darwin, “had the classic Victorian gentleman scientist’s obsessive interest in collecting data”—including relative beauty of young women (London prettiest, Aberdeen the least). He came up with what is now called “regression to the mean” (though he called it “regression to mediocrity”). Example: A tall parent is likely to have shorter offspring and vice-versa.
The British statistician George E. P. Box came up with the aphorism: “All models are wrong, but some are useful.” An example given is the over-reliance on models of mortgage risks, leading to one of our worst financial crises. There are now, however, means of using more data for “extremely complex models such as those labelled as deep learning” that permit the use of all data without reduction.
A considerable amount of text is devoted to predicting the survival rate of Titanic passengers when it sank on its maiden voyage. An illuminating “classification tree” is shown of predicted survival rates by title (“Mr.” or not), 3rd class (yes or no), and family size (five or more, yes or no). Not surprisingly, large families have a very low survival rate (3%) while smaller families have a 60% figure. Unfortunately, in the text this is reversed, the words “do not” are omitted in reference to belonging to large families. Continuing with the Titanic we are warned against “over-fitting,” which is so many categories that we lose the larger numbers that provide more credibility.
A section titled “Challenge of Algorithms” provides useful information on warning signs that an algorithm may be flawed: Lack of robustness; Not accounting for lack of statistical variability; Implicit bias; Lack of transparency.
At the beginning of the chapter titled “Estimates and Intervals,” we read some startling statistics on margin of error: “the U.S. Bureau of Labor Statistics reported … a rise in civilian unemployment of 108,000 … based on a sample of around 60,000 households had a margin of error … of +/- 300,000.” Wow!
“Bootstrapping” is a term applied to resampling the sample. For example, if our sample is 1,000 items, try randomly selecting 10, 50, 100, or 500 and see what you get. It is no surprise that the bigger the sample, the less variation, and in all cases clustered about the mean. He calls this the Central Limit Theorem—the tendency toward a normal distribution increasing with sample size. No surprise here.
Chapter 8 gets to the heart of actuarial practice; i.e., probability. Here we meet our old French friends, Fermat and Pascal, when one Chevalier de Mere sought their help in determining which of two games gave him the better chance of winning: four throws of a single die to get a six, or 24 throws of two dice to get a pair of sixes. (Spoiler: 52% game 1, 49% game 2.) Thus was born the math branch called probability. Introducing the Poisson distribution, we are informed that it was originally developed to “represent the pattern of wrongful convictions per year.”
And speaking of such, along comes the “prosecutor’s fallacy” prevalent on court cases involving DNA. Here a forensic expert might allege that “if the accused is innocent, there is only a 1 in a billion chance that they would match the DNA found at the crime scene.” The prosecutor turns this to “given the DNA evidence, there is only 1 in a billion chance the accused is innocent.”
The science of probability is taken apart and broken in pieces and reassembled in, to me, tedious detail in this chapter: Do we really need to deal with a) long-run frequency probability, b) propensity, or chance, or, c) subjective or “personal” probability? It was a relief to proceed to the next chapter wherein he marries probability and statistics, warning us that it is perhaps the most challenging chapter. If we consider his distinction of two types of uncertainty—say before and after a coin flip—as aleatory uncertainty (before) and epistemic uncertainty (after), we would have to agree or, “Who cares?” Well, we all do, and hence we come to something called the “confidence interval”—or where we think the true value lies, typically with 95% certainty.
A chapter on the “null hypothesis” is intriguing in that we are asked to assume something to be the case until proven otherwise, and a six-step process is proposed to test. In the process we are introduced to the concept of statistical significance and the chi-squared goodness-of-fit test. Using homicide incidents in the U.K., we are asked if they follow a Poisson distribution; by comparing the expected with the actual, we find they do indeed. At this point the 215 or so deaths mentioned at the outset are revisited, with the expected number of deaths now a mere 120.
Easily made happy, the author refers to “the delightful term ‘the Law of Iterated Logarithm.’” In short, repeated testing will produce a false result. A response to this problem was given the term “sequential testing,” whereby the false result was tested and disproved. This technique was developed in World War II when problems with assembly lines were found to be false.
All in all, a fine, readable and intelligent textbook on statistics that doesn’t feel like a text. Congratulations, Sir Spiegelhalter. (He was knighted for his “services to statistics.”)
ARDIAN GILL, a member of the Academy and a fellow of the Society of Actuaries, is the author of The River Is Mine, a historical novel, and of a recently published children’s book, The Blue Moose. In his past life, Gill was chief actuary at the Mutual Life Insurance Co. of New York, a partner at Tillinghast, and co-founder of Gill and Roeser Inc., reinsurance intermediaries.