I read The Devil Is in the Digits, an analysis of the Iranian voting results, and something doesn’t feel quite right about it. (And it’s not that these two political science student authors are being touted as mathematicians in some of the blogs linking to the story) Disclaimer: there seem to be lots of reasons to question the vote. I’m not addressing anything but the rigor of this analysis.
Now, I could be wrong about this, because anything past basic probability gives me trouble — I’m not particularly skilled (my lowest math grades were on probability exams. What are the odds of that?), and those feeble skills have atrophied for most anything beyond simple dice-rolling and poker calculations.
But I do recall that when you multiply probabilities together, it needs to be for independent events. And I question what’s going on here.
We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran’s provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average — a spike of 17 percent or more in one digit and a drop to 4 percent or less in another — are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.
OK, the premise seems fine. You expect each digit to show up 10% of the time, but you can deviate from that and still have a random distribution. But the relationship between the digits is not random — if you have too many 7s, you must have fewer of other numbers! So what I want to know is how they arrived at the four percent result.
Let me illustrate with an example that’s easier to see, and one I can work through: coin tosses. If you toss a coin twice, there are three outcomes: Two heads (25% of the time) a head and a tail (50%) and two tails (25% of the time). So while the expected, average result is one head, it only happens half the time — a result of either two heads or two tails isn’t evidence of anything fishy; we don’t have enough trials. But here’s the biggie: what is the probability of getting two heads, and no tails? It’s still 25%, because (heads) and (not tails) are not independent results. They have the maximum amount of correlation you can get, and since they aren’t independent results, you wouldn’t multiply the probabilities together to find the answer.
I found an analysis someone did using random numbers, and their model simulation gives the odds of a number appearing 5 or fewer times as about 20%, and appearing more than 20 time as 11%. But the odds of both shouldn’t simply be the product of the two, because the results would be correlated in some fashion that’s more involved than the coin-tossing.
So I wonder how they arrived at 4%. It’s not at all clear.
The second part of their analysis is of the last two digits, and whether they are adjacent (or identical) numbers or not, e.g. 54 (adjacent) vs 59 (not).
To check for deviations of this type, we examined the pairs of last and second-to-last digits in Iran’s vote counts. On average, if the results had not been manipulated, 70 percent of these pairs should consist of distinct, non-adjacent digits.
Not so in the data from Iran: Only 62 percent of the pairs contain non-adjacent digits.
Aha! They assume that the numbers are perfectly distributed, and we know the last digits are not; I didn’t see any mention of the second-to-last digit. So one has to wonder whether this analysis holds. I can certainly think of some examples where it fails: the second-to-last digits are all 5, and the last digits are all 4, 5 or 6. In that unlikely result, there would be zero pairs that were non-adjacent, rather than 70%. So I have to wonder how far the assumption holds and how badly it fails. And if these odds depend on the distribution, the digits and the pairings are not independent of each other, so multiplying the probabilities won’t give the right answer.
That’s what my gut and some basic probability math, dredged up from the recesses of my brain tell me. Perhaps someone who does math for a living can confirm that I’m right or tell me that I’m wrong and should stick to my day job. (or that I’m right and I should still stick to my day job)
Everybody is banging on thermodynamics and ignoring kinetics. Thermodynamics proposes, kinetics disposes. An enormous number of paper ballots were tallied in a Second World country in a twinkling. The Iran vote was never counted, it was proclaimed. 53:47 plurality, with attention to past voting patterns, and murder of analytical technicians thereafter would have been the proper path.
I’m willing to give the analysis the benefit of the doubt until I have time to work the numbers myself. However, the last sentence of the article is misleading:
“In other words, a bet that the numbers are clean is a one in two-hundred long shot.”
The premise:
“If the election was fair, then these numerical anomalies have a 1:200 chance of occurring.”
Does not imply the converse:
“If these numbers occur, then the election has a 1:200 chance of being fair.”
In order to get the odds that support the second statement, you need to apply Bayes theorem, for which you first need to know the probability of a fair election independent of any trailing-digit analysis. Not that I believe this election was fair, but the article seems to be doing a bit of data mining to support its hypothesis. We don’t know how many other numerical trends they looked at before finding these two examples.
I made a simulation that captures most of their results. There’s a 3.5% chance of the 17%/4% occurrence and a 4.2% chance of the 62% non-adjacent occurrence. These *are* independent and *can* be multiplied. 0 is adjacent to both 9 and 1, so every number is adjacent to 2 numbers, identical to 1, and non-adjacent to 7 (70% probability). Their 4.2% probability for 62% or less non-adjacent is also correct. However, when you multiply these probabilities together, you get 0.15%, not 0.5% … there must be some problem with their simulations (mine give the correct result).
So, the numbers are (almost) correct. The conclusion that this “leaves little room for reasonable doubt,” however, is garbage. Their article has been interpreted (see above) as meaning there’s only a 1 in 200 chance that the election is fair. In fact, it’s trivial to come up with dozens of equivalent patterns in the last digit that are equally rare. For instance, two overly common numbers or two uncommon numbers instead of one of each. Because you can come up with dozens of things with a 3.5% probability, it’s certain that one of them will happen in any string of random numbers. It would, in fact, be far less common to see numbers used exactly 10% of the time (11 or 12 times in 116 random numbers).
The correct null hypothesis to test here is “The last digit is uniformly distributed.” The numbers fail to invalidate that hypothesis.
If you look at the Obama/McCain data that they use, and look at the penultimate digit instead of the last digit, you’ll find 20% 7s and 5% 8s (1.5% probability). If you look at past US Presidential elections you’ll find that a number of them involve “rare” nonadjacent digit frequencies and last digit frequencies. As propaganda, this analysis is great, but it’s entirely numerology. I don’t really see the need to invent false statistical proof that the Iranian election was flawed when there are so many obvious flaws in the first place.
Giving the Devil hie due :
So you believe there is a make believe Devil ?
Or do you mean something else ?.
Just disregard I’m just jokeing with you about the devil.