Cuuuut!

“Filming in the lab” is the recent theme at PhD comics, and this one grabs the essence. (Or you can start at the beginning, if you’re one of the type that needs to do that.)

I’ve been filmed in the lab and interviewed on TV once, and I’ve observed my colleagues being filmed and interviewed. There’s a pattern to it. They sit you down in front of one of your impressive-looking pieces of lab apparatus and ask questions for a while. For every 15 minutes of interview, approximately 5 seconds will make it to air time in the final story (my data point, at least). Next, they will want some “action” shots of you, which for an atomic physics/optics lab usually means adjusting some mirrors or twiddling a knob on a piece of electronics and looking at an oscilloscope with a serious expression on your face. If there are two of you in the shot, one of you will need to be pointing at the oscilloscope, as if to say, “Here is where the WOW signal would be, if we had a signal. But we don’t, because we can’t run our experiment with these floodlights on.” Obviously “action shot” here does not the mean same thing as in an episode of some detective series — this is no Magnum, Principle Investigator. A third component that is sometimes used is of one of the interviewee walking down a corridor or sidewalk, so that the reporter can do a voice-over. Alternately they will just get shots of the equipment, especially if it whirs and moves about, for that segment.

Then they mash it all together and if you’re lucky they won’t have gotten the science horribly wrong.

Giving the Devil His Due

I read The Devil Is in the Digits, an analysis of the Iranian voting results, and something doesn’t feel quite right about it. (And it’s not that these two political science student authors are being touted as mathematicians in some of the blogs linking to the story) Disclaimer: there seem to be lots of reasons to question the vote. I’m not addressing anything but the rigor of this analysis.

Now, I could be wrong about this, because anything past basic probability gives me trouble — I’m not particularly skilled (my lowest math grades were on probability exams. What are the odds of that?), and those feeble skills have atrophied for most anything beyond simple dice-rolling and poker calculations.

But I do recall that when you multiply probabilities together, it needs to be for independent events. And I question what’s going on here.

We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran’s provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average — a spike of 17 percent or more in one digit and a drop to 4 percent or less in another — are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.

OK, the premise seems fine. You expect each digit to show up 10% of the time, but you can deviate from that and still have a random distribution. But the relationship between the digits is not random — if you have too many 7s, you must have fewer of other numbers! So what I want to know is how they arrived at the four percent result.

Let me illustrate with an example that’s easier to see, and one I can work through: coin tosses. If you toss a coin twice, there are three outcomes: Two heads (25% of the time) a head and a tail (50%) and two tails (25% of the time). So while the expected, average result is one head, it only happens half the time — a result of either two heads or two tails isn’t evidence of anything fishy; we don’t have enough trials. But here’s the biggie: what is the probability of getting two heads, and no tails? It’s still 25%, because (heads) and (not tails) are not independent results. They have the maximum amount of correlation you can get, and since they aren’t independent results, you wouldn’t multiply the probabilities together to find the answer.

I found an analysis someone did using random numbers, and their model simulation gives the odds of a number appearing 5 or fewer times as about 20%, and appearing more than 20 time as 11%. But the odds of both shouldn’t simply be the product of the two, because the results would be correlated in some fashion that’s more involved than the coin-tossing.

So I wonder how they arrived at 4%. It’s not at all clear.

The second part of their analysis is of the last two digits, and whether they are adjacent (or identical) numbers or not, e.g. 54 (adjacent) vs 59 (not).

To check for deviations of this type, we examined the pairs of last and second-to-last digits in Iran’s vote counts. On average, if the results had not been manipulated, 70 percent of these pairs should consist of distinct, non-adjacent digits.

Not so in the data from Iran: Only 62 percent of the pairs contain non-adjacent digits.

Aha! They assume that the numbers are perfectly distributed, and we know the last digits are not; I didn’t see any mention of the second-to-last digit. So one has to wonder whether this analysis holds. I can certainly think of some examples where it fails: the second-to-last digits are all 5, and the last digits are all 4, 5 or 6. In that unlikely result, there would be zero pairs that were non-adjacent, rather than 70%. So I have to wonder how far the assumption holds and how badly it fails. And if these odds depend on the distribution, the digits and the pairings are not independent of each other, so multiplying the probabilities won’t give the right answer.

That’s what my gut and some basic probability math, dredged up from the recesses of my brain tell me. Perhaps someone who does math for a living can confirm that I’m right or tell me that I’m wrong and should stick to my day job. (or that I’m right and I should still stick to my day job)