{"id":221,"date":"2011-02-21T22:24:47","date_gmt":"2011-02-22T03:24:47","guid":{"rendered":"http:\/\/blogs.scienceforums.net\/capn\/?p=221"},"modified":"2011-02-21T22:24:47","modified_gmt":"2011-02-22T03:24:47","slug":"statistical-significance-of-doom","status":"publish","type":"post","link":"http:\/\/blogs.scienceforums.net\/capn\/2011\/02\/21\/statistical-significance-of-doom\/","title":{"rendered":"Statistical Significance of Doom (part 1)"},"content":{"rendered":"<p><a href=\"http:\/\/blogs-new.scienceforums.net\/capn\/wp-content\/uploads\/sites\/2\/2011\/02\/Statistical-Significance.pdf\"><\/a>I was recently assigned to give a 25-minute presentation on a subject of my choice. After choosing &#8220;scientific dishonesty and fraud,&#8221; I happened upon a paper by John Ioannidis claiming that &#8220;<a href=\"http:\/\/www.plosmedicine.org\/article\/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124\">Most Published Research Findings Are False<\/a>.&#8221;<\/p>\n<p>After skimming through Ioannidis&#8217; paper and reading some of the references, I quickly changed my presentation&#8217;s title to &#8220;Statistical Significance Testing is the Devil\u2019s Work,&#8221; and dug up some 23 papers on the subject. What follows is a reformatted version of why you should <strong>never<\/strong> trust papers that claim &#8220;statistically significant&#8221; results. It&#8217;s split into several parts for length. (The slides I used are available <a href=\"..\/files\/2011\/02\/Statistical-Significance.pdf\">here<\/a>.)<\/p>\n<p><!--more--><\/p>\n<p>(Note to statisticians: Please do send me comments if I screwed something up. I&#8217;d rather not preach falsehoods while preaching against preaching falsehoods.)<\/p>\n<p><strong>Why Statistical Significance Means Everything&#8217;s Wrong<\/strong><\/p>\n<p>(well, not <em>everything<\/em>, but an alarming amount of things)<\/p>\n<p>First, let&#8217;s address an important question: What is statistical significance, anyway, and what does it mean?<\/p>\n<p>Well, consider a medical study. Suppose I have a fantastic new cold medication that should make the average cold a day or two shorter. Now, I have to design an experiment to test if my medication works, so I get a bunch of people with colds, and give half of them my magic medication and the other half some sort of placebo. We then follow the group for a week or two until the colds are over and see how long their colds lasted.<\/p>\n<p>However, we all know that colds are never the same length. Sure, a cold may average to four days long,[ref]A completely arbitrary number I just made up.[\/ref] but there are eight-day colds and two-day colds too. Hence, if I take ten people and see what the average cold length is, it might be 4.2 days, or 3.6, or 2.4, or anywhere in a large range.<\/p>\n<p>This is difficult if I&#8217;m doing an experiment &#8212; there&#8217;s a huge random fluctuation that I have to distinguish from the real effect of the medicine. So, I take a larger sample, and pay a few hundred undergrads $15 for their time. The more people I sample, the more the random fluctuations balance out, and the closer to the &#8220;true&#8221; average my numbers get.<\/p>\n<p>So far, so good. Now, when I evaluate my results, I have to decide if the difference I observed was caused by random fluctuation or by the medication&#8217;s effects. That&#8217;s where statistical significance comes in.<\/p>\n<p><strong>Statistical Significance and p-values<\/strong><\/p>\n<p>There&#8217;s a whole variety of statistical tests I won&#8217;t bore you with, but they all test things like &#8220;is the difference between these populations caused by chance?&#8221; and &#8220;are these two variables correlated?&#8221; Many of them give you a &#8220;p-value.&#8221; I asked a graduate student who&#8217;s taken some statistics courses, and he told me this is what a p-value is:<\/p>\n<blockquote><p>the probability that the statistic you just derived happened by chance, essentially<\/p><\/blockquote>\n<p>So if the p-value is less than, say, 0.05, there&#8217;s less than a 5% chance that my results happened because of random fluctuations. Not bad! If p is 0.7, though, there&#8217;s a huge chance my results happened by chance.<\/p>\n<p>Sound good so far?<\/p>\n<p>I hope not, because that&#8217;s all wrong.<\/p>\n<p>Here&#8217;s what p-value <em>really<\/em> means:<\/p>\n<blockquote><p>the probability, <strong>under the assumption of no effect or no difference<\/strong> (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed[ref]Goodman, Steven N. \u201cToward Evidence-Based Medical Statistics. 1: The P Value Fallacy.\u201d <em>Ann Intern Med<\/em> 130, no. 12 (1999): 995-1004. http:\/\/www.annals.org\/cgi\/content\/abstract\/130\/12\/995.[\/ref] [emphasis added]<\/p><\/blockquote>\n<p>Note the bolded part carefully. When you calculate the p-value, you ask, &#8220;If my medication had no effect, what are the odds I&#8217;d see this result?&#8221; You <strong>assume<\/strong> there was no effect or no significant difference, and then make some calculations.<\/p>\n<p>I hope we can all see now that you can&#8217;t assume the difference occurred by chance alone, and then do some math to calculate the probability the difference occurred by chance alone. &#8220;Assuming this happened by chance, what are the odds this happened by chance?&#8221; It&#8217;s nonsensical. And it leads us to some major problems.<\/p>\n<p><strong>False Negatives and Statistical Insignificance<\/strong><\/p>\n<p>A false negative occurs when there is a real difference, but my study misses it. In our example, this would happen when the cold medication works, but somehow my study concludes it doesn&#8217;t. How would that happen?<\/p>\n<p>Well, remember those random fluctuations. If the random fluctuations are bigger than the actual effect of the medication, there&#8217;s almost no way to tell if the medication worked &#8212; so we need to study more people until the random fluctuations balance each other out.<\/p>\n<p>Statisticians come up with a number to describe this problem: statistical power. Statistical power tells us the odds that our study will detect the difference, assuming there really is one. If my sample size is too small, I&#8217;ll never be able to detect a small difference, and my statistical power is too low. If I sample every person in the entire country who has a cold, my statistical power will be excellent.<\/p>\n<p>What, then, would be a good statistical power to aim for when testing medicines? You don&#8217;t want to miss a perfectly good drug, right?<\/p>\n<p>The average statistical power of a medical study is <strong>50%<\/strong>.[ref]Sterne, J a, and G Davey Smith. \u201cSifting the evidence-what\u02bcs wrong with significance tests?\u201d <em>BMJ (Clinical research ed.)<\/em> 322, no. 7280 (January 2001): 226-31. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?artid=1119478&amp;tool=pmcentrez&amp;rendertype=abstract.[\/ref]<\/p>\n<p>That means the average medical study has a 50% chance of completely missing the effect they&#8217;re looking for, and then concluding there was a &#8220;statistically insignificant&#8221; difference and that the drug is useless.<\/p>\n<p><em>&#8220;Statistically insignificant difference&#8221; does not mean &#8220;no difference.&#8221;<\/em> It simply means you could not <strong>detect<\/strong> the difference. Science news articles often say &#8220;the difference was statistically insignificant, so age could not have been a factor,&#8221; but that is simply <strong>false.<\/strong><\/p>\n<p><strong>Next in part 2: <a href=\"http:\/\/blogs.scienceforums.net\/capn\/2011\/02\/21\/statistical-significance-of-doom-part-2\/\">False positives and multiple comparisons<\/a><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was recently assigned to give a 25-minute presentation on a subject of my choice. After choosing &#8220;scientific dishonesty and fraud,&#8221; I happened upon a paper by John Ioannidis claiming that &#8220;Most Published Research Findings Are False.&#8221; After skimming through Ioannidis&#8217; paper and reading some of the references, I quickly changed my presentation&#8217;s title to&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-221","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/posts\/221","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/comments?post=221"}],"version-history":[{"count":0,"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/posts\/221\/revisions"}],"wp:attachment":[{"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/media?parent=221"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/categories?post=221"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blogs.scienceforums.net\/capn\/wp-json\/wp\/v2\/tags?post=221"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}