People keep sending me this link to an article by Jonah Lehrer in the New Yorker: The Decline Effect and the Scientific Method, which has the subheadings of “The Truth Wears Off” and “Is there something wrong with the scientific method?” Some of my correspondents sound rather distraught, like they’re concerned that science is breaking down and collapsing; a few, creationists mainly, are crowing over it and telling me they knew we couldn’t know anything all along (but then, how did they know…no, let’s not dive down that rabbit hole).
I read it. I was unimpressed with the overselling of the flaws in the science, but actually quite impressed with the article as an example of psychological manipulation.
The problem described is straightforward: many statistical results from scientific studies that showed great significance early in the analysis are less and less robust in later studies. For instance, a pharmaceutical company may release a new drug with great fanfare that showed extremely promising results in clinical trials, and then later, when numbers from its use in the general public trickle back, shows much smaller effects. Or a scientific observation of mate choice in swallows may first show a clear preference for symmetry, but as time passes and more species are examined or the same species is re-examined, the effect seems to fade.
This isn’t surprising at all. It’s what we expect, and there are many very good reasons for the shift.
-
Regression to the mean: As the number of data points increases, we expect the average values to regress to the true mean…and since often the initial work is done on the basis of promising early results, we expect more data to even out a fortuitously significant early outcome.
-
The file drawer effect: Results that are not significant are hard to publish, and end up stashed away in a cabinet. However, as a result becomes established, contrary results become more interesting and publishable.
-
Investigator bias: It’s difficult to maintain scientific dispassion. We’d all love to see our hypotheses validated, so we tend to consciously or unconsciously select reseults that favor our views.
-
Commercial bias: Drug companies want to make money. They can make money off a placebo if there is some statistical support for it; there is certainly a bias towards exploiting statistical outliers for profit.
-
Population variance: Success in a well-defined subset of the population may lead to a bit of creep: if the drug helps this group with well-defined symptoms, maybe we should try it on this other group with marginal symptoms. And it doesn’t…but those numbers will still be used in estimating its overall efficacy.
-
Simple chance: This is a hard one to get across to people, I’ve found. But if something is significant at the p=0.05 level, that still means that 1 in 20 experiments with a completely useless drug will still exhibit a significant effect.
-
Statistical fishing: I hate this one, and I see it all the time. The planned experiment revealed no significant results, so the data is pored over and any significant correlation is seized upon and published as if it was intended. See previous explanation. If the data set is complex enough, you’ll always find a correlation somewhere, purely by chance.
Here’s the thing about Lehrer’s article: he’s a smart guy, he knows this stuff. He touches on every single one of these explanations, and then some. In fact, the structure of the article is that it is a whole series of explanations of those sorts. Here’s phenomenon 1, and here’s explanation 1 for that result. But here’s phenomenon 2, and explanation 1 doesn’t work…but here’s explanation 2. But now look at phenomenon 3! Explanation 2 doesn’t fit! Oh, but here’s explanation 3. And on and on. It’s all right there, and Lehrer has explained it.
But that’s where the psychological dimension comes into play. Look at the loaded language in the article: scientists are “disturbed,” “depressed,” and “troubled.” The issues are presented as a crisis for all of science; the titles (which I hope were picked by an editor, not Lehrer) emphasize that science isn’t working, when nothing in the article backs that up. The conclusion goes from a reasonable suggestion to complete bullshit.
Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.
I’ve highlighted the part that is true. Yes, science is hard. Especially when you are dealing with extremely complex phenomena with multiple variables, it can be extremely difficult to demonstrate the validity of a hypothesis (I detest the word “prove” in science, which we don’t do, and we know it; Lehrer should, too). What the decline effect demonstrates, when it occurs, is that just maybe the original hypothesis was wrong. This shouldn’t be disturbing, depressing, or troubling at all, except, as we see in his article, when we have scientists who have an emotional or profit-making attachment to an idea.
That’s all this fuss is really saying. Sometimes hypotheses are shown to be wrong, and sometimes if the support for the hypothesis is built on weak evidence or a highly derived interpretation of a complex data set, it may take a long time for the correct answer to emerge. So? This is not a failure of science, unless you’re somehow expecting instant gratification on everything, or confirmation of every cherished idea.
But those last few sentences, where Lehrer dribbles off into a delusion of subjectivity and essentially throws up his hands and surrenders himself to ignorance, is unjustifiable. Early in any scientific career, one should learn a couple of general rules: science is never about absolute certainty, and the absence of black & white binary results is not evidence against it; you don’t get to choose what you want to believe, but instead only accept provisionally a result; and when you’ve got a positive result, the proper response is not to claim that you’ve proved something, but instead to focus more tightly, scrutinize more strictly, and test, test, test ever more deeply. It’s unfortunate that Lehrer has tainted his story with all that unwarranted breast-beating, because as a summary of why science can be hard to do, and of the institutional flaws in doing science, it’s quite good.
But science works. That’s all that counts. One could whine that we still haven’t “proven” cell theory, but who cares? Cell and molecular biologists have found it a sufficiently robust platform to dive ever deeper into how life works, constantly pushing the boundaries of uncertainty.