False-positive psychology five years later

Joe Simmons, Leif Nelson, and Uri Simonsohn have written a 5-years-later[1] retrospective on their “false-positive psychology” paper. It is for an upcoming issue of Perspectives on Psychological Science dedicated to the most-cited articles from APS publications. A preprint is now available.

It’s a short and snappy read with some surprises and gems. For example, footnote 2 notes that the Journal of Consumer Research declined to adopt their disclosure recommendations because they might “dull … some of the joy scholars may find in their craft.” No, really.

For the youngsters out there, they do a good job of capturing in a sentence a common view of what we now call p-hacking: “Everyone knew it was wrong, but they thought it was wrong the way it’s wrong to jaywalk. We decided to write ‘False-Positive Psychology’ when simulations revealed it was wrong the way it’s wrong to rob a bank.”[2]

The retrospective also contains a review of how the paper has been cited in 3 top psychology journals. About half of the citations are from researchers following the original paper’s recommendations, but typically only a subset of them. The most common citation practice is to justify having barely more than 20 subjects per cell, which they now describe as a “comically low threshold” and take a more nuanced view on.

But to me, the most noteworthy passage was this one because it speaks to institutional pushback on the most straightforward of their recommendations:

Our paper has had some impact. Many psychologists have read it, and it is required reading in at least a few methods courses. And a few journals – most notably, Psychological Science and Social Psychological and Personality Science – have implemented disclosure requirements of the sort that we proposed (Eich, 2014; Vazire, 2015). At the same time, it is worth pointing out that none of the top American Psychological Association journals have implemented disclosure requirements, and that some powerful psychologists (and journal interests) remain hostile to costless, common sense proposals to improve the integrity of our field.

Certainly there are some small refinements you could make to some of the original paper’s disclosure recommendations. For example, Psychological Science requires you to disclose all variables “that were analyzed for this article’s target research question,” not all variables period. Which is probably an okay accommodation for big multivariate studies with lots of measures.[3]

But it is odd to be broadly opposed to disclosing information in scientific publications that other scientists would consider relevant to evaluating the conclusions. And yet I have heard these kinds of objections raised many times. What is lost by saying that researchers have to report all the experimental conditions they ran, or whether data points were excluded and why? Yet here we are in 2017 and you can still get around doing that.


1. Well, five-ish. The paper came out in late 2011.

2. Though I did not have the sense at the time that everyone knew about everything. Rather, knowledge varied: a given person might think that fiddling with covariates was like jaywalking (technically wrong but mostly harmless), that undisclosed dropping of experimental conditions was a serious violation, but be completely oblivious to the perils of optional stopping. And a different person might have had a different constellation of views on the same 3 issues.

3. A counterpoint is that if you make your materials open, then without clogging up the article proper, you allow interested readers to go and see for themselves.

Is there p-hacking in a new breastfeeding study? And is disclosure enough?

There is a new study out about the benefits of breastfeeding on eventual adult IQ, published in The Lancet Global Health. It’s getting lots of news coverage, for example in NPR, BBC, New York Times, and more.

A friend shared a link and asked what I thought of it. So I took a look at the article and came across this (emphasis added):

We based statistical comparisons between categories on tests of heterogeneity and linear trend, and we present the one with the lower p value. We used Stata 13·0 for the analyses. We did four sets of analyses to compare breastfeeding categories in terms of arithmetic means, geometric means, median income, and to exclude participants who were unemployed and therefore had no income.

Yikes. The description of the analyses is frankly a little telegraphic. But unless I’m misreading it, or they did some kind of statistical correction that they forgot to mention, it sounds like they had flexibility in the data analyses (I saw no mention of pre-registration in the analysis plan), they used that flexibility to test multiple comparisons, and they’re openly disclosing that they used p-values for model selection – which is a more technical way of saying they engaged in p-hacking. (They don’t say how they selected among the 4 sets of analyses with different kinds of means etc.; was that based on p-values too?)*

From time to time students ask, Am I allowed to do x statistical thing? And my standard answer is, in the privacy of your office/lab/coffeeshop/etc. you are allowed to do whatever you want! Exploratory data analysis is a good thing. Play with your data and learn from it.** But if you are going to publish the results of your exploration, then disclose. If you did something that could bias your p-values, let readers know and they can make an informed evaluation.***

But that advice assumes that you are talking to a sophisticated reader. When it comes time to talk to the public, via the press, you have a responsibility to explain yourself. “We used a statistical approach that has an increased risk of producing false positives when there is no effect, or overestimating the size of effects when they are real.”

And if that weakens your story too much, well, that’s valid. Your story is weaker. Scientific journals are where experts communicate with other experts, and it could still be interesting enough to publish for that audience, perhaps to motivate a more definitive followup study. But if it’s too weak to go to the public and tell mothers what to do with their bodies… Maybe save the press release for the pre-registered Study 2.


* The study has other potential problems which are pretty much par for the course in these kinds of observational studies. They try to statistically adjust for differences between kids who were breastfed and those who weren’t, but that assumes that you have a complete and precisely measured set of all relevant covariates. Did they? It’s not a testable assumption, though it’s one that experts can make educated guesses at. On the plus side, when they added potentially confounding variables to the models the effects got stronger, not weaker. On the minus side, as Michelle Meyer pointed out on Twitter, they did not measure or adjust for parental IQ, which will definitely be associated with child IQ and for which the covariates they did use (like parental education and income) are only rough proxies.

** Though using p-values to guide your exploratory data analysis isn’t the greatest idea.

*** Some statisticians will no doubt disagree and say you shouldn’t be reporting p-values with known bias. My response is (a) if you want unbiased statistics then you shouldn’t be reading anything that’s gone through pre-publication review, and (b) that’s what got us into this mess in the first place. I’d rather make it acceptable for people to disclose everything, as opposed to creating an expectation and incentive for people to report impossibly clean results.