Is there p-hacking in a new breastfeeding study? And is disclosure enough?

There is a new study out about the benefits of breastfeeding on eventual adult IQ, published in The Lancet Global Health. It’s getting lots of news coverage, for example in NPR, BBC, New York Times, and more.

A friend shared a link and asked what I thought of it. So I took a look at the article and came across this (emphasis added):

We based statistical comparisons between categories on tests of heterogeneity and linear trend, and we present the one with the lower p value. We used Stata 13·0 for the analyses. We did four sets of analyses to compare breastfeeding categories in terms of arithmetic means, geometric means, median income, and to exclude participants who were unemployed and therefore had no income.

Yikes. The description of the analyses is frankly a little telegraphic. But unless I’m misreading it, or they did some kind of statistical correction that they forgot to mention, it sounds like they had flexibility in the data analyses (I saw no mention of pre-registration in the analysis plan), they used that flexibility to test multiple comparisons, and they’re openly disclosing that they used p-values for model selection – which is a more technical way of saying they engaged in p-hacking. (They don’t say how they selected among the 4 sets of analyses with different kinds of means etc.; was that based on p-values too?)*

From time to time students ask, Am I allowed to do x statistical thing? And my standard answer is, in the privacy of your office/lab/coffeeshop/etc. you are allowed to do whatever you want! Exploratory data analysis is a good thing. Play with your data and learn from it.** But if you are going to publish the results of your exploration, then disclose. If you did something that could bias your p-values, let readers know and they can make an informed evaluation.***

But that advice assumes that you are talking to a sophisticated reader. When it comes time to talk to the public, via the press, you have a responsibility to explain yourself. “We used a statistical approach that has an increased risk of producing false positives when there is no effect, or overestimating the size of effects when they are real.”

And if that weakens your story too much, well, that’s valid. Your story is weaker. Scientific journals are where experts communicate with other experts, and it could still be interesting enough to publish for that audience, perhaps to motivate a more definitive followup study. But if it’s too weak to go to the public and tell mothers what to do with their bodies… Maybe save the press release for the pre-registered Study 2.


* The study has other potential problems which are pretty much par for the course in these kinds of observational studies. They try to statistically adjust for differences between kids who were breastfed and those who weren’t, but that assumes that you have a complete and precisely measured set of all relevant covariates. Did they? It’s not a testable assumption, though it’s one that experts can make educated guesses at. On the plus side, when they added potentially confounding variables to the models the effects got stronger, not weaker. On the minus side, as Michelle Meyer pointed out on Twitter, they did not measure or adjust for parental IQ, which will definitely be associated with child IQ and for which the covariates they did use (like parental education and income) are only rough proxies.

** Though using p-values to guide your exploratory data analysis isn’t the greatest idea.

*** Some statisticians will no doubt disagree and say you shouldn’t be reporting p-values with known bias. My response is (a) if you want unbiased statistics then you shouldn’t be reading anything that’s gone through pre-publication review, and (b) that’s what got us into this mess in the first place. I’d rather make it acceptable for people to disclose everything, as opposed to creating an expectation and incentive for people to report impossibly clean results.

3 thoughts on “Is there p-hacking in a new breastfeeding study? And is disclosure enough?

  1. Good post. I would say it’s acceptable to run multiple analyses and present the lowest p if you also present (somewhere) the other p’s so that readers can judge whether the lowest p is representative.

    Just saying “we present the one with the lowest p” is not useful.

  2. Reblogged this on Think Bigger, Public Health and commented:
    This is not ok, but not at all surprising. A good look at the latest breastfeeding study data. Perhaps the day will come when public health people will stop tolerating bad data around breastfeeding and quit using it to support public policy. A girl can dream, right?

Comments are closed.