False-positive psychology five years later

Joe Simmons, Leif Nelson, and Uri Simonsohn have written a 5-years-later[1] retrospective on their “false-positive psychology” paper. It is for an upcoming issue of Perspectives on Psychological Science dedicated to the most-cited articles from APS publications. A preprint is now available.

It’s a short and snappy read with some surprises and gems. For example, footnote 2 notes that the Journal of Consumer Research declined to adopt their disclosure recommendations because they might “dull … some of the joy scholars may find in their craft.” No, really.

For the youngsters out there, they do a good job of capturing in a sentence a common view of what we now call p-hacking: “Everyone knew it was wrong, but they thought it was wrong the way it’s wrong to jaywalk. We decided to write ‘False-Positive Psychology’ when simulations revealed it was wrong the way it’s wrong to rob a bank.”[2]

The retrospective also contains a review of how the paper has been cited in 3 top psychology journals. About half of the citations are from researchers following the original paper’s recommendations, but typically only a subset of them. The most common citation practice is to justify having barely more than 20 subjects per cell, which they now describe as a “comically low threshold” and take a more nuanced view on.

But to me, the most noteworthy passage was this one because it speaks to institutional pushback on the most straightforward of their recommendations:

Our paper has had some impact. Many psychologists have read it, and it is required reading in at least a few methods courses. And a few journals – most notably, Psychological Science and Social Psychological and Personality Science – have implemented disclosure requirements of the sort that we proposed (Eich, 2014; Vazire, 2015). At the same time, it is worth pointing out that none of the top American Psychological Association journals have implemented disclosure requirements, and that some powerful psychologists (and journal interests) remain hostile to costless, common sense proposals to improve the integrity of our field.

Certainly there are some small refinements you could make to some of the original paper’s disclosure recommendations. For example, Psychological Science requires you to disclose all variables “that were analyzed for this article’s target research question,” not all variables period. Which is probably an okay accommodation for big multivariate studies with lots of measures.[3]

But it is odd to be broadly opposed to disclosing information in scientific publications that other scientists would consider relevant to evaluating the conclusions. And yet I have heard these kinds of objections raised many times. What is lost by saying that researchers have to report all the experimental conditions they ran, or whether data points were excluded and why? Yet here we are in 2017 and you can still get around doing that.

 


1. Well, five-ish. The paper came out in late 2011.

2. Though I did not have the sense at the time that everyone knew about everything. Rather, knowledge varied: a given person might think that fiddling with covariates was like jaywalking (technically wrong but mostly harmless), that undisclosed dropping of experimental conditions was a serious violation, but be completely oblivious to the perils of optional stopping. And a different person might have had a different constellation of views on the same 3 issues.

3. A counterpoint is that if you make your materials open, then without clogging up the article proper, you allow interested readers to go and see for themselves.

A null replication in press at Psych Science – anxious attachment and sensitivity to temperature cues

Etienne LeBel writes:

My colleague [Lorne Campbell] and I just got a paper accepted at Psych Science that reports on the outcome of two strict direct replications where we  worked very closely with the original author to have all methodological design specifications as similar as those in the original study (and unfortunately did not reproduce the original finding). 

We believe this is an important achievement for the “replication movement” because it shows that (a) attitudes are changing at the journal level with regard to rewarding direct replication efforts (to our knowledge this is the first strictly direct replications to be published at a top journal like Psych Science [JPSP eventually published large-scale failed direct replications of Bem’s ESP findings, but this was of course a special case]) and (b) that direct replication endeavors can contribute new knowledge concerning a theoretical idea while maintaining a cordial, non-adversarial atmosphere with the original author. We really want to emphasize this point the most to encourage other researchers to engage in similar direct replication efforts. Science should first and foremost be about the ideas rather than the people behind the ideas; we’re hoping that examples like ours will sensibilize people to a more functional research culture where it is OK and completely normal for ideas to be revised given new evidence.

An important achievement indeed. The original paper was published in Psychological Science too, so it is especially good to see the journal owning the replication attempt. And hats off to LeBel and Campbell for taking this on. Someday direct replications will hopefully be more normal, but in world we currently live in it takes some gumption to go out and try one.

I also appreciated the very fact-focused and evenhanded tone of the writeup. If I can quibble, I would have ideally liked to see a statistical test contrasting their effect against the original one – testing the hypothesis that the replication result is different from the original result. I am sure it would have been significant, and it would have been preferable over comparing the original paper’s significant rejection of the null versus the replications non-significant test against the null. But that’s a small thing compared to what a large step forward this is.

Now let’s see what happens with all those other null replications of studies about relationships and physical warmth.

Pre-publication peer review can fall short anywhere

The other day I wrote about a recent experience participating in post-publication peer review. Short version: I picked up on some errors in a paper published in PLOS ONE, which led to a correction. In my post I made the following observation:

Is this a mark against pre-publication peer review? Obviously it’s hard to say from one case, but I don’t think it speaks well of PLOS ONE that these errors got through. Especially because PLOS ONE is supposed to emphasize “a high technical standard” and reporting of “sufficient detail” (the reason I noticed the issue with the SDs was because the article did not report effect sizes).

But this doesn’t necessarily make PLOS ONE worse than traditional journals like Psychological Science or JPSP, where similar errors get through all the time and then become almost impossible to correct.

My intention was to discuss pre- and post-publication peer review generally, and I went out of my way to cite evidence that mistakes can happen anywhere. But some comments I’ve seen online have characterized this as a mark against PLOS ONE (and my “I don’t think it speaks well of PLOS ONE” phrasing probably didn’t help). So I would like to note the following:

1. After my blog post went up yesterday, somebody alerted me that the first author of the PLOS ONE paper has posted corrections to 3 other papers on her personal website. The errors are similar to what happened at PLOS ONE. She names authors and years, not full citations, but through a little deduction with her CV it appears that one of the journals is Psychological Science, one of them is the Journal of Personality and Social Psychology, and the third could be either JPSP, Personality and Social Psychology Bulletin, or the Journal of Experimental Social Psychology. So all 3 of the corrected papers were in high-impact journals with a traditional publishing model.

2. Some of the errors might look obvious now. But that is probably boosted by hindsight. It’s important to keep in mind that reviewers are busy people who are almost always working pro bono. And even at its best, the review process is always going to be a probabilistic filter. I certainly don’t check the math on every paper I read or review. I was looking at the PLOS ONE paper with a particular mindset that made me especially attentive to power and effect sizes. Other reviewers with different concerns might well have focused on different things. That doesn’t mean that we should throw up our hands, but in the big picture we need to be realistic about what we can expect of any review process (and design any improvements with that realism in mind).

3. In the end, what makes PLOS ONE different is that their online commenting system makes it possible for many eyes to be involved in a continuous review process — not just 2-3 reviewers and an editor before publication and then we’re done. That seems much smarter about the probabilistic nature of peer review. And PLOS ONE makes it possible to address potential errors quickly and transparently and in a way that is directly linked from the published article. Whereas with the other 3 papers, assuming that those corrections have been formally submitted to the respective journals, it could still be quite a while before they appear in print, and the original versions could be in wide circulation by then.

 

The PoPS replication reports format is a good start

Big news today is that Perspectives on Psychological Science is going to start publishing pre-registered replication reports. The inaugural editors will be Daniel Simons and Alex Holcombe, who have done the serious legwork to make this happen. See the official announcement and blog posts by Ed Yong and Melanie Tannenbaum. (Note: this isn’t the same as the earlier plan I wrote about for Psychological Science to publish replications, but it appears to be related.)

The gist of the plan is that after getting pre-approval from the editors (mainly to filter for important but as-yet unreplicated studies), proposers will create a detailed protocol. The original authors (and maybe other reviewers?) will have a chance to review the protocol. Once it has been approved, the proposer and other interested labs will run the study. Publication will be contingent on carrying out the protocol but not on the results. Collections of replications from multiple labs will be published together as final reports.

I think this is great news. In my ideal world published replications would be more routine, and wouldn’t require all the hoopla of prior review by original authors, multiple independent replications packaged together, etc. etc. In other words, they shouldn’t be extraordinary, and they should be as easy or easier to publish than original research. I also think every journal should take responsibility for replications of its own original reports (the Pottery Barn rule). BUT… this new format doesn’t preclude any of that from also happening elsewhere. By including all of those extras, PoPS replication reports might function as a first-tier, gold standard of replication. And by doing a lot of things right (such as focusing on effect sizes rather than tallying “successful” and “failed” replications, which is problematic) they might set an example for more mundane replication reports in other outlets.

This won’t solve everything — not by a long shot. We need to change scientific culture (by which I mean institutional incentives) so that replication is a more common and more valued activity. We need funding agencies to see it that way too. In a painful coincidence, news came out today that a cognitive neuroscientist admitted to misconduct in published research. One of the many things that commonplace replications would do would be to catch or prevent fraud. But whenever I’ve asked colleagues who use fMRI whether people in their fields run direct replications, they’ve just laughed at me. There’s little incentive to run them and no money to do it even if you wanted to. All of that needs to change across many areas of science.

But you can’t solve everything at once, and the PoPS initiative is an important step forward.

Psychological Science to publish direct replications (maybe)

Pretty big news. Psychological Science is seriously discussing 3 new reform initiatives. They are outlined in a letter being circulated by Eric Eich, editor of the journal, and they come from a working group that includes top people from APS and several other scientists who have been active in working for reforms.

After reading it through (which I encourage everybody to do), here are my initial takes on the 3 initiatives:

Initiative 1: Create tutorials on power, effect size, and confidence intervals. There’s plenty of stuff out there already, but if PSci creates a good new source and funnels authors to it, it could be a good thing.

Initiative 2: Disclosure statements about research process (such as how sample size was determined, unreported measures, etc.) This could end up being a good thing, but it will be complicated. Simine Vazire, one of the working group members who is quoted in the proposal, puts it well:

We are essentially asking people to “incriminate” themselves — i.e., reveal information that, in the past, editors have treated as reasons not to publish a paper. If we want authors to be honest, I think they will want some explicit acknowledgement that some degree of messiness (e.g., a null result here and there) will be tolerated and perhaps even treated as evidence that the entire set of findings is even more plausible (a la [Gregory] Francis, [Uli] Schimmack, etc.).

I bet there would be low consensus about what kinds and amounts of messiness are okay, because no one is accustomed to seeing that kind of information on a large scale in other people’s studies. It is also the case that things that are problematic in one subfield may be more reasonable in another. And reviewers and editors who lack the time or local expertise to really judge messiness against merit may fall back on simplistic heuristics rather than thinking things through in a principled way. (Any psychologist who has ever tried to say anything about causation, however tentative and appropriately bounded, in data that was not from a randomized experiment probably knows what that feels like.)

Another basic issue is whether people will be uniformly honest in the disclosure statements. I’d like to believe so, but without a plan for real accountability I’m not sure. If some people can get away with fudging the truth, the honest ones will be at a disadvantage.

3. A special submission track for direct replications, with 2 dedicated Associate Editors and a system of pre-registration and prior review of protocols to allow publication decisions to be decoupled from outcomes. A replication section at a journal? If you’ve read my blog before you might guess that I like that idea a lot.

The section would be dedicated to studies previously published in Psychological Science, so in that sense it is in the same spirit as the Pottery Barn Rule. The pre-registration component sounds interesting — by putting a substantial amount of review in place before data are collected, it helps avoid the problem of replications getting suppressed because people don’t like the outcomes.

I feel mixed about another aspect of the proposal, limiting replications to “qualified” scientists. There does need to be some vetting, but my hope is that they will set the bar reasonably low. “This paradigm requires special technical knowledge” can too easily be cover for “only people who share our biases are allowed to study this effect.” My preference would be for a pro-data, pro-transparency philosophy. Make it easy for for lots of scientists to run and publish replication studies, and make sure the replication reports include information about the replicating researchers’ expertise and experience with the techniques, methods, etc. Then meta-analysts can code for the replicating lab’s expertise as a moderator variable, and actually test how much expertise matters.

My big-picture take. Retraction Watch just reported yesterday on a study showing that retractions, especially retractions due to misconduct, cause promising scientists to move to other fields and funding agencies to direct dollars elsewhere. Between alleged fraud cases like Stapel, Smeesters, and Sanna, and all the attention going to false-positive psychology and questionable research practices, psychology (and especially social psychology) is almost certainly at risk of a loss of talent and money.

Getting one of psychology’s top journals to make real reforms, with the institutional backing of APS, would go a long way to counteract those negative effects. A replication desk in particular would leapfrog psychology past what a lot of other scientific fields do. Huge credit goes to Eric Eich and everyone else at APS and the working group for trying to make real reforms happen. It stands a real chance of making our science better and improving our credibility.

A Pottery Barn rule for scientific journals

Proposed: Once a journal has published a study, it becomes responsible for publishing direct replications of that study. Publication is subject to editorial review of technical merit but is not dependent on outcome. Replications shall be published as brief reports in an online supplement, linked from the electronic version of the original.

*****

I wrote about this idea a year ago when JPSP refused to publish a paper that failed to replicate one of Daryl Bem’s notorious ESP studies. I discovered, immediately after writing up the blog post, that other people were thinking along similar lines. Since then I have heard versions of the idea come up here and there. And strands of it came up again in David Funder’s post on replication (“[replication] studies should, ideally, be published in the same journal that promulgated the original, misleading conclusion”) and the comments to it. When a lot of people are coming up with similar solutions to a problem, that’s probably a sign of something.

Like a lot of people, I believe that the key to improving our science is through incentives. You can finger-wag about the importance of replication all you want, but if there is nowhere to publish and no benefit for trying, you are not going to change behavior. To a large extent, the incentives for individual researchers are controlled through institutions — established journal publishers, professional societies, granting agencies, etc. So if you want to change researchers’ behavior, target those institutions.

Hence a Pottery Barn rule for journals: once you publish a study, you own its replicability (or at least a significant piece of it).

This would change the incentive structure for researchers and for journals in a few different ways. For researchers, there are currently insufficient incentives to run replications. This would give them a virtually guaranteed outlet for publishing a replication attempt. Such publications should be clearly marked on people’s CVs as brief replication reports (probably by giving the online supplement its own journal name, e.g., Journal of Personality and Social Psychology: Replication Reports). That would make it easier for the academic marketplace (like hiring and promotion committees, etc.) to reach its own valuation of such work.

I would expect that grad students would be big users of this opportunity. Others have proposed that running replications should be a standard part of graduate training (e.g., see Matt Lieberman’s idea). This would make it worth students’ while, but without the organizational overhead of Matt’s proposal. The best 1-2 combo, for grad students and PIs alike, would be to embed a direct replication in a replicate-and-extend study. Then if the “extend” part does not work out, the replication report is a fallback (hopefully with a footnote about the failed extend). And if it does, the new paper is a more cumulative contribution than the shot-in-the-dark papers we often see now.

A system like this would change the incentive structure for original studies too. Researchers would know that whatever they publish is eventually going to be linked to a list of replication attempts and their outcomes. As David pointed out, knowing that others will try to replicate your work — and in this proposal, knowing that reports of those attempts would be linked from your own paper! — would undermine the incentives to use questionable research practices far better than any heavy-handed regulatory response. (And if that list of replication attempts is empty 5 years down the road because nobody thinks it’s worth their while to replicate your stuff? That might say something too.)

What about the changed incentives for journals? One benefit would be that the increased accountability for individual researchers should lead to better quality submissions for journals that adopted this policy. That should be a big plus.

A Pottery Barn policy would also increase accountability for journals. It would become much easier to document a journal’s track record of replicability, which could become a counterweight to the relentless pursuit of impact factors. Such accountability would mean a greater emphasis on evaluating replicability during the review process — e.g., to consider statistical power, to let reviewers look at the raw data and the materials and stimuli, etc.

But sequestering replication reports into an online supplement means that the journal’s main mission can stay intact. So if a journal wants to continue to focus on groundbreaking first reports in its main section, it can continue to do so without fearing that its brand will be diluted (though I predict that it would have to accept a lower replication rate in exchange for its focus on novelty).

Replication reports would generate some editorial overhead, but not nearly as much as original reports. They could be published based directly on an editorial decision, or perhaps with a single peer reviewer. A structured reporting format like the one used at Psych File Drawer would make it easier to evaluate the replication study relative to the original. (I would add a field to describe the researchers’ technical expertise and experience with the methods, since that is a potential factor in explaining differences in results.)

Of course, journals would need an incentive to adopt the Pottery Barn rule in the first place. Competition from outlets like PLoS One (which does not consider importance/novelty in its review criteria) or Psych File Drawer (which only publishes replications) might push the traditional journals in this direction. But ultimately it is up to us scientists. If we cite replication studies, if we demand and use outlets that publish them, and if we we speak loudly enough — individually or through our professional organizations — I think the publishers will listen.

Journals can be groundbreaking or definitive, not both

I was recently invited to contribute to Personality and Social Psychology Connections, an online journal of commentary (read: fancy blog) run by SPSP. Don Forsyth is the editor, and the contributors include David Dunning, Harry Reis, Jennifer Crocker, Shige Oishi, Mark Leary, and Scott Allison. My inaugural post is titled “Groundbreaking or definitive? Journals need to pick one.” Excerpt:

Do our top journals need to rethink their missions of publishing research that is both groundbreaking and definitive? And as a part of that, do they — and we scientists — need to reconsider how we engage with the press and the public?…

In some key ways groundbreaking is the opposite of definitive. There is a lot of hard work to be done between scooping that first shovelful of dirt and completing a stable foundation. And the same goes for science (with the crucial difference that in science, you’re much more likely to discover along the way that you’ve started digging on a site that’s impossible to build on). “Definitive” means that there is a sufficient body of evidence to accept some conclusion with a high degree of confidence. And by the time that body of evidence builds up, the idea is no longer groundbreaking.

Read it here.