A null replication in press at Psych Science – anxious attachment and sensitivity to temperature cues

Etienne LeBel writes:

My colleague [Lorne Campbell] and I just got a paper accepted at Psych Science that reports on the outcome of two strict direct replications where we  worked very closely with the original author to have all methodological design specifications as similar as those in the original study (and unfortunately did not reproduce the original finding). 

We believe this is an important achievement for the “replication movement” because it shows that (a) attitudes are changing at the journal level with regard to rewarding direct replication efforts (to our knowledge this is the first strictly direct replications to be published at a top journal like Psych Science [JPSP eventually published large-scale failed direct replications of Bem's ESP findings, but this was of course a special case]) and (b) that direct replication endeavors can contribute new knowledge concerning a theoretical idea while maintaining a cordial, non-adversarial atmosphere with the original author. We really want to emphasize this point the most to encourage other researchers to engage in similar direct replication efforts. Science should first and foremost be about the ideas rather than the people behind the ideas; we’re hoping that examples like ours will sensibilize people to a more functional research culture where it is OK and completely normal for ideas to be revised given new evidence.

An important achievement indeed. The original paper was published in Psychological Science too, so it is especially good to see the journal owning the replication attempt. And hats off to LeBel and Campbell for taking this on. Someday direct replications will hopefully be more normal, but in world we currently live in it takes some gumption to go out and try one.

I also appreciated the very fact-focused and evenhanded tone of the writeup. If I can quibble, I would have ideally liked to see a statistical test contrasting their effect against the original one – testing the hypothesis that the replication result is different from the original result. I am sure it would have been significant, and it would have been preferable over comparing the original paper’s significant rejection of the null versus the replications non-significant test against the null. But that’s a small thing compared to what a large step forward this is.

Now let’s see what happens with all those other null replications of studies about relationships and physical warmth.

Pre-publication peer review can fall short anywhere

The other day I wrote about a recent experience participating in post-publication peer review. Short version: I picked up on some errors in a paper published in PLOS ONE, which led to a correction. In my post I made the following observation:

Is this a mark against pre-publication peer review? Obviously it’s hard to say from one case, but I don’t think it speaks well of PLOS ONE that these errors got through. Especially because PLOS ONE is supposed to emphasize “a high technical standard” and reporting of “sufficient detail” (the reason I noticed the issue with the SDs was because the article did not report effect sizes).

But this doesn’t necessarily make PLOS ONE worse than traditional journals like Psychological Science or JPSP, where similar errors get through all the time and then become almost impossible to correct.

My intention was to discuss pre- and post-publication peer review generally, and I went out of my way to cite evidence that mistakes can happen anywhere. But some comments I’ve seen online have characterized this as a mark against PLOS ONE (and my “I don’t think it speaks well of PLOS ONE” phrasing probably didn’t help). So I would like to note the following:

1. After my blog post went up yesterday, somebody alerted me that the first author of the PLOS ONE paper has posted corrections to 3 other papers on her personal website. The errors are similar to what happened at PLOS ONE. She names authors and years, not full citations, but through a little deduction with her CV it appears that one of the journals is Psychological Science, one of them is the Journal of Personality and Social Psychology, and the third could be either JPSP, Personality and Social Psychology Bulletin, or the Journal of Experimental Social Psychology. So all 3 of the corrected papers were in high-impact journals with a traditional publishing model.

2. Some of the errors might look obvious now. But that is probably boosted by hindsight. It’s important to keep in mind that reviewers are busy people who are almost always working pro bono. And even at its best, the review process is always going to be a probabilistic filter. I certainly don’t check the math on every paper I read or review. I was looking at the PLOS ONE paper with a particular mindset that made me especially attentive to power and effect sizes. Other reviewers with different concerns might well have focused on different things. That doesn’t mean that we should throw up our hands, but in the big picture we need to be realistic about what we can expect of any review process (and design any improvements with that realism in mind).

3. In the end, what makes PLOS ONE different is that their online commenting system makes it possible for many eyes to be involved in a continuous review process — not just 2-3 reviewers and an editor before publication and then we’re done. That seems much smarter about the probabilistic nature of peer review. And PLOS ONE makes it possible to address potential errors quickly and transparently and in a way that is directly linked from the published article. Whereas with the other 3 papers, assuming that those corrections have been formally submitted to the respective journals, it could still be quite a while before they appear in print, and the original versions could be in wide circulation by then.

 

The PoPS replication reports format is a good start

Big news today is that Perspectives on Psychological Science is going to start publishing pre-registered replication reports. The inaugural editors will be Daniel Simons and Alex Holcombe, who have done the serious legwork to make this happen. See the official announcement and blog posts by Ed Yong and Melanie Tannenbaum. (Note: this isn’t the same as the earlier plan I wrote about for Psychological Science to publish replications, but it appears to be related.)

The gist of the plan is that after getting pre-approval from the editors (mainly to filter for important but as-yet unreplicated studies), proposers will create a detailed protocol. The original authors (and maybe other reviewers?) will have a chance to review the protocol. Once it has been approved, the proposer and other interested labs will run the study. Publication will be contingent on carrying out the protocol but not on the results. Collections of replications from multiple labs will be published together as final reports.

I think this is great news. In my ideal world published replications would be more routine, and wouldn’t require all the hoopla of prior review by original authors, multiple independent replications packaged together, etc. etc. In other words, they shouldn’t be extraordinary, and they should be as easy or easier to publish than original research. I also think every journal should take responsibility for replications of its own original reports (the Pottery Barn rule). BUT… this new format doesn’t preclude any of that from also happening elsewhere. By including all of those extras, PoPS replication reports might function as a first-tier, gold standard of replication. And by doing a lot of things right (such as focusing on effect sizes rather than tallying “successful” and “failed” replications, which is problematic) they might set an example for more mundane replication reports in other outlets.

This won’t solve everything — not by a long shot. We need to change scientific culture (by which I mean institutional incentives) so that replication is a more common and more valued activity. We need funding agencies to see it that way too. In a painful coincidence, news came out today that a cognitive neuroscientist admitted to misconduct in published research. One of the many things that commonplace replications would do would be to catch or prevent fraud. But whenever I’ve asked colleagues who use fMRI whether people in their fields run direct replications, they’ve just laughed at me. There’s little incentive to run them and no money to do it even if you wanted to. All of that needs to change across many areas of science.

But you can’t solve everything at once, and the PoPS initiative is an important step forward.

Psychological Science to publish direct replications (maybe)

Pretty big news. Psychological Science is seriously discussing 3 new reform initiatives. They are outlined in a letter being circulated by Eric Eich, editor of the journal, and they come from a working group that includes top people from APS and several other scientists who have been active in working for reforms.

After reading it through (which I encourage everybody to do), here are my initial takes on the 3 initiatives:

Initiative 1: Create tutorials on power, effect size, and confidence intervals. There’s plenty of stuff out there already, but if PSci creates a good new source and funnels authors to it, it could be a good thing.

Initiative 2: Disclosure statements about research process (such as how sample size was determined, unreported measures, etc.) This could end up being a good thing, but it will be complicated. Simine Vazire, one of the working group members who is quoted in the proposal, puts it well:

We are essentially asking people to “incriminate” themselves — i.e., reveal information that, in the past, editors have treated as reasons not to publish a paper. If we want authors to be honest, I think they will want some explicit acknowledgement that some degree of messiness (e.g., a null result here and there) will be tolerated and perhaps even treated as evidence that the entire set of findings is even more plausible (a la [Gregory] Francis, [Uli] Schimmack, etc.).

I bet there would be low consensus about what kinds and amounts of messiness are okay, because no one is accustomed to seeing that kind of information on a large scale in other people’s studies. It is also the case that things that are problematic in one subfield may be more reasonable in another. And reviewers and editors who lack the time or local expertise to really judge messiness against merit may fall back on simplistic heuristics rather than thinking things through in a principled way. (Any psychologist who has ever tried to say anything about causation, however tentative and appropriately bounded, in data that was not from a randomized experiment probably knows what that feels like.)

Another basic issue is whether people will be uniformly honest in the disclosure statements. I’d like to believe so, but without a plan for real accountability I’m not sure. If some people can get away with fudging the truth, the honest ones will be at a disadvantage.

3. A special submission track for direct replications, with 2 dedicated Associate Editors and a system of pre-registration and prior review of protocols to allow publication decisions to be decoupled from outcomes. A replication section at a journal? If you’ve read my blog before you might guess that I like that idea a lot.

The section would be dedicated to studies previously published in Psychological Science, so in that sense it is in the same spirit as the Pottery Barn Rule. The pre-registration component sounds interesting — by putting a substantial amount of review in place before data are collected, it helps avoid the problem of replications getting suppressed because people don’t like the outcomes.

I feel mixed about another aspect of the proposal, limiting replications to “qualified” scientists. There does need to be some vetting, but my hope is that they will set the bar reasonably low. “This paradigm requires special technical knowledge” can too easily be cover for “only people who share our biases are allowed to study this effect.” My preference would be for a pro-data, pro-transparency philosophy. Make it easy for for lots of scientists to run and publish replication studies, and make sure the replication reports include information about the replicating researchers’ expertise and experience with the techniques, methods, etc. Then meta-analysts can code for the replicating lab’s expertise as a moderator variable, and actually test how much expertise matters.

My big-picture take. Retraction Watch just reported yesterday on a study showing that retractions, especially retractions due to misconduct, cause promising scientists to move to other fields and funding agencies to direct dollars elsewhere. Between alleged fraud cases like Stapel, Smeesters, and Sanna, and all the attention going to false-positive psychology and questionable research practices, psychology (and especially social psychology) is almost certainly at risk of a loss of talent and money.

Getting one of psychology’s top journals to make real reforms, with the institutional backing of APS, would go a long way to counteract those negative effects. A replication desk in particular would leapfrog psychology past what a lot of other scientific fields do. Huge credit goes to Eric Eich and everyone else at APS and the working group for trying to make real reforms happen. It stands a real chance of making our science better and improving our credibility.

A Pottery Barn rule for scientific journals

Proposed: Once a journal has published a study, it becomes responsible for publishing direct replications of that study. Publication is subject to editorial review of technical merit but is not dependent on outcome. Replications shall be published as brief reports in an online supplement, linked from the electronic version of the original.

*****

I wrote about this idea a year ago when JPSP refused to publish a paper that failed to replicate one of Daryl Bem’s notorious ESP studies. I discovered, immediately after writing up the blog post, that other people were thinking along similar lines. Since then I have heard versions of the idea come up here and there. And strands of it came up again in David Funder’s post on replication (“[replication] studies should, ideally, be published in the same journal that promulgated the original, misleading conclusion”) and the comments to it. When a lot of people are coming up with similar solutions to a problem, that’s probably a sign of something.

Like a lot of people, I believe that the key to improving our science is through incentives. You can finger-wag about the importance of replication all you want, but if there is nowhere to publish and no benefit for trying, you are not going to change behavior. To a large extent, the incentives for individual researchers are controlled through institutions — established journal publishers, professional societies, granting agencies, etc. So if you want to change researchers’ behavior, target those institutions.

Hence a Pottery Barn rule for journals: once you publish a study, you own its replicability (or at least a significant piece of it).

This would change the incentive structure for researchers and for journals in a few different ways. For researchers, there are currently insufficient incentives to run replications. This would give them a virtually guaranteed outlet for publishing a replication attempt. Such publications should be clearly marked on people’s CVs as brief replication reports (probably by giving the online supplement its own journal name, e.g., Journal of Personality and Social Psychology: Replication Reports). That would make it easier for the academic marketplace (like hiring and promotion committees, etc.) to reach its own valuation of such work.

I would expect that grad students would be big users of this opportunity. Others have proposed that running replications should be a standard part of graduate training (e.g., see Matt Lieberman’s idea). This would make it worth students’ while, but without the organizational overhead of Matt’s proposal. The best 1-2 combo, for grad students and PIs alike, would be to embed a direct replication in a replicate-and-extend study. Then if the “extend” part does not work out, the replication report is a fallback (hopefully with a footnote about the failed extend). And if it does, the new paper is a more cumulative contribution than the shot-in-the-dark papers we often see now.

A system like this would change the incentive structure for original studies too. Researchers would know that whatever they publish is eventually going to be linked to a list of replication attempts and their outcomes. As David pointed out, knowing that others will try to replicate your work — and in this proposal, knowing that reports of those attempts would be linked from your own paper! — would undermine the incentives to use questionable research practices far better than any heavy-handed regulatory response. (And if that list of replication attempts is empty 5 years down the road because nobody thinks it’s worth their while to replicate your stuff? That might say something too.)

What about the changed incentives for journals? One benefit would be that the increased accountability for individual researchers should lead to better quality submissions for journals that adopted this policy. That should be a big plus.

A Pottery Barn policy would also increase accountability for journals. It would become much easier to document a journal’s track record of replicability, which could become a counterweight to the relentless pursuit of impact factors. Such accountability would mean a greater emphasis on evaluating replicability during the review process — e.g., to consider statistical power, to let reviewers look at the raw data and the materials and stimuli, etc.

But sequestering replication reports into an online supplement means that the journal’s main mission can stay intact. So if a journal wants to continue to focus on groundbreaking first reports in its main section, it can continue to do so without fearing that its brand will be diluted (though I predict that it would have to accept a lower replication rate in exchange for its focus on novelty).

Replication reports would generate some editorial overhead, but not nearly as much as original reports. They could be published based directly on an editorial decision, or perhaps with a single peer reviewer. A structured reporting format like the one used at Psych File Drawer would make it easier to evaluate the replication study relative to the original. (I would add a field to describe the researchers’ technical expertise and experience with the methods, since that is a potential factor in explaining differences in results.)

Of course, journals would need an incentive to adopt the Pottery Barn rule in the first place. Competition from outlets like PLoS One (which does not consider importance/novelty in its review criteria) or Psych File Drawer (which only publishes replications) might push the traditional journals in this direction. But ultimately it is up to us scientists. If we cite replication studies, if we demand and use outlets that publish them, and if we we speak loudly enough — individually or through our professional organizations — I think the publishers will listen.

Journals can be groundbreaking or definitive, not both

I was recently invited to contribute to Personality and Social Psychology Connections, an online journal of commentary (read: fancy blog) run by SPSP. Don Forsyth is the editor, and the contributors include David Dunning, Harry Reis, Jennifer Crocker, Shige Oishi, Mark Leary, and Scott Allison. My inaugural post is titled “Groundbreaking or definitive? Journals need to pick one.” Excerpt:

Do our top journals need to rethink their missions of publishing research that is both groundbreaking and definitive? And as a part of that, do they — and we scientists — need to reconsider how we engage with the press and the public?…

In some key ways groundbreaking is the opposite of definitive. There is a lot of hard work to be done between scooping that first shovelful of dirt and completing a stable foundation. And the same goes for science (with the crucial difference that in science, you’re much more likely to discover along the way that you’ve started digging on a site that’s impossible to build on). “Definitive” means that there is a sufficient body of evidence to accept some conclusion with a high degree of confidence. And by the time that body of evidence builds up, the idea is no longer groundbreaking.

Read it here.

 

How should journals handle replication studies?

Recently Ben Goldacre wrote about a group of researchers (Stuart Ritchie, Chris French, and Richard Wiseman) whose null replication of 3 experiments from the infamous Bem ESP paper was rejected by JPSP – the same journal that published Bem’s paper.

JPSP is the flagship journal in my field, and I’ve published in it and I’ve reviewed for it, so I’m reasonably familiar with how it ordinarily works. It strives to publish work that is theory-advancing. I haven’t seen the manuscript, but my understanding is that the Ritchie et al. experiments were exact replications (not “replicate and extend” studies). In the usual course of things, I wouldn’t expect JPSP to accept a paper that only reported exact replication studies, even if their results conflicted with the original study.

However, the Bem paper was extraordinary in several ways. I had two slightly different lines of thinking about JPSP’s rejection.

My first thought was that given the extraordinary nature of the Bem paper, maybe JPSP has a special obligation to go outside of its usual policy. Many scientists think that Bem’s effects are impossible, which created the big controversy around the paper. So in this instance, a null replication has a special significance that usually it would not. That would be especially true if the results reported by Ritchie et al. fell outside of the Bem studies’ replication interval (i.e., if they statistically conflicted; I don’t know whether or not that is thecase).

My second line of thinking was slightly different. Some people have suggested that the Bem paper shines a light on shortcomings of our usual criteria for what constitutes good methodology. Tal Yarkoni made this argument very well. In short: the Bem paper was judged by the same standard that other papers are judged by. So the fact that an effect that most of us consider impossible was able to pass that standard should cause us to question the standard, rather than just attacking the paper.

So by that same line of thinking, maybe the rejection of the Ritchie et al. null replication should make us rethink the usual standards for how journals treat replications. Prior to electronic publication — in an age where journal pages were scarce and expensive — the JPSP policy made sense for a flagship journal that strived to be “theory advancing.” But a consequence of that kind of policy is that exact replication studies are undervalued. Since researchers know from the outset that the more prestigious journals won’t publish exact replications, we have a low incentive to invest time and energy running them. Replications still get run, but often only if a researcher can think of some novel extension, like a moderator variable or a new condition to compare the old ones too. And then the results might only get published if the extension yields a novel and statistically significant result.

But nowadays, in the era of electronic publication, why couldn’t a journal also publish an online supplement of replication studies? Call it “JPSP: Replication Reports.” It would be a home for all replication attempts of studies originally published in the journal. This would have benefits for individual investigators, for journals, and for the science as a whole.

For individual investigators, it would be an incentive to run and report exact replication studies simply to see if a published effect can be reproduced. The market – that is, hiring and tenure committees – would sort out how much credit to give people for publishing such papers, in relation to the more usual kind. Hopefully it would be greater than zero.

For journals, it would be additional content and added value to users of their online services. Imagine if every time you viewed the full text of a paper, there was a link to a catalog of all replication attempts. In addition to publishing and hosting replication reports, journals could link to replicate-and-extend studies published elsewhere (e.g., as a subset of a “cited by” index). That would be a terrific service to their customers.

For the science, it would be valuable to encourage and document replications better than we currently do. When a researcher looks up an article, you could immediately and easily see how well the effect has survived replication attempts. It would also help us organize information better for meta-analyses and the like. It would help us keep labs and journals honest by tracking phenomena like the notorious decline effect and publication bias. In the short term that might be bad for some journals (I’d guess that journals that focus on novel and groundbreaking research are going to show stronger decline curves). But in the long run, it would be another index (alongside impact factors and the like) of the quality of a journal — which the better journals should welcome if they really think they’re doing things right. It might even lead to improvement of some of the problems that Tal discussed. If researchers, editors, and publishers knew that failed replications would be tied around the neck of published papers, there would be an incentive to improve quality and close some methodological holes.

Are there downsides that I’m not thinking of? Probably. Would there be barriers to adopting this? Almost certainly. (At a minimum, nobody likes change.) Is this a good idea? A terrible idea? Tell me in the comments.

Postscript: After I drafted this entry and was getting ready to post it, I came across this article in New Scientist about the rejection. It looks like Richard Wiseman already had a similar idea:

“My feeling is that the whole system is out of date and comes from a time when journal space was limited.” He argues that journals could publish only abstracts of replication studies in print, and provide the full manuscript online.