Reflections on a foray into post-publication peer review

Recently I posted a comment on a PLOS ONE article for the first time. As someone who had a decent chunk of his career before post-publication peer review came along — and has an even larger chunk of his career left with it around — it was an interesting experience.

It started when a colleague posted an article to his Facebook wall. I followed the link out of curiosity about the subject matter, but what immediately jumped out at me was that it was a 4-study sequence with pretty small samples. (See Uli Schimmack’s excellent article The ironic effect of significant results on the credibility of multiple-study articles [pdf] for why that’s noteworthy.) That got me curious about effect sizes and power, so I looked a little bit more closely and noticed some odd things. Like that different N’s were reported in the abstract and the method section. And when I calculated effect sizes from the reported means and SDs, some of them were enormous. Like Cohen’s d > 3.0 level of enormous. (If all this sounds a little hazy, it’s because my goal in this post is to talk about my experience of engaging in post-publication review — not to rehash the details. You can follow the links to the article and comments for those.)

In the old days of publishing, it wouldn’t have been clear what to do next. In principle many psych journals will publish letters and comments, but in practice they’re exceedingly rare. Another alternative would have been to contact the authors and ask them to write a correction. But that relies on the authors agreeing that there’s a mistake, which authors don’t always do. And even if authors agree and write up a correction, it might be months before it appears in print.

But this article was published in PLOS ONE, which lets readers post comments on articles as a form of post-publication peer-review (PPPR). These comments aren’t just like comments on some random website or blog — they become part of the published scientific record, linked from the primary journal article. I’m all in favor of that kind of system. But it brought up a few interesting issues for how to navigate the new world of scientific publishing and commentary.

1. Professional etiquette. Here and there in my professional development I’ve caught bits and pieces of a set of gentleman’s rules about scientific discourse (and yes, I am using the gendered expression advisedly). A big one is, don’t make a fellow scientist look bad. Unless you want to go to war (and then there are rules for that too). So the old-fashioned thing to do — “the way I was raised” — would be to contact the authors quietly and petition them to make a correction themselves, so it could look like it originated with them. And if they do nothing, probably limit my comments to grumbling at the hotel bar at the next conference.

But for PPPR to work, the etiquette of “anything public is war” has to go out the window. Scientists commenting on each other’s work needs to be a routine and unremarkable part of scientific discourse. So does an understanding that even good scientists can make mistakes. And to live by the old norms is to affirm them. (Plus, the authors chose to submit to a journal that allows public comments, so caveat author.) So I elected to post a comment and then email the authors to let them know, so they would have a chance to respond quickly if they weren’t monitoring the comments. As a result, the authors posted several comments over the next couple of days correcting aspects of the article and explaining how the errors happened. And they were very responsive and cordial over email the entire time. Score one for the new etiquette.

2. A failure of pre-publication peer review? Some of the issues I raised in my comment were indisputable factual inconsistencies — like that the sample sizes were reported differently in different parts of the paper. Others were more inferential — like that a string of significant results in these 4 studies was significantly improbable, even under a reasonable expectation of an effect size consistent with the authors’ own hypothesis. A reviewer might disagree about that (maybe they think the true effect really is gigantic). Other issues, like the too-small SDs, would have been somewhere in the middle, though they turned out to be errors after all.

Is this a mark against pre-publication peer review? Obviously it’s hard to say from one case, but I don’t think it speaks well of PLOS ONE that these errors got through. Especially because PLOS ONE is supposed to emphasize “a high technical standard” and reporting of “sufficient detail” (the reason I noticed the issue with the SDs was because the article did not report effect sizes).

But this doesn’t necessarily make PLOS ONE worse than traditional journals like Psychological Science or JPSP, where similar errors get through all the time and then become almost impossible to correct. [UPDATE: Please see my followup post about pre-publication review at PLOS ONE and other journals.]

3. The inconsistency of post-publication peer review. I don’t think post-publication peer review is a cure-all. This whole episode depended in somebody (in this case, me) noticing the anomalies and being motivated to post a comment about them. If we got rid of pre-publication peer review and if the review process remained that unsystematic, it would be a recipe for a very biased system. This article’s conclusions are flattering to most scientists’ prejudices, and press coverage of the article has gotten a lot of mentions and “hell yeah”s on Twitter from pro-science folks. I don’t think it’s hard to imagine that that contributed to it getting a pass, and that if the opposite were true the article would have gotten a lot more scrutiny both pre- and post-publication. In my mind, the fix would be to make sure that all articles get a decent pre-publication review — not to scrap it altogether. Post-publication review is an important new development but should be an addition, not a replacement.

4. Where to stop? Finally, one issue I faced was how much to say in my initial comment, and how much to follow up. In particular, my original comment made a point about the low power and thus the improbability of a string of 4 studies with a rejected null. I based that on some hypotheticals and assumptions rather than formally calculating Schimmack’s incredibility index for the paper, in part because other errors in the initial draft made that impossible. The authors never responded to that particular point, but their corrections would have made it possible to calculate an IC index. So I could have come back and tried to goad them into a response. But I decided to let it go. I don’t have an axe to grind, and my initial comment is now part of the record. And one nice thing about PPPR is that readers can evaluate the arguments for themselves. (I do wish I had cited Schimmack’s paper though, because more people should know about it.)

The PoPS replication reports format is a good start

Big news today is that Perspectives on Psychological Science is going to start publishing pre-registered replication reports. The inaugural editors will be Daniel Simons and Alex Holcombe, who have done the serious legwork to make this happen. See the official announcement and blog posts by Ed Yong and Melanie Tannenbaum. (Note: this isn’t the same as the earlier plan I wrote about for Psychological Science to publish replications, but it appears to be related.)

The gist of the plan is that after getting pre-approval from the editors (mainly to filter for important but as-yet unreplicated studies), proposers will create a detailed protocol. The original authors (and maybe other reviewers?) will have a chance to review the protocol. Once it has been approved, the proposer and other interested labs will run the study. Publication will be contingent on carrying out the protocol but not on the results. Collections of replications from multiple labs will be published together as final reports.

I think this is great news. In my ideal world published replications would be more routine, and wouldn’t require all the hoopla of prior review by original authors, multiple independent replications packaged together, etc. etc. In other words, they shouldn’t be extraordinary, and they should be as easy or easier to publish than original research. I also think every journal should take responsibility for replications of its own original reports (the Pottery Barn rule). BUT… this new format doesn’t preclude any of that from also happening elsewhere. By including all of those extras, PoPS replication reports might function as a first-tier, gold standard of replication. And by doing a lot of things right (such as focusing on effect sizes rather than tallying “successful” and “failed” replications, which is problematic) they might set an example for more mundane replication reports in other outlets.

This won’t solve everything — not by a long shot. We need to change scientific culture (by which I mean institutional incentives) so that replication is a more common and more valued activity. We need funding agencies to see it that way too. In a painful coincidence, news came out today that a cognitive neuroscientist admitted to misconduct in published research. One of the many things that commonplace replications would do would be to catch or prevent fraud. But whenever I’ve asked colleagues who use fMRI whether people in their fields run direct replications, they’ve just laughed at me. There’s little incentive to run them and no money to do it even if you wanted to. All of that needs to change across many areas of science.

But you can’t solve everything at once, and the PoPS initiative is an important step forward.

What the heck is research anyway? The annual holiday post

Happy holidays, readers! Today, of course, is the day to gather around the aluminum pole with friends and family and air your grievances. And here at The Hardest Science I am adding a holiday tradition of my own to help that process along. So sometime after the fifth “it must be nice not to have to work over your long break” but before someone pins the head of household so you can all go home, gather together all your non-academic loved ones and read this to them aloud:

What the heck is research anyway?

by Brent Roberts

Recently, I was asked for the 17th time by a family member, “So, what are you going to do this summer?”  As usual, I answered, “research.”  And, as usual, I was met with that quizzical look that says, “What the heck is research anyway?”

It struck me in retrospect that I’ve done a pretty poor job of describing what research is to my family and friends.  So, I thought it might be a good idea to write an open letter that tries explaining research a little better.  You deserve an explanation.  So do other people, like parents of students and the general public.  You all pay a part of our salary, either through your taxes or the generous support of your kid’s education, and therefore should know where your money goes.

Continue reading…

All the personality blogging you could ask for

Want to see what’s new in personality research? Check out the new ARP Personality Meta-Blog that Chris Soto just set up. (That’s ARP as in Association for Research in Personality). It’s a blog aggregator that pulls from a bunch of different personality blogs. The Meta-Blog posts titles and excerpts, with links that you can follow to the original blogs for the full posts.

By and large these are blogs written by researchers for researchers, though some also mix in more outwardly focused content (particularly at Psych Your Mind). From my perspective this is a great thing. When I started The Hardest Science it felt like psychology had plenty of general-interest blogs (like those at Psychology Today) but relatively few blogs written with a researcher audience, especially compared to fields like economics and neuroscience. So I’m happy to see that changing.

Right now the Meta-Blog is pulling from 6 blogs. They are Tal Yarkoni’s [citation needed], David Funder’s funderstorms, Brent Roberts’s pigee, the collaborative Psych Your Mind, Brent Donnellan’s Trait-State Continuum, and yours truly. If you know of a blog that should be added, please contact Chris.

Personality psychology at SPSP

Melissa Ferguson and I are the program co-chairs for the upcoming SPSP conference in New Orleans, January 17-19. That means we are in charge of the scientific content of the program. (Cindy Pickett is the convention chair, meaning she’s in charge of pretty much everything else, which I have discovered is a heck of a lot more than 99% of the world knows. If you see Cindy at the conference, please buy her a drink.) The conference is going to be awesome. You should go.

One issue that I’m particularly attuned to is the representation of personality psychology on the program. During my work as program co-chair, I heard from some people who are from a more centrally personality-psych background that they’re worried that the conference is tilted too heavily toward social psych, and therefore there won’t be enough interesting stuff to go to.

I am writing here to dispel that notion. If you are a personality psychologist and you’re wavering about going, trust me: there’ll be lots of exciting stuff for you.

SPSP has a long-standing commitment to ensuring that both of its parent disciplines are well represented at the conference. That means, first of all, that the 2 program co-chairs are picked to make sure there is broad representation at the top. So among my predecessors are folks like Veronica Benet-Martinez, Sam Gosling, Will Fleeson, etc… — people who have both the expertise and motivation to make sure that outstanding personality submissions make it onto the program. Speaking for myself, I don’t see the personality/social distinction as mapping easily onto my work (it’s both!), but hopefully most people who are from a more canonical personality point of view will see me as intellectually connected to that.

One way that directly translates into program content is through selection of reviewers. Melissa and I made sure that both the symposium and poster review panels had plenty of personality psychologists, so all personality-related submissions get a fair shake. Not every good submission made it onto the program — there was just too much good stuff (and that’s true across all topic areas). But I personally assigned every symposium submission to its reviewers, and I promise you that anything that looked personality-ish got read by someone with relevant expertise.

On top of all that, SPSP’s 2013 president is David Funder. David got to handpick speakers for a Presidential Symposium, and he’ll also give a presidential address. Those sessions will appeal to everybody at SPSP, but I think personality psychologists will feel particularly happy.

For people interested in personality psychology content, here are some highlights:

Presidential Symposium, Thu 5:00 pm – 7:00 pm. Title: “The First ‘P’ in SPSP.” David will give the opening remarks, followed by talks by Colin DeYoung on personality and neuroscience, Sarah Hampson on lifespan personality development, and Bob Krueger on how personality psychology is shaping the DSM-5. (Hardcore social folks, these are 3 dynamite researchers. I bet you’ll like this one too!)

Presidential Address, Fri 2:00 pm – 3:15 pm. David Funder gets the spotlight this time, in a talk titled “Taking the Power of the Situation Seriously.”

Award lectures, Fri 5:00 pm – 6:30 pm. The recipients of SPSP’s 3 major awards will speak at this session. Dan McAdams is the winner of the Jack Block award for personality. Dan Wegner is the winner of the Campbell award in social psych (Thalia Wheatley will be speaking on his behalf). And  Jamie Pennebaker is the winner of the inaugural Distinguished Scholar Award.

Symposium Room 217-219. In order to ensure that there is always something personality-oriented for people to go to, we picked 9 symposia that we thought would be especially appealing to personality psychologists and spread them out over every timeslot. So if you want personality, personality, and more personality, you can set up camp in room 217-219 and never leave.

All the other symposium rooms. Just because we highlighted personality stuff in one room doesn’t mean that’s the only place it appears on the schedule. “Personality versus social psychology” is a clearer distinction in people’s stereotypes than in reality. Spread across the schedule are presentations on gene-environment interactions, individual differences and health, subjective well-being, motivation and self-regulation, research methods and practices, and much more.

Posters, posters, posters. There is personality-related content in every poster session. Posters were grouped by keywords (self-nominated by the submitters), so an especially high concentration will be in Session E on Saturday morning.

As long as personality psychologists keep submitting their best stuff, the high-quality representation of personality at SPSP is going to remain the rule in years to come.

Science is more interesting when it’s true

There is a great profile of Uri Simonsohn’s fraud-detection work in the Atlantic Monthly, written by Chris Shea (via Andrew Gelman). This paragraph popped out at me:

So what, then, is driving Simonsohn? His fraud-busting has an almost existential flavor. “I couldn’t tolerate knowing something was fake and not doing something about it,” he told me. “Everything loses meaning. What’s the point of writing a paper, fighting very hard to get it published, going to conferences?”

 It reminded me of a story involving my colleague (and grand-advisor) Lew Goldberg. Lew was at a conference once when someone presented a result that he was certain could not be correct. After the talk, Lew stood up and publicly challenged the speaker to a bet that she’d made a coding error in the data. (The bet offer is officially part of the published scientific record. According to people who were there, it was for a case of whiskey.)

The research got published anyway, there were several years of back-and-forth with what Lew felt was a vague and insufficient admission of possible errors, which ended up with Lew and colleagues publishing a comment on an erratum – the only time I’ve ever heard of that happening in a scientific journal. When someone asked Lew recently why he’d been so motivated to follow through, he answered in part: “Science is more interesting when it’s true.”

What is the Dutch word for “irony?”

Breathless headline-grabbing press releases based on modest findings. Investigations driven by confirmation bias. Broad generalizations based on tiny samples.

I am talking, of course, about the final report of the Diederik Stapel investigation.

Regular readers of my blog will know that I have been beating the drum for reform for quite a while. I absolutely think psychology in general, and perhaps social psychology especially, can and must work to improve its methods and practices.

But in reading the commission’s press release, which talks about “a general culture of careless, selective and uncritical handling of research and data” in social psychology, I am struck that those conclusions are based on a retrospective review of a known fraud case — a case that the commissions were specifically charged with finding an explanation for. So when they wag their fingers about a field rife with elementary statistical errors and confirmation bias, it’s a bit much for me.

I am writing this as a first reaction based on what I’ve seen in the press. At some point when I have the time and the stomach I plan to dig into the full 100-page commission report. I hope that — as is often the case when you go from a press release to an actual report — it takes a more sober and cautious tone. Because I do think that we have the potential to learn some important things by studying how Diederik Stapel did what he did. Most likely we will learn what kinds of hard questions we need to be asking of ourselves — not necessarily what the answers to those questions will be. Remember that the more we are shocked by the commission’s report, the less willing we should be to reach any sweeping generalizations from it.

So let’s all take a deep breath, face up to the Stapel case for what it is — neither exaggerating nor minimizing it — and then try to have a productive conversation about where we need to go next.