An editorial board discusses fMRI analysis and “false-positive psychology”
Update 1/3/2012: I have seen a few incoming links describing the Psych Science email discussion as “leaked” or “made public.” For the record, the discussion was forwarded to me from someone who got it from a professional listserv, so it was already out in the open and circulating before I posted it here. Considering that it was carefully redacted and compiled for circulation by the incoming editor-in-chief, I don’t think “leaked” is a correct term at all (and “made public” happened before I got it).
I recently got my hands on an email discussion among the Psychological Science editorial board. The discussion is about whether or how to implement recommendations by Poldrack et al. (2008) and Simmons, Nelson, and Simonsohn (2011) for research methods and reporting. The discussion is well worth reading and appears to be in circulation already, so I am posting it here for a wider audience. (All names except the senior editor, John Jonides, and Eric Eich who compiled the discussion, were redacted by Eich; commenters are instead numbered.)
The Poldrack paper proposes guidelines for reporting fMRI experiments. The Simmons paper is the much-discussed “false-positive psychology” paper that was itself published in Psych Science. The argument in the latter is that slippery research and reporting practices can produce “researcher degrees of freedom” that inflate Type I error. To reduce these errors, they make 6 recommendations for researchers and 4 recommendations for journals to reduce these problems.
There are a lot of interesting things to come out of the discussion. Regarding the Poldrack paper, the discussion apparently got started when a student of Jonides analyzed the same fMRI dataset under several different defensible methods and assumptions and got totally different results. I can believe that — not because I have extensive experience with fMRI analysis (or any hands-on experience at all), but because that’s true with any statistical analysis where there is not strong and widespread consensus on how to do things. (See covariate adjustment versus difference scores.)
The other thing about the Poldrack discussion that caught my attention was commenter #8, who asked that more attention be given to selection and determination of ROIs. S/he wrote:
We, as psychologists, are not primarily interested in exploring the brain. Rather, we want to harness fMRI to reach a better understanding of psychological process. Thus, the choice of the various ROIs should be derived from psychological models (or at least from models that are closely related to psychological mechanisms). Such a justification might be an important editorial criterion for fMRI studies submitted to a psychological journal. Such a psychological model might also include ROIs where NO activity is expected, control regions, so to speak.
A.k.a. convergent and discriminant validity. (Once again, the psychometricians were there first.) A lot of research that is billed (in the press or in the scientific reports themselves) as reaching new conclusions about the human mind is really, when you look closely, using established psychological theories and methods as a framework to explore the brain. Which is a fine thing to do, and in fact is a necessary precursor to research that goes the other way, but shouldn’t be misrepresented.
Turning to the Simmons et al. piece, there was a lot of consensus that it had some good ideas but went too far, which is similar to what I thought when I first read the paper. Some of the Simmons recommendations were so obviously important that I wondered why they needed to be made at all, because doesn’t everybody know them already? (E.g., running analyses while you collect data and using p-values as a stopping rule for sample size — a definite no-no.) The fact that Simmons et al. thought this needed to be said makes me worried about the rigor of the average research paper. Other of their recommendations seemed rather rigid and targeted toward a pretty small subset of research designs. The n>20 rule and the “report all your measures” rule might make sense for small-and-fast randomized experiments of the type the authors probably mostly do themselves, but may not work for everything (case studies, intensive repeated-measures studies, large multivariate surveys and longitudinal studies, etc.).
Commenter #8 (again) had something interesting to say about a priori predictions:
It is always the educated reader who needs to be persuaded using convincing methodology. Therefore, I am not interested in the autobiography of the researcher. That is, I do not care whether s/he has actually held the tested hypothesis before learning about the outcomes…
Again, an interesting point. When there is not a strong enough theory that different experts in that theory would have drawn the same hypotheses independently, maybe a priori doesn’t mean much? Or put a little differently: a priori should be grounded in a publicly held and shared understanding of a theory, not in the contents of an individual mind.
Finally, a general point that many people made was that Psych Science (and for that matter, any journal nowadays) should make more use of supplemental online materials (SOM). Why shouldn’t stimuli, scripts, measures, etc. — which are necessary to conduct exact replications — be posted online for every paper? In current practice, if you want to replicate part or all of someone’s procedure, you need to email the author. Reviewers almost never have access to this material, which means they cannot evaluate it easily. I have had the experience of getting stimuli or measures for a published study and seeing stuff that made me worry about demand characteristics, content validity, etc. That has made me wonder why reviewers are not given the opportunity to closely review such crucial materials as a matter of course.