# An editorial board discusses fMRI analysis and “false-positive psychology”

**Update 1/3/2012:** I have seen a few incoming links describing the *Psych Science* email discussion as “leaked” or “made public.” For the record, the discussion was forwarded to me from someone who got it from a professional listserv, so it was already out in the open and circulating before I posted it here. Considering that it was carefully redacted and compiled for circulation by the incoming editor-in-chief, I don’t think “leaked” is a correct term at all (and “made public” happened before I got it).

***

I recently got my hands on an email discussion among the *Psychological Science* editorial board. The discussion is about whether or how to implement recommendations by Poldrack et al. (2008) and Simmons, Nelson, and Simonsohn (2011) for research methods and reporting. The discussion is well worth reading and appears to be in circulation already, so I am posting it here for a wider audience. (All names except the senior editor, John Jonides, and Eric Eich who compiled the discussion, were redacted by Eich; commenters are instead numbered.)

The Poldrack paper proposes guidelines for reporting fMRI experiments. The Simmons paper is the much-discussed “false-positive psychology” paper that was itself published in Psych Science. The argument in the latter is that slippery research and reporting practices can produce “researcher degrees of freedom” that inflate Type I error. To reduce these errors, they make 6 recommendations for researchers and 4 recommendations for journals to reduce these problems.

There are a lot of interesting things to come out of the discussion. Regarding the Poldrack paper, the discussion apparently got started when a student of Jonides analyzed the same fMRI dataset under several different defensible methods and assumptions and got totally different results. I can believe that — not because I have extensive experience with fMRI analysis (or any hands-on experience at all), but because that’s true with any statistical analysis where there is not strong and widespread consensus on how to do things. (See covariate adjustment versus difference scores.)

The other thing about the Poldrack discussion that caught my attention was commenter #8, who asked that more attention be given to selection and determination of ROIs. S/he wrote:

We, as psychologists, are not primarily interested in exploring the brain. Rather, we want to harness fMRI to reach a better understanding of psychological process. Thus, the choice of the various ROIs should be derived from psychological models (or at least from models that are closely related to psychological mechanisms). Such a justification might be an important editorial criterion for fMRI studies submitted to a psychological journal. Such a psychological model might also include ROIs where NO activity is expected, control regions, so to speak.

A.k.a. convergent and discriminant validity. (Once again, the psychometricians were there first.) A lot of research that is billed (in the press or in the scientific reports themselves) as reaching new conclusions about the human mind is really, when you look closely, using established psychological theories and methods as a framework to explore the brain. Which is a fine thing to do, and in fact is a necessary precursor to research that goes the other way, but shouldn’t be misrepresented.

Turning to the Simmons et al. piece, there was a lot of consensus that it had some good ideas but went too far, which is similar to what I thought when I first read the paper. Some of the Simmons recommendations were so obviously important that I wondered why they needed to be made at all, because doesn’t everybody know them already? (E.g., running analyses while you collect data and using p-values as a stopping rule for sample size — a definite no-no.) The fact that Simmons et al. thought this needed to be said makes me worried about the rigor of the average research paper. Other of their recommendations seemed rather rigid and targeted toward a pretty small subset of research designs. The n>20 rule and the “report all your measures” rule might make sense for small-and-fast randomized experiments of the type the authors probably mostly do themselves, but may not work for everything (case studies, intensive repeated-measures studies, large multivariate surveys and longitudinal studies, etc.).

Commenter #8 (again) had something interesting to say about a priori predictions:

It is always the educated reader who needs to be persuaded using convincing methodology. Therefore, I am not interested in the autobiography of the researcher. That is, I do not care whether s/he has actually held the tested hypothesis before learning about the outcomes…

Again, an interesting point. When there is not a strong enough theory that different experts in that theory would have drawn the same hypotheses independently, maybe a priori doesn’t mean much? Or put a little differently: a priori should be grounded in a publicly held and shared understanding of a theory, not in the contents of an individual mind.

Finally, a general point that many people made was that Psych Science (and for that matter, any journal nowadays) should make more use of supplemental online materials (SOM). Why shouldn’t stimuli, scripts, measures, etc. — which are necessary to conduct exact replications — be posted online for every paper? In current practice, if you want to replicate part or all of someone’s procedure, you need to email the author. Reviewers almost never have access to this material, which means they cannot evaluate it easily. I have had the experience of getting stimuli or measures for a published study and seeing stuff that made me worry about demand characteristics, content validity, etc. That has made me wonder why reviewers are not given the opportunity to closely review such crucial materials as a matter of course.

Comments are closed.

[I just sent this note to all our doctoral students regarding the Simmons et al. paper, but am hoping to post this here without attribution. It's the points that matter, not who is making them.]

The paper makes many worthwhile points, a few of which were already well-known to statisticians (e.g., alpha levels go way up if one decides sample size by stopping whenever results “look good”).

One thing they said that surprised me concerned Bayesian statistics. Frankly, it seems highly misleading, and I wanted to both quote it in full, and address it:

“Although the Bayesian approach has many virtues, it actually increases researcher degrees of freedom. First, it offers a new set of analyses (in addition to all frequentist ones) that authors could flexibly try out on their data. Second, Bayesian statistics require making additional judgments (e.g., the prior distribution) on a case-bycase basis, providing yet more researcher degrees of freedom.”

Two points:

1) Bayesian analyses are *correct*; Frequentist (classical) analyses are *approximations*, based on assumptions that are nearly impossible to check in real data. The “choice” or “degrees of freedom” they speak of is between getting the fully correct answer and what may or may not be a good approximation to it. There are VERY VERY few statistical models for which classical analyses are known to yield the exactly correct sampling distribution, which is what Bayesian analysis always gives you.

2) Frequentist analyses ALSO use priors: flat ones. Bayesians can also do that, if they truly know nothing about the problem’s parameters. But we always know, for example, that variances are positive, that correlations are between -1 and 1, etc. So, the point they make about priors is a STRENGTH of the Bayesian analysis. You will never get, for example, a negative variance, like you can in some classical HLM or SEM programs.

It’s disspiriting reading remarks like theirs that, while seemingly plausible on one level, miss the point entirely on another.

Simon Jackman just came out with an exceptionally good book, “Bayesian Analysis for the Social Sciences”, that covers all this well. This ground has been trod lots of times, so I don’t want people reading this one little paper and thinking that Bayesian analysis isn’t worthwhile. If computers were infinitely fast, there would be no more classical statistics.

> the exactly correct sampling distribution, which is what Bayesian analysis always gives

> you.

Which is determined by the multivarite prior which might have be specified or modified in light of the data!

There are many ways multicilities can do in Bayesian analysis as indicated by a senior Bayesian researcher here – http://www.samsi.info/sites/default/files/berry_july2006.pdf

(Don Berry, Multiplicity Minefields for Bayesians)

False positives and spurious results in fMRI autism research has been discussed here:

http://sfari.org/news-and-opinion/news/2012/movement-during-brain-scans-may-lead-to-spurious-patterns