A collection of links to commentary on the Diederik Stapel fraud
I was asked to talk to a research ethics seminar about the Diederik Stapel fraud case. I pulled together a few links to circulate to the class, but I thought I’d put up a blog post and see if anyone has any other suggestions. I am especially interested in commentary and recommendations (rather than straight news coverage).
Here is what I have so far:
Jennifer Crocker ponders the slippery slope from corner-cutting to outright fraud (see also this short version)
Jelte Wicherts proposes that mandatory data sharing could prevent fraud. (And see his empirical study of data-sharing and research quality.)
Brent Roberts offers a long list of problematic practices in psychology, including undervaluing replication (see my thoughts on replication here), selective reporting, HARKing, and more.
Interviews with various psychologists (esp. Eric-Jan Wagenmakers) about problems with NHST and valuing surprising/counterintuitive findings
Andrew Gelman compares Stapel to other cheaters
Yours truly on whether our top journals have incoherent missions that produce perverse incentives.
Maybe this is all because psychology isn’t a real science. Benedict Carey of the New York Times, editorializing in a news story, suggests that psychology badly needs an overhaul. Hank Carey of Science 2.0 thinks social psychology is too fuzzy. Andrew Ferguson of the Weekly Standard detects a tendency of journalists and psychologists to like gimmicky-but-meaningless findings, which he calls The Chump Effect.
Did the liberal / progressive message of Stapel’s work help it escape scrutiny? Retraction Watch suggests the possibility; Rush Limbaugh has no doubt.
What other commentary and recommendations are floating around?
Update 1/3/2012: I have seen a few incoming links describing the Psych Science email discussion as “leaked” or “made public.” For the record, the discussion was forwarded to me from someone who got it from a professional listserv, so it was already out in the open and circulating before I posted it here. Considering that it was carefully redacted and compiled for circulation by the incoming editor-in-chief, I don’t think “leaked” is a correct term at all (and “made public” happened before I got it).
***
I recently got my hands on an email discussion among the Psychological Science editorial board. The discussion is about whether or how to implement recommendations by Poldrack et al. (2008) and Simmons, Nelson, and Simonsohn (2011) for research methods and reporting. The discussion is well worth reading and appears to be in circulation already, so I am posting it here for a wider audience. (All names except the senior editor, John Jonides, and Eric Eich who compiled the discussion, were redacted by Eich; commenters are instead numbered.)
The Poldrack paper proposes guidelines for reporting fMRI experiments. The Simmons paper is the much-discussed “false-positive psychology” paper that was itself published in Psych Science. The argument in the latter is that slippery research and reporting practices can produce “researcher degrees of freedom” that inflate Type I error. To reduce these errors, they make 6 recommendations for researchers and 4 recommendations for journals to reduce these problems.
There are a lot of interesting things to come out of the discussion. Regarding the Poldrack paper, the discussion apparently got started when a student of Jonides analyzed the same fMRI dataset under several different defensible methods and assumptions and got totally different results. I can believe that — not because I have extensive experience with fMRI analysis (or any hands-on experience at all), but because that’s true with any statistical analysis where there is not strong and widespread consensus on how to do things. (See covariate adjustment versus difference scores.)
The other thing about the Poldrack discussion that caught my attention was commenter #8, who asked that more attention be given to selection and determination of ROIs. S/he wrote:
We, as psychologists, are not primarily interested in exploring the brain. Rather, we want to harness fMRI to reach a better understanding of psychological process. Thus, the choice of the various ROIs should be derived from psychological models (or at least from models that are closely related to psychological mechanisms). Such a justification might be an important editorial criterion for fMRI studies submitted to a psychological journal. Such a psychological model might also include ROIs where NO activity is expected, control regions, so to speak.
A.k.a. convergent and discriminant validity. (Once again, the psychometricians were there first.) A lot of research that is billed (in the press or in the scientific reports themselves) as reaching new conclusions about the human mind is really, when you look closely, using established psychological theories and methods as a framework to explore the brain. Which is a fine thing to do, and in fact is a necessary precursor to research that goes the other way, but shouldn’t be misrepresented.
Turning to the Simmons et al. piece, there was a lot of consensus that it had some good ideas but went too far, which is similar to what I thought when I first read the paper. Some of the Simmons recommendations were so obviously important that I wondered why they needed to be made at all, because doesn’t everybody know them already? (E.g., running analyses while you collect data and using p-values as a stopping rule for sample size — a definite no-no.) The fact that Simmons et al. thought this needed to be said makes me worried about the rigor of the average research paper. Other of their recommendations seemed rather rigid and targeted toward a pretty small subset of research designs. The n>20 rule and the “report all your measures” rule might make sense for small-and-fast randomized experiments of the type the authors probably mostly do themselves, but may not work for everything (case studies, intensive repeated-measures studies, large multivariate surveys and longitudinal studies, etc.).
Commenter #8 (again) had something interesting to say about a priori predictions:
It is always the educated reader who needs to be persuaded using convincing methodology. Therefore, I am not interested in the autobiography of the researcher. That is, I do not care whether s/he has actually held the tested hypothesis before learning about the outcomes…
Again, an interesting point. When there is not a strong enough theory that different experts in that theory would have drawn the same hypotheses independently, maybe a priori doesn’t mean much? Or put a little differently: a priori should be grounded in a publicly held and shared understanding of a theory, not in the contents of an individual mind.
Finally, a general point that many people made was that Psych Science (and for that matter, any journal nowadays) should make more use of supplemental online materials (SOM). Why shouldn’t stimuli, scripts, measures, etc. — which are necessary to conduct exact replications — be posted online for every paper? In current practice, if you want to replicate part or all of someone’s procedure, you need to email the author. Reviewers almost never have access to this material, which means they cannot evaluate it easily. I have had the experience of getting stimuli or measures for a published study and seeing stuff that made me worry about demand characteristics, content validity, etc. That has made me wonder why reviewers are not given the opportunity to closely review such crucial materials as a matter of course.
Journals can be groundbreaking or definitive, not both
I was recently invited to contribute to Personality and Social Psychology Connections, an online journal of commentary (read: fancy blog) run by SPSP. Don Forsyth is the editor, and the contributors include David Dunning, Harry Reis, Jennifer Crocker, Shige Oishi, Mark Leary, and Scott Allison. My inaugural post is titled “Groundbreaking or definitive? Journals need to pick one.” Excerpt:
Do our top journals need to rethink their missions of publishing research that is both groundbreaking and definitive? And as a part of that, do they — and we scientists — need to reconsider how we engage with the press and the public?…
In some key ways groundbreaking is the opposite of definitive. There is a lot of hard work to be done between scooping that first shovelful of dirt and completing a stable foundation. And the same goes for science (with the crucial difference that in science, you’re much more likely to discover along the way that you’ve started digging on a site that’s impossible to build on). “Definitive” means that there is a sufficient body of evidence to accept some conclusion with a high degree of confidence. And by the time that body of evidence builds up, the idea is no longer groundbreaking.
What the Heck is Research Anyway? (A guest post by Brent Roberts)
Brent Roberts recently showed me a copy of this essay he wrote to explain to family members what he does for a living. I thought it would make a neat holiday-themed entry on the blog (a link to forward in response to “it must be so nice to have almost a month off between semesters!”). So I asked him if I could put it up as a guest post, and he kindly agreed.
Recently, I was asked for the 17th time[1] by a family member, “So, what are you going to do this summer?” As usual, I answered, “research.” And, as usual, I was met with that quizzical look that says, “What the heck is research anyway?”
It struck me in retrospect that I’ve done a pretty poor job of describing what research is to my family and friends. So, I thought it might be a good idea to write an open letter that tries explaining research a little better. You deserve an explanation. So do other people, like parents of students and the general public. You all pay a part of our salary, either through your taxes or the generous support of your kid’s education, and therefore should know where your money goes.
First, I should apologize if my reaction to the question “What are you going to do this summer?” has been less than positive in the past. It is hard not to react negatively. Because when asked this question it is hard not to interpret it as really asking “Hey, you’re a teacher, and now that you are done teaching, what the heck are you going to do with yourself?” Since scientific research is typically the majority of the work we do in the professoriate we tend to chafe at seeing our job pigeonholed in such a way. In fact, when we are asked “are you done for the summer?”, we typically think to ourselves “I’m going to get a boat-load of research done, like four papers, two grants, and some progress made on my book, along with starting several new projects.” In other words, we typically think along the lines of “I’m going to work my tail off this summer because I’m finally free of those teaching and service obligations which take me away from what I love to do and for that matter what I mostly get paid to do.”
Let me expand on that latter point a little before delving into what I mean by scientific research. As a professor at a major research university I am paid to do three things: Research, teaching, and service. On the teaching side of things, we often teach what appears to be an appallingly small number of classes. That said much of our teaching is done in the old-fashioned artisan-apprentice fashion—one-on-one with students. We have countless meetings throughout our week outside of the classroom working with undergraduate and graduate students, and post-doctoral researchers teaching them how to do research. In terms of service, we are tasked with helping to run our department and university, and with running the guilds to which we belong. I can expand on that later if you like. That said, one thing you may not have known is that at major research universities teaching and research service constitute less than 50% of our job description, combined. You may expect us to take summers and winter breaks off, but our universities are smiling as we apply ourselves to what they hired us to do, research—often when they are not even paying us. There’s nothing like free labor[2].
So what is research anyway? Let me answer a slightly different question that my wife’s aunt asked recently as it will help frame the answer. She asked, “What purpose does research serve?” Now there is probably less consensus on the answer to this question than I’d like, but ultimately, I think the answer is knowledge. Research is supposed to provide knowledge that can be used by others and hopefully the broader society. To illustrate, let me describe the number of ways in which the knowledge we generate might be used.
The most common way that the knowledge we create is used is by other researchers. This is what you’ll hear described as “basic” research because it may or may not have a direct applied purpose. This is about all most researchers can aspire to. We are pretty happy if other researchers not only read our work, but also draw on it to inform their research too. This is important because the knowledge we generate is not only read, but also built upon and extended in meaningful ways by others. The next way our knowledge is used – and ultimately the way our research will most likely influence society – is through teaching. Yes, our research hopefully gets incorporated into the classroom because it is summarized in textbooks or our original research articles are assigned as core reading. In this way, our research forms the material that thousands of students learn in order to make themselves better-informed citizens who hopefully go on to be productive members of society. Finally, our research might be used for more practical aims like shaping social policy set by State and Federal authorities, informing decisions made by employers or other organizations, or helping practitioners treat illness. For example, recently a Nobel Prize winning economist discovered our work on the personality dimension of conscientiousness (being self-controlled, responsible, and organized). He has conducted rigorous work demonstrating the importance of this psychological attribute to human potential, and has started lobbying congress and federal funding agencies to focus on how we can teach kids to be more conscientious. Similarly, other scholars conduct research on how best to characterize psychopathology and, in turn, how that might affect the way we treat patients. Not many researchers ever get this level of influence, but you see it in medical breakthroughs and engineering accomplishments on a regular basis. So, ultimately, we do research to provide usable knowledge back to society. At least, that’s my opinion.
So what do we do when we do this thing called research? I can’t speak for all types of scientists, but here are what I believe to be the basic phases of the generic research project:
- We are posed with a problem, challenge, riddle, or question that needs to be solved or answered. For example, Teresa might ask: “How can an employer help workers to see work as more meaningful?”
- We come up with a method for answering the question.
- We assemble the tools and resources needed to conduct our research.
- We run the study intended to answer our question.
- We analyze the data that comes from our study.
- We write up our findings and send the paper off to a journal where it is reviewed by several (typically three) anonymous peers who, along with the journal editor, decide whether the way we answered the question provides an adequate answer and thus provides an incremental advancement to our knowledge. If they think we did add something to the knowledge pool, then Hallelujah, our work gets published.
I know, that all sounds a little abstract. So let me walk you through these steps in a little more detail. What do I mean by a problem, challenge, riddle, or question and where do these things come from? Well, typically, these riddles come from us knowing a lot about some particular area of knowledge. By becoming an expert in a specific area you become aware of not only what we know, but also of what we don’t know, and more importantly, what we need to know. To get the point of being able to ask the right question requires a lot of time reading, going to conferences, and meeting with other experts. That is why our graduate education took so long. That is why we spend a lot of time with our noses stuck in books and journals. We need to know.
Once we have a grasp of some issue, then comes the fun—and hard part—coming up with the question and or idea you want to test. I think this is the fun part because it is often the most creative part of the job. It is like solving a riddle or puzzle. You have a bunch of disparate facts and you need to put them together in a new way.[3] This is one reason we are so often lost in thought. It is hard to turn the thinking off and new ideas might come to us at any time and from any source. One time I cooked up a program of research by having a discussion with my mother-in-law about the Ten Commandments. Inspiration can strike anywhere, any time.
The way we test our ideas is often tied to what kind of researcher we are. That said, there are only so many options available. The key to our choice of method is that anything we do should be transparent to others, replicable (i.e., we or someone else should be able to do what we did again and get the same results), and systematic so that other researchers can duplicate our efforts. Our methods range from simple observation and documentation—this might result in a book or case study—to surveys where we see if two things go together (e.g., age and maturity)—to experiments where we obsessively control all extraneous variables so that we can get an idea of something causes something else (e.g., does increasing empathy for others increase cooperation?). Often our choices are determined by our question—for example, personality change is hard to study using experiments. Changing someone’s mood is relatively easy and easily tested with an experiment. Some of our choices are determined by technology—we can now take pictures of the brain in action, something not available to researchers in previous generations. We like technology, especially if it is new.[4]
Assembling our tools ranges from the simple—like going to a library—to the complex, like ordering a great big atom smasher or a satellite to be delivered to high earth orbit.[5] Many of us will populate a lab space with necessary equipment. Some of us will work solo on a computer in our office. Usually, we work in teams of 2 or more people and we sometimes work with researchers from other universities. Graduate students are often part of the team, but it can also include undergraduates, post docs, and administrative staff. Remember when I said we spend a lot of time teaching students one-on-one? This is a good example. For some of us, our labs become very much like a small business. Of course, to get to that stage, we typically need grant money, but that’s a topic for a different letter.
After we come up with our idea and assemble our team to work on it, we run the study. This can be as simple as borrowing other people’s data—economists seem to do this a lot—or more likely, we’ll run the study with people or animals, or things, in our labs. Sometimes this goes fast. Running a small experiment with undergraduate students can take as little as a few days. Sometimes this goes slow. I’ve been running several longitudinal studies now for 10 years. I may never stop. Some researchers become famous because at this stage they are either very creative in how they test their ideas or ingenious in how they develop their techniques. This type of technical skill is an underappreciated aspect of the job.
After we’ve collected our data, we analyze it. This is where that dreaded concept of statistics rears its ugly head. To be honest, some of us get really excited at this stage. Okay, to be really honest, I get excited at this stage. Call me a nerd. I’m okay with that. This is also where we lose our audience. You’ve probably heard us invoke statisticalese in describing our work or some other finding. It has a universal effect on the neurobiology of human brains—it puts 99% of them to sleep[6]. Again, please accept our apologies. If we start down this path, ask us to explain it in language that normal people can understand.
Finally, we write. This stage would be great for us, and others, if we could write like normal writers, but we can’t. We have to write for an academic audience. This means that most of the rhetorical techniques used by creative writers to keep readers engaged are off limits. We must choose our words carefully, be painfully consistent with those words, and hedge most everything we say. This doesn’t mean it is bad writing, just typically not that exciting—closer to a user’s manual than pulp fiction- and it’s full of arcane terms that only people in our field are likely to understand.
Of course, once we’ve written our research article we need to submit it to a scientific journal and have it reviewed. This step is what makes our work different from magazine writers, pundits, or reporters. We can’t just spout off. Our ideas need to be vetted by other knowledgeable researchers. More often than not, our papers get rejected, or at best rejected with an invitation to make revisions along the lines of the criticisms laid out by the reviewers. In other words, you have anonymous people ripping your precious ideas, hard work, and painful writing to shreds. It hurts. You will see us at our most depressed following rejections of our work. Considering the fact that a typical research project can often take upwards of three years from inspiration to rejection, a little depression is warranted. Eventually, some of our work gets published. Then, hopefully, somebody uses it, somehow.
Now, multiply this process several times over and you get an idea of our research lives. Most of us work on several projects simultaneously. It keeps us busy and off the streets at night. For that matter, it keeps us off the streets during the day too—thus the pasty complexion.
So, that’s research. Sorry to be long winded. That’s one reason we don’t elaborate on our answer to your questions concerning our summertime activities. Your eyes would be glazed over before we got to the second paragraph. Keep your questions coming, though. Right now, I gotta go do some research.
[1] I’ve been doing this professor thing for 17 years now.
[2] Our employers, Universities, typically pay us on a 9-month contract. They like research because it makes our institutions famous. The more famous the institution, the more likely students will come and the more likely granting agencies will give us money. Teaching and service are important, but research brings in the dough.
[3] I often find ideas come to me in the bathroom, which has led to the “proximity to porcelain” hypothesis. Being in contact or near porcelain acts like a catalyst for new ideas. Or, alternatively, it is the only time you are left alone for long enough to think.
[4] The ideas described here would offend our colleagues who consider themselves “post modern” or “deconstructivists”. They don’t believe in replicable knowledge or anything roughly thought of as the scientific method. We mostly humor them as they eat their own departments and fields from the inside out and then we take their faculty lines for our own.
[5] No kidding. Some physicists at Berkeley did this while I was there.
[6] Another fMRI study begging to be done….
My university’s president, Richard Lariviere, was fired last week. I sent this letter to Chancellor George Pernsteiner and the members of the State Board of Higher Education on Friday, December 2, 2011. Links have been added for the blog post.
Dear Chancellor and Members of the Board:
I am writing to you to urge you to hire Robert Berdahl as interim President of the University of Oregon. I agree with the UO Senate Executive Committee’s recommendation that Berdahl and Berdahl alone is suited for this position. I will not restate their reasoning here (all of which I concur with), but I want to add something.
Earlier this week, Dr. Berdahl wrote an op-ed in the Register-Guard criticizing you for firing Richard Lariviere. In conversations, some of my colleagues have suggested that the op-ed would make it difficult for you to credibly hire Dr. Berdahl. I believe the opposite is true: hiring Berdahl would be a showing of credibility and strength on your part. Here is why:
About Dr. Lariviere’s termination, you have stated that it “has nothing to do with policy positions or conflicting visions for the future of the University of Oregon.” Rather, “This was an issue of lack of communication and eroded trust.” (OUS Press Release of Nov 28, 2011). Right now, rightly or wrongly, many people in the University of Oregon community, around the state, and beyond doubt those words. They believe that you could not tolerate dissent, that you acted because your authority was threatened, and that you were afraid of change.
This, now, is your opportunity to back up your words with actions and show your critics, the state, and the world that you mean what you say. Dr. Berdahl has a long and distinguished history of establishing trust and communication with people he disagrees with and working effectively with state governance bodies. And he shares much of Dr. Lariviere’s broad vision. By selecting Dr. Berdahl, you would show that your idea of teamwork does not mean lock-step submission, that this is not about ego, and that you are willing to have a change agent on your team who will work with you for the good of higher education in all of Oregon. Thus, not only would Dr. Berdahl be an outstanding president, his selection would also go a long way toward restoring confidence in your governance and repairing badly damaged communication.
For the good of the university, its students, and the state that it serves, I urge you to select Dr. Berdahl.
Sincerely,
Sanjay Srivastava
In other news, people who say “I got my bachelor’s degree in medicine” are not getting jobs as doctors
The email below has been making the rounds. The APA should post it on their website, but I have not found it there. Since it says “FYI and distribution” I am taking the liberty myself.
As noted below, the data are only based on people who stopped at a bachelor’s degree (no grad school). The vast majority of undergrad psychology majors are just called “psychology.” Since people in the survey self-reported their major, I would speculate that a lot of the people claiming to have majored in “clinical psychology,” “social psychology,” etc. were just making it up to sound impressive.
From: Chairs of Councils of Directors of Training Councils [mailto:CCTC@LISTS.APA.ORG] On Behalf Of Belar, Cynthia
Sent: Saturday, November 12, 2011 11:32 AM
To: CCTC@LISTS.APA.ORG
Subject: [CCTC] NPR report
FYI and distribution.
Unfortunately, a recent report on National Public Radio [SS: and now CBS] may be misleading regarding the employment status of undergraduate psychology majors, and confusing about the employment status of clinical psychologists.
On Nov 9, 2011 NPR reported graduates with majors in clinical psychology had the highest unemployment rate — nearly 20%. Although technically correct, these data are based on terminal bachelor’s degrees, not graduate degrees, so they have no relevance to the employment status of clinical psychologists for whom the doctoral degree is required. Nor does this report represent the employment status of undergraduate majors in psychology in general, as clinical psychology majors are only a miniscule subset (<1%) of the psychology majors reported in those data.
Since APA has received many inquiries from those interpreting the NPR report as reflecting poorly on the employment status of clinical psychologists and recipients of bachelor’s degrees in psychology in general, we have prepared the following information for clarification.
* The data NPR cited are from a table recently published by the Wall Street Journal entitled From College Major to Career. They are self-report data from the American Community Survey (ACS) by the Census Bureau.
* There are eight undergraduate degrees in psychology reported: clinical psychology, cognitive science and biopsychology, counseling psychology, educational psychology, industrial and organizational psychology, miscellaneous psychology, psychology and social psychology.
* The category of “psychology” was the 5th most popular among all majors reported, with an unemployment rate for psychology of 6.1% that is not much different from biology (5.6%), computer science (5.6%), economics (6.3%) and geography (6.1%).
* The vast majority of undergraduate institutions that provide degrees in psychology either provide a BA or BS in psychology – not a degree in an area of specialization such as clinical (perhaps explaining why the popularity of clinical psychology as a major is ranked 168, while psychology as a major is ranked as 5)
* Data from the previous year’s Census Bureau survey are available on the Georgetown University Center on Education and the Workforce website; see http://cew.georgetown.edu/collegepayoff/. These data also illustrate how unrepresentative the data on clinical psychology are of undergraduate psychology education in general. As noted on page 170, clinical psychology represents less than one percent (0.76%) of the approximately 1.5 million psychology majors reported. The authors also note: “Sample size was too small to be statistically valid.” Of interest was that the unemployment rate for clinical psychology bachelor’s degrees in that year was 5%.
* With respect to employment of individuals holding doctoral degrees in clinical psychology, the data on 2009 degree recipients reveal that 3.8% were unemployed seeking employment: http://www.apa.org/workforce/publications/09-doc-empl/table-2.pdf
Although the NPR report and its focus on clinical psychology has masked important information on the large number of undergraduate majors in psychology, it has brought to light the need for more public understanding of the undergraduate major in psychology. According to the National Center on Educational Statistics, roughly 90,000 students graduate each year with a bachelor’s degree in psychology. The Wall Street Journal data and those from the Georgetown Center for Education and the Workforce suggest that employment rates for psychology majors are similar to many other disciplines. Moreover, the graduates are employed across multiple sectors as would be consistent with the goals of the undergraduate major in psychology.
APA has specific policies guiding the undergraduate major in psychology, including Guidelines for the Undergraduate Psychology Major and Principles for Quality Undergraduate Education in Psychology. We strongly encourage consumers of undergraduate education to use these guides in making choices among majors on their campuses. We also wish to highlight that a bachelor’s degree in clinical psychology is a miniscule subset of psychology majors, and that a doctoral degree is required for one to become a clinical psychologist.
We wish to acknowledge Jeff Strohl (Georgetown Center for Education and the Workforce) and Joseph Light (Wall Street Journal) for their helpfulness in ensuring we had accurate data.
Cynthia D. Belar, PhD, ABPP | Executive Director
Education Directorate
American Psychological Association
Mark Zuckerberg on psychology and social media
In response to Florida Governor Rick Scott attacking Florida universities for graduating too many psychology majors (among other disciplines), a group of department chairs put out a report explaining and defending the discipline. Toward the end they list some famous psychology majors, and among them is Mark Zuckerberg.
Here’s Zuckerberg in the Deseret News:
“All of these problems at the end of the day are human problems,” he said. “I think that that’s one of the core insights that we try to apply to developing Facebook. What [people are] really interested in is what’s going on with the people they care about. It’s all about giving people the tools and controls that they need to be comfortable sharing the information that they want. If you do that, you create a very valuable service. It’s as much psychology and sociology as it is technology.”
And it’s not just talk — he’s hiring psychology PhDs (including a University of Oregon graduate).
See also here (psych major stuff starts around 1:00; gets especially interesting around 2:50).
Hard copy? Really?
Is there some legitimate, non-Luddite reason why some psychology departments continue to insist on hard copy for letters of recommendation? Electronic signatures are legal, folks, and most of your peers have gotten with the program.
Seriously, is there something I’m missing?
Does psilocybin cause changes in personality? Maybe, but not so fast
This morning I came across a news article about a new study claiming that psilocybin (the active ingredient in hallucinogenic mushrooms) causes lasting changes in personality, specifically the Big Five factor of openness to experience.
It was hard to make out methodological details from the press report, so I looked up the journal article (gated). The study, by Katherine MacLean, Matthew Johnson, and Roland Griffiths, was published in the Journal of Psychopharmacology. When I read the abstract I got excited. Double blind! Experimentally manipulated! Damn, I thought, this looks a lot better than I thought it was going to be.
The results section was a little bit of a letdown.
Here’s the short version: Everybody came in for 2 to 5 sessions. In session 1 some people got psilocybin and some got a placebo (the placebo was methylphenidate, a.k.a., Ritalin; they also counted as “placebos” some people who got a very low dose of psilocybin in their first session). What the authors report is a significant increase in NEO Openness from pretest to after the last session. That analysis is based on the entire sample of N=52 (everybody got an active dose of psilocybin at least once before the study was over). In a separate analysis they report no significant change from pretest to after session 1 for the n=32 people who got the placebo first. So they are basing a causal inference on the difference between significant and not significant. D’oh!
To make it (even) worse, the “control” analysis had fewer subjects, hence less power, than the “treatment” analysis. So it’s possible that openness increased as much or even more in the placebo contrast as it did in the psilocybin contrast. (My hunch is that’s not what happened, but it’s not ruled out. They didn’t report the means.)
None of this means there is definitely no effect of psilocybin on Openness; it just means that the published paper doesn’t report an analysis that would answer that question. I hope the authors, or somebody else, come back with a better analysis. (A simple one would be a 2×2 ANOVA comparing pretest versus post-session-1 for the placebo-first versus psilocybin-first subjects. A slightly more involved analysis might involve a multilevel model that could take advantage of the fact that some subjects had multiple post-psilocybin measurements.)
Aside from the statistics, I had a few observations.
One thing you’d worry about with this kind of study – where the main DV is self-reported – is demand or expectancy effects on the part of subjects. I know it was double-blind, but they might have a good idea about whether they got psilocybin. My guess is that they have some pretty strong expectations about how shrooms are supposed to affect them. And these are people who volunteered to get dosed with psilocybin, so they probably had pretty positive expectations. I wouldn’t call the self-report issue a dealbreaker, but in a followup I’d love to see some corroborating data (like peer reports, ecological momentary assessments, or a structured behavioral observation of some kind).
On the other hand, they didn’t find changes in other personality traits. If the subjects had a broad expectation that psilocybin would make them better people, you would expect to see changes across the board. If their expectations were focused around Openness-related traits, that’s less relevant.
If you accept the validity of the measures, it’s also noteworthy that they didn’t get higher in neuroticism — which is not consistent with what the government tells you will happen if you take shrooms.
One of the most striking numbers in the paper is the baseline sample mean on NEO Openness — about 64. That is a T-score (normed [such as it is] to have a mean = 50, SD = 10). So that means that in comparison to the NEO norming sample, the average person in this sample was about 1.4 SDs above the mean — which is above the 90th percentile — in Openness. I find that to be a fascinating peek into who volunteers for a psilocybin study. (It does raise questions about generalizability though.)
Finally, because psilocybin was manipulated within subjects, the long-term (one year-ish) followup analysis did not have a control group. Everybody had been dosed. They predicted Openness at one year out based on the kinds of trip people reported (people who had a “complete mystical experience” also had the sustained increase in openness). For a much stronger inference, of course, you’d want to manipulate psilocybin between subjects.
Do not use what I am about to teach you
I am gearing up to teach Structural Equation Modeling this fall term. (We are on quarters, so we start late — our first day of classes is next Monday.)
Here’s the syllabus. (pdf)
I’ve taught this course a bunch of times now, and each time I teach it I add more and more material on causal inference. In part it’s a reaction to my own ongoing education and evolving thinking about causation, and in part it’s from seeing a lot of empirical work that makes what I think are poorly supported causal inferences. (Not just articles that use SEM either.)
Last time I taught SEM, I wondered if I was heaping on so many warnings and caveats that the message started to veer into, “Don’t use SEM.” I hope that is not the case. SEM is a powerful tool when used well. I actually want the discussion of causal inference to help my students think critically about all kinds of designs and analyses. Even people who only run randomized experiments could benefit from a little more depth than the sophomore-year slogan that seems to be all some researchers (AHEM, Reviewer B) have been taught about causation.
