Does psilocybin cause changes in personality? Maybe, but not so fast

This morning I came across a news article about a new study claiming that psilocybin (the active ingredient in hallucinogenic mushrooms) causes lasting changes in personality, specifically the Big Five factor of openness to experience.

It was hard to make out methodological details from the press report, so I looked up the journal article (gated). The study, by Katherine MacLean, Matthew Johnson, and Roland Griffiths, was published in the Journal of Psychopharmacology. When I read the abstract I got excited. Double blind! Experimentally manipulated! Damn, I thought, this looks a lot better than I thought it was going to be.

The results section was a little bit of a letdown.

Here’s the short version: Everybody came in for 2 to 5 sessions. In session 1 some people got psilocybin and some got a placebo (the placebo was methylphenidate, a.k.a., Ritalin; they also counted as “placebos” some people who got a very low dose of psilocybin in their first session). What the authors report is a significant increase in NEO Openness from pretest to after the last session. That analysis is based on the entire sample of N=52 (everybody got an active dose of psilocybin at least once before the study was over). In a separate analysis they report no significant change from pretest to after session 1 for the n=32 people who got the placebo first. So they are basing a causal inference on the difference between significant and not significant. D’oh!

To make it (even) worse, the “control” analysis had fewer subjects, hence less power, than the “treatment” analysis. So it’s possible that openness increased as much or even more in the placebo contrast as it did in the psilocybin contrast. (My hunch is that’s not what happened, but it’s not ruled out. They didn’t report the means.)

None of this means there is definitely no effect of psilocybin on Openness; it just means that the published paper doesn’t report an analysis that would answer that question. I hope the authors, or somebody else, come back with a better analysis. (A simple one would be a 2×2 ANOVA comparing pretest versus post-session-1 for the placebo-first versus psilocybin-first subjects. A slightly more involved analysis might involve a multilevel model that could take advantage of the fact that some subjects had multiple post-psilocybin measurements.)

Aside from the statistics, I had a few observations.

One thing you’d worry about with this kind of study – where the main DV is self-reported – is demand or expectancy effects on the part of subjects. I know it was double-blind, but they might have a good idea about whether they got psilocybin. My guess is that they have some pretty strong expectations about how shrooms are supposed to affect them. And these are people who volunteered to get dosed with psilocybin, so they probably had pretty positive expectations. I wouldn’t call the self-report issue a dealbreaker, but in a followup I’d love to see some corroborating data (like peer reports, ecological momentary assessments, or a structured behavioral observation of some kind).

On the other hand, they didn’t find changes in other personality traits. If the subjects had a broad expectation that psilocybin would make them better people, you would expect to see changes across the board. If their expectations were focused around Openness-related traits, that’s less relevant.

If you accept the validity of the measures, it’s also noteworthy that they didn’t get higher in neuroticism — which is not consistent with what the government tells you will happen if you take shrooms.

One of the most striking numbers in the paper is the baseline sample mean on NEO Openness — about 64. That is a T-score (normed [such as it is] to have a mean = 50, SD = 10). So that means that in comparison to the NEO norming sample, the average person in this sample was about 1.4 SDs above the mean — which is above the 90th percentile — in Openness. I find that to be a fascinating peek into who volunteers for a psilocybin study. (It does raise questions about generalizability though.)

Finally, because psilocybin was manipulated within subjects, the long-term (one year-ish) followup analysis did not have a control group. Everybody had been dosed. They predicted Openness at one year out based on the kinds of trip people reported (people who had a “complete mystical experience” also had the sustained increase in openness). For a much stronger inference, of course, you’d want to manipulate psilocybin between subjects.

Do not use what I am about to teach you

I am gearing up to teach Structural Equation Modeling this fall term. (We are on quarters, so we start late — our first day of classes is next Monday.)

Here’s the syllabus. (pdf)

I’ve taught this course a bunch of times now, and each time I teach it I add more and more material on causal inference. In part it’s a reaction to my own ongoing education and evolving thinking about causation, and in part it’s from seeing a lot of empirical work that makes what I think are poorly supported causal inferences. (Not just articles that use SEM either.)

Last time I taught SEM, I wondered if I was heaping on so many warnings and caveats that the message started to veer into, “Don’t use SEM.” I hope that is not the case. SEM is a powerful tool when used well. I actually want the discussion of causal inference to help my students think critically about all kinds of designs and analyses. Even people who only run randomized experiments could benefit from a little more depth than the sophomore-year slogan that seems to be all some researchers (AHEM, Reviewer B) have been taught about causation.

Modeling the Jedi Theory of Emotions

Today I gave my structural equation modeling class the following homework:

In Star Wars I: The Phantom Menace, Yoda presented the Jedi Theory of Emotions:  “Fear is the path to the dark side. Fear leads to anger. Anger leads to hate. Hate leads to suffering.”

1. Specify the Jedi Theory of Emotions as a path model with 4 variables (FEAR, ANGER, HATE, and SUFFERING). Draw a complete path diagram, using lowercase Roman letters (a, b, c, etc.) for the causal parameters.

2. Were there any holes or ambiguities in the Jedi Theory (as stated by Yoda) that required you to make theoretical assumptions or guesses? What were they?

3. Using the tracing rule, fill in the model-implied correlation matrix (assuming that all variables are standardized):

FEAR ANGER HATE SUFFERING
FEAR 1
ANGER 1
HATE 1
SUFFERING 1

4. Generate a plausible equivalent model. (An equivalent model is a model that specifies a different causal structure but implies the same correlation matrix.)

5. Suppose you run a study and collect data on these four variables. Your data gives you the following correlation matrix.

FEAR ANGER HATE SUFFERING
FEAR 1
ANGER .5 1
HATE .3 .6 1
SUFFERING .4 .3 .5 1

Is the Jedi Theory a good fit to the data? In what way(s), if any, would you revise the model?

Some comments…

For #1, everybody always comes up with a recursive, full mediation model — e.g., fear only causes hate via anger as an intervening cause, and there are no loops or third-variable associations between fear and hate, etc. It’s an opportunity to bring up the ambiguity of theories expressed in natural language: just because Yoda didn’t say “and anger can also cause fear sometimes too,” does that mean he’s ruling that out?

Relatedly, observational data will only give you unbiased causal estimates — of the effect of fear on anger, for example — if you assume that Yoda gave a complete and correct specification of the true causal structure (or if you fill in the gaps yourself and include enough constraints to identify the model). How much do you trust Yoda’s model? Questions 4 and 5 are supposed to help students to think about ways in which the model could and could not be falsified.

In a comment on an earlier post, I repeated an observation I once heard someone make, that psychologists tend to model all relationships as zero unless given reason to think otherwise, whereas econometricians tend to model all relationships as free parameters unless given reason to think otherwise. I’m not sure why that is the case (maybe a legacy of NHST in experimental psychology, where you’re supposed to start by hypothesizing a zero relationship and then look for reasons to reject that hypothesis). At any rate, if you think like an econometrician and come from the no true zeroes school of thought, you’ll need something more than just observational data on 4 variables in order to test this model. That makes the Jedi Theory a tough nut to crack. Experimental manipulation gets ethically more dubious as you proceed down the proposed causal chain. And I’m not sure how easy it would be to come up with good instruments for all of these variables.

I also briefly worried that I might be sucking the enjoyment out of the movie. But then I remembered that the quote is from The Phantom Menace, so that’s already been done.

Prepping for SEM

I’m teaching the first section of a structural equation modeling class tomorrow morning. This is the 3rd time I’m teaching the course, and I find that the more times I teach it, the less traditional SEM I actually cover. I’m dedicating quite a bit of the first week to discussing principles of causal inference, spending the second week re-introducing regression as a modeling framework (rather than a toolbox statistical test), and returning to causal inference later when we talk about path analysis and mediation (including assigning a formidable critique by John Bullock et al. coming out soon in JPSP).

The reason I’m moving in that direction is that I’ve found that a lot of students want to rush into questionable uses of SEM without understanding what they’re getting into. I’m probably guilty of having done that, and I’ll probably do it again someday, but I’d like to think I’m learning to be more cautious about the kinds of inferences I’m willing to make. To people who don’t know better, SEM often seems like magical fairy dust that you can sprinkle on cross-sectional observational data to turn it into something causally conclusive. I’ve probably been pretty far on the permissive end of the spectrum that Andrew Gelman talks about, in part because I think experimental social psychology sometimes overemphasizes internal validity to the exclusion of external validity (and I’m not talking about the special situations that Mook gets over-cited for). But I want to instill an appropriate level of caution.

BTW, I just came across this quote from Donald Campbell and William Shadish: “When it comes to causal inference from quasi-experiments, design rules, not statistics.” I’d considered writing “IT’S THE DESIGN, STUPID” on the board tomorrow morning, but they probably said it nicer.

Causality, genes, and the law

Ewen Callaway in New Scientist reports:

In 2007, Abdelmalek Bayout admitted to stabbing and killing a man and received a sentenced of 9 years and 2 months. Last week, Nature reported that Pier Valerio Reinotti, an appeal court judge in Trieste, Italy, cut Bayout’s sentence by a year after finding out he has gene variants linked to aggression. Leaving aside the question of whether this link is well enough understood to justify Reinotti’s decision, should genes ever be considered a legitimate defence?

Short answer: probably not.

Long answer: This reminds me of an issue I have with the Rubin Causal Model. In Holland’s 1986 paper on the RCM, he has a section titled “What can be a cause?” He introduces the notion of potential exposability – basically the idea that something can only be a cause if you could, in principle, manipulate it. He contrasts causes with attributes – features of individuals that are part of the definition of the individual. He uses as an example the statement, “She did well on the exam because she is a woman.” Gender can be statistically associated (correlated) with an outcome, but it cannot be a cause (according to Holland and I believe Rubin as well), because the person who did well on the exam would not be the same person if “she” weren’t a woman.

From a scientific/philosophical level, I’ve never liked the way they make the cause/attribute distinction. The RCM is so elegant and logical and principled, and then they tack on this very pragmatic and mushy issue of what can and cannot be manipulated. If technology changes to where something becomes manipulable, or if someone else thinks of a manipulation that escapes the researcher’s imagination (sex reassignment surgery?), things can shift back and forth from being classed as causes versus as attributes. Philosophically speaking: Blech. Plus, it leads to places I don’t really like. What about: “Jane didn’t get the job because she is a woman.” Is Holland saying that we cannot say that an applicant’s gender affected the employer’s hiring decision?

I think we just need to be better about defining the units and the nature of the counterfactuals. If we are trying to draw inferences about Jane, as she existed on a specific date and time and location, and therefore as a principled matter of defining the question (not as a pragmatic concern) we take as an a priori fact that Jane for the purposes of this problem has to be a woman, then okay, we’ve defined our problem space in a particular way that excludes “is a man” as a potential state of Jane. But if we are trying to draw inferences in which the units are exam-takers or job applicants, and Jane is one of many potential members of that population of units, then we’re dealing with a totally different question. In that case, we could have had either a man or a woman take the exam or apply for the job. Put another way: what is the counterfactual to Jane taking the exam or Jane applying for the job? If Jane could have been John for purposes of the problem that we are trying to solve, then it makes perfectly good sense to say that “Jane did well on the exam because she is a woman” is a coherent causal inference. It goes back to a principled matter of how we have defined the problem. Not a practical question of manipulability.

So back to the criminal… Holland (and Rubin) would make the question, “Is the MAOA-L variant a cause or an attribute?” And then they’d get into questions of whether you could manipulate that gene. And right now we cannot, so it’s an attribute; but maybe someday we’ll be able to, and then it’ll be a cause.

But I’d instead approach it by asking: what are the units, and what’s the counterfactual? To a scientist, it makes perfect sense to formulate a causal-inference problem in which the universe of units consists of all possible persons. Then we compare two persons whose genomes are entirely identical except for their MAOA variant, and we ask what the potential outcomes would be if one vs. the other was put in some situation that allows you to measure aggressive behavior. So the scientist gets to ask questions about MAOA causing aggression, because the scientist is drawing inferences about how persons behave, and MAOA is a variable across those units (generic persons).

But a court is supposed to ask different kinds of causal questions. The court judges the actual individual before it. And the units are potential or actual actions of that specific person as he existed on the day of the alleged crime. The units are not members of the generic category of persons. Thus, the court should not be considering what would happen if the real Abdelmalek Bayout had been replaced by a hypothetical almost-Bayout with a minutely different genome. A scientist can go there, but a court cannot. Rather, the court’s counterfactual is a different behavior from the very same real-world Abdelmalek Bayout, i.e., a Bayout who didn’t stab anybody on that day in 2007. And if Bayout had not stabbed anybody, there’d be no murder. But since he did, he caused a murder.

Addendum: it’s a totally different question of whether we want to hold all persons to the same standards. For example, we have the insanity defense. But there, it’s not a question of causality. In fact, defendants who plead insanity have to stipulate to the causal question (e.g. in a murder trial, they have to acknowledge that the defendant’s actions caused the death of another). The question before the court basically becomes a descriptive question — is this person sane or insane? — not a causal one.