Everything is fucked: The syllabus

PSY 607: Everything is Fucked
Prof. Sanjay Srivastava
Class meetings: Mondays 9:00 – 10:50 in 257 Straub
Office hours: Held on Twitter at your convenience (@hardsci)

In a much-discussed article at Slate, social psychologist Michael Inzlicht told a reporter, “Meta-analyses are fucked” (Engber, 2016). What does it mean, in science, for something to be fucked? Fucked needs to mean more than that something is complicated or must be undertaken with thought and care, as that would be trivially true of everything in science. In this class we will go a step further and say that something is fucked if it presents hard conceptual challenges to which implementable, real-world solutions for working scientists are either not available or routinely ignored in practice.

The format of this seminar is as follows: Each week we will read and discuss 1-2 papers that raise the question of whether something is fucked. Our focus will be on things that may be fucked in research methods, scientific practice, and philosophy of science. The potential fuckedness of specific theories, research topics, etc. will not be the focus of this class per se, but rather will be used to illustrate these important topics. To that end, each week a different student will be assigned to find a paper that illustrates the fuckedness (or lack thereof) of that week’s topic, and give a 15-minute presentation about whether it is indeed fucked.


20% Attendance and participation
30% In-class presentation
50% Final exam

Week 1: Psychology is fucked

Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66, 195-244.

Week 2: Significance testing is fucked

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312.

Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E. J. (2016). Is there a free lunch in inference? Topics in Cognitive Science, 8, 520-547.

Week 3: Causal inference from experiments is fucked

Chapter 3 from: Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

Week 4: Mediation is fucked

Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism?(don’t expect an easy answer). Journal of Personality and Social Psychology, 98, 550-558.

Week 5: Covariates are fucked

Culpepper, S. A., & Aguinis, H. (2011). Using analysis of covariance (ANCOVA) with fallible covariates. Psychological Methods, 16, 166-178.

Westfall, J., & Yarkoni, T. (2016). Statistically controlling for confounding constructs is harder than you think. PloS one, 11, e0152719.

Week 6: Replicability is fucked

Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531-536.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Week 7: Interlude: Everything is fine, calm the fuck down

Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “Estimating the reproducibility of psychological science.” Science, 251, 1037a.

Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? American Psychologist, 70, 487-498.

Week 8: Scientific publishing is fucked

Fanelli, D. (2011). Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891-904.

Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Med, 2, e124.

Week 9: Meta-analysis is fucked

Inzlicht, M., Gervais, W., & Berkman, E. (2015). Bias-Correction Techniques Alone Cannot Determine Whether Ego Depletion is Different from Zero: Commentary on Carter, Kofler, Forster, & McCullough, 2015. Available at SSRN: http://ssrn.com/abstract=2659409 or http://dx.doi.org/10.2139/ssrn.2659409

Van Elk, M., Matzke, D., Gronau, Q. F., Guan, M., Vandekerckhove, J., & Wagenmakers, E. J. (2015). Meta-analyses are no substitute for registered replications: A skeptical perspective on religious priming. Frontiers in Psychology, 6.

Week 10: The scientific profession is fucked

Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7, 543-554.

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7, 615-631.

Finals week

Wear black and bring a #2 pencil.

Apparently I’m on a blogging break

I just noticed that I haven’t posted in over a month. Don’t fear, loyal readers (am I being presumptuous with that plural? hi Mom!). I haven’t abandoned the blog, apparently I’ve just been too busy or preoccupied to flesh out any coherent thoughts.

So instead, here are some things that, over the last month, I’ve thought about posting but haven’t summoned up the wherewithal to turn into anything long enough to be interesting:

  • Should psychology graduate students routinely learn R in addition to, or perhaps instead of, other statistics software? (I used to think SPSS or SAS was capable enough for the modal grad student and R was too much of a pain in the ass, but I’m starting to come around. Plus R is cheaper, which is generally good for graduate students.)
  • What should we do about gee-whiz science journalism covering social neuroscience that essentially reduces to, “Wow, can you believe that X happens in the brain?” (Still working on that one. Maybe it’s too deeply ingrained to do anything.)
  • Reasons why you should read my new commentary in Psychological Inquiry. (Though really, if it takes a blog post to explain why an article is worth reading, maybe the article isn’t worth reading. I suggest you read it and tell me.)
  • A call for proposals for what controversial, dangerous, or weird research I should conduct now that I just got tenure.
  • Is your university as sketchy as my university? (Okay, my university probably isn’t really all that sketchy. And based on the previous item, you know I’m not just saying that to cover my butt.)
  • My complicated reactions to the very thought-provoking Bullock et al. “mediation is hard” paper in JPSP.

Our spring term is almost over, so maybe I’ll get to one of these sometime soon.

Prepping for SEM

I’m teaching the first section of a structural equation modeling class tomorrow morning. This is the 3rd time I’m teaching the course, and I find that the more times I teach it, the less traditional SEM I actually cover. I’m dedicating quite a bit of the first week to discussing principles of causal inference, spending the second week re-introducing regression as a modeling framework (rather than a toolbox statistical test), and returning to causal inference later when we talk about path analysis and mediation (including assigning a formidable critique by John Bullock et al. coming out soon in JPSP).

The reason I’m moving in that direction is that I’ve found that a lot of students want to rush into questionable uses of SEM without understanding what they’re getting into. I’m probably guilty of having done that, and I’ll probably do it again someday, but I’d like to think I’m learning to be more cautious about the kinds of inferences I’m willing to make. To people who don’t know better, SEM often seems like magical fairy dust that you can sprinkle on cross-sectional observational data to turn it into something causally conclusive. I’ve probably been pretty far on the permissive end of the spectrum that Andrew Gelman talks about, in part because I think experimental social psychology sometimes overemphasizes internal validity to the exclusion of external validity (and I’m not talking about the special situations that Mook gets over-cited for). But I want to instill an appropriate level of caution.

BTW, I just came across this quote from Donald Campbell and William Shadish: “When it comes to causal inference from quasi-experiments, design rules, not statistics.” I’d considered writing “IT’S THE DESIGN, STUPID” on the board tomorrow morning, but they probably said it nicer.

When you have an interaction, which variable moderates which?

I was talking recently with a colleague about interpreting moderator effects, and the question came up: when you have a 2-way interaction between A and B, how do you decide whether to say that A moderates B versus B moderates A?

Mathematically, of course, A*B = B*A, so the underlying math is indifferent. I was schooled in the Baron and Kenny approach to moderation and mediation. I’ve never found any hard and fast rules in any of Kenny’s writing on the subject (if I’ve missed any, please let me know in the comments section). B&K talk about the moderator moderating the “focal” variable, and I’ve always taken that to be an interpretive choice by the researcher. If the researcher’s primary goal is to understand how A affects Y, and in the researcher’s mind B is some other interesting variable across which the A->Y relationship might vary, then B is the moderator. And vice versa. And to me, it’s entirely legitimate to talk about the same analysis in different ways — it’s a framing issue rather than a deep substantive issue.

However, my colleague has been trying to apply Kraemer et al.’s “MacArthur framework” and has been running into some problems. One of the MacArthur rules is that the variable you call the moderator (M) is the one that comes first, since (in their framework) the moderator always temporally precedes the treatment (T). But in my colleague’s study the ordering is not clear. (I believe that in my colleague’s study, the variables in question meet all of Kraemer’s other criteria for moderation — e.g., they’re uncorrelated — but they were measured at the same timepoint in a longitudinal study. Theoretically it’s not clear which one “would have” come first. Does it come down to which one came first in the questionnaire packet?)

I’ll admit that I’ve looked at Kraemer et al.’s writing on mediation/moderation a few times and it’s never quite resonated with me — they’re trying to make hard-and-fast rules for choosing between what, to me, seem like 2 legitimate alternative interpretations. (I also don’t really grok their argument that a significant interaction can sometimes be interpreted as mediation — unless it’s “mediated moderation” in Kenny-speak — but that’s a separate issue.) I’m curious how others deal with this issue…

When NOT to run a randomized experiment

Just came across a provocative article about the iatrogenic effects of self-help cognitive-behavioral therapy (CBT) books:

Self-help books based on the traditional principles of CBT, including popular titles like ‘CBT for Dummies’, can do more harm than good, according to a new study. The risks were highest for readers described as ‘high ruminators’ – those who spend time mulling over the likely causes and consequence of their negative moods.

The gist of the research (by Gerald Haeffel and colleagues) is that in some people’s hands — specifically, people prone to engage in rumination — self-guided CBT techniques can exacerbate depressive symptoms. In CBT, clients are often taught to pay attention to their negative thoughts so they can recognize and change them. But ruminators are already excessively focused on negative thoughts, which is why they are at higher risk for depression. Just following a book without the help of a dedicated therapist, ruminators may be encouraged to ruminate even more, without acquiring the skills to take the next step of challenging and altering those thought patterns.

What’s interesting from a research-design perspective is that this finding comes from a study that crossed a randomized manipulation (giving people traditional CBT self-help books vs. 2 control conditions) with a person variable (individual differences in a proneness to rumination) and found a meaningful statistical interaction. As such, it is able to identify a causal process that is stronger within a subset of the population.

What this design doesn’t tell us, though, is about the real-world effects. Experimental randomization means that high and low ruminators were equally likely to get the CBT books. In the real world we cannot assume this would be the case. If ruminators are more likely than non-ruminators to seek out these kinds of books — maybe they seek out books that are compatible with their existing cognitive tendencies — then the problem would be even worse than the experiment suggests. On the other hand, if ruminators are less likely to seek out CBT-based self-help books (maybe recognizing that the advice inside isn’t going to help them), then self-selection would mitigate the real-world effects.

So a useful followup study to complement this work would be an observational design, in which high- and low-ruminators were allowed to select among books with and without the harmful CBT components, and you could model whether such self-selection mediates effects on depressive symptoms.