Does the replication debate have a diversity problem?

Folks who do not have a lot of experiences with systems that don’t work well for them find it hard to imagine that a well intentioned system can have ill effects. Not work as advertised for everyone. That is my default because that is my experience.
– Bashir, Advancing How Science is Done

A couple of months ago, a tenured white male professor* from an elite research university wrote a blog post about the importance of replicating priming effects, in which he exhorted priming researchers to “Nut up or shut up.”

Just today, a tenured white male professor* from an elite research university said that a tenured scientist who challenged the interpretation and dissemination of a failed replication is a Rosa Parks, “a powerless woman who decided to risk everything.”

Well then.

The current discussion over replicability and (more broadly) improving scientific integrity and rigor is an absolutely important one. It is, at its core, a discussion about how scientists should do science. It therefore should include everybody who does science or has a stake in science.

Yet over the last year or so I have heard a number of remarks (largely in private) from scientists who are women, racial minorities, and members of other historically disempowered groups that they feel like the protagonists in this debate consist disproportionately of white men with tenure at elite institutions. Since the debate is over prescriptions for how science is to be done, it feels a little bit like the structurally powerful people shouting at each other and telling everybody else what to do.

By itself, that is enough to make people with a history of being disempowered wonder if they will be welcome to participate. And when the debate is salted with casually sexist language, and historically illiterate borrowing of other people’s oppression to further an argument — well, that’s going to hammer the point.

This is not a call for tenured white men to step back from the conversation. Rather, it is a call to bring more people in. Those of us who are structurally powerful in various ways have a responsibility to make sure that people from all backgrounds, all career stages, and all kinds of institutions are actively included and feel safe and welcome to participate. Justice demands it. That’s enough for me, but if you need a bonus, consider that including people with personal experience seeing well-intentioned systems fail might actually produce a better outcome.

—–

* The tenured and professor parts I looked up. White and male I inferred from social presentation.

What is counterintuitive?

Simine Vazire has a great post contemplating how we should evaluate counterintuitive claims. For me that brings up the question: what do we mean when we say something is “counterintuitive?”

First, let me say what I think counterintuitive isn’t. The “intuitive” part points to the fact that when we label something counterintuitive, we are usually not talking about contradicting a formal, well-specified theory. For example, you probably wouldn’t say that the double-slit experiment was “counterintuitive;” you’d say it falsified classical mechanics.

In any science, though, you have areas of inquiry where there is not an existing theory that makes precise predictions. In social and personality psychology that is the majority of what we are studying. (But it’s true in other sciences too, probably more than we appreciate.) Beyond the reach of formal theory, scientists develop educated guesses, hunches, and speculations based on their knowledge and experience. So the “intuitive” in counterintuitive could refer to the intuitions of experts.

But in social and personality psychology we study phenomena that regular people reflect on and speculate about too. A connection to everyday lived experience is almost definitional to our field, whether you think it is something that we should actively pursue or just inevitably creeps in. So we have an extra source of intuitions – the intuitions of people who are living the phenomena that we study. Which includes ourselves, since social and personality psychologists are all human beings too.

And when you are talking about something that (a) people reflect on and wonder about and (b) is not already well settled, then chances are pretty good that people have had multiple, potentially contradictory ideas about it. Sometimes different people having different ideas; sometimes the same person having different ideas at different times. The contradictory ideas might even have made their way into cultural wisdom – like “birds of a feather flock together” versus “opposites attract.”

What I suspect that means is that “counterintuitive” is often just a rhetorical strategy for writing introduction sections and marketing our work. No matter how your results turned out, you can convince your audience that they once thought the opposite. Because chances are very good that they did. A skilled writer can exploit the same mechanisms that lead to hindsight bias to set people up, and then surprise! show them that the results went the other way.

I would not claim that this describes all instances of counterintuitive, but I think it describes a lot of them. As Simine points out, many people in psychology say that counterintuitive findings are more valuable — so clearly there is an incentive to frame things that way. (Counterintuitive framing is also a great way to sell a lot of books.)

Of course, it does not have to be that way. After all, we are the field that specializes in measuring and explaining people’s intuitions. Why don’t we ask our colleagues to back up their claims of being “counterintuitive” with data? Describe the procedure fully and neutrally to a group of people (experts or nonexperts, depending whose intuitions you want to claim to be counter to) and ask what they think will happen. Milgram famously did that with his obedience experiments.

We should also revisit why we think “counterintuitive” is valuable. Sometimes it clearly is. For example, when intuition systematically leads people to make consequentially bad decisions it can be important to document that and understand why. But being counterintuitive for counterintuitive’s sake? If intuitions vary widely — and so do results, across contexts and populations — then we run the risk that placing too much value on counterintuitive findings will do more to incentive rhetorical flash than substantive discoveries.

What did Malcolm Gladwell actually say about the 10,000 hour rule?

A new paper out in Intelligence, from a group of authors led by David Hambrick, is getting a lot of press coverage for having “debunked” the 10,000-hour rule discussed in Malcolm Gladwell’s book Outliers. The 10,000-hour rule is — well, actually, that’s the point of this post: Just what, exactly, is the 10,000-hour rule?

The debate in Intelligence is between Hambrick et al. and researcher K. Anders Ericsson, who studies deliberate practice and expert performance (and wrote a rejoinder to Hambrick et al. in the journal). But Malcolm Gladwell interpreted Ericsson’s work in a popular book and popularized the phrase “the 10,000-hour rule.” And most of the press coverage mentions Gladwell.

Moreover, Gladwell has been the subject of a lot of discussion lately about how he interprets research and presents his conclusions. The 10,000-hour rule has become a runaway meme — there’s even a Macklemore song about it. And if you google it, you’ll find a lot of people talking about it and trying to apply it to their lives. The interpretations aren’t always the same, suggesting there’s been some interpretive drift in what people think the 10,000-hour rule really is. I read Outliers shortly after it came out, but my memory of it has probably been shaped by all of that conversation that has happened since. So I decided it would be interesting to go back to the source and take another look at what Gladwell actually said.

“The 10,000-Hour Rule” is the title of a chapter in Outliers. It weaves together a bunch of stories of how people became wildly successful. The pivotal moment where Gladwell lays out his thesis, the nut graf if you will, is this:

“For almost a generation, psychologists around the world have been engaged in a spirited debate over a question that most of us would consider to have been settled years ago. The question is this: is there such a thing as innate talent? The obvious answer is yes. Not every hockey player born in January ends up playing at the professional level. Only some do—the innately talented ones. Achievement is tal­ent plus preparation. The problem with this view is that the closer psychologists look at the careers of the gifted, the smaller the role innate talent seems to play and the bigger the role preparation seems to play.” (pp. 37-38)

This is classic Gladwell style — setting up the conventional wisdom and then knock it down. You might think X, but I’m going to show you it’s really not-X. In this case, what is the X that you might think? That there is such a thing as talent and that it matters for success. And Gladwell is promising to challenge that view. Zoom in and it’s laid bare:

“Achievement is tal­ent plus preparation. The problem with this view…”

Some Gladwell defenders have claimed he was just saying that talent isn’t enough by itself and preparation matters too. But that would be a pretty weak assertion for a bestselling book. I mean, who doesn’t think that violin prodigies or hockey players need to practice? And it is clear Gladwell is going for something more extreme than that. “Achievement is talent plus preparation” is not Gladwell’s thesis. To the contrary, that is the conventional wisdom that Gladwell is promising to overturn.

Gladwell then goes on to tell a bunch of stories of successful people who practiced a lot lot lot before they became successful. But that line of argument can only get you so far. Preparation and talent are not mutually exclusive. So saying “preparation matters” over and over really tells you nothing about whether talent matters too. And the difficulty for Gladwell is that, try as he might, he cannot avoid acknowledging a place for talent too. To deny that talent exists and matters would be absurd in the face of both common sense and hard data. And Gladwell can’t go that far:

“If we put the stories of hockey players and the Beatles and Bill Joy and Bill Gates together, I think we get a more com­plete picture of the path to success. Joy and Gates and the Beatles are all undeniably talented. Lennon and McCart­ney had a musical gift of the sort that comes along once in a generation, and Bill Joy, let us not forget, had a mind so quick that he was able to make up a complicated algorithm on the fly that left his professors in awe. That much is obvious.” (p. 55)

So “a more complete picture of the path to success” says that talent exists and it matters — a lot. It is actually a big deal if you have a “gift of the sort that comes along once in a generation.” So we are actually back to the conventional wisdom again: Achievement is talent plus preparation. Sure, Gladwell emphasizes the preparation piece in his storytelling. But that difference in emphasis tells us more about what is easier to narrate (nobody is ever going to make an 80’s-style montage about ACE models) than about which is actually the stronger cause. So after all the stories, it looks an awful lot like the 10,000-hour rule is just the conventional wisdom after all.

But wait! In the very next paragraph…

“But what truly distinguishes their histories is not their extraordinary talent but their extraordinary oppor­tunities.” (p. 55)

“Opportunities” doesn’t sound like talent *or* preparation. What’s that about?

This, I think, has been missing from a lot of the popular discussion about the 10,000-hour rule. Narrowly, the 10,000-hour rule is about talent and preparation. But that overlooks the emphasis in Outliers on randomness and luck — being in the right place and the right time. So you might expand the formula: “Achievement is talent plus preparation plus luck.”

Only Gladwell wants his conclusion to be simpler than the conventional wisdom, not more complicated. So he tries to equate luck with preparation, or more precisely with the opportunity to prepare. Be born in the right era, live in the right place, and maybe you’ll get a chance to spend 10,000 hours getting good at something.

The problem with simplifying the formula rather than complicating it is that you miss important things. Gladwell’s point is that you need opportunities to prepare — you can’t become a computer whiz unless you have access to a computer to tinker with (10,000 hours worth of access, to be precise). He notes that a lot of wealthy and famous computer innovators, like Bill Gates, Paul Allen, and Steve Jobs, were born in 1954 or 1955. So when personal computing took off they were just the right age to get to mess around with computers: old enough to start businesses, young enough and unattached enough to have the time to sink into something new and uncertain. Gladwell concludes that the timing of your birth is a sort of cosmically random factor that affects whether you’ll be successful.

But not all opportunities are purely random — in many domains, opportunities are more likely to come to people who are talented or prepared or both. If you show some early potential and dedication to hockey or music, people are more likely to give you a hockey stick or a violin. Sure, you have to live in a time and place where hockey sticks or violins exist, but there’s more to it than that.

And let us not forget one of the most important ways that people end up in the right place at the right time: privilege (turns out Macklemore has a song about that too). The year that Gates, Allen, and Jobs were all born in 1954-55 may be random in some cosmic sense. But the fact that they are all white dudes from America suggests some sort of pattern, at least to me. Gladwell tells a story about how Bill Hewlett gave a young Steve Jobs spare computer parts to tinker with. The story is told like it’s a lucky opportunity for Jobs, and in a sense it is. But I wonder what would have happened if a poor kid from East Palo Alto had asked Hewlett for the same thing.

So now we are up to 4 things: talent, preparation, luck, and privilege. They all matter, they all affect each other, and I am sure we could add to the list. And you could go even deeper and start questioning the foundations of how we have carved up our list of variables (just what do we mean by “innate talent” anyway, and is it the same thing — innate in the same way — for everybody?). That would be an even more complete picture of the path to success. Not an easy story to tell, I know, but maybe a better one.

In which we admire a tiny p with complete seriousness

A while back a colleague forwarded me this quote from Stanley Schachter (yes that Stanley Schachter):

“This is a difference which is significant at considerably better than the p < .0001 level of confidence. If, in reeling off these zeroes, we manage to create the impression of stringing pearls on a necklace, we rather hope the reader will be patient and forbearing, for it has been the very number of zeros after this decimal point that has compelled us to treat these data with complete seriousness.”

The quote comes from a chapter on birth order in Schachter’s 1959 book The Psychology of Affiliation. The analysis was a chi-square test on 76 subjects. The subjects were selected from 3 different experiments for being “truly anxious” and combined for this analysis. True anxiety was determined if the subject scored at one or the other extreme endpoint of an anxiety scale (both complete denial and complete admission were taken to mean that the subject is “truly anxious”), and/or if the subject discontinued participation because the experiment made them feel too anxious.

Let’s talk about diversity in personality psychology

In the latest issue of the ARP newsletter, Kelci Harris writes about diversity in ARP. You should read the whole thing. Here’s an excerpt:

Personality psychology should be intrinsically interesting to everyone, because, well, everyone has a personality. It’s accessible and that makes our research so fun and an easy thing to talk about with non-psychologists, that is, once we’ve explained to them what we actually do. However, despite what could be a universal appeal, our field is very homogenous. And that’s too bad, because diversity makes for better science. Good research comes from observations. You notice something about the world, and you wonder why that is. It’s probably reasonable to guess that most members of our field have experienced the world in a similar way due to their similar demographic backgrounds. This similarity in experience presents a problem for research because it makes us miss things. How can assumptions be challenged when no one realizes they are being made? What kind of questions will people from different backgrounds have that current researchers could never think of because they haven’t experienced the world in that way?

 In response, Laura Naumann posted a letter to the ARP Facebook wall. Read it too. Another excerpt:

I challenge our field to begin to view those who conduct this type of research [on underrepresented groups] as contributing work that is EQUAL TO and AS IMPORTANT AS “traditional” basic research in personality and social psychology. First, this will require editors of “broad impact” journals to take a critical eye to their initial review process in evaluating what manuscripts are worthy of being sent out to reviewers. I’ve experienced enough frustration sending a solid manuscript to a journal only to have it quickly returned praising the work, but suggesting resubmission to a specialty journal (e.g., ethnic minority journal du jour). The message I receive is that my work is not interesting enough for broad dissemination. If we want a more welcoming field on the personal level, we need to model a welcoming field at the editorial level.

This is a discussion we need to be having. Big applause to Kelci and Laura for speaking out.

Now, what should we be doing? Read what Kelci and Laura wrote — they both have good ideas.

I’ll add a much smaller one, which came up in a conversation on my Facebook wall: let’s collect data. My impressions of what ARP conferences look like are very similar to Kelci’s, but not all important forms of diversity are visible, and if we had hard data we wouldn’t have to rely on impressions. How are the members and conference attendees of ARP and other personality associations distributed by racial and ethnic groups, gender, sexual orientation, national origin, socioeconomic background, and other important dimensions? How do those break down by career stage? And if we collect data over time, is better representation moving up the career ladder, or is the pipeline leaking? I hope ARP will consider collecting this data as part of the membership and conference registration processes going forward, and releasing aggregate numbers. (Maybe they already collect this, but if so, I cannot recall ever seeing any report of it.) With data we will have a better handle on what we’re doing well and what we could be doing better.

What else should we be doing — big or small? This is a conversation that is long overdue and that everybody should be involved in. Let’s have it.

An interesting study of why unstructured interviews are so alluring

A while back I wrote about whether grad school admissions interviews are effective. Following up on that, Sam Gosling recently passed along an article by Dana, Dawes, and Peterson from the latest issue of Judgment and Decision Making:

Belief in the unstructured interview: The persistence of an illusion

Unstructured interviews are a ubiquitous tool for making screening decisions despite a vast literature suggesting that they have little validity. We sought to establish reasons why people might persist in the illusion that unstructured interviews are valid and what features about them actually lead to poor predictive accuracy. In three studies, we investigated the propensity for “sensemaking” – the ability for interviewers to make sense of virtually anything the interviewee says—and “dilution” – the tendency for available but non-diagnostic information to weaken the predictive value of quality information. In Study 1, participants predicted two fellow students’ semester GPAs from valid background information like prior GPA and, for one of them, an unstructured interview. In one condition, the interview was essentially nonsense in that the interviewee was actually answering questions using a random response system. Consistent with sensemaking, participants formed interview impressions just as confidently after getting random responses as they did after real responses. Consistent with dilution, interviews actually led participants to make worse predictions. Study 2 showed that watching a random interview, rather than personally conducting it, did little to mitigate sensemaking. Study 3 showed that participants believe unstructured interviews will help accuracy, so much so that they would rather have random interviews than no interview. People form confident impressions even interviews are defined to be invalid, like our random interview, and these impressions can interfere with the use of valid information. Our simple recommendation for those making screening decisions is not to use them.

It’s an interesting study. In my experience people’s beliefs in unstructured interviews are pretty powerful — hard to shake even when you show them empirical evidence.

I did have some comments on the design and analyses:

1. In Studies 1 and 2, each subject made a prediction about absolute GPA for 1 interviewee. So estimates of how good people are at predicting GPA from interviews are based on entirely between-subjects comparisons. It is very likely that a substantial chunk of the variance in predictions will be due to perceiver variance — differences between subjects in their implicit assumptions about how GPA is distributed. (E.g., Subject 1 might assume most GPAs range from 3 to 4, whereas Subject 2 assumes most GPAs range from 2.3 to 3.3. So even if they have the same subjective impression of the same target — “this person’s going to do great this term” — their numerical predictions might differ by a lot.) That perceiver variance would go into the denominator as noise variance in this study, lowering the interviewers’ predictive validity correlations.

Whether that’s a good thing or a bad thing depends on what situation you’re trying to generalize to. Perceiver variance would contribute to errors in judgment when each judge makes an absolute decision about a single target. On the other hand, in some cases perceivers make relative judgments about several targets, such as when an employer interviews several candidates and picks the best one. In that setting, perceiver variance would not matter, and a study with this design could underestimate accuracy.

2. Study 1 had 76 interviewers spread across 3 conditions (n = 25 or 26 per condition), and only 7 interviewees (each of whom was rated by multiple interviewers). Based on 73 degrees of freedom reported for the test of the “dilution” effect, it looks like they treated interviewer as the unit of analysis but did not account for the dependency in interviewees. Study 2 looked to have similar issues (though in Study 2 the dilution effect was not significant.)

3. I also had concerns about power and precision of the estimates. Any inferences about who makes better or worse predictions will depend a lot on variance among the 7 interviewees whose GPAs were being predicted (8 interviewees in study 2). I haven’t done a formal power analysis, but my intuition is that that’s pretty small. You can see a possible sign of this in one key difference between the studies. In Study 1, the correlation between the interviewees’ prior GPA and upcoming GPA was r = .65, but in Study 2 it was r = .37. That’s a pretty big difference between estimates of a quantity that should not be changing between studies.

So it’s an interesting study but not one that can give answers I’d call definitive. If that’s well understood by readers of the study, I’m okay with that. Maybe someone will use the interesting ideas in this paper as a springboard for a larger followup. Given the ubiquity of unstructured interviews, it’s something we need to know more about.

The hotness-IQ tradeoff in academia

The other day I came across a blog post ranking academic fields by hotness. Important data for sure. But something about it was gnawing on me for a while, some connection I wasn’t quite making.

And then it hit me. The rankings looked an awful lot like another list I’d once seen of academic fields ranked by intelligence. Only, you know, upside-down.

Sure enough, when I ran the correlation among the fields that appear on both lists, it came out at r = -.45.

hotness-iq

I don’t know what this means, but it seems important. Maybe a mathematician or computer scientist can help me understand it.