On base rates and the “accuracy” of computerized Facebook gaydar

I never know what to make of reports stating the “accuracy” of some test or detection algorithm. Take this example, from a New York Times article by Steve Lohr titled How Privacy Vanishes Online:

In a class project at the Massachusetts Institute of Technology that received some attention last year, Carter Jernigan and Behram Mistree analyzed more than 4,000 Facebook profiles of students, including links to friends who said they were gay. The pair was able to predict, with 78 percent accuracy, whether a profile belonged to a gay male.

I have no idea what “78 percent accuracy” means in this context. The most obvious answer would seem to be that of all 4,000 profiles analyzed, 78% were correctly classified as gay versus not gay. But if that’s the case, I have an algorithm that beats the pants off of theirs. Are you ready for it?

Say that everybody is not gay.

Figure that around 5 to 10 percent of the population is gay. If these 4,000 students are representative of that, then saying not gay every time will yield an “accuracy” of 90-95%.

But wait — maybe by “accuracy” they mean what percentage of gay people are correctly identified as such. In that case, I have an algorithm that will be 100% accurate by that standard. Ready?

Say that everybody is gay.

You can see how silly this gets. To understand how good the test is, you need two numbers: sensitivity and specificity. My algorithms each turn out to be 100% on one and 0% on the other. Which means that they’re both crap. (A good test needs to be high on both.) I am hoping that the MIT class’s algorithm was a little better, and the useful numbers just didn’t get translated. But this news report tells us nothing that we need to know to evaluate it.

Say it again

When students learn writing, they often are taught that if you have to say the same kind of thing more than once, word things in a slightly different way each time. The idea is to add interest through variety.

But when I work with psychology students on their writing, I often have to work hard to break them of that habit. In scientific writing, precision and clarity are the most important. This doesn’t mean that scientific writing cannot also be elegant and interesting (the vary-the-wording strategy is often just a cheap trick anyhow). But your first priority is to make sure that your reader knows exactly what you mean.

Problems arise when journalists trained in vary-the-wording write about statistics. Small thing, but take this sentence from a Slate piece (in the oft-enlightening Explainer column) about the Fort Hood shooting:

Studies have shown that the suicide rate among male doctors is 40 percent higher than among men overall and that female doctors take their own lives at 130 percent the rate of women in general.

The same comparison is being made for men and for women: how does the suicide rate among doctors compare to the general population? But the numbers are not presented in parallel. For men, the number presented is 40, as in “40 percent higher than” men in general. For women, the number is 130, as in “130 percent the rate of” women in general.

The prepositions are the tipoff that the writer is doing different things, and a careful reader can probably figure that out. But the attempt to add variety just bogs things down. A reader will have to slow down and possibly re-read once or twice to figure out that 40% and 130% are both telling us that doctors commit suicide more often than others.

Separately: why break it out by gender? In context, the writer is trying to make a point about doctors versus everybody else. Not male doctors versus female doctors. We often reflexively categorize things by gender (I’m using “we” in a society-wide sense) when it’s unnecessary and uninformative.

Improving the grant system ain’t so easy

Today’s NY Times has an article by Gina Kolata about how the National Cancer Institute plays it safe with grant funding. The main point of the article is that NCI funds too many “safe” studies — studies that promise a high probability of making a modest, incremental discovery. This is done at the expense of more speculative and exploratory studies that take bigger risks but could lead to greater leaps in knowledge.

The article, and by and large the commenters on it, seem to assume that things would be better if the NCI funded more high-risk research. Missing is any analysis of what might be the downsides of adopting such a strategy.

By definition, a high-risk proposal has a lower probabilty of producing usable results. (That’s what people mean by “risk” in this context.) So for every big breakthrough, you’d be funding a larger number of dead ends. That raises three problems: a substantive policy problem, a practical problem, and a political problem.

1. The substantive problem is in knowing what would be the net effect of changing the system. If you change the system so that you invest grant dollars in research that pays off half as often, but when it does the findings are twice as valuable, it’s a wash — you haven’t made things better or worse overall. So it’s a problem of adjusting the system to optimize the risk X reward payoffs. I’m not saying the current situation is optimal; but nobody is presenting any serious analysis of whether an alternative investment strategy would be better.

2. The practical problem is that we would have to find some way to choose among high-risk studies. The problem everybody is pointing to is that in the current system, scientists have to present preliminary studies, stick to incremental variations on well-established paradigms, reassure grant panels that their proposal is going to pay off, etc. Suppose we move away from that… how would you choose amongst all the riskier proposals?

People like to point to historical breakthroughs that never would have been funded by a play-it-safe NCI. But it may be a mistake to believe those studies would have been funded by a take-a-risk NCI, because we have the benefit of hindsight and a great deal of forgetting. Before the research was carried out — i.e., at the time it would have been a grant proposal — every one of those would-be-breakthrough proposals would have looked just as promising as a dozen of their contemporaries that turned out to be dead-ends and are now lost to history. So it’s not at all clear that all of those breakthroughs would have been funded within a system that took bigger risks, because they would have been competing against an even larger pool of equally (un)promising high-risk ideas.

3. The political problem is that even if we could solve #1 and #2, we as a society would have to have the stomach for putting up with a lot of research that produces no meaningful results. The scientific community, politicians, and the general public would have to be willing to constantly remind themselves that scientific dead ends are not a “waste” of research dollars — they are the inevitable consequence of taking risks. There would surely be resistance, especially at the political level.

So what’s the solution? I’m sure there could be some improvements made within the current system, especially in getting review panels and program officers to reorient to higher-risk studies. But I think the bigger issue has to do with the overall amount of money available. As the top-rated commenter on Kolata’s article points out, the FY 2010 defense appropriation is more than 6 times what we have spent at NCI since Nixon declared a “war” on cancer 38 years ago. If you make resources scarce, of course you’re going to make people cautious about how they invest those resources. There’s a reason angel investors are invariably multi-millionnaires. If you want to inspire the scientific equivalent of angel investing, then the people giving out the money are going to have to feel like they’ve got enough money to take risks with.

Taking aim at evolutionary psychology

Sharon Begley has a doozy of an article in Newsweek taking aim at evolutionary psychology. The article is a real mixed bag and is already starting to generate vigorous rebuttals.

As background, the term “evolutionary psychology” tends to confuse outsiders because it sounds like a catchall for any approach to psychology that incorporates evolutionary theory and principles. But that’s not how it’s used by insiders. Rather, evolutionary psychology (EP) refers to one specific way (of many) of thinking about evolution and human behavior. (This article by Eric Alden Smith contrasts EP with other evolutionary approaches.) EP can be differentiated from other evolutionary approaches on at least 3 different levels. There are the core scientific propositions, assumptions, and methods that EPs use. There are the particular topics and conclusions that EP has most commonly been associated with. And there is a layer of politics and extra-scientific discourse regarding how EP is discussed and interpreted by its proponents, its critics, and the media.

Begley makes clear that EP is not the only way of applying evolutionary principles to understanding human behavior. (In particular, she contrasts it with human behavioral ecology). Thus, hopefully most readers won’t take this as a ding on evolutionary theory broadly speaking. But unfortunately, she cherrypicks her examples and conflates the controversies at different levels — something that I suspect is going to drive the EP folks nuts.

At the core scientific level, one of the fundamental debates is over modularity versus flexibility. EP posits that the ancestral environment presented our forebears with specific adaptive problems that were repeated over multiple generations, and as a result we evolved specialized cognitive modules that help us solve those problems. Leda Cosmides’s work on cheater detection is an example of this — she has proposed that humans have specialized cognitive mechanisms for detecting when somebody isn’t holding up their obligations in a social exchange. Critics of EP argue that our ancestors faced a wide and unpredictable range of adaptive problems, and as a result our minds are more flexible — for example they say that we detect cheaters by applying a general capacity for reasoning, not through specialized cheater-detecting skills. This is an important, serious scientific debate with broad implications.

Begley discusses the modularity versus flexibility debate — and if her article stuck to the deep scientific issues, it could be a great piece of science journalism. But it is telling what topics and examples she uses to flesh out her arguments. Cosmides’s work on cheater detection would have been a great topic to focus on: Cosmides has found support across multiple methods and levels of analysis, and at the same time critics like David Buller have presented serious challenges. That could have made for a thoughtful but still dramatic presentation. But Begley never mentions cheater detection. Instead, she picks examples of proposed adaptations that (a) have icky overtones, like rape or the abuse of stepchildren; and (b) do not have widespread support even among EPs. (Daly and Wilson, the researchers who originally suggested that stepchild abuse might be an adaptation, no longer believe that the evidence supports that conclusion.) Begley wants to leave readers with the impression that EP claims are falling apart left and right because of fundamental flaws in the underlying principles (as opposed to narrower instances of particular arguments or evidence falling through). To make her case, she cherrypicks the weakest and most controversial claims. She never mentions less-controversial EP research on topics like decision-making, emotions, group dynamics, etc.

Probably the ugliest part of the article is the way that Begley worms ad hominem attacks into her treatment of the science, and then accuses EPs of changing topics when they defend themselves. A major point of Begley’s is that EP is used to justify horrific behavior like infidelity, rape, and child abuse. Maybe the findings are sometimes used that way — but in my experience that is almost never done by the scientists themselves, who are well aware of the difference between “is” and “ought.” (If Begley wants to call somebody out on committing the naturalistic fallacy, she should be taking aim at mass media, not science.) Begley also seems to play a rhetorical “I’m not touching you” baiting game. Introducing EP research on jealousy she writes, “Let’s not speculate on the motives that (mostly male) evolutionary psychologists might have in asserting that their wives are programmed to not really care if they sleep around…” Then amazingly a few paragraphs later she writes, “Evolutionary psychologists have moved the battle from science, where they are on shaky ground, to ideology, where bluster and name-calling can be quite successful.” Whahuh? Who’s moving what battle now?

The whole thing is really unfortunate, because evolutionary psychology deserves serious attention by serious science journalists (which Begley can sometimes be). David Buller’s critique a few years ago raised some provocative challenges and earned equally sharp rebuttals, and the back-and-forth continues to reverberate. That makes for a potentially gripping story. And EP claims frequently get breathless coverage and oversimplified interpretations in the mass media, so a nuanced and thoughtful treatment of the science (with maybe a little media criticism thrown in) would play a needed corrective role. I’m no EP partisan — I tend to take EP on a claim-by-claim basis, and I find the evidence for some EP conclusions to be compelling and others poorly supported. I just wish the public was getting a more informative and more scientifically grounded view of the facts and controversies.

Newsflash: TV news sucks. Film at 11.

A study of medical news reporting in Australian media has reached the following conclusions:

  • In general, news outlets don’t do a great job of reporting medical research.
  • “Broadsheet” newspapers (vs. tabloids; or what we in America call “newspapers, you know, but not the crappy kind”) do relatively better than other media formats, with 58% of stories being considered satisfactory.
  • Online news sites lag behind print media but are catching up.
  • TV news does the worst job.

Oh, that explains it

A new study by Timothy Salthouse adds to the body of work suggesting that raw cognitive performance begins to decline in early adulthood.

News reports are presenting the basic age pattern as a new finding. It’s not, or at least it’s not new in the way it’s being portrayed. The idea that fluid intelligence peaks in the 20s and then declines has been around for a while. I remember learning it as an undergrad. I teach it in my Intro classes.

So why is a new study being published? Because the research, reported in Neurobiology of Aging, tries to tease apart some thorny methodological problems in estimating how mental abilities change with age.

If you simply compare different people of different ages (a cross-sectional design), you don’t know if the differences are because of what happens to people as they get older, or instead because of cohort effects (i.e., generational differences). In other words, maybe members of more recent generations do better at these tasks by virtue of better schooling, better early nutrition, or something like that. In that case, apparent differences between old people and young people might have nothing to do with the process of getting older per se.

To avoid cohort effects, you could follow the same people over time (a longitudinal design). However, if you do that you have to worry about something else — practice effects. The broad underlying ability may be declining, but people might be getting “test-smart” if you give them the same (or similar) tests again and again, which would mask any true underlying decline.

As a result of different findings obtained with different methods, there was a majority view among researchers that fluid performance starts to decline in early adulthood, but also a significant minority view that that declines happen later.

What Salthouse did was to look at cross-sectional and longitudinal data side-by-side in a way that allowed him to estimate the age trajectory after accounting for both kinds of biases. In principle, this should yield more precise estimates than previous studies about the particular shape of the trend. Based on the combined data, Salthouse concluded that the early-adulthood peak was more consistent with the evidence.

It’s understandable, but unfortunate, that the media coverage isn’t going into this level of nuance. Science is incremental, and this study is a significant contribution (though by no means the last word). But news stories often have a set narrative – the lone scientist having a “eureka!” moment with a shattering breakthrough that “proves” his theory. Science doesn’t work that way, but that’s the way it’s usually covered.