CITI is still misrepresenting Milgram’s obedience research

Unbelievable.

Two years ago, after taking the Collaborative Institutional Training Initiative (CITI) online ethics training course required by my institution, I wrote to them to object to the way Milgram’s obedience research was characterized in their “History and Ethics” module. Short version: they compared Milgram’s research to Nazi medical experiments and the Tuskegee syphilis study.

In response to my email, I got a very nice sounding reply from a CITI staff member. Quoting from her email:

I agree with you.

The module was adapted from a module written for biomedical researchers. When it was adopted, in order to make it more relevant for researchers in the social and behavioral sciences, the writers simply added cases that seemed more relevant. The important distinctions you note were not made.

I would like to revise the module completely and the only obstacle right now is time.  I will see if I can get some minor changes approved, in the meantime, that will address your issues. One simple solution might be to change the introductory language to the case studies and remove the word scandals where it is not appropriate.

But yesterday when I took my requried CITI refresher course, I discovered that the promise was an empty one. Not a word has been changed.

So I’m back to writing to them. Below is my latest email. Below it, I have quoted the objectionable text from the CITI course.

Dear CITI staff:

Two years ago, I wrote to you to object to the way that Stanley Milgram’s obedience research, and other behavioral science research, was portrayed in your “History and Ethics” module. That email is appended below.

The crux of my objection was that CITI mischaracterized Milgram’s research as unethical and drew parallels to Nazi medical experiments and the Tuskegee syphilis study. To the contrary, Milgram’s obedience research was conducted ethically (in fact, it was replicated with IRB approval just a few years ago). It is indeed relevant to contemporary research ethics — not as an exemplar of harmful research, but because of what it teaches us about how research subjects may respond to scientific and institutional authority.

In response, I received a message from Lorna Hicks (appended below) in which she stated that Paul Bruanschweiger had forwarded her my email. She stated quite bluntly, “I agree with you.” She assured me that CITI would update its materials. At that time, I was pleased both with CITI’s prompt responsiveness to feedback as well as with the specific substance of the reply.

So perhaps you can imagine my surprise and dismay when I sat down to take the CITI refresher yesterday — two years later — and discovered that Milgram and several other behavioral studies are still being described as “similar events” to Nazi war crimes and the Tuskegee syphilis study. Despite your assurances made two years ago, the module has not been changed to remove the objectionable comparisons.

So once again, I am writing to you to strongly object to your characterization of Milgram’s obedience research. You are doing a disservice to the legacy of an important body of behavioral science research, and you have continued to do so for several years despite agreeing that it was wrong and promising to stop.

Sincerely,
Sanjay Srivastava

Here is the text of the CITI “History and Ethics” module that I objected to, as it appeared both in 2009 and 2011. I have quoted at length to provide the full context, so you can see the comparison for yourself. Boldface emphases have been added by me.

The development of the regulations to protect human subjects were driven by scandals in both biomedical and the social/behavioral research, and as such reflect social concerns regarding research involving human subjects including:

* The importance of meeting the requirements of basic ethical principles underlying the involvement of humans as research subjects

* The need for independent, objective review of research

* The need to preserve the public trust in research involving human subjects

1.0 Historical Development

The events that led up to the development of the currently regulatory system occurred in both biomedical and social/behavioral research.

1.1 Events in Biomedical Research

Attention to the ethics of human subjects research first received wide-spread attention after WWII with revelations of the Nazi “research” which led to the Nuremburg Code, a statement of ethical principles of human experimentation. In 1964, the World Medical Association developed a code of research ethics that came to be known as the Declaration of Helsinki. It was a reinterpretation of the Nuremberg Code, with an eye to medical research with therapeutic intent. In 1966, Dr. Henry K. Beecher, an anesthesiologist, wrote an article (Beecher HK. “Ethics and Clinical Research” NEJM June 16, 1966) describing 22 examples of research studies with controversial ethics that had been conducted by reputable researchers and published in major journals. Beecher’s article played an important role in heightening the awareness of researchers, the public, and the press to the problem of unethical human subjects research.

One of the seminal events in the development of the current regulatory environment was the Public Health Service (PHS) Syphilis Study (1932 – 1972), the so-called “Tuskeegee Syphilis Study”. Initiated and funded by the PHS, this study was designed to document the natural history of syphilis in African-American men. Hundreds of poor, African-American men with syphilis were enrolled into the study. The men were recruited without informed consent and were deliberately misinformed about the need for some of the procedures. This longitudinal study lasted over 40 years until newspaper reports forced the US government to terminate the study. For more information follow the link to the PHS Syphilis Study.

1.2 Events in Social & Behavioral Research

Events contributing to the development of the current regulatory system were not limited to biomedical research; during the same period there were several similar events in the social and behavioral sciences: The Wichita Jury Case (1953) where researchers tape recorded jurors’ deliberations in six cases to measure influence of attorney comments on decision making. The research was conducted with knowledge of the judge and attorneys, but not jurors. The Milgram “Obedience to Authority”(1963) studies which were conducted to determine how far subjects would go in administering seemingly severe electric “shock” as directed/instructed by an authority figure (to continue when the experimenter) to another subject (a confederate) even when the latter subject appeared to be in extreme pain but continued to fail test questions. Humphreys “Tearoom Trade” study (1970), which involved the observation of men engaged in sex acts in restrooms, secretly following them to their cars, transcribing license plate numbers, tracking them through DMV records to their homes and interviewing them about personal issues. The Zimbardo “Simulated Prison” (1973) research, which involved assigning roles to male student volunteers as “prisoners” and “guards”. The research became so intense as physical and psychological abuse of “prisoners” by “guards” escalated, that the researcher stopped the experiment/simulation after six days. See Dr. Zimbardo’s web site for more details on this study.

UPDATE (7/8/2011): I heard back from CITI.

Why does an IRB need an analysis plan?

My IRB has updated its forms since the last time I submitted an application, and I just saw this section, which I think is new (emphasis added by me):

Analysis: Explain how the data will be analyzed or studied (i.e. quantitatively or qualitatively and what statistical tests you plan on using). Explain how the interpretation will address the research questions. (Attach a copy of the data collection instruments).

What statistical tests I plan on using?

My first thought was “mission creep,” but I want to keep an open mind. Are there some statistical tests that are more likely to do harm to the human subjects who provided the data? Has anybody ever been given syphilis by a chi-square test? If I do a median split, am I damaging anything more than my own credibility? (“What if there are an odd number of subjects? Are you going to have to saw a subject in half?”)

Seriously though, is there something I’m missing?

Milgram is not Tuskegee

My IRB requires me to take a course on human subjects research every couple of years. The course, offered by the Collaborative Institutional Training Initiative (CITI), mostly deals with details of federal research regulations covering human subjects research.

However the first module is titled “History and Ethics” and purports to give an overview and background of why such regulations exist. It contains several historical inaccuracies and distortions, including attempts to equate the Milgram obedience studies with Nazi medical experiments and the Tuskegee syphilis study. I just sent the following letter to the CITI co-founders in the hopes that they will correct their presentation:

* * *

Dear Dr. Braunschweiger and Ms. Hansen:

I just completed the CITI course, which is mandated by my IRB. I am writing to strongly object to the way the research of Stanley Milgram and others was presented in the “History and Ethics” module.

The module begins by stating that modern regulations “were driven by scandals in both biomedical and social/behavioral research.” It goes on to list events whose “aftermath” led to the formation of the modern IRB system. The subsection for biomedical research lists Nazi medical experiments and the PHS Tuskegee Syphilis study. The subsection for social/behavioral research lists what it calls “similar events,” including the Milgram obedience experiments, the Zimbardo/Stanford prison experiment, and several others.

The course makes no attempt to distinguish among the reasons why the various studies are relevant. They are all called “scandals,” described as “similar,” and presented in parallel. This is severely misleading.

Clearly, the Nazi experiments are morally abhorrent on their face. The Tuskegee study was also deeply unethical by modern standards and, most would argue, even by the standards of its day: it involved no informed consent, and after the discovery that penicillin was an effective treatment for syphilis, continuation of the experiment meant withholding a life-saving medical treatment.

But Milgram’s studies of obedience to authority are a much different case. His research predated the establishment of modern IRBs, but even by modern standards it was an ethical experiment, as the societal benefits from knowledge gained are a strong justification for the use of deception. Indeed, just this year a replication of Milgram’s study was published in the American Psychologist, the flagship journal of the American Psychological Association. The researcher, Jerry M. Burger of Santa Clara University, received permission from his IRB to conduct the replication. He made some adjustments to add further safeguards beyond what Milgram did — but these adjustments were only possible by knowing, in hindsight, the outcome of Milgram’s original experiments. (See: http://www.apa.org/journals/releases/amp641-1.pdf)

Thus, Tuskegee and Milgram are both relevant to modern thinking about research ethics, but for completely different reasons. Tuskegee is an example of a deeply flawed study that violated numerous ethical principles. By contrast, Milgram was an ethically sound study whose relevance to modern researchers is in the substance of its findings — to wit, that research subjects are more vulnerable than we might think to the influence of scientific and institutional authority. Yet in spite of these clear differences, the CITI course calls them all “scandals” and presents them in parallel, and alongside other ethically questionable studies, implying that they are all relevant in the same way.

(The parallelism implied with other studies on the list is problematic as well. Take for example the Stanford prison experiment. It would arguably not be approved by a modern IRB. But an important part of its modern relevance is that the researchers discontinued the study when they realized it was harming subjects — anticipating a central tenet of modern research ethics. This is in stark contrast to Tuskegee, where even after an effective treatment for syphilis was discovered, the researchers continued the study and never intervened on behalf of the subjects.)

In conclusion, I strongly urge you to revise your course. It appears that the module is trying to get across the point that biomedical research and social/behavioral research both require ethical standards and regulation — which is certainly true. But the histories, relevant issues, and ramifications are not the same. The attempt to create some sort of parallelism in the presentation (Tuskegee = Milgram? Nazis = Zimbardo?) is inaccurate and misguided, and does a disservice to the legacy of important social/behavioral research.

Sincerely,
Sanjay Srivastava

UPDATE: I got a response a day after I sent the letter. See this post: A very encouraging reply.

UPDATE 7/6/2011: Scratch that. Two years later, they haven’t changed a thing.

The perverse incentive structure of IRBs

As a researcher at a university, all of my human subjects research has to go through my university’s IRB. I believe that IRBs have an important role in research. However, in practice I sometimes find dealing with an IRB to be frustrating.

Pretty much all of the research that I do is very low risk. Yet I have to go through a review system that was invented as a response to Nazi medical experiments and other horrific incidents half a century ago. You might think that should make my behavioral research easier to get approved — I could just say, “hey, guess what, I’m not secretly giving people syphilis or anything” and get the thumbs-up. Sadly, though, it doesn’t work like that. Even when I have a study that is eligible for expedited review, there is a heck of a lot of paperwork to fill out, and time to wait, and often pointless revisions to make — all in order to do something as simple as asking people a few questions about what kind of day they had yesterday.

So why are university IRBs so inefficient? There are a number of reasons, but I believe that one of the core problems is that the system is built on a foundation of perverse incentives for the IRB.

The IRB’s task can be thought of like a signal detection problem. Simplifying a little bit, you can think of the protocols that researchers submit as being either worthy or unworthy. For any given protocol, the IRB has to decide to approve or reject. So there are two kinds of correct decisions (approve a worthy protocol or reject an unworthy one) and two kinds of mistaken decisions (reject a worthy protocol or approve an unworthy one). And the big problem is that the IRB’s potential costs associated with the two different kinds of mistakes are severely imbalanced.

If the IRB mistakenly rejects a worthy protocol, what is the worst thing that could happen? The investigator might make a phone call and resubmit the application, taking up some extra staff time, but the IRB will not get into any serious trouble. And the costs of this mistake are chiefly borne by the researcher, not the IRB. Furthermore, within a university, there is no appeals process or oversight authority empowered to act on a rejected protocol.

By contrast, if the IRB mistakenly approves an unworthy protocol, all kinds of bad things could happen. Even if no subjects are harmed, an audit could turn up the mistake and the IRB could get in trouble. And in more serious cases — if subjects do get exposed to inappropriate risks, or actually get harmed — things can get much, much worse. The IRB could get shut down (halting all research at the university), the professional IRB staff could get fired, and the university could get sued by the harmed subjects.

These asymmetric incentives mean that IRBs have a very strong incentive to err on the side of rejecting too much research. So it’s no wonder that the process is so slow and clunky, and even simple low-risk protocols are routinely sent back for revisions. The staff at my IRB are good people who want to help researchers when they can. But the actual review board members are often people with no personal stake in seeing that research gets done efficiently, and some have no formal science training at all (which can lead them to imagine harmful effects of research that have no basis in reality). And for both the paid staff and the board members, even those with the best intentions work within an incentive structure that is completely out of whack.

So a big part of me was outraged (and a tiny, naughty part of me jealous) to learn that in commercial medical settings, the IRB incentives are out of whack too — but in the opposite direction. If you are a researcher a private, for-profit research company, you get approval for your research by paying a commercial IRB to review it. It doesn’t take a genius to look at this setup and figure out that a commercial IRB that approves lots of research is going to be popular with its customer base. So it was probably just a matter of time before a scandal erupted. And now one has.

In a test of the commercial IRB system, the Government Accountability Office submitted a fake protocol to 3 different commercial IRBs. The protocol was rigged to be full of unsafe, high-risk elements. And apparently one of the companies, Coast IRB, fell for the sting, deeming the protocol safe and low-risk and giving it approval. Upon further investigation from the GAO, it turns out that Coast has not rejected a single proposal in the last 5 years, and it made over $9 million last year. Hmmm…

In the aftermath of this incident, it is very likely that attention is again going to get focused where it always gets focused: on the possibility that IRBs might be approving bad, unsafe research. But such a focus may be misguided. The case of Coast IRB shows that even commercial IRBs face very serious costs when they get caught approving bad research. The company has just seen its entire $9-mil-a-year business evaporate while it undergoes an audit. Employees may lose their jobs. Owners may lose profits and see their shares lose value. The entire company could go out of business.

Instead, the problem with both university and commercial IRBs is on the approval side: the system does not present the right level of incentives for approving worthy research. In the university IRB case, the incentive is too low. And in the commercial IRB case, it’s too high. Hypothetically speaking, even if somebody at a Coast IRB kind of place knew the potential costs of getting caught approving bad research, in a rational cost-benefit analysis those potential costs would have been balanced against a multimillion-dollar revenue stream that depended on them approving lots of protocols, good and bad.

So what will happen next? If you are a member of Congress and you want to fix commercial IRBs, you could alter the cost-benefit balance on either side. That is, you could either diminish the profit motive associated with approving research, or you could make it even more costly for a company to mistakenly approve bad research. The problem is that any new regulatory policy designed to fix commercial IRBs could very well affect university IRBs as well, since both kinds of IRBs fall under many of the same regulations. And if you raise the costs and punishments associated with approving bad research (or institute even more intrusive regulations and oversight to try to prevent such approvals from happening), you will make the perverse incentives at universities even more perverse.

Personally, I think it’s at least a littie bit weird that IRBs — institutions designed to safeguard the interests of research subjects — can be run as for-profit businesses whose very financial existence depends upon those they are supposed to watch. If Congress wants to fix the system in the commercial medical industry, they need to look at the fundamental question of whether that is a sustainable model, and narrowly tailor any changes to apply to commerical IRBs. The answer is most definitely not to create more intrusive oversight or threaten punishments across the board. Let’s hope that is not the direction they choose to go.