Changing software to nudge researchers toward better data analysis practice
The tools we have available to us affect the way we interact with and even think about the world. “If all you have is a hammer” etc. Along these lines, I’ve been wondering what would happen if the makers of data analysis software like SPSS, SAS, etc. changed some of the defaults and options. Sort of in the spirit of Nudge — don’t necessarily change the list of what is ultimately possible to do, but make changes to make some things easier and other things harder (like via defaults and options).
Would people think about their data differently? Here’s my list of how I might change regression procedures, and what I think these changes might do:
1. Let users write common transformations of variables directly into the syntax. Things like centering, z-scoring, log-transforming, multiplying variables into interactions, etc. This is already part of some packages (it’s easy to do in R), but not others. In particular, running interactions in SPSS is a huge royal pain. For example, to do a simple 2-way interaction with centered variables, you have to write all this crap *and* cycle back and forth between the code and the output along the way:
desc x1 x2. * Run just the above, then look at the output and see what the means are, then edit the code below. compute x1_c = x1 - [whatever the mean was]. compute x2_c = x2 - [whatever the mean was]. compute x1x2 = x1_c*x2_c. regression /dependent y /enter x1_c x2_c x1x2.
Why shouldn’t we be able to do it all in one line like this?
regression /dependent y /enter center(x1) center(x2) center(x1)*center(x2).
The nudge: If it were easy to write everything into a single command, maybe more people would look at interactions more often. And maybe they’d stop doing median splits and then jamming everything into an ANOVA!
2. By default, the output shows you parameter estimates and confidence intervals.
3. Either by default or with an easy-to-implement option, you can get a variety of standardized effect size estimates with their confidence intervals. And let’s not make variance-explained metrics (like R^2 or eta^2) the defaults.
The nudge: #2 and #3 are both designed to focus people on point and interval estimation, rather than NHST.
This next one is a little more radical:
4. By default the output does not show you inferential t-tests and p-values — you have to ask for them through an option. And when you ask for them, you have to state what the null hypotheses are! So if you want to test the null that some parameter equals zero (as 99.9% of research in social science does), hey, go for it — but it has to be an active request, not a passive default. And if you want to test a null hypothesis that some parameter is some nonzero value, it would be easy to do that too.
The nudge. In the way a lot of statistics is taught in psychology, NHST is the main event and effect estimation is an afterthought. This would turn it around. And by making users specify a null hypothesis, it might spur us to pause and think about how and why we are doing so, rather than just mining for asterisks to put in tables. Heck, I bet some nontrivial number of psychology researchers don’t even know that the null hypothesis doesn’t have to be the nil hypothesis. (I still remember the “aha” feeling the first time I learned that you could do that — well along into graduate school, in an elective statistics class.) If we want researchers to move toward point or range predictions with strong hypothesis testing, we should make it easier to do.
All of these things are possible to do in most or all software packages. But as my SPSS example under #1 shows, they’re not necessarily easy to implement in a user-friendly way. Even R doesn’t do all of these things in the standard lm function. As a result, they probably don’t get done as much as they could or should.
Any other nudges you’d make?