Saturday, April 22, 2017

What will happen when we combine replication studies with positive-result bias?

Just read a nice blog post from Stephen Heard about replicability vs. robustness that I really agree with. Basically, the idea under discussion is how much effort we should devote to exactly repeating experiments (narrow robustness) vs. the more standard way of doing science, which is everyone does their own version to see whether the result holds more generally (broad robustness). In my particular niche of molecular biology, I think most (though definitely not all, you know who you are!) errors are those of judgement rather than technical competence/integrity, and so I think most exact replication efforts are a waste of time, an argument which many other have made as well.

In the comments, some people arguing for more narrow replication studies made the point that very little (~0%) of our current research budget is devoted to explicitly to replication. Which got me wondering: what might happen if we suddenly funded a lot of replication studies?

In particular, I worry about positive-result bias. Positive-result bias is basically the natural human desire to find something new: our expectation is X, but instead we found Y. Hooray, look, new science! Press release, please! :)

Now what happens when when we start a bunch of studies with the explicit mandate to replicate a previous study? Here, the expectation is now what was already found and so positive-result bias would bias towards a refutation. I mean, let’s face it, people want to do something interesting and new that other people care about. The cancer reproducibility project in eLife provides an interesting case study: most of the press around the publication was about how the results were “muddy”, and I definitely saw a great deal more interest in what didn’t replicate than what did.

Look, I’m not saying that scientists are so hungry for attention that most, or even more than a few, would consciously try to have a replication fail (although I do wonder about that eLife replication paper that applied what seemed to be overly stringent statistical criteria in order to say something did not replicate). All I’m saying is the same hype incentives that we complain about are clearly aligned with failed replication results, and so we should be just as critical and vigilant about them.

As for apportionment of resources towards replication, I think that setting aside the question as to whether it’s a good use of money from the scientific perspective (I, like others, would argue largely not), there’s also the question of whether it’s a good use of human resources. Having a student or postdoc work on a replication study for years during their training period is not, I think, a good use of their time, and keeps them from the more valuable training experience of actually, you know, doing their own science—let alone robbing them of the thrill of new discovery. Perhaps such studies are best left to industry, which is where I believe they already largely reside.

1 comment:

  1. I think there are far more narrow(ish) sense replication studies than anyone credits - the result in your last paper becomes the positive control in my new study. When my student can't get their positive result to work, this cast doubt on the robustness of the original result. Over time people "in the field" get to know what is robust and what is not. Of course this is a problem when people not in the narrow field in question want to use results from your field's literature.