Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so if you missed the intro, check it out here and check out Part 1 here.
First, a quick recap: Last week we took a look at the statistical framework that helps us analyze the chances that any given paper we are reading found a relationship that actually exists. This first involves turning the study design (assumed Type 1 and Type 2 error rate) in to a positive predictive value….aka given the assumed error rate, what is the chance that a positive result is actually true. We then added in a variable R or “pre-study odds” which sought to account for the fact that some fields are simply more likely to find true results than others due to the nature of their work. The harder it is to find a true relationship, the less likely it is that any apparently true relationship you do find is actually true. This is all just basic math (well, maybe not basic math), and provides us the coat hook on which to hang some other issues which muck things up even further.
Oh, bias: Yes, Ioannidis talks about bias right up front. He gives it the letter “u” and defines it as “the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias“. Note that he is specifically focusing on research that is published claiming to have found a relationship between to things. He does mention that bias could be used to bury true findings, but that is beyond the current scope. It’s also probably less common simply because positive findings are less common. Anyway, he doesn’t address reasons for bias at this point, but he does add it in to his table to show how much it mucks about with the equations:
This pretty much confirms our pre-existing beliefs that bias makes everything messy. Nearly everyone knows that bias screws things up and makes things less reliable, but Ioannidis goes a step further and seeks to answer the question “how much less reliable?” He helpfully provides these graphs (blue line is low bias of .05, yellow is high bias of .8):
Eesh. What’s interesting to note here is that good study power (the top graph) has a pretty huge moderating effect on all levels of bias over studies with low power (bottom graph). This makes sense since study power is influenced by sample size and the size of the effect your are looking for. While even small levels of bias (the blue line) influence the chance of a paper being correct, it turns out good study design can do wonders for your work. To put some numbers on this, a well powered study with 30% pre-study odds with a positive finding has a 83% chance of being correct with no bias. If that bias is 5%, the chances drop to about 80%. If the study power is dropped, you have about a 70% chance of a true finding being real. Drop the study power further and you’re under 60%. Keep your statisticians handy folks.
Independent teams, or yet another way to muck things up: Now when you think about bias, the idea of having independent teams work on the same problems sounds great. After all, they’re probably not all equally biased, and they can confirm each other’s findings right?
It’s not particularly intuitive to think that having lots of people working on a research question would make results less reliable, but it makes sense. For every independent team working on the same research question, the chances that one of them gets a false positive finding goes up. This is a more complicated version of the replication crisis, because none of these teams necessarily have to be trying the same method to address the question. Separating out what’s a study design issue and what’s a false positive is more complicated than it seems. Mathematically, the implications of this are kind of staggering. The number of teams working on a problem (n) actually increase some of the factors exponentially. Even if you leave bias out of the equation, this can have an enormous impact on the believability of positive results:
If you compare this to the bias graph, you’ll note that having 5 teams working on the same question actually decreases the chances of have a true positive finding more than having a bias rate of 20% does….and that’s for well designed studies. This is terrible news because while many people have an insight in to how biased a field might be and how to correct for it, you rarely hear people discuss how many teams are working on the same problem. That Indeed, researchers themselves may not know how many people are researching their question. I mean, think about how this is reported in the press “previous studies have not found similar things”. Some people take that as a sign of caution, but many more take that as “this is groundbreaking”. Only time can tell which one is which, and we are not patient people.
Now we have quite a few factors to take in to account. Along with the regular alpha and beta, we’ve added R (pre-study odds), u (bias) and n (number of teams). So far we’ve looked at them all in isolation, but next week we’re going to review what the practical outcomes are of each and how they start to work together to really screw us up. Stay tuned.
Part 3 is up! Click here to read “The Corollaries”