So Why ARE Most Published Research Findings False? The Corollaries

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here and Part 2  here.

Okay, first a quick recap: Up until now, Ioannidis has spent most of the paper providing a statistical justification for considering not just study power and p values, but also made a case for including pre-study odds, bias measures, and the number of teams working on a problem as items to look at when trying to figure out if a published finding is true or not. Because he was writing a scientific paper and not a blog post, he did a lot less editorializing than I did when I was breaking down what he did. In this section he changes all that, and he goes through a point by point breakdown of what this all means with a set of 7 6 corollaries. The words here in bold are his, but I’ve simplified the explanations. Some of this is a repeat from the previous posts, but hey, it’s worth repeating.

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. In part 1 and part 2, we saw a lot of graphs that showed good study power had a huge effect on result reliability. Larger sample sizes = better study power.

Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. This is partially just intuitive, but also part of the calculation for study power. Larger effect sizes = better study power. Interestingly, Ioannidis points out here that given all the math involved, any field looking for effect sizes smaller than 5% is pretty much never going to be able to confirm their results.

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. That R value we talked about in part 1 is behind this one. Pre-study odds matter, and fields that are generating new hypotheses or exploring new relationships are always going to have more false positives than studies that replicate others or meta-analyses.

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. This should be intuitive, but it’s often forgotten. I work in oncology, and we tend to use a pretty clear cut end point for many of our studies: death. Our standards around this are so strict that if you die in a car crash less than 100 days after your transplant, you get counted in our mortality statistics. Other fields have more wiggle room. If you are looking for mortality OR quality of life OR reduced cost OR patient satisfaction, you’ve quadrupled your chance of a false positive.

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. This one’s pretty obvious. Worth noting: he points out “trying to get tenure” and “trying to preserve ones previous findings” are both sources of potential bias.

Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This was part of our discussion last week. Essentially it’s saying that if you have 10 people with tickets to a raffle, the chances that one of you wins is higher than the chances that you personally win. If we assume 5% of positive findings happen due to chance, having multiple teams work on a question will inevitably lead to more false positives.

Both before and after listing these 6 things out, Ioannidis reminds us that none of these factors are independent or isolated. He gives some specific examples from genomics research, but then also gives this helpful table.  To refresh your memory, the 1-beta column is study power (influenced by sample size and effect size), R is the pre-study odds (varies by field), u is bias, and the “PPV” column over on the side there is the chance that a paper with a positive finding is actually true. Oh, and “RCT” is “Randomized Control Trial”:

I feel a table of this sort should hang over the desk of every researcher and/or science enthusiast.

Now all this is a little bleak, but we’re still not entirely at the bottom. We’ll get to that next week.

Part 4 is up! Click here to read it.

So Why ARE Most Published Research Findings False? Bias and Other Ways of Making Things Worse

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so if you missed the intro, check it out here and check out Part 1 here.

First, a quick recap: Last week we took a look at the statistical framework that helps us analyze the chances that any given paper we are reading found a relationship that actually exists. This first involves turning the study design (assumed Type 1 and Type 2 error rate) in to a positive predictive value….aka given the assumed error rate, what is the chance that a positive result is actually true. We then added in a variable R or “pre-study odds” which sought to account for the fact that some fields are simply more likely to find true results than others due to the nature of their work. The harder it is to find a true relationship, the less likely it is that any apparently true relationship you do find is actually true. This is all just basic math (well, maybe not basic math), and provides us the coat hook on which to hang some other issues which muck things up even further.

Like bias.

Oh, bias: Yes, Ioannidis talks about bias right up front. He gives it the letter “u” and defines it as “the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias“. Note that he is specifically focusing on research that is published claiming to have found a relationship between to things. He does mention that bias could be used to bury true findings, but that is beyond the current scope. It’s also probably less common simply because positive findings are less common. Anyway, he doesn’t address reasons for bias at this point, but he does add it in to his table to show how much it mucks about with the equations:

This pretty much confirms our pre-existing beliefs that bias makes everything messy. Nearly everyone knows that bias screws things up and makes things less reliable, but Ioannidis goes a step further and seeks to answer the question “how much less reliable?”  He helpfully provides these graphs (blue line is low bias of .05, yellow is high bias of .8):

Eesh. What’s interesting to note here is that good study power (the top graph) has a pretty huge moderating effect on all levels of bias over studies with low power (bottom graph). This makes sense since study power is influenced by sample size and the size of the effect your are looking for. While even small levels of bias (the blue line) influence the chance of a paper being correct, it turns out good study design can do wonders for your work.  To put some numbers on this, a well powered study with 30% pre-study odds with a positive finding has a 83% chance of being correct with no bias. If that bias is 5%, the chances drop to about 80%. If the study power is dropped, you have about a 70% chance of a true finding being real. Drop the study power further and you’re under 60%. Keep your statisticians handy folks.

Independent teams, or yet another way to muck things up: Now when you think about bias, the idea of having independent teams work on the same problems sounds great. After all, they’re probably not all equally biased, and they can confirm each other’s findings right?

Well, sometimes.

It’s not particularly intuitive to think that having lots of people working on a research question would make results less reliable, but it makes sense. For every independent team working on the same research question, the chances that one of them gets a false positive finding goes up. This is a more complicated version of the replication crisis, because none of these teams necessarily have to be trying the same method to address the question. Separating out what’s a study design issue and what’s a false positive is more complicated than it seems. Mathematically, the implications of this are kind of staggering. The number of teams working on a problem (n) actually increase some of the factors exponentially. Even if you leave bias out of the equation, this can have an enormous impact on the believability of positive results:

If you compare this to the bias graph, you’ll note that having 5 teams working on the same question actually decreases the chances of have a true positive finding more than having a bias rate of 20% does….and that’s for well designed studies. This is terrible news because while many people have an insight in to how biased a field might be and how to correct for it, you rarely hear people discuss how many teams are working on the same problem.  That Indeed, researchers themselves may not know how many people are researching their question. I mean, think about how this is reported in the press “previous studies have not found similar things”.  Some people take that as a sign of caution, but many more take that as “this is groundbreaking”. Only time can tell which one is which, and we are not patient people.

Now we have quite a few factors to take in to account. Along with the regular alpha and beta, we’ve added R (pre-study odds),  u (bias) and n (number of teams). So far we’ve looked at them all in isolation, but next week we’re going to review what the practical outcomes are of each and how they start to work together to really screw us up. Stay tuned.

Part 3 is up! Click here to read “The Corollaries”

So Why ARE Most Published Research Findings False? A Statistical Framework

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper bearing that name. If you missed the intro, check it out here.

Okay, so last week I gave you the intro to the John Ioannidis paper Why Most Published Research Findings are False. This week we’re going to dive right in with the first section, which is excitingly titled “Modeling the Framework for False Positive Findings“.

Ioannidis opens the paper with a review of the replication crisis (as it stood in 2005 that is) and announces his intention to particularly focus on studies that yield false positive results….aka those papers that find relationships between things where no relationship exists.

To give a framework for understanding why so many of these false positive findings exists, he creates a table showing the 4 possibilities for research findings, and how to calculate how large each one is. We’ve discussed these four possibilities before, and they look like this:

Now that may not look too shocking off the bat, and if  you’re not in to this sort of thing you’re probably yawning a bit. However, for those of us in the stats world, this is a paradigm shift.  See historically stats students and researchers have been taught that the table looks like this:


This table represents a lot of the decisions you make right up front in your research, often without putting much thought in to it. Those values are used to drive error rates, study power and confidence intervals:


The alpha value is used to drive the notorious “.05” level used in p-value testing, and is the chances that you would see a relationship more extreme than the one you’re seeing due to random chance.

What Ioannidis is adding in here is c, or the overall number of relationships you are looking at, and the R, which is the overall proportion of true findings to false findings in the field. Put another way, this is the “Pre-Study Odds”. It asks researchers to think about it up front: if you took your whole field and every study ever done in it, what would you say the chances of a positive finding are right off the bat?

Obviously R would be hard to calculate, but it’s a good add in for all researchers. If you have some sense that your field is error prone or that it’s easy to make false discoveries, you should be adjusting your calculations accordingly. Essentially he is asking people to consider the base rate here, and to keep it front and center.  For example, a drug company that has carefully vetted it’s drug development process may know that 30% of the drugs that make it to phase 2 trials will ultimately prove to work. On the other hand, a psychologist attempting to create a priming study could expect a much lower rate of success. The harder it is for everyone to find a real relationship, the greater the chances that a relationship you do find will also be a false positive. I think requiring every field to come up with an R would be an enormously helpful step in and of itself, but Ioannidis doesn’t stop there.

Ultimately, he ends up with an equation for the Positive Predictive Value (aka the chance that a positive result is true aka PPV aka the chance that a paper you read is actually reporting a real finding) which is PPV = (1 – β)R/(R – βR + α). For a study with a typical alpha and a good beta (.05 and .2, respectively), here’s what that looks like for various values of R:


So the lower the pre-study odds of success, the more likely it is that a finding is a false positive rather than a true positive. Makes sense right?

Now most readers will very quickly note that this graph shows that you have a 50% chance of being able to trust the result at a fairly low level of pre-study odds, and that is true. Under this model, the study is more likely to be true than false if (1 – β)R > α. In the case of my graph above, this translates in to pre-study odds that are greater than 1/16. So where do we get the “most findings are false” claim?

Enter bias.

You see, Ioannidis was setting this framework up to remind everyone what the best case scenario was. He starts here to remind everyone that even within a perfect system, some fields are going to be more accurate than others simply due to the nature of the investigations they do, and that no field should ever expect that 100% accuracy is their ceiling. This is an assumption of the statistical methods used, but this assumption is frequently forgotten when people actually sit down to review the literature. Most researchers would not even think of claiming that their pre-study odds were more than 30%, yet very few would say off the top “17% of studies finding significant results in my field are wrong”, yet that’s what the math tells us. And again, that’s in a perfect system. Going forward we’re going to add more terms to the statistical models, and those odds will never get better.

In other words, see you next week folks, it’s all down hill from here.

Click here to go straight to part 2.

So Why ARE Most Published Research Findings False? (An Introduction)

Well hello hello! I’m just getting back from a conference in Minneapolis and I’m completely exhausted, but I wanted to take a moment to introduce a new Sunday series I’ll be rolling out starting next week. I’m calling it my “Important Papers” series, and it’s going to be my attempt to cover/summarize/explain the important points and findings in some, well, important papers.

I’m going to start with the 2005 John Ioannidis paper “Why Most Published Research Findings are False“.  Most people who have ever questioned academic findings have heard of this one, but fewer seem familiar with what it actually says or recommends. Given the impact this paper has had, I think it’s a vital one for people to understand.  I got this idea when my professor for this semester made us all read it to kick off our class, and I was thinking how helpful it was to use that as a framework for further learning. It will probably take me 6 weeks or so to get through the whole thing, and I figured this week would be a good time to do a bit of background. Ready? Okay!

John Ioannidis is Greek physician who works at Stanford University. In 2005 he published the paper “Why Most Published Research Findings Are False”. This quickly became the most cited paper from PLOS Medicine, and is apparently one of the most accessed papers of all time with 1.5 million downloads. The paper is really the godfather of the meta-research movement…i.e. the push to research how research goes wrong. The Atlantic did a pretty cool breakdown of Ioannidis’s career and work here.

The paper has a few different sections, and I’ll going through each of them. I’ll probably group a few together based on length, but I’m not sure quite yet how that will look.  However, up front I’m thinking the series will go like this:

  1. The statistical framework for false positive findings
  2. Bias and failed attempts at corrections
  3. Corollaries (aka uncomfortable truths)
  4. Research and Bias
  5. A Way Forward
  6. Some other voices/complaints

I’ll be updating that list with links as I write them.

We’ll kick off next week with that first one. There will be pictures.

Week one is up! Go straight to it here.