During my recent series on “Why Most Published Research Findings Are False“, I mentioned a concept called “study power” quite a few times. I haven’t talked about study power much on this blog, so I thought I’d give a quick primer for those who weren’t familiar with the term. If you’re looking for a more in depth primer try this one here, but if you’re just looking for a few quick hits, I gotcha covered:
- It’s sort of the flip side of the p-value We’ve discussed the p-value and how it’s based on the alpha value before, and study power is actually based on a value called beta. If alpha can be thought of as the chances of committing a Type 1 error (false positive), then the beta is the chance of getting a Type 2 error (false negative). Study power is actually 1 – beta, so if someone says study power is .8, that means the beta was .2. Setting the alpha and beta values are both up to the discretion of the researcher….their values are more about risk tolerance than mathematical absolutes.
- The calculation is not simple, but what it’s based on is important Calculating study power is not easy math, but if you’re desperately curious try this explanation. For most people though, the important part to remember is that it’s based on 3 things: the alpha you use, the effect size you’re looking for, and your sample size. These three can all shift based on the values of the other one. As an example, imagine you were trying to figure out if a coin was weighted or not. The more confident you want to be in your answer (alpha), the more times you have to flip it (sample size). However, if the coin is REALLY unfairly weighted (effect size), you’ll need fewer flips to figure that out. Basically the unfairness of a coin weighted to 80-20 will be easier to spot than a coin weighted to 55-45.
- It is weirdly underused As we saw in the “Why Most Published Findings Are False” series, adequate study power does more than prevent false negatives. It can help blunt the impact of bias and the effect of multiple teams, and it helps everyone else trust your research. So why don’t most researchers put much thought in to it, science articles mention it, or people in general comment on it? I’m not sure, but I think it’s simply because the specter of false negatives is not as scary or concerning as that of false positives. Regardless, you just won’t see it mentioned as often as other statistical issues. Poor study power.
- It can make negative (aka “no difference”) studies less trustworthy With all the current attention on false positive/failure to replicate studies, it’s not terribly surprising that false negatives have received less attention…..but it is still an issue. Despite the fact that study power calculations can tell you how big the effect size you can detect is, and odd number of researchers don’t include their calculations. This means a lot of “negative finding” trials could also be suspect. In this breakdown of study power, Stats Done Wrong author Alex Reinhart cites studies that found up to 84% of studies don’t have sufficient power to detect even a 25% difference in primary outcomes. An ASCO review found that 47% of oncology trials didn’t have sufficient power to detect all but the largest effect sizes. That’s not nothing.
- It’s possible to overdo it While underpowered studies are clearly an issue, it’s good to remember that overpowered studies can be a problem too. They waste resources, but can also detect effect sizes so small as to be clinically meaningless.
Okay, so there you have it! Study power may not get all the attention the p-value does, but it’s still a worthwhile trick to know about.