This past week I was having a discussion with my high school teacher brother about an experiment his class was running and appropriate statistical methods for analyzing the data. We were discussing using the chi square statistic to compare data from an in class calorimetry experiment to the expected/published values (this is the point where the rest of my family wandered off), and he asked what other statistical analysis his kids could do that might help them understand their results. I mentioned that I was a big fan of confidence intervals for understanding data like this, and started to rattle off my reasons. While testing that produces a p-value is more commonly used in scientific work, I think for most people confidence intervals are more intuitive to use and should be worked in to the mix. Since we were talking about all this at around 7am (prior to both the second cup of coffee AND a trip out to move the cows at his farm), I figured I’d use my blog post today to more thoroughly lay out a few reasons confidence intervals should be more widely used (particularly by teachers) and provide a few helpful links.
The foundations of my argument comes from a paper published a few years ago called “Why the P-value culture is bad and confidence intervals a better alternative“, which gets in to the weeds on the stats, but makes a good overall case for moving away from a reliance on p-values and towards a focus on confidence intervals. The major reasons are:
- Confidence intervals use values you already have p-values and confidence intervals are mathematically similar and take the same basic variables in to account: number of observations (n) and variation of those observations (standard error or SE). More observations and lower variability within those observations generally are considered good things. If you can get a p-value, you can get a confidence interval.
- Confidence intervals give you more information than the p-value Where a p-value tells you just the statistical significance of the difference, the confidence interval tells you something about the magnitude of the difference. It gives an upper and lower limit, so you get a sense of what you’re really see. For kids learning about variation, I also thinks this can give them a better sense of how each of their experimental values affects the overall outcome. For example, if you’re doing a calorimetry experiment and know that the expected outcome is 100, and your class of 30 kids gets an average of 98 with a standard deviation of 5, you would tell them that p=.03 and thus this a significant difference. Using the confidence interval however, you would give them the range 96.2 to 99.8. This gives a better sense of how different the difference really is, as opposed to just accepting or rejecting a binary “is there a difference” assumption.
- Confidence intervals are more visual. The paper I mentioned above has a great figure with it that illustrates what I’m talking about:On this graph you can draw lines to show not just “is it different” but also “when do we really care”. I think this is easier to show kids than just a p-value by itself, as there’s no equivalent visual to show p-values.
- It’s easier to see the effect of sample size with a confidence interval. For the calorimetry experiment mentioned above, let’s show what happens if the class is different sizes, all with the same result of 98 with a standard deviation of 5:
n 95% Confidence interval p-value 10 94.9-101.1 .2377 15 95.5-100.5 .1436 20 95.8-100.2 .0896 25 96-100 .0569 30 96.2-99.8 .0366
I think watching the range shrink is clearer than watching a p-value drop, and again, this can easily be converted in to a graph. If you’re running the experiment with multiple classes, comparing their results can also help show kids a wider range of what the variation can look like.
- Confidence intervals reiterate that some variation is to be expected. One of the harder statistical concepts for people to grasp is how predictable a bit of unpredictability really is. For some things we totally get this (like our average commute times), but for other things we seem to stumble (like success of medical treatments) and let outliers color our thinking. In the calorimetry experiment, if 1 kid gets 105 as a value, confidence intervals make it much easier to see how that one outlier fits in with a bigger picture than a single p-value.
So there you go. Confidence intervals are a superior way of presenting effect size, significance of the finding, and are easy to visualize for those who have trouble with written numbers. While they don’t do away with all of the pitfalls of p-values, they really don’t add any new pitfalls to the mix, and they confer some serious benefits for classroom learning. I used Graphpad to quickly calculate the confidence intervals I used here, and they have options for both summary and individual data.