In yesterday’s post, I got a bit worked up over sloppy reporting on a study on dietary interventions in pregnancy.
correlation and causation
Correlation and Causation: the Teen Pregnancy Edition
One of the first posts I ever did was on correlation and causation. In it, I spelled out the three rules to consider whenever two variables (x and y) are linked:
- X is causing Y
- Y is causing X
- Something else is causing both X and Y
Delivering the commencement address last weekend at the evangelical Liberty University, Mitt Romney naturally stuck primarily to “family values” and religious themes. He did, however, make one economic observation that intersects with some fascinating new research. “For those who graduate from high school, get a full-time job, and marry before they have their first child,” he said, “the probability that they will be poor is 2 percent. But if [all] those things are absent, 76 percent will be poor.”
These are striking numbers, but they raise the age-old question of correlation and causation. Does this mean that the representative high-school dropout would be doing much better had he stuck it out in school for a few more years? Or is it instead the case that the population of high-school dropouts is disproportionately composed of people who have attributes that lead to low earnings?
When it comes to early pregnancy, surprising new evidence indicates that Romney and most everyone else have it backward: Having a baby early does not hamper a young woman’s economic prospects, as Romney implies. Rather, young women choose to become mothers because their economic outlook is so objectively bleak.
Say what?
As a former teenage girl myself, this is a strange conclusion….I certainly never met a teen mom who would have put it that way. But surely there was some wonderful evidence to support this scathing conclusion?
Well, not really. Here’s the original paper….and here’s how the authors conveyed their thoughts:
We describe some recent analysis indicating that the combination of being poor and living in a more unequal (and less mobile) location, like the United States, leads young women to choose early, non-marital childbearing at elevated rates, potentially because of their lower expectations of future economic success. …These findings lead us to conclude that the high rate of teen childbearing in the United States matters mostly because it is a marker of larger, underlying social problems.
The emphasis was mine….but notice how much more careful they are in their language. If you take my list above, you see that they are challenging possibility number 1, seeing if #2 is a feasible conclusion, but ultimately pointing the finger at #3….i.e. “larger, underlying social problems”.
For example, the cite low maternal education as a risk factor for teen pregnancy…which one could presume could be either the result of or the cause of low income.
Teen pregnancy is complicated, and honestly I would be very surprised if you could ever figure out a way to pin it on just one factor. Additionally, so much information is unavailable that it can be hard to parse through what you have left. A key factor in all of this would be to determine if higher income girls weren’t having babies because they weren’t getting pregnant or because they were having abortions….data which could lead to very different conclusions.
I fully support this study, by the way, questioning the prevailing wisdom is always a good thing. What I resent is when people think just by flipping the order of a normal conclusion that they’re being clever.
X could cause Y, Y could cause X, something else could be causing both.
Then again, it could also just be a coincidence.
Weekend Moment of Zen 4-29-12
Circumventing the Middle Man
Well, my post on justifiable skepticism (Paranoia is just good sense if people actually are out to get you) certainly was the big winner for traffic/comments this week. I was happy to see that…I had a lot of fun putting that graph together and thought the outcomes were pretty striking. Thanks to Maggie’s Farm for linking to it.
It was my post on food deserts however, that got me the most IRL comments. Both my mother and my brother commented on it, and not terrifically positively. In retrospect, I wasn’t very clear about the points I was trying to make, though to be fair I had spent a lot of the day on an airplane.
My issue with food desert research, or any similar research, is that what we’re really talking about is a proposed proximate cause to a larger issue: obesity. In my experience, just having people tell you why they think something’s happening, isn’t good enough to prove that’s the actual reason. Thus my quibble with much of the theorizing about obesity problems….you have to make sure that what you’re theorizing is the cause is actually the cause (or one of the causes) before you start dumping money in to it. You cannot make the middle man the holy grail if you haven’t established that it’s really a cause.
Unfortunately, people love to jump on good ideas before truly establishing this link.
Example: A few years ago, it was discovered that 22% of school children were eating vending machine food. This school had an obesity problem, the food in the vending machines was unhealthy, so a push began to remove vending machines from schools. Schools balked, as they make money from vending machines, but the well being of children came first…..until of course this study came out proving that reducing access to vending machines didn’t actually effect obesity rates. Oops.
It’s really a simple logic exercise…proving that kids are (a) obese and (b) eating from vending machines does not actually prove that getting rid of (b) will reduce (a).
That’s why I liked the research in to the difference food deserts make in obesity. It’s a question that needs to be asked more often when trying to address a large issue: are we sure that the issue we’re trying to address will actually help the issue we were concerned about it the first place???
If you haven’t established that it will, then be careful with how you proceed. Addressing food deserts (or vending machines or whatever) is a means to an end, and you shouldn’t confuse it with the end itself…unless you have really good data backing you up.
Paranoia is just good sense if people really are out to get you
Yesterday I posted about retractions in scientific journals, and the assertion that they are going up. I actually woke up this morning thinking about that study, and wishing I could see more data on how it’s changed year to year (yes, I’m a complete nerd…but what do you ponder while brushing your teeth????). Anyway, that brought to mind a post I did a few weeks ago, on how conservatives trust in the scientific community has gone steadily down.
It occurred to me that if you superimposed the retraction rate of various journals over the trust in the scientific community rates, it could actually be an interesting picture. It turns out PubMed actually has a retraction rate by year available here. For purposes of this graph I figured that would be a representative enough sample.
Why most nutritional research is useless
Nutrition research is big money these days. Our national obsession with weight loss is at a fever pitch, and any new or interesting research is sure to make headlines.
- Was the data self reported? Even CNN brought this up in their article. People, especially those embarrassed about their weight, don’t accurately assess what they eat. My mother, skinny little thing that she is, could eat one peppermint patty and tell you she’d had a serving of chocolate. I don’t think I’d even count it until I had 3 or so.
- How much was “more”? I actually can’t find this for this study. Is it the difference between 1 and 2 servings per week? Or the difference between 1 and 5? Both would produce statistically significant correlations, but the practical outcome would be different. In 2005, researchers made news by saying that eating more fruits and veggies did not, in fact, prevent cancer. The cancer treating establishment (which I work in, btw) promptly responded by pointing out that they compared people who ate half a fruit per day to those who at 1-2 fruits per day. It was all reported in grams too, so the data look extra impressive “Those eating less than 114 grams showed no difference from those eating 367 grams”. The link gives more examples, but 250 grams is one medium apple. Watch out for this.
- Who classified people as “normal weight” or “overweight”? If this was also self reported (and in this study, it looks like there were clinic visits), then look out. There’s a great study I can’t find right now that shows that women tend to lie about weight, and men tend to lie about height. Both lies will screw up the BMI calculation (the most common metric for assessing “normal”).
- Were the overweight people actively (or even somewhat) trying to modify their diets to lose weight? A few years ago, I heard about the study that suggested diet soda was linked to obesity. I remember my first reaction was “are we sure they’re all not just on diets?”. This seemed like a classic correlation/causation issue. All the analysis seemed to presume they were overweight because they drank diet soda. I wondered why they never seemed to look at the idea that they could be drinking diet soda because they were overweight. That’s one of the first swaps most people I know make when they try to lose weight.
- Don’t even get me started if it’s a population study. That’s a big topic for another time, but lets just say they’re really really tricky.
Correlation and Causation: the Housework Edition
After yesterday’s comic, I was hoping to find a good example of a news story where they equated correlation and causation. In case you’re curious, it took me under 5 minutes.
Headline: Why Being Less of a Control Freak May Make You Happier
To start, let me just mention that correlation implies that two things are moving together….as one goes up, so does the other. Alternatively, as one goes up, the other goes down, or vice versa. Either way, their outcomes appear to be tied.
Causation on the other hand, says that one thing is causing another. What yesterday’s post was referring to is the often made mistake that just because two things are correlated, we can infer that one is causing the other. This is not always true, and believing so may get you drawn as a stick figure.
Anyway, the article above illustrates that point nicely. The author set out to find out if being a control freak mom made people unhappy….and low and behold it appears to. 55% of women who said they delegate to a partner or spouse at least once a week reported themselves as “very satisfied” with their life. For those who did not delegate that often, the number was 43%.
Now, I’ll mostly skip the use of the word “delegate” in this article, though it does bother me. My husband does plenty around the house, but we mostly just consider that “teamwork” not “delegating”. I don’t start the week handing out tasks to him, and he doesn’t consider the work he does around the house a favor to me. It’s just what needs to get done.
More importantly however, is the articles conclusion that delegating will make people happier. While delegating and happiness are perhaps correlated, they are not necessarily causal. It’s possible that the women who don’t delegate do so because their spouse is lazy, hostile, or generally not involved….all things which would also make them less happy over all. It’s also possible that women who don’t delegate are controlling, martyr’s, passive aggressive, etc, and that makes them unhappy too.
I had a great stats professor once who opened every class with this:
“If you get one thing out of this class, let it be this:
When x and y or correlated, you have 3 possibilities:
- X is causing Y
- Y is causing X
- Something else is causing both X and Y “

