Correlation and Causation: Real World Problems

In yesterday’s post, I got a bit worked up over sloppy reporting on a study on dietary interventions in pregnancy. 

This led to an interesting comment from the Assistant Village Idiot regarding weight gain recommendations for pregnant women.  The current weight gain recommendation is 25 – 35 pounds for a normal BMI woman, but AVI commented that it used to be much lower, and that women were hospitalized to stop them from eating too much.
I didn’t actually know that, so I immediately decided to look it up.  
I stumbled on to a fascinating presentation put together by an OB at UCSF on the history of maternal weight gain recommendations (link goes to the PowerPoint slides).  It not only confirmed what AVI had mentioned, but also gave some of the reasoning….which turned out to be a very interesting example of people erroneously conflating correlation with causation.  
Apparently part of the reason why they (they being doctor’s circa 1930) were so nervous about weight gain in pregnancy was that they were trying to prevent preeclampsia.  Now preeclampsia is a life threatening condition if left untreated, and one of the warning signs is rapid weight gain.  Apparently some doctors actually thought that the symptom was the cause, and believed that all excessive weight gain was a sign the patient was about to become preeclamptic.  Thus, the theory went, limiting weight gain would prevent preeclampsia and aid in “figure preservation” to boot*.  
Sadly, this also led to higher infant mortality, disability, and mental retardation….which seems a pretty steep price to pay for what was really a data analysis error.  As I’ve said before, this is why statistics are so relevant in medicine….the cost for getting things wrong is too steep to not be careful.
*To note, it is actually true that preeclampsia is linked to higher weight/glucose/insulin production….but the way they went about addressing it did as much harm to the fetus as good.  Current weight gain recommendations are set to optimize outcomes for the babies, not the mothers.  

Correlation and Causation: the Teen Pregnancy Edition

One of the first posts I ever did was on correlation and causation.  In it, I spelled out the three rules to consider whenever two variables (x and y) are linked:

  1. X is causing Y
  2. Y is causing X
  3. Something else is causing both X and Y
While most people jump to the conclusion that it’s number 1, Matthew Yglesias wrote a piece for Slate.com this week where he rather awkwardly jumps to conclusion number 2.  
He starts off well with the second paragraph, but then goes to very strange place in the third: 

Delivering the commencement address last weekend at the evangelical Liberty University, Mitt Romney naturally stuck primarily to “family values” and religious themes. He did, however, make one economic observation that intersects with some fascinating new research. “For those who graduate from high school, get a full-time job, and marry before they have their first child,” he said, “the probability that they will be poor is 2 percent. But if [all] those things are absent, 76 percent will be poor.”
These are striking numbers, but they raise the age-old question of correlation and causation. Does this mean that the representative high-school dropout would be doing much better had he stuck it out in school for a few more years? Or is it instead the case that the population of high-school dropouts is disproportionately composed of people who have attributes that lead to low earnings?
When it comes to early pregnancy, surprising new evidence indicates that Romney and most everyone else have it backward: Having a baby early does not hamper a young woman’s economic prospects, as Romney implies. Rather, young women choose to become mothers because their economic outlook is so objectively bleak.

Say what?

As a former teenage girl myself, this is a strange conclusion….I certainly never met a teen mom who would have put it that way.  But surely there was some wonderful evidence to support this scathing conclusion?

Well, not really.  Here’s the original paper….and  here’s how the authors conveyed their thoughts:

We describe some recent analysis indicating that the combination of being poor and living in a more unequal (and less mobile) location, like the United States, leads young women to choose early, non-marital childbearing at elevated rates, potentially because of their lower expectations of future economic success. …These findings lead us to conclude that the high rate of teen childbearing in the United States matters mostly because it is a marker of larger, underlying social problems.

The emphasis was mine….but notice how much more careful they are in their language.  If you take my list above, you see that they are challenging possibility number 1, seeing if #2 is a feasible conclusion, but ultimately pointing the finger at #3….i.e. “larger, underlying social problems”.

For example, the cite low maternal education as a risk factor for teen pregnancy…which one could presume could be either the result of or the cause of low income.

Teen pregnancy is complicated, and honestly I would be very surprised if you could ever figure out a way to pin it on just one factor.  Additionally, so much information is unavailable that it can be hard to parse through what you have left.  A key factor in all of this would be to determine if higher income girls weren’t having babies because they weren’t getting pregnant or because they were having abortions….data which could lead to very different conclusions.

I fully support this study, by the way, questioning the prevailing wisdom is always a good thing. What I resent is when people think just by flipping the order of a normal conclusion that they’re being clever.

X could cause Y, Y could cause X, something else could be causing both.

Then again, it could also just be a coincidence.  

Circumventing the Middle Man

Well, my post on justifiable skepticism (Paranoia is just good sense if people actually are out to get you) certainly was the big winner for traffic/comments this week.  I was happy to see that…I had a lot of fun putting that graph together and thought the outcomes were pretty striking.  Thanks to Maggie’s Farm for linking to it.

It was my post on food deserts however, that got me the most IRL comments.  Both my mother and my brother commented on it, and not terrifically positively.  In retrospect, I wasn’t very clear about the points I was trying to make, though to be fair I had spent a lot of the day on an airplane.

My issue with food desert research, or any similar research, is that what we’re really talking about is a proposed proximate cause to a larger issue: obesity.  In my experience, just having people tell you why they think something’s happening, isn’t good enough to prove that’s the actual reason.  Thus my quibble with much of the theorizing about obesity problems….you have to make sure that what you’re theorizing is the cause is actually the cause (or one of the causes) before you start dumping money in to it.  You cannot make the middle man the holy grail if you haven’t established that it’s really a cause.

Unfortunately, people love to jump on good ideas before truly establishing this link.

Example:  A few years ago, it was discovered that 22% of school children were eating vending machine food.  This school had an obesity problem, the food in the vending machines was unhealthy, so a push began to remove vending machines from schools.  Schools balked, as they make money from vending machines, but the well being of children came first…..until of course this study came out proving that reducing access to vending machines didn’t actually effect obesity rates.   Oops.

It’s really a simple logic exercise…proving that kids are (a) obese and (b) eating from vending machines does  not actually prove that getting rid of (b) will reduce (a).

That’s why I liked the research in to the difference food deserts make in obesity.  It’s a question that needs to be asked more often when trying to address a large issue:  are we sure that the issue we’re trying to address will actually help the issue we were concerned about it the first place???


If you haven’t established that it will, then be careful with how you proceed.  Addressing food deserts (or vending machines or whatever) is  a means to an end, and you shouldn’t confuse it with the end itself…unless you have really good data backing you up.

Paranoia is just good sense if people really are out to get you

Yesterday I posted about retractions in scientific journals, and the assertion that they are going up.  I actually woke up this morning thinking about that study, and wishing I could see more data on how it’s changed year to year (yes, I’m a complete nerd…but what do you ponder while brushing your teeth????).  Anyway, that brought to mind a post I did a few weeks ago, on how conservatives trust in the scientific community has gone steadily down.

It occurred to me that if you superimposed the retraction rate of various journals over the trust in the scientific community rates, it could actually be an interesting picture.   It turns out PubMed actually has a retraction rate by year available here.  For purposes of this graph I figured that would be a representative enough sample.

I couldn’t find the raw numbers for the original public trust study, so these are eyeballed from the original graph in blue, with the exact numbers from the PubMed database in green.  
So it looks like a decreasing trust in the scientific community may actually be a rational thing*.  
It’s entirely possible, by the way, that the increased scrutiny of the internet led to the higher retraction rate…but that would still have given people more reasons not to blindly trust.  As the title of this post suggests, skepticism isn’t crazy if you actually should be skeptical.
Speaking of trust, I obviously had to manipulate the axes a bit to get this all to fit.  Still not sure I got it quite right, but if anyone wants to check my work, the raw data for the retraction rate is here and the data for the public trust study is here.  These links are included earlier as well, just wanted to be thorough.  
*Gringo requested that I run the correlation coefficients.  Conservatives r = -0.81 Liberals r = 0.52 Moderates r = 0.  I can’t stand by these numbers since my data points were all estimates based on the original chart, but they should be about correct.

Why most nutritional research is useless

Nutrition research is big money these days.  Our national obsession with weight loss is at a fever pitch, and any new or interesting research is sure to make headlines.

Here’s some basic guidelines on what to look for in nutritional research (any study, not just this one):
  1. Was the data self reported? Even CNN brought this up in their article.  People, especially those embarrassed about their weight, don’t accurately assess what they eat.  My mother, skinny little thing that she is, could eat one peppermint patty and tell you she’d had a serving of chocolate.  I don’t think I’d even count it until I had 3 or so.
  2. How much was “more”? I actually can’t find this for this study.  Is it the difference between 1 and 2 servings per week?  Or the difference between 1 and 5?  Both would produce statistically significant correlations, but the practical outcome would be different.  In 2005, researchers made news by saying that eating more fruits and veggies did not, in fact, prevent cancer.  The cancer treating establishment (which I work in, btw) promptly responded by pointing out that they compared people who ate half a fruit per day to those who at 1-2 fruits per day.  It was all reported in grams too, so the data look extra impressive “Those eating less than 114 grams showed no difference from those eating 367 grams”.  The link gives more examples, but 250 grams is one medium apple.  Watch out for this.
  3. Who classified people as “normal weight” or “overweight”?  If this was also self reported (and in this study, it looks like there were clinic visits), then look out.  There’s a great study I can’t find right now that shows that women tend to lie about weight, and men tend to lie about height.  Both lies will screw up the BMI calculation (the most common metric for assessing “normal”).
  4. Were the overweight people actively (or even somewhat) trying to modify their diets to lose weight?  A few years ago, I heard about the study that suggested diet soda was linked to obesity.  I remember my first reaction was “are we sure they’re all not just on diets?”.  This seemed like a classic correlation/causation issue.  All the analysis seemed to presume they were overweight because they drank diet soda.  I wondered why they never seemed to look at the idea that they could be drinking diet soda because they were overweight.  That’s one of the first swaps most people I know make when they try to lose weight.  
  5. Don’t even get me started if it’s a population study.  That’s a big topic for another time, but lets just say they’re really really tricky.
If you ever want a fabulous crash course in how nutrition research can be skewed, pick up two diet books that contradict each other, and read through their parts on research.  Take something like Atkins (high protein, low carb) and Joel Fuhrman (nearly vegan), and watch them rip to shreds the research the other one builds their whole case on.  
He may have his own controversy, but this is why I like Michael Pollan.  The book I linked to has a great crash course in why most nutritional research just sees what it wants to.  He refused to take a strict nutritional stance and instead condensed it down to a few “rules” that he gleaned from quizzing nutritionists on “what they could say for sure”.  The answer? Eat real food, not too much, mostly plants.  

Correlation and Causation: the Housework Edition

After yesterday’s comic, I was hoping to find a good example of a news story where they equated correlation and causation.  In case you’re curious, it took me under 5 minutes.

Headline: Why Being Less of a Control Freak May Make You Happier

To start, let me just mention that correlation implies that two things are moving together….as one goes up, so does the other.  Alternatively, as one goes up, the other goes down, or vice versa.  Either way, their outcomes appear to be tied.

Causation on the other hand, says that one thing is causing another.  What yesterday’s post was referring to is the often made mistake that just because two things are correlated, we can infer that one is causing the other.  This is not always true, and believing so may get you drawn as a stick figure.  

Anyway, the article above illustrates that point nicely.  The author set out to find out if being a control freak mom made people unhappy….and low and behold it appears to.  55% of women who said they delegate to a partner or spouse at least once a week reported themselves as “very satisfied” with their life.  For those who did not delegate that often, the number was 43%.  

Now, I’ll mostly skip the use of the word “delegate” in this article, though it does bother me.  My husband does plenty around the house, but we mostly just consider that “teamwork” not “delegating”.  I don’t start the week handing out tasks to him, and he doesn’t consider the work he does around the house a favor to me.  It’s just what needs to get done.

More importantly however, is the articles conclusion that delegating will make people happier.  While delegating and happiness are perhaps correlated, they are not necessarily causal.  It’s possible that the women who don’t delegate do so because their spouse is lazy, hostile, or generally not involved….all things which would also make them less happy over all.  It’s also possible that women who don’t delegate are controlling, martyr’s, passive aggressive, etc, and that makes them unhappy too.

I had a great stats professor once who opened every class with this:

“If you get one thing out of this class, let it be this:

When x and y or correlated, you have 3 possibilities:

  1. X is causing Y
  2. Y is causing X
  3. Something else is causing both X and Y “
Lack of delegating could cause unhappiness.
Unhappiness could cause people to stop delegating.
Something else entirely could cause people to not delegate and to be unhappy.