Correlation and Causation: the Teen Pregnancy Edition

One of the first posts I ever did was on correlation and causation.  In it, I spelled out the three rules to consider whenever two variables (x and y) are linked:

  1. X is causing Y
  2. Y is causing X
  3. Something else is causing both X and Y
While most people jump to the conclusion that it’s number 1, Matthew Yglesias wrote a piece for this week where he rather awkwardly jumps to conclusion number 2.  
He starts off well with the second paragraph, but then goes to very strange place in the third: 

Delivering the commencement address last weekend at the evangelical Liberty University, Mitt Romney naturally stuck primarily to “family values” and religious themes. He did, however, make one economic observation that intersects with some fascinating new research. “For those who graduate from high school, get a full-time job, and marry before they have their first child,” he said, “the probability that they will be poor is 2 percent. But if [all] those things are absent, 76 percent will be poor.”
These are striking numbers, but they raise the age-old question of correlation and causation. Does this mean that the representative high-school dropout would be doing much better had he stuck it out in school for a few more years? Or is it instead the case that the population of high-school dropouts is disproportionately composed of people who have attributes that lead to low earnings?
When it comes to early pregnancy, surprising new evidence indicates that Romney and most everyone else have it backward: Having a baby early does not hamper a young woman’s economic prospects, as Romney implies. Rather, young women choose to become mothers because their economic outlook is so objectively bleak.

Say what?

As a former teenage girl myself, this is a strange conclusion….I certainly never met a teen mom who would have put it that way.  But surely there was some wonderful evidence to support this scathing conclusion?

Well, not really.  Here’s the original paper….and  here’s how the authors conveyed their thoughts:

We describe some recent analysis indicating that the combination of being poor and living in a more unequal (and less mobile) location, like the United States, leads young women to choose early, non-marital childbearing at elevated rates, potentially because of their lower expectations of future economic success. …These findings lead us to conclude that the high rate of teen childbearing in the United States matters mostly because it is a marker of larger, underlying social problems.

The emphasis was mine….but notice how much more careful they are in their language.  If you take my list above, you see that they are challenging possibility number 1, seeing if #2 is a feasible conclusion, but ultimately pointing the finger at #3….i.e. “larger, underlying social problems”.

For example, the cite low maternal education as a risk factor for teen pregnancy…which one could presume could be either the result of or the cause of low income.

Teen pregnancy is complicated, and honestly I would be very surprised if you could ever figure out a way to pin it on just one factor.  Additionally, so much information is unavailable that it can be hard to parse through what you have left.  A key factor in all of this would be to determine if higher income girls weren’t having babies because they weren’t getting pregnant or because they were having abortions….data which could lead to very different conclusions.

I fully support this study, by the way, questioning the prevailing wisdom is always a good thing. What I resent is when people think just by flipping the order of a normal conclusion that they’re being clever.

X could cause Y, Y could cause X, something else could be causing both.

Then again, it could also just be a coincidence.  

Why most marriage statistics are completely skewed

Apparently is now doing a “map of the week”.  This week, it was a map of states by marriage rate.  Can’t get it to format well….click on the map and drag to see other states.

It shows Nevada as the overwhelming winner, with Hawaii second.  This reminded me about my annoyance at most marriage data.

Marriage data is often quoted, but fairly poorly understood.  The top two states in the map above should tip you off as to the major problem with marriage data derived from the CDC in particular….it’s based on the state that issued the marriage license, not the state where the couple resides.  Since all (heterosexual) marriages affirmed by one state are currently recognized by every other state, state of residence information is not reported to the CDC.  This means that states with destination wedding type locations (Las Vegas anyone?) skew high, and all others are presumably a bit lower than they should be.  Anecdotally, it’s also conceivable that states with large meccas for young people (New York City, Boston, DC) may be artificially low because many young people return to their childhood home states to marry.  This

The other problem with marriage data is the resulting divorce data is even more skewed.  Quite a few states don’t report divorce statistics at all (California, Georgia, Hawaii, Indiana, Louisiana, Minnesota) and the statistics from the remaining states are often misinterpreted.  One of the most commonly quoted statistics is that “50% of marriages end in divorce”.  This isn’t true.

In any given year, there are about twice as many marriages as there are divorces….but thanks to changing population, changing marriage rates, people with multiple divorces, and the pool of the already married, this does not mean that half of all marriages end in divorce.  In fact, if you change the stat to “percent of people who have been married and divorced”, you wind up at only about 33%.  More explanation here.

Ultimately, when considering any marriage data, it is important to remember that there are no national databases for this stuff.  All data has to come from somewhere, and if the source is spotty, the conclusions drawn from the data will likely be wrong.  This all applies to quite a few types of data….but marriage data is used with such confidence that it’s tough to remember how terrible the sources are.  A few people have let me know that I’ve ruined infographics for them forever, and I’m hoping to do the same with all marriage data.

You’re welcome.


Hate’s a strong word.  I get that.  I also get that data and survey types are not always the sort of thing that inspires people to strong hatred, but here we are.

In this post I mentioned my annoyance at perception/prediction polls.  The one I referenced was based on women who didn’t change their last names and their level of marital commitment.  Commenter Assistant Village Idiot mentioned another example, which I also liked ““Do you think earthquakes are more likely now because of climate change?” What we think has nothing to do with anything. The earthquakes will happen according to their own rules.”  

In writing that post however, I forgot to mention that same study included an even worse piece of data.  As a rebuttal to the “Midwestern college kids don’t think non-name changing women are committed” they included a remark that women who didn’t plan on changing their names didn’t feel less committed. 


I would really love it if someone could tell me if there’s a proper name for this sort of thing, but I always think of it as “the embarrassing question debacle”.  Basically, researchers ask people questions with a potentially embarrassing answer, and then report it as meaningful when people do not answer embarrassingly.

There are only two types of people I have ever heard who will admit they went in to their marriages less than completely committed:

  1. Those who have been married successfully for quite some time who are now comfortable in admitting they were totally naive when they walked down the aisle.
  2. Those who are already divorced and reflecting on what went wrong.
Level of commitment is best assessed in retrospect, and I look with great skepticism at anyone who says they can gauge it before the fact.  
Getting at the reasons people do things can be brutal.  Your only source for your data also has the biggest motivation to conceal it from you.  Some people are actually doing things for good reasons, some just want to look like they are, and some are lying to themselves.  Unless a study at least attempts to account for all 3 scenarios, I would hold all answers suspect.

Correlation and Causation: the Housework Edition

After yesterday’s comic, I was hoping to find a good example of a news story where they equated correlation and causation.  In case you’re curious, it took me under 5 minutes.

Headline: Why Being Less of a Control Freak May Make You Happier

To start, let me just mention that correlation implies that two things are moving together….as one goes up, so does the other.  Alternatively, as one goes up, the other goes down, or vice versa.  Either way, their outcomes appear to be tied.

Causation on the other hand, says that one thing is causing another.  What yesterday’s post was referring to is the often made mistake that just because two things are correlated, we can infer that one is causing the other.  This is not always true, and believing so may get you drawn as a stick figure.  

Anyway, the article above illustrates that point nicely.  The author set out to find out if being a control freak mom made people unhappy….and low and behold it appears to.  55% of women who said they delegate to a partner or spouse at least once a week reported themselves as “very satisfied” with their life.  For those who did not delegate that often, the number was 43%.  

Now, I’ll mostly skip the use of the word “delegate” in this article, though it does bother me.  My husband does plenty around the house, but we mostly just consider that “teamwork” not “delegating”.  I don’t start the week handing out tasks to him, and he doesn’t consider the work he does around the house a favor to me.  It’s just what needs to get done.

More importantly however, is the articles conclusion that delegating will make people happier.  While delegating and happiness are perhaps correlated, they are not necessarily causal.  It’s possible that the women who don’t delegate do so because their spouse is lazy, hostile, or generally not involved….all things which would also make them less happy over all.  It’s also possible that women who don’t delegate are controlling, martyr’s, passive aggressive, etc, and that makes them unhappy too.

I had a great stats professor once who opened every class with this:

“If you get one thing out of this class, let it be this:

When x and y or correlated, you have 3 possibilities:

  1. X is causing Y
  2. Y is causing X
  3. Something else is causing both X and Y “
Lack of delegating could cause unhappiness.
Unhappiness could cause people to stop delegating.
Something else entirely could cause people to not delegate and to be unhappy.