More Sex, More Models, More Housework

Well hi! If you got here via Google, this is probably not the type of post you are looking for. This one has math, and the only pictures are graphs.  Sorry about that.

For everyone else, welcome to “From the Archives” where I revisit old posts  to see where the science (or my thinking) has gone since I put them up originally.

Back in 2013, a concerned reader had sent me a headline that warned men about a terrible scourge depriving them of all that was good in life. Oh yes, I’m talking about housework.  The life advice started from the headline “Want to Have More Sex? Men, stop helping with chores.”  The article covered at study that had devised a mathematical model of a couples sexual frequency vs the number of chores they did.  I couldn’t resist, and ended up writing a post called “Sex, Models and Housework“. It’s still one of my most viewed posts, though probably not the most read.

A few things to know about the original study (found here):

  1. That headline was pretty misleading. The study never said that men who didn’t do chores had more sex, the study said that men who did more traditionally female chores had less sex. Men who did more traditionally male chores actually had more sex.
  2. Despite being released in 2013, the data the study used was from 1992. The people in the study had an average age of early to mid 40s at that time, so this is a study looking at Baby Boomers and their relationships in the early 90s. With shifting culture, this is important to keep in mind.
  3. The model extrapolated out to men who do 100% of the traditionally female housework. One of my core concerns was how many data points they had in that range, or if they extrapolated beyond the scope of the model. Men reported doing an average of 25% of the “traditionally female chores” at baseline, with a standard deviation of .19.  It does not look likely they had many men in the 100% range, and those relationships may have had something else unusual going on.
  4. Given #3, you’ll excuse me if I doubt that this model really should have been perfectly linear:

Those were my original thoughts, and rereading the paper I wanted to add a few more:

  1. One point I can’t believe I didn’t mention the first time around is the inherent selection bias in this data. You had to be a married couple to be included in the data. So a hypothetical couple who had an uneven distribution of housework and divorced was not counted. To be perfectly fair, they did take a bit of a look at this. These respondents were surveyed in 1988 and then again in 1992-1994. They did look at those who were married in 1988 but divorced by 1992 to see if the chore distribution/sexual frequency was different. It wasn’t.  However, given the ages of the respondents (born in the 40s-60s) many of them could have actually already been divorced before 1988 rolled around1. Additionally, those who are going through a divorce or in an otherwise rocky marriage likely didn’t take part in the survey. We don’t know if those numbers would have changed things, but I think we have reason to suspect that those most bothered by chore arrangements would be more likely to divorce.
  2. The women in the study worked an average of 15 hours fewer per week than men at paid labor. The women in the study spent 18 more hours per week than men at household chores. It’s worth noting that an “average” man in this study doing half of the chores would have actually been doing more labor for the house than the “average” woman. It would have been interesting to see a total on “labor for household” to see what the effect of an even vs uneven total workload was. This is important to rule out that it’s not the “gender” of the chores, but potential perceived unfairness that drives the decrease in sex.
  3. Child care hours were not included anywhere for either partner.

Other than that, how has this research fared?

Well, as you can imagine, it caused a stir in academic circles. There was a New York Times Magazine cover story about it provocatively asking “Do More Equal Marriages Mean Less Sex?” based heavily on the study. Many people walked away concerned about the age of the data, and how applicable it was to  people over 20 years later.  Researchers from Georgia State University were able to (somewhat) replicate the study (pre-published copy) using data from 2006. A few things about that study:

  1. The study population was younger by about a decade and less wealthy than the original study population, and they had more sex overall
  2. Cohabiting but not married couples were included, but couples without children were not.
  3. They tossed 10 respondents who said they had sex 50 times a month
  4. This study ended up with three categories of couples: traditional, egalitarian, and counter-conventional. Of those
    1. Egalitarian: Divided housework approximately evenly, with anywhere from a 35%-65% split. This group  was 30% of the sample size had the most sex and highest satisfaction.
    2. Traditional: The woman did more than 65% of the housework. This was about 63% of the sample, and had slightly less sex and women had slightly less satisfaction than the egalitarian couples.
    3. Counter-cultural: The man did more than 65% of the housework. This was only 5% of the sample size, and did not work out well. These couples had a lower sexual frequency than either of the first two groups, and were less satisfied overall.
  5. I felt thoroughly vindicated by this line “No research, however, has considered the possibility that the observed effect of men’s shares of domestic labor on sexual frequency and satisfaction could be non-linear.”

So I was at least correct in my concerns. Presuming that this data holds, the line is likely fairly straight until it hits the extreme on one end, then plummets.  Interestingly, this study still didn’t compare total labor, and the women in this study worked 20 hours fewer at paid labor than the men, and about 15 hours more per week in housework. Again, child care was not included in the work totals. Since this group was younger, it’s likely at least some of that discrepancy is child care.

So where does this leave us?

Well, it looks like my concerns about assuming a linear model are valid, and that assuming relationships haven’t changed between Baby Boomers and Gen Xers is not a great idea. While some changes to marital set ups can have a negative effect (say a wife working longer hours) they are frequently immediately offset by a positive effect (increased income). This paper here has some interesting examples of these sorts of trade offs. I’m increasingly convinced that the details of the division of labor matter much less than sufficient and equally divided labor.

I would love to see a break down of just the couples on the “man doing all the housework” end. In the second study that was only 24 couples, and we don’t know if the arrangement was through conscious choice or because of circumstances such as unemployment. In fact, I think further research should ask people “how much does your current relationship reflect your expectations prior to the relationship?”. That might catch some of the effect of cultural script changes better than just asking people what they are doing.

Regardless, I have to go do some dishes.

1. According to this the median age at first marriage in 1975 was 21. If you got married in 1975, your chance of being divorced 13 years later was about 30%. This is not a negligible amount of people

Women, Ovulation and Voting in 2016

Welcome to “From the Archives” where I revisit old posts  to see where the science (or my thinking) has gone since I put them up originally.

Back in good old October of 2012, it was an election year and I was getting irritated1. First, I was being bombarded with Elizabeth Warren vs Scott Brown for Senate ads, and then I was confronted with this study:The fluctuating female vote: politics, religion, and the ovulatory cycle (Durante, et al), which purported to show that women’s political and religious beliefs varied wildly around their monthly cycle, but in different ways if they were married or single. For single women they claimed that being fertile caused them to get more liberal and less religious, because they had more liberal attitudes toward sex. For married women, being fertile made them more conservative and religious so they could compensate for their urge to cheat.  The swing was wide too: about 20%.  Of note, the study never actually observed any women changing their vote, but compared two groups of women to find the differences. The study got a lot of attention because CNN initially put it up, then took it back down when people complained.  I wrote two posts about this, one irritated and ranty, and one pointing to some more technical issues I had.

With a new election coming around, I was thinking about this paper and wanted to take a look at where it had gone since then. I knew that Andrew Gelman had ultimately taken shots at the study for reporting an implausibly large effect2 and potentially collecting lots of data/comparisons and only publishing some of them, so I was curious how this study had subsequently fared.

Well, there are updates!  First, in 2014, a different group tried to replicate their results in a paper called  Women Can Keep the Vote: No Evidence That Hormonal Changes During the Menstrual Cycle Impact Political and Religious Beliefs by Harris and Mickes.  This paper recruited a different group, but essentially recreated much of the analysis of the original paper with one major addition. They conducted their survey prior to the 2012 election AND after, to see predicted voting behavior vs actual voting behavior.  A few findings:

  1. The first paper (Durante et al) had found that fiscal policy beliefs didn’t change for women, but social policy beliefs did change around ovulation. The second paper (Harris and Mickes) failed to replicate this finding, and also failed to detect any change in religious beliefs.
  2. In the second paper, married women had a different stated preference for Obama (high when low feritility, lower when high fertility), but that difference went away when you looked at how they actually voted. For single women, it was actually the opposite. They reported the same preference level for Obama regardless of fertility, but voted differently based on the time of the month.
  3. The original Durante study had taken some heat for how they assessed fertility level in their work. There were concerns that self reported fertility level was so likely to be inaccurate that it would render any conclusions void. I was interested to see that Harris and Mickes clarified that the Durante paper actually didn’t accurately describe how they did fertility assessments in the original paper, and that they had both ultimately used the same method. This was supposed to be in the supplementary material, but I couldn’t find a copy of that free online. It’s an interesting footnote.
  4. A reviewer asked them to combine the pre and post election data to see if they could find a fertility/relationship interaction effect. When pre and post election data were kept separate, there was no effect. When they were combined, there was.

Point #4 is where things got a little interesting. The authors of the Harris and Mickes study said combining their data was not valid, but Durante et al hit back and said “why not?”. There’s an interesting piece of stat/research geekery about the dispute here, but the TL;DR version is that this could be considered a partial replication or a failure to replicate, depending on your statistical strategy. Unfortunately this is one of those areas where you can get some legitimate concern that a person’s judgement calls are being shaded by their view of the outcome. Since we don’t know what either researchers original plan was, we don’t know if either one modified their strategy based on results. Additionally the “is it valid to combine these data sets” question is a good one, and would be open for discussion even if we were discussing something totally innocuous. The political nature of the discussion intensifies the debate, but it didn’t create it.

Fast forward now to 2015, when yet another study was published: Menstrual Cycle Phase Does Not Predict Political Conservatism. This study was done using data ALSO from the 2012 election cycle3, but with a few further changes.  The highlights:

  1. This study, by Scott and Pound, addressed some of the “how do you measure fertility when you can’t test” concerns by asking about medical conditions that might influence fertility to screen out women whose self reporting might be less accurate. They also ranked fertility on a continuum as opposed to the dichotomous “high” and “low”. This should have made their assessment more accurate.
  2. The other two studies both asked for voting in terms of Romney vs Obama. Scott and Pound were concerned that this might capture a personal preference change that was more about Obama and Romney as people rather than a political change. They measured both self-reported political leanings and a “moral foundations” test and came up with an overall “conservatism” rank, then tracked that with chances of conception.
  3. They controlled for age, number of children, and other sociological factors.

So overall, what did this show? Well, basically, political philosophy doesn’t vary much no matter where a woman is in her cycle.

The authors have a pretty interesting discussion at the end about the problems with Mechanical Turk (where all three studies recruited their participants in the same few months), the differences of measuring person preference (Obama vs Romney) vs political preference (Republican vs Democrat), and some statistical analysis problems.

So what do I think now?

First off, I’ve realized that getting all ranty when someone brings up women’s hormones effecting things may be counterproductive. Lesson learned.

More seriously though, I find the hypothesis that our preferences for individuals may change with hormonal changes more compelling than the hypothesis that our overall philosophy of religion or government changes with our hormones. The first simply seems more plausible to me. In a tight presidential election though, this may be hopelessly confounded by the candidates actual behavior. It’s pretty well known that single women voted overwhelmingly for Obama, and that Romney had a better chance to capture the votes of married women. Candidates know this and can play to it, so if a candidate makes a statement playing to their base, you may see shifts that have nothing to do with hormones of the voters but are an actual reaction to real time statements. This may be a case where research in to the hypothetical (i.e. made up candidate A vs B) may be helpful.

The discussions on fertility measures and statistical analysis were interesting and a good insight in to how much study conclusions can change based on how we define particular metrics.  I was happy to see that both follow up papers hammered on clear and standard definitions for “fertility”. If that is one of  the primary metrics you are assessing, then the utmost care must be taken to assess it accurately, or else the signal to noise ratio can go through the roof.

Do I still think CNN should have taken the story down? Yes….but just as much as I believe that they should take most sensational new social/psych research stories down. If you follow the research for just two more papers, you see the conclusion go from broad (women change their social, political and religious views and votes based on fertility!) to much narrower (women may in some cases change their preference or voting patterns for particular candidates based on fertility, but their religious and political beliefs do not appear to change regardless). I’ll be interested to see if anyone tries to replicate this with the 2016 election, and if so what the conclusions are.

This concludes your trip down memory lane!
1. Gee, this is sounding familiar
2. This point was really interesting. He pointed out that around elections, pollsters are pretty obsessive about tracking things, and short of a major scandal breaking literally NOTHING causes a rapid 20 point swing. The idea that swings that large were happening regularly and everyone had missed it seemed implausible to him. Statistically of course, the authors were only testing that there was a difference at all, not what it was….but the large effect should possibly have given them pause. It would be like finding that ovulation made women spend twice as much on buying a house. People don’t change THAT dramatically, and if you find that they do you may want to rerun the numbers.
3. Okay, so I can’t be the only one noticing at this point that this means 3 different studies all recruited around 1000 American women not on birth control, not pregnant, not recently pregnant or breastfeeding but of child bearing age, interested in participating in a study on politics, all at the same time and all through Amazon’s Mechanical Turk. Has anyone asked the authors to compare how much of their sample was actually the same women? Does Mechanical Turk have any barriers for this? Do we care? Oh! Yes, turns out this is actually a bit of a problem.