More Sex, More Models, More Housework

Well hi! If you got here via Google, this is probably not the type of post you are looking for. This one has math, and the only pictures are graphs.  Sorry about that.

For everyone else, welcome to “From the Archives” where I revisit old posts  to see where the science (or my thinking) has gone since I put them up originally.

Back in 2013, a concerned reader had sent me a headline that warned men about a terrible scourge depriving them of all that was good in life. Oh yes, I’m talking about housework.  The life advice started from the headline “Want to Have More Sex? Men, stop helping with chores.”  The article covered at study that had devised a mathematical model of a couples sexual frequency vs the number of chores they did.  I couldn’t resist, and ended up writing a post called “Sex, Models and Housework“. It’s still one of my most viewed posts, though probably not the most read.

A few things to know about the original study (found here):

  1. That headline was pretty misleading. The study never said that men who didn’t do chores had more sex, the study said that men who did more traditionally female chores had less sex. Men who did more traditionally male chores actually had more sex.
  2. Despite being released in 2013, the data the study used was from 1992. The people in the study had an average age of early to mid 40s at that time, so this is a study looking at Baby Boomers and their relationships in the early 90s. With shifting culture, this is important to keep in mind.
  3. The model extrapolated out to men who do 100% of the traditionally female housework. One of my core concerns was how many data points they had in that range, or if they extrapolated beyond the scope of the model. Men reported doing an average of 25% of the “traditionally female chores” at baseline, with a standard deviation of .19.  It does not look likely they had many men in the 100% range, and those relationships may have had something else unusual going on.
  4. Given #3, you’ll excuse me if I doubt that this model really should have been perfectly linear:
    8fe80-sexandhousework

Those were my original thoughts, and rereading the paper I wanted to add a few more:

  1. One point I can’t believe I didn’t mention the first time around is the inherent selection bias in this data. You had to be a married couple to be included in the data. So a hypothetical couple who had an uneven distribution of housework and divorced was not counted. To be perfectly fair, they did take a bit of a look at this. These respondents were surveyed in 1988 and then again in 1992-1994. They did look at those who were married in 1988 but divorced by 1992 to see if the chore distribution/sexual frequency was different. It wasn’t.  However, given the ages of the respondents (born in the 40s-60s) many of them could have actually already been divorced before 1988 rolled around1. Additionally, those who are going through a divorce or in an otherwise rocky marriage likely didn’t take part in the survey. We don’t know if those numbers would have changed things, but I think we have reason to suspect that those most bothered by chore arrangements would be more likely to divorce.
  2. The women in the study worked an average of 15 hours fewer per week than men at paid labor. The women in the study spent 18 more hours per week than men at household chores. It’s worth noting that an “average” man in this study doing half of the chores would have actually been doing more labor for the house than the “average” woman. It would have been interesting to see a total on “labor for household” to see what the effect of an even vs uneven total workload was. This is important to rule out that it’s not the “gender” of the chores, but potential perceived unfairness that drives the decrease in sex.
  3. Child care hours were not included anywhere for either partner.

Other than that, how has this research fared?

Well, as you can imagine, it caused a stir in academic circles. There was a New York Times Magazine cover story about it provocatively asking “Do More Equal Marriages Mean Less Sex?” based heavily on the study. Many people walked away concerned about the age of the data, and how applicable it was to  people over 20 years later.  Researchers from Georgia State University were able to (somewhat) replicate the study (pre-published copy) using data from 2006. A few things about that study:

  1. The study population was younger by about a decade and less wealthy than the original study population, and they had more sex overall
  2. Cohabiting but not married couples were included, but couples without children were not.
  3. They tossed 10 respondents who said they had sex 50 times a month
  4. This study ended up with three categories of couples: traditional, egalitarian, and counter-conventional. Of those
    1. Egalitarian: Divided housework approximately evenly, with anywhere from a 35%-65% split. This group  was 30% of the sample size had the most sex and highest satisfaction.
    2. Traditional: The woman did more than 65% of the housework. This was about 63% of the sample, and had slightly less sex and women had slightly less satisfaction than the egalitarian couples.
    3. Counter-cultural: The man did more than 65% of the housework. This was only 5% of the sample size, and did not work out well. These couples had a lower sexual frequency than either of the first two groups, and were less satisfied overall.
  5. I felt thoroughly vindicated by this line “No research, however, has considered the possibility that the observed effect of men’s shares of domestic labor on sexual frequency and satisfaction could be non-linear.”

So I was at least correct in my concerns. Presuming that this data holds, the line is likely fairly straight until it hits the extreme on one end, then plummets.  Interestingly, this study still didn’t compare total labor, and the women in this study worked 20 hours fewer at paid labor than the men, and about 15 hours more per week in housework. Again, child care was not included in the work totals. Since this group was younger, it’s likely at least some of that discrepancy is child care.

So where does this leave us?

Well, it looks like my concerns about assuming a linear model are valid, and that assuming relationships haven’t changed between Baby Boomers and Gen Xers is not a great idea. While some changes to marital set ups can have a negative effect (say a wife working longer hours) they are frequently immediately offset by a positive effect (increased income). This paper here has some interesting examples of these sorts of trade offs. I’m increasingly convinced that the details of the division of labor matter much less than sufficient and equally divided labor.

I would love to see a break down of just the couples on the “man doing all the housework” end. In the second study that was only 24 couples, and we don’t know if the arrangement was through conscious choice or because of circumstances such as unemployment. In fact, I think further research should ask people “how much does your current relationship reflect your expectations prior to the relationship?”. That might catch some of the effect of cultural script changes better than just asking people what they are doing.

Regardless, I have to go do some dishes.

1. According to this the median age at first marriage in 1975 was 21. If you got married in 1975, your chance of being divorced 13 years later was about 30%. This is not a negligible amount of people

What I’m Reading: March 2016

The Unbearable Asymmetry of Bullshit. Alas, we are outnumbered.

It won’t help with the asymmetry thing much, but I love this site. I plan on using it early and often.

Oh wait, here’s some more on bullshit and academic infighting, along with a proposal to call the study of bullshit “Taurascatics“. I’m in.

And one more thing about bullshit and rage….for anyone who is overwhelmed or perplexed by the current state of politics, I read this blog post once a month to keep myself grounded: The Toxoplasma of Rage.  It’s a great reminder that your ingroup is persecuting my ingroup, and that you really need to stop. My ingroup is far too busy enumerating the faults of your ingroup to have time to deal with this crap.

On a lighter note, did you know James Garfield came up with his own proof of the Pythagorean theorem during a discussion with congress? I am wondering how many current members of Congress could actually define the Pythagorean theorem.

My book for the month (well, one of them) is Guesstimation: Solving the World’s Problems on the Back of a Cocktail Napkin. Basically it’s about how to estimate complicated problems. A little repetitious, but an interesting mental exercise book so far.

These are some interesting numbers on growing American commute times.  Apparently I spend 20.8 days a year commuting. I resent the “wasted life” part though. Between the train and the bus I get a lot of reading and thinking done. That’s pretty much what I would have done with that time if I had my druthers anyway.

This was an interesting piece about how to make science fairs better. I like the idea of a myth busters style fair. That could get fun.

There’s an interesting Vox piece about health/science journalism and how it’s a good way of losing friends. I liked the piece, but I think she left out the issue of policy recommendations. It’s one thing to talk about evidence for a problem, and it’s another thing to talk about policy recommendations. Very often we see people start with the former, end with the latter, then claim all criticism is because people “don’t like evidence”.  At work when this happens, we have one doctor who will immediately announce “you realize we just all wandered in to an evidence free zone right?”. I like him.  Anyway, describing a problem and prescribing solutions are two different things, and if you mix them up you are DEFINITELY going to lose some folks.

And speaking of evidence and policy, here’s an interesting one on weird statistical methodology in a nutrition paper.

Finally, here’s an interesting deep dive in to social psychology’s replication problem, what it means, and how seriously we should take it.

Intro to Internet Science: A Postlude

All right, we did it! 10 topics, 10 weeks, and a whole slew of examples. I’ve had a lot of fun, gotten some great feedback, and had some very kind comments from some very lovely teachers. It’s also given me some good ideas for some ongoing posts.  In the talks I give I almost never have time to get in to any actual math, but hey, what’s the point of having a blog if you can’t go on and on about the stuff you like? I’ll probably be calling that “crazy stats tricks” and at a minimum I’ll cover some of the topics I complained about in Part 7.  Any suggestions for that series, or feedback on this series is welcome either in the comments or on the feedback page.

Now that I have that out of the way, lets take a moment to reflect on what we’ve learned, eh?  Overall, there are four P’s:

Presentation
Pictures
Proof
People

We spent a little time on all four:

Presentation: How They Reel You In
In Parts 1 and 2, we learned how quickly the internet spreads completely false information, and to always make sure what you are quoting is actually real. We also learned that headlines are marketing tools, and to be wary of what they are selling.

Pictures: Trying to Distract You
In Parts 3 and 4, we added some visuals. Narrative pictures, or those that help illustrate the story, can set impressions that can be ridiculously hard to correct. It gets even worse when you add graphs. Even a little bit of technical information can make things look more credible than they deserve.

Proof: Using Facts to Deceive
In Parts 5, 6, and 7, we covered “the truths people use to lie with”. Here we covered information that is true, but used to give false impressions. We started with stories and anecdotes, which are often used to humanize and emphasize various points. Next we moved on to experts and balance, and how we need to be careful who we listen to and who we dismiss.  Finally I gave a woefully short and incomplete overview of some statistical tricks that get used a lot.

People: Our Own Worst Enemy
And now we come to the part where we have only ourselves to blame. First we took a look at how our own pre-existing beliefs color our views of facts and even impact our ability to do math. Next, we take a look at how our tendency to not be entirely honest can screw up surveys and research based on them.  Finally, we had a bit of a discussion about the limits of scientific understanding, research ethics and things we may never know.

And that’s a wrap!

Mr Uniform Distribution

This will likely not be of interest to anyone who hasn’t taken a probability theory class, but I’ve been a little obsessive about probability distributions and how they relate to each other. This is my way of dealing with that.

Meet Mr Uniform Distribution, the continuous one that is. A continuous uniform probability distribution is one that has a continuous probability for all points from “a” to “b”.  You can learn the technical piece here or the Wikipedia version here, but I’m mostly focused on what he would look like as a cartoon character.

Uniform

Terrorist Timelines and Bar Graphs

A reader going by the name of “Sound Information” sent along the following graph from this Brietbart article, with this comment:

Just saw the following graph in a Breitbart article, and thought “wow! those increasing bar lengths really indicate increase” — except really they are just an artifact of earlier dates being closer to the y-axis than later ones.

It’s a good point. The bar lengths do, at first glance, appear to represent something in terms of magnitude. It’s only when you look closely that you realize their length is mostly about making the dates readable.  I was curious how this graph would look if I just took the absolute numbers for each year so I did that and I came up with this graph:

attacksplots2

Note: all I did was transcribe their data. They got it from this Heritage Foundation timeline, and I didn’t look to see what got counted or not. I did however, take a look at discrepancies. I think I found 2 typos and 1 intentional addition to the Brietbart data:

  1. Breitbart lists a plot on  June 3, 2008 that the Heritage Foundation doesn’t list and I couldn’t find (probably typo).
  2. The Heritage Foundation has a plot listed on May 16, 2013 that Brietbart did not include (probably typo).
  3. September 11th, 2012 is included on the Breitbart list but not the Heritage Foundation one. This is the date of the Benghazi attacks on the US embassy in Libya (almost certainly intentionally added)

So overall there does appear to be an increase in absolute number, at least of the plots and events we know about or have record of.  This is one of those strange areas where we never quite know how big the sample size was. Some plots (especially single person events) likely fizzle with no one knowing, and more massive plots might be kept from us by FBI/CIA/etc for ongoing investigation reasons.

The other thing  missing from both graphs of course is the magnitude of any of these attacks. 2015 had 15 plots or attacks overall, but 9 of those involved just one person, and 5 involved 2 people. It’s hard to know if it’s more accurate to show number of events, magnitude of events or both. It feels strange to look at 9/11/01 and say “that’s one”, but there also is some value in seeing trends of smaller events.

Regardless of how you do the numbers, I think we all hope 2016 is a record low in every way possible.

People: Our Own Worst Enemy (Part 10)

Note: This is part 10 in a series for high school students about reading and interpreting science on the internet. Read the intro and get the index here, or go back to Part 9 here.

Wow folks….10 weeks later we are coming to the end.   This is a shorter one than the rest of the series, but I think it’s still important. Up until now I’ve been referencing science as thought it can always provide the guidance we need if we just know where to look. Unfortunately that’s not always true. It’s at this point that I like to step back and get a little bit reflective about evidence and science in general, and how we acknowledge what we may never know. That’s why I call this section:

Acknowledging our Limitations

Okay, so what’s the problem here?

The problem is that just like research and evidence can be manipulated, so can lack of research and evidence. The reality is that there are practical, financial, moral and ethical issues facing all researchers, and there are limits on both what we know at the moment and what we can ever know. A lack of evidence doesn’t always mean someone’s hiding something. Unfortunately, none of this stops people from claiming it does. This normally comes up when someone is explaining why their opponent’s evidence doesn’t count.

What kinds of things should we be looking out for? 

Mostly calls for more research. It’s tricky business because sometimes this is a perfectly reasonable claim, but sometimes it’s not. Sometimes it’s just a smokescreen for an agenda.

For example, in 2012 two doctors from the CDC were called in front of Congress to discuss vaccine safety. As part of the hearing, Congressman Bill Posey asked the doctors if they had done a study on autism in vaccinated vs unvaccinated children. You can read the whole exchange here, but the answer to the question was no. Why? Well,  a double-blind placebo controlled trial of vaccines would be unethical to do. For non-fatal diseases you can sometimes do them, but you can’t actually knowingly put people in the way of harm no matter how much you need or want the data. To give a placebo (i.e. fake) measles vaccine to a child  just to see if they get sick and die or not would be unethical. The NIH requires studies to actually have a “fair risk-benefit ratio”, so there either has to be low risk or high benefit. I work in oncology and have actually seen trials closed to enrollment immediately because data suggested a new treatment might have more side effects than we suspected.

A Congressman looking in to vaccine safety should know this, but to anyone listening it might have sounded like a reasonable question. Why aren’t we doing the gold standard research? What are they hiding?

Other examples of this can be asking for evidence such as “prove to me my treatment DOESN’T work“.

Why do we fall for this stuff?

Well, mostly because many of us never considered it. If you’re not working in research, it can be hard to notice when someone’s asking for something that would never get past the IRB. Even if something would be ethical, it’s hard to realize how tricky some studies would be. In something like nutrition science this is rampant.  I mean, how much money would it take for you to change your diet for the next 30 years to scientists could study you?

I took a “Statistics in Clinical Trials” class a few years ago, and I was surprised that nearly half of it was really an ethics class. Every two years I (and everyone else at my institute) also have to take 8 hours of training in human subject research, just to make sure I stay clear on the guidelines. It’s not easy stuff, but you have to remember the data can’t always come first.

So what can we do about it?

Well first, recognize these limitations exist. We can and should always be refining our research, but we have to respect limits. Read about famous cases where this has gone wrong, if you’ve got the stomach for it. The Tuskegee Syphilis Experiments  and the Doctor’s Trial that resulted in the Nuremberg Code are two of the most famous examples of this, but there are others.  The more you know, the more you’ll be prepared for this one when you see it.

 

All right, that wraps up part 10! I think I’m going to cut this off here and do my wrap up next week. See you then!

A is for Alternative Hypothesis

A few weeks ago, I hung out with two lovely people who were self-professed language lovers who didn’t really like math. Over a wonderful spread of french fries and beer, I tried to get them a little more well versed in why math and statistics were so appealing to me.  After a few more beers and a wonderful reading of Introductory Calculus For Infants, I had an idea: wouldn’t it be fun to put together a list of statistics words for logophiles? Since my urge to systematize pervades all aspects of my life, I figured I’d start with the alphabet. More specifically, the letter A. Obviously I’m cheating a bit here as this is technically a phrase, but bear with me.

LetterA

Hypothesis testing in general and the Alternative Hypothesis in particular are beautiful things. Learn more about them here.

How Do They Call Elections so Early?

I live in Massachusetts now, but for the first 18 or so years of my life I lived in New Hampshire. I still have most of my family and many friends there, so every 4 years around primary time my Facebook feed turns in to a front row seat for the “first in the nation primary” show1.  This year the primary was on Tuesday February 9th, and it promised to be an interesting time as both parties have unexpected races going on. I was interested in the results of the primary, but since I tend to go to bed early, was unsure I’d stay up late enough to see it through. Thus like many others, I was completely surprised to see CNN had called the race around 8:30 for Trump and Sanders with  only 8% of the votes counted. By 8:45 I had a message in my inbox from a NH family member/Sanders supporter saying “okay, how’d they do that????”.

It’s a great question and one I was interested to learn more about. It turns out most networks keep their exact strategies secret, but I figured I’d take a look at the most likely general approach. I start with some background math stuff, but I include pictures!

Okay, first things first, what information do we need?

Whenever you’re doing any sort of polling (including voting), there are a couple things you need to think through.  These are:

  1. What your population size is
  2. How confident you want to be in your guess (confidence level)
  3. How close you want your guess to be to reality  (margin of error)
  4. If you have any idea what the real value is
  5. Sampling bias risk

#1 is pretty easy here. About 250,000 voters voted in the Democrat primary, and 280,000 voted in the Republican primary. This doesn’t matter much when it’s this large.

#2 Confidence is up to the individual network, but they’re almost ubiquitously pretty conservative. They’re skittish here because every journalist to ever pick up a pen has seen this image and lives in fear of it:

If you’re missing the reference Wikipedia’s got your back, but suffice it to say networks live in fear of a missed call.

#3 is how close you want to be to reality. We’ll come back to this, but basically it’s how much you need your answer to look like the real answer. When polls say “the margin of error is +/- 3 percentage points”, this is what they’re saying.  If you look at this diagram:

Margin of error is basically how close those x’s need to be to the target, confidence interval (#2) is how close you need them to be to each other.

#4 is whether or not you’re working from scratch or you have a guess. Basically, do you know ahead of time what percent of people might be voting for a candidate or are you going in blind?

#5 is all the other messy stuff that has nothing to do with math.

Okay, so what do we do with this?

Well factors 1-4 all end up in this equation:

MEeq

So basically what that’s saying is that the more confident and precise you need to be, the more people you need to poll. Additionally, the larger the gap between your “percent saying yes” and “percent saying something else”, the fewer people you need before you can make a call. A landslide result may be bad for your candidate, but great for predictions.

Okay, thanks for the math lesson. Now what?

Now things get dirty. What I showed you above is basically how we’d do an estimate for each of the candidates, putting in their prior polling numbers for p one at a time. What about the other numbers though? We know we have to set our confidence high so we’re not embarrassed, but what about our margin of error?  Well here’s where all those phone calls you get prior to the election help.

Going in to voting day, the pollsters had Trump in the lead at 31%, with his next closest rival at 14%. This 17 point lead means we can set our margin of error pretty wide. After all, CNN doesn’t have to know what percent of the vote Trump got as much as it needs to know that someone is really unlikely to beat him. If you split it down the middle, you get a margin of error of 8. Their count could be off by that much and still only lower Trump to 23% of the vote and raise his opponent to 22%. However, that assumes all of his error would go to his closest opponent. With so many others in the race that’s unlikely to happen, so they could probably go with +/- 10.

For the Democrats, I found the prior polls showed Sanders leading 54% to Hillary’s 41%. Splitting that difference you could go about +/- 6.

In a perfect world this means we’d need about 160 random votes to predict Trumps win and about 460 to predict Sanders win at the 99% confidence level.

Whoa that’s it? Why’d they wait so long then?

Well, remember #5 up there? That’s the killer. All those pretty equations I just showed you only work if you get a random sample, and that’s really hard to come by in a situation like this. Even in a small state like New Hampshire you will have geographic differences in the types of candidates people like.  This post from smartblogs had a map shows some of the differences:

So as precincts report, we know there’s likely some bias to those numbers. If the 8% of the votes you’ve counted are from throughout the state, you have a lot more information than if those 8% are just from Manchester or Nashua. Because of this most networks have eschewed strict stats limits like that one I did above in favor of slightly messier rules.

So why’d you tell us all that other stuff?

Because frequentist probability theory is great and you should know more about it. Also, those are still the steps that underlie everything else the networks do. As we discussed above, the size of the leads made the initial/perfect world required number quite small.  To highlight this, watch what would happen to that base number of votes needed as we close the margin of error:

Samplesize

Anything lead closer than about +/- 4 (or about an 8 point difference) gets increasingly more difficult to call. If you’re over that though, you can act a little faster. In this case, both leads were bigger than that from the get go.

To hedge their bets against bias, the networks likely produce some models of the state based on past elections, polling, exit polls and demographic shifts, call the election the day before, then spend election night validating their models/predictions. Bayesian inference would come in handy here, as the networks could rapidly update their guesses with new information. So they’re not really calculating “what is the probability that Trump is winning” they’re calculating “given that the polls said Trump was winning, what are the chances he is also winning now”.  That sounds like semantics, but it can actually make a huge difference. If they saw anything unusual happening or any conflicting information, they could delay (justifying a few veteran election watchers hanging out to pick up on this stuff), but in this case all their information sources were agreeing.

As the night went on, it became apparent that Trump and Sanders were actually out performing the pre-election polls, so this probably increased the network’s confidence rapidly. In pre-election polls, the most worrying thing is non-response bias. You get concerned that those answering the polls are not the same as those who are going to vote. Voting results eliminate this bias….in a democracy we only count the opinions of those who show up at the polls. So if you get two different types of samples with different error sources saying the same things, you increase your confidence.

Overall, I don’t totally know all the particulars about how the networks do it, but they almost certainly use some of the methods above in addition to some gut reactions. With today’s computing power, they could be individually computing probabilities for every precinct or have very advanced models to predict which areas that were most likely to go rogue. It’s worth noting that the second place Clinton and Kasich won very few individual districts, so this strategy would have produced results quickly as well.

So there you have it. The more accurate the prior polling, the greater the gap between candidates, the more regions reporting at least some of their votes, and the less inter-region variability, the faster the call. An hour and a half after the polls close seems speedy until you consider that statistically they probably could have called it accurately after the first 1% came in. No matter how mathematically backed however, that definitely would have gotten them the same level of love that my over-zealous-in-class-question-answering habits got me in middle school. They had to be quick, but not too quick. My guess is that last half hour was more a debate over the respectability of calling so soon rather than the math. Life’s annoying like that some times.

Got a stats question? Send it in here!

Updated to add: Based on a Facebook conversation about this post, I thought I should add that if the race is REALLY close, the margin of error with the vote counting itself starts to come in to play. Typically things like absentee ballots aren’t even counted if it won’t make a difference, but in very close races when every ballot matters, which ballots are valid becomes a big deal. The weirdest example of this I know of is the Al Franken/Minnesota senate seat election from 2008. It took 8 months to resolve which votes were valid and get someone sworn in.

1. This is the quadrennial tradition where New Hampshire acts like a hot girl in a bar who totally hates the fact that she’s getting so much attention yet never seems to want to leave.

SCOTUS Nomination Timing

After yesterday’s news about the death of Antonin Scalia’s death, the conversation almost immediately turned to whether or not President Obama should or would nominate a new candidate.  There’s obviously a lot being said about this right now by better legal and political minds than mine, but I did start wondering what kind of timing there normally was between Supreme Court nominations and Presidential Elections.  Thanks to Wikipedia, I was able to find a list of all 160 Supreme Court nominations that have occurred since 1789. I combined this with a list of election dates, and calculated the difference between the day the person was submitted to the Senate and the next presidential election.  I graphed days vs election year, and color coded the dots with the outcome of the nomination.

A few notes:

  1. I didn’t fully vet the Wikipedia data. If there’s an error in that data, it’s in this chart.
  2. All day calculations for years prior to the 1848 election are approximate. Prior to that, states had a 34 day window prior to the first Wednesday in December to hold their election. I gave them a default date of November 3rd for their year, which could be off in some cases.
  3. There were a few cases in which presidents attempted to nominate someone after the election but before the next inauguration. If they got re-elected, I counted that nomination from the election that would take place 4 years later. If they were leaving office, I gave them a negative number.
  4. 310 days is approximately the number of days between January 1st of a year and the general election, so I put a reference line there.
  5. These nominations include Chief Justice nominations….and those nominees may have been active justices when they were nominated.

With that out of the way, here you go:

Days to election

Rutheford B Hayes sets the record for getting things in under the wire, as he nominated William Burnham Woods in late December of 1880. He actually also nominated Stanley Matthews in January of that year, but it didn’t go to a vote. Matthews was renominated and confirmed a few months later by Garfield.

Overall only about 15% of nominations ever have come in this close to the election, and the success rate of those nominations is a little less than half. To compare, those nominees submitted before January 1st of the election year have about an 80% all time success rate. Obviously we haven’t even dealt with this in a while, but it’s interesting to see that historically this was more common than in recent years.

This could get interesting kids!

People: Our Own Worst Enemies (Part 9)

Note: This is part 9 in a series for high school students about reading and interpreting science on the internet. Read the intro and get the index here, or go back to Part 8 here.

Okay, we’re in the home stretch here! In part 8 I talked about how we as individuals work to confuse ourselves when we read and interpret data. Today I’m going to talk about how we as a society collectively work to undermine our own understanding of science, one little step at a time.  Oh that’s right, we’re talking about:

Surveys and Self Reporting

Okay, so what’s the problem here?

The problem is that people are weird. Not any individual really (ed note: this is false, some people really are weird), but collectively we have some issues that add up. Nowhere is this more evident than on surveys. There is something about those things that brings out the worst in us.  For example, in this paper from 2013, researchers found that 59% of men and 67% of women in the National Health and Nutrition Examination Survey (NHANES) database had reported calorie intake that were “physiologically implausible” and “incompatible with life”.  The NHANES database is incredibly widely used for nutrition research for about 40 years, and these findings have caused some to call for an end to self-reporting in nutrition research.  Now I doubt any individual was intending to mislead, but as a group those effects add up.

Nutrition isn’t the only field with a problem though. Any field that studies something where people think they can make themselves look better has an issue. For example, the Bureau of Labor Statistic found that most people exaggerate how many hours they work per week. People who say they work 40 hours normally only work 37. People who say they work 75 hours a week typically work about 50. One or two people exaggerating doesn’t make a difference, but when it’s a whole lot of people it adds up.

So what kinds of things should we be looking out for?

Well, any time things say they’re based on a survey, you may want to get the particulars. Before we even get to some of the reporting bias I mentioned above, we also have to contend with questions that are asked one way and reported another.  For example back in 2012 I wrote about an article that said “1/3rd of women resent their husbands don’t make more money”. When you read the original question, it asked if the “sometimes” resent that their husband doesn’t make more money.  It’s a one word difference, but it changes the whole tone of the question.  Every time you see a headline about what “people think”, be a little skeptical.  Especially if it looks like this:

lizardpeople

That one’s from a survey about conspiracy theories, and they got that 12 million number from extrapolating out the 4% of respondents to the survey who said they believed in lizard people to the entire US population.  In the actual survey, this represented 50 people.  Do you think it’s more plausible that the pollsters found 50 people who believed in lizard people or 50 people who thought this was an amusing thing to say yes to?

But people who troll polls aren’t the only problem, polling companies play this game too, asking questions designed to grab a headline. For example, recently a poll found that 10% of college graduates believe a woman named Judith Sheindlin sits on the Supreme Court.  College graduates were given a list of names and told to pick the one who was a current Supreme Court justice.  So what’s the big deal, other than a wrong answer? Well apparently Judith Sheindlin is the real life name of “Judge Judy” a TV show judge. News outlets had a field day with the “college grads think Judge Judy is on the Supreme Court” headlines. However, the original question never used the phrase “Judge Judy”, only the nearly unrecognizable name “Judith Sheindlin. The Washington Post thankfully called this out, but all the headlines had already been run. Putting in a little known celebrity name in your question then writing a headline with the well known name is beyond obnoxious. It’s a question designed to make people look dumb and make everyone reading feel superior. I mean, quick, who is Caryn Elaine Johnson? Thomas Mapother IV? People taking a quiz will often guess things that sound vaguely right or familiar, and I wouldn’t read too much in to it.

Why do we fall for this stuff?

This one I fully blame on the people reporting things for not giving proper context. This is one area where journalists really don’t seem to be able to help themselves. They want the splashy headline, methodology or accuracy be damned. They’re playing to our worst tendencies and desires….the desire to feel better about yourself. I mean, it’s really just a basic ego boost. If you know that Judge Judy isn’t on the Supreme Court, then you must clearly be smarter than all those people who didn’t right?

So what can we do about it?

The easiest thing to do is not to trust the journalists. Don’t let someone else tell you what people said, try to find the question itself.  Good surveys will always provide the actual questions that they asked people. Remember that tiny word shifts can change answers enormously.  Words like “sometimes” “maybe” and “occasionally” can be used up front, then dropped later when reported. Even more innocuous word choices can make a difference. For example, in 2010 CBS found that asking if “gays and lesbians” should be able to serve in the military instead of “homosexuals” causes quite the change in people’s opinions:

gaysinmilitary

So watch the questions, watch the wording, watch out for people lying, and watch out for the reporting.  Basically, paranoia is just good sense when lizard people really are out to get you.

See you in Week 10! Read Part 10 here.