Friday links for fun – 4.13.12

This will be completely lost on you if you’re not a Hunger Games fan, but the stats work/extrapolation is pretty damn impressive.

Professionally, I found this interesting….I can only get you the numbers, ma’am, I can’t make you use them wisely.

I haven’t talked much about small sample sizes, but this blog does.

These guys are my new heroes.  They noticed a statistical error that kept popping up in neuro research, and then went back and figured out how often people were getting it wrong….half of the studies that could have got it wrong did.  It’s a stat geeky read, but hears the story.

Age Bias and Polling Methods

A few years ago, in one of my research methods classes in grad school, a professor I had asked us to raise our hand if we had a cell phone.  

Everyone raised their hands.  
Then he asked people to keep their hands up if they had a land line as well.  
Many hands went down.  
For those left, he asked how many answered it regularly or had caller ID and screened calls.  
Pretty much everyone.
This of course then led in to a discussion of political polling and how many of us had ever considered who was actually answering these questions.  It was an interesting discussion, as pretty much the entire class admitted they would have self excluded.  The Pew Research center suggests this was not an anomaly, and that this is actually a problem that’s becoming more acute in political polling.  
While many large national polling organizations have started calling cell phones as well, on the state level this is not often corrected for.  This can, and has, resulted in some inaccurate polls, as the sample of people home, with a landline, willing to answer a pollsters call, does not always reflect the general population.  Actually, I think there’s good reason to question the representativeness of a sample willing to answer their phone for an unknown number, but that could be disputed (those interested enough to pick up the phone also might be more likely to actually go vote).  
Anyway, none of this is new.  What is new this (presidential) election cycle is that news organizations are now starting to put up stats on Twitter and Facebook status updates.  I decided to take a look and see exactly how skewed these stats are, and found that Twitter is most popular in the 18-29 demographic.  Of course, this is the least likely demographic to actually vote.  Interestingly, the poll on Twitter usage did not include people under 18, but these are not excluded when they are compiling trends.  
So two different ways of tracking elections, two different sets of flaws.  Pick your poison.

There’s bad data, and then there’s data that’s just plain mean….

I’ve worked at teaching hospitals for pretty much my whole post-college career, so I generally heave a bit of a sigh when I hear the initials “IRB”.  IRB’s (Institutional Review Board) are set up to protect patients and approve of research, but they also have power to reject proposed studies and cause lots of paperwork.  Sometimes though, you need a good reminder of why they were invented.

Apparently, some scientists in the 1940’s tried to develop a pain scale based on burning people and rating the pain.  Then, to make sure they had a good control, they burned pregnant women while in between contractions.

While it actually wasn’t a half bad way of figuring out what their numerical scale should look like, that is just WRONG.  As a pregnant women, I can pretty confidently say that anyone coming at me with a flat iron during labor will be kicked.  Hard.

Unethical gathering of data is not only not worth it, but also frequently wasted.  In the study mentioned above, the data proved useless, as pain is too subjective to be really quantified.  After this fiasco, it wouldn’t be until 2010 that someone came up with a really workable pain scale.

You can’t misquote a misquote

Yesterday I talked about sensational statistics and to always verify that there’s no missing adjectives that would change the statistic.  It was thus a bit serendipitous that today I happened to hear a debate about a misquoted statistic, and whether the quote or the misquote was more accurate.  It was on a podcast I listen to, and it was about a month old (sometimes I don’t keep up well).

It was happening around the time the contraception debate was at it’s most furious (see what I did there?  It was a federally mandated coverage of contraception debate, to give you all the adjectives).  Anyway, at the time the statistic about the prevalence of birth control usage among Catholic women was getting tossed around quite a bit.  The statistic, in it’s most detailed form, is this:  98% of self-identified Catholic women of child bearing age who are sexually active have used a contraceptive method other than natural family planning at some point in their lives.

Now, this stat rarely got quoted in it’s entirety.  First, I always think designating that the religions is self identified is important.  The women answering this survey didn’t have to clarify if they thought they were good Catholics, just Catholic.  Second, the “sexually active” got glossed over as well, despite the fact that it probably cuts down the numbers at least a bit (for young adult Catholics, to approximately 89% of respondents).  Third, “at some point”.  The study’s authors have justified this qualifier by arguing that if a woman is on birth control for years, then decides to start trying to have children and goes off of it, she would have been excluded.  Critics have argued that this strategy was designed to include women who may have tried it, decided it was wrong, and stopped.  Both have a point.

That being said, I most often heard this being quoted as “98% of Catholic women use birth control” or sometimes even “98% of Catholics use birth control”.  

It was that last phrase that got the debate going on the show I was listening to.  Person 1 argued that it annoyed him that people kept dropping the “women” part of the quote.  Person 2 shot back that it actually drove him nuts that people felt the need to add it.  He argued that for every straight female using contraception, there was by definition a straight man using it.  Unless one presumed a statistically significant number of women were misleading their partners, 98% of Catholic men were also using birth control (of course, even if they were being misled, they were actually still using it…just not knowingly).  Since according to Catholic doctrine the contraception mandate is for both genders, both parties are therefore guilty.

I liked the debate, and would be totally fascinated to hear the numbers on men who have used (or had a partner who used) contraception.  I am curious if a significant number don’t know, or would claim not to know.  I still think that clarifying “women” in the quote is fine, as it’s who the study was actually done on.  In my mind extrapolation should always be classified as extrapolation, not an actual finding.

Also of note, this was an in-person survey.  That’s always useful to realize that every answer given in a survey like this had to verbalize their answers to another person….important when the topic is anything highly subject to social pressures.  For a further breakdown of issues with that study, see here.

Beware the Adjective

My tax return showed up in my bank account this weekend, which is always nice (even if it was my money to begin with).  It brought to mind a few months back when people were big on the “50% of American households don’t pay any federal income tax” statistic.

Now, that was an interesting statistic, and one that no doubt caused a lot of emotion.  I mean, heck, this is my percent breakdown of taxes paid for 2011 (excluding sales-linked taxes…that retrospective would have taken all week):

Edit: My labels got a little hinky, so assume federal tax = federal income tax and state tax = state income tax.  So yes, life would have been a great deal cheaper if I could have avoided federal income tax.

Anyway, I was thinking about this when I stumbled across this chart:

Along with this post explaining that many of the households not paying taxes were actually older workers.  Interesting, but economic data is so easily manipulated it doesn’t normally catch my attention (example: no where on this graph does it indicate how large each population slice is…I’m sure there are far fewer people represented at the end of the graph than at the middle).

Anyway, what this jogged my memory about was how this statistic got quoted by many at the time.  Rick Warren was one of the more notable examples, but many people made the mistake of stating “half of all Americans pay no taxes”.  The “Federal Income” part of that phrase makes a huge difference.

I’m certainly not saying that everyone who misquotes a stat does so intentionally.  Many times it’s innocent, and thus it’s something to keep in mind when you hear a crazy statistic from anything but the source.  Politicians and other public speakers do just flat out miss words sometimes.  There are some pretty horrifying stats out there that become much more reasonable when the correct modifiers are put back in their place.  

Friday links for fun – 4.6.12

Two fun articles taking on bad data:

This one covers everything I will probably ever say in this blog, but with less pizzazz.

This one is trying to stop bad data before it starts.  Don’t try to make things in to a scientific experiment if you have to fudge around things to do it.  Just call it a model.  I like that.

That’s some bad data, bad to the bone

Not the most useful data on the planet, but fun never the less….especially if you are a data geek married to a metal head.  Not that I’d know anything about that.

Heavy metal bands per capita for every country except Bhutan:

In case you’re curious, here’s an article explaining more, including the actual numbers used.

Thanks to some research carried out by me and my wonderful husband, we discovered that Bhutan now has 1 metal band that was formed in 2008.  Their name is Metal Visage.  Here’s a review.  Oh, and if you’re super curious, here’s a video.  I have no idea if they’re good or offensive or what, as my dog started barking as soon as I hit play, but my husband assures me they are better than Ugra Karma (one of Nepal’s 12 metal bands)

Anyway, not much to criticize here, as sadly this is probably more accurate than most of the studies I write about.  I did find it amusing that I saw a comment about this where someone was greatly disturbed that the CIA world factbook was cited as a source.  I considered politely explaining to them that that was probably where the population numbers came from, not the metal band numbers, but I decided not to.  Read ALL the sources folks, thank you.

Opinions, everybody’s got one

I was listening to a management podcast recently where a man named John Blackwell was being interviewed.  He was talking about how he was constantly reading things about how the whole workplace was changing, but he was getting curious as to why he felt like the companies he worked with weren’t reflecting this.  When he tried to investigate, he found out that the ongoing surveys commonly used in British management journals (can’t find a link) were being done on the “up and coming business leaders”.  When he looked in to what that meant, he realized it was people who were second year MBA students.

The problem with this, of course, was that this was asking people not in the workforce what the workforce was going to look like 10 years from now.  They found, not surprisingly, that young people in grad school tend to be very optimistic about things like “working from home” or “flex time” when they’re in school, but when they got in to business, they towed toed the line.  Thus, every survey done was essentially useless.  
This all reminded me of a conversation I got in to several years ago when I was working the overnight shift.  Someone had brought in a magazine (People or Vogue or something like that) and they had a ranking of the 100 most beautiful women in Hollywood.  Drew Barrymore was number one that year, and one of my (young, male) coworkers was actively scoffing at that.  “She’s unattractive,” he stated definitively.  “All the guys I know think so too.”
Now, I was feeling a little feisty feminist that night, so I thought about how to challenge him on that.  Leaving aside that “Hollywood unattractive” would still turn heads in any average crowd (and be more attractive than any girl he’d dated), something about his comment irked my data side.  “So maybe the voting was done by women,” I replied.  
He was floored.
I noted that it was not a men’s magazine that ran the story, so really women’s opinions of other women’s attractiveness would actually be more relevant to this list.  Furthermore, as most of the leading women in Hollywood make their money on romantic comedies, professionally women’s opinions of their attractiveness (which presumably included a certain likeability factor) would actually matter more than men’s.
I was fascinated that this clearly disturbed him.  It had clearly never occurred to him that straight men may not be the target audience for female attractiveness, or even that the relevance of his opinion might get questions.  He wasn’t trying to be a jerk, he was legitimately confused at the whole idea.
A long intro, but the bigger point is important.  In any opinion survey or research, it’s important to figure out whose opinion is most relevant to what you’re trying to get at and why.  When it comes to law and public policy questions, I think every voter is relevant.  When it comes to workplace trends?  You may need to narrow your sample.
Sampling bias is a huge problem in many contexts, but my primary one for today’s post is when the survey was not conducted with the end in mind.  For any sample, you have to figure out how much your subject’s opinions actually matter given what you’re trying to find out.  In social conversation it may be interesting to find out what a particular person thinks of a topic, but for good data, show me why I care.