That’s some bad data, bad to the bone

Not the most useful data on the planet, but fun never the less….especially if you are a data geek married to a metal head.  Not that I’d know anything about that.

Heavy metal bands per capita for every country except Bhutan:

In case you’re curious, here’s an article explaining more, including the actual numbers used.

Thanks to some research carried out by me and my wonderful husband, we discovered that Bhutan now has 1 metal band that was formed in 2008.  Their name is Metal Visage.  Here’s a review.  Oh, and if you’re super curious, here’s a video.  I have no idea if they’re good or offensive or what, as my dog started barking as soon as I hit play, but my husband assures me they are better than Ugra Karma (one of Nepal’s 12 metal bands)

Anyway, not much to criticize here, as sadly this is probably more accurate than most of the studies I write about.  I did find it amusing that I saw a comment about this where someone was greatly disturbed that the CIA world factbook was cited as a source.  I considered politely explaining to them that that was probably where the population numbers came from, not the metal band numbers, but I decided not to.  Read ALL the sources folks, thank you.

Opinions, everybody’s got one

I was listening to a management podcast recently where a man named John Blackwell was being interviewed.  He was talking about how he was constantly reading things about how the whole workplace was changing, but he was getting curious as to why he felt like the companies he worked with weren’t reflecting this.  When he tried to investigate, he found out that the ongoing surveys commonly used in British management journals (can’t find a link) were being done on the “up and coming business leaders”.  When he looked in to what that meant, he realized it was people who were second year MBA students.

The problem with this, of course, was that this was asking people not in the workforce what the workforce was going to look like 10 years from now.  They found, not surprisingly, that young people in grad school tend to be very optimistic about things like “working from home” or “flex time” when they’re in school, but when they got in to business, they towed toed the line.  Thus, every survey done was essentially useless.  
This all reminded me of a conversation I got in to several years ago when I was working the overnight shift.  Someone had brought in a magazine (People or Vogue or something like that) and they had a ranking of the 100 most beautiful women in Hollywood.  Drew Barrymore was number one that year, and one of my (young, male) coworkers was actively scoffing at that.  “She’s unattractive,” he stated definitively.  “All the guys I know think so too.”
Now, I was feeling a little feisty feminist that night, so I thought about how to challenge him on that.  Leaving aside that “Hollywood unattractive” would still turn heads in any average crowd (and be more attractive than any girl he’d dated), something about his comment irked my data side.  “So maybe the voting was done by women,” I replied.  
He was floored.
I noted that it was not a men’s magazine that ran the story, so really women’s opinions of other women’s attractiveness would actually be more relevant to this list.  Furthermore, as most of the leading women in Hollywood make their money on romantic comedies, professionally women’s opinions of their attractiveness (which presumably included a certain likeability factor) would actually matter more than men’s.
I was fascinated that this clearly disturbed him.  It had clearly never occurred to him that straight men may not be the target audience for female attractiveness, or even that the relevance of his opinion might get questions.  He wasn’t trying to be a jerk, he was legitimately confused at the whole idea.
A long intro, but the bigger point is important.  In any opinion survey or research, it’s important to figure out whose opinion is most relevant to what you’re trying to get at and why.  When it comes to law and public policy questions, I think every voter is relevant.  When it comes to workplace trends?  You may need to narrow your sample.
Sampling bias is a huge problem in many contexts, but my primary one for today’s post is when the survey was not conducted with the end in mind.  For any sample, you have to figure out how much your subject’s opinions actually matter given what you’re trying to find out.  In social conversation it may be interesting to find out what a particular person thinks of a topic, but for good data, show me why I care.

Stand Back! I’m going to try SCIENCE!

Today I discovered that my favorite webcomic (xkcd.com) actually has a special comic up if you check it from my employer’s server.  Turns out the artist’s wife is a patient, doing well, and he wanted to show some love.  This post is thus titled for this shirt, which would make an awesome Christmas present for me, even in April.

Anyway, this weekend I saw this story with the headline “Study: Conservatives’ Trust In Science At Record Low”.

My first thought on seeing this was that the word “science” is a loaded word.  I mean, I’m as much a science geek as anyone.  Math’s my favorite, but science will always be a close second.  But do I trust science? I’m not sure.  Something really bothered me about that question, but I couldn’t quite put my finger on it until I read this post on the study from First Things today.  

My love of science makes me a skeptic.  I makes me question relentlessly and then continuously revisit to figure what got left out.  I don’t trust science because not trusting your assumptions is science done right.  If we could all trust our assumptions, what would we need science for?  This is the problem with vague questions and loaded words.  Much like the discussion in the comments section of this post where several commenters weighed in on the word “delegate” in relation to household tasks, it’s clear that people will interpret the phrase “trust science” in many different ways.

Some might say it means the scientific method, scientists, science as a career, science’s role in the world, or something else not springing to mind.  Given the vagueness of the question though, I would have a hard time actually calling anyone’s interpretation wrong.  Mine is based on my own bias, but I would wager everyone’s is.  So isn’t this survey more about how we’re defining a phrase than about anything else?

I thought my annoyance was going to end there, I really did.

Then I looked at the graph with the story, and had no choice but to get annoyed all over again.

That’s what I get for just reading headlines.

So over the course of this survey, moderates have consistently trusted science less than conservatives for all but four data points?  Why didn’t this get mentioned?  I found the original study and took a quick look for the breakdown: 34% self identified as conservative, 39% as moderate, and 27% as liberal.  So 73% of the population has shown a significant drop off in “trust of science” and yet they’re somehow portrayed as the outliers?  Science and technology have changed almost unimaginably since 1974, and yet liberal’s opinions about all that haven’t changed*?  Does that strike anyone else as the more salient feature here?

*Technically this may not be true.  I don’t know what the self identified proportions were in 1974, so it could be a self-identification shift.  Still.  This might be that media bias everyone’s always talking about.

Book Recommendation – How to Lie With Statistics

If one has free reading time or just really likes lists (and boy do I love a good list!) the Personal MBA reading list is pretty darn cool.  It claims to give you knowledge equivalent to an MBA in 99 books, without any of the crippling debt.  I’m about 10 books in, and there’s some really great stuff on data, statistics, analysis and presentation.

One of the classics of course is How to Lie With Statistics.  It’s a great book, easy reading, though the examples are outdated to the point of near distraction (salaries list at $8000/year, that sort of thing).  Still, clear and concise, and shows you that bad data has been around for quite some time.

One of my favorite moments is when he goes after Joseph Stalin for his bad data….in retrospect that kind of feels like saying Hitler was a bad dresser.  Still, pretty interesting to see where the misinformation starts.  This book should be required reading for everyone.

Arguments and Discussions…learning the rules

I was struck by something that commenter Erin mentioned in response to my post about data that I hate.   She ended her comments with this:

I teach this stuff to my AP students…I love trying to get them to understand how to break apart political rhetoric and other arguments around them. I figure even if we disagree wildly in politics or social issues, at least I’ll have an intelligent opponent to argue with someday. 

I like that, because I fully endorse that approach to life.  That’s part of why I wanted to do a blog like this.  Quite some time ago, the Assistant Village Idiot put up a post I liked very much (and can’t find now…circa 2007?) about how far too many people treated their political opinions as though they were defense lawyers….never giving an inch, never admitting that anything they had said or cited could be wrong or skewed.  This makes lots of people defend really stupid things.

In my office, this flowchart hangs just to the right of my computer:

I often have fantasies of taking it down during debates and serenely handing it to the other person whilst telling them to try again.  Sadly, I have never done this.  The fantasy keeps me going some days though, doubly so in political debates.
Though I’m probably preaching to the choir hear, I feel the need to state for the record:  Just because something you cited is wrong does not mean you are wrong.  You can keep your belief while also admitting that something that agrees with you is a load of crap.  That actually makes you a better person, not a worse one.  This is not an April fools joke, people actually can operate like this.

Say What?

I can’t figure out if this is the worst statistic I’ve read this week, or just the most poorly phrased good statistic.  I’m leaning towards that first one:

“….more than half of students earning bachelor’s degrees at public colleges – 56 percent – are graduating with $22,000 of debt, on average.”  –Nancy Zimpher on CNN.com

If I’m reading this correctly, they tossed out everyone at private college, then anyone who didn’t graduate, then (most disturbingly) 44% of who was left?  Why did they get the boot?  I mean, if we’re just tossing out arbitrary numbers of students, can’t we get any average we want?

Nancy Zimpher, what are you up to??? 

Why most nutritional research is useless

Nutrition research is big money these days.  Our national obsession with weight loss is at a fever pitch, and any new or interesting research is sure to make headlines.

Here’s some basic guidelines on what to look for in nutritional research (any study, not just this one):
  1. Was the data self reported? Even CNN brought this up in their article.  People, especially those embarrassed about their weight, don’t accurately assess what they eat.  My mother, skinny little thing that she is, could eat one peppermint patty and tell you she’d had a serving of chocolate.  I don’t think I’d even count it until I had 3 or so.
  2. How much was “more”? I actually can’t find this for this study.  Is it the difference between 1 and 2 servings per week?  Or the difference between 1 and 5?  Both would produce statistically significant correlations, but the practical outcome would be different.  In 2005, researchers made news by saying that eating more fruits and veggies did not, in fact, prevent cancer.  The cancer treating establishment (which I work in, btw) promptly responded by pointing out that they compared people who ate half a fruit per day to those who at 1-2 fruits per day.  It was all reported in grams too, so the data look extra impressive “Those eating less than 114 grams showed no difference from those eating 367 grams”.  The link gives more examples, but 250 grams is one medium apple.  Watch out for this.
  3. Who classified people as “normal weight” or “overweight”?  If this was also self reported (and in this study, it looks like there were clinic visits), then look out.  There’s a great study I can’t find right now that shows that women tend to lie about weight, and men tend to lie about height.  Both lies will screw up the BMI calculation (the most common metric for assessing “normal”).
  4. Were the overweight people actively (or even somewhat) trying to modify their diets to lose weight?  A few years ago, I heard about the study that suggested diet soda was linked to obesity.  I remember my first reaction was “are we sure they’re all not just on diets?”.  This seemed like a classic correlation/causation issue.  All the analysis seemed to presume they were overweight because they drank diet soda.  I wondered why they never seemed to look at the idea that they could be drinking diet soda because they were overweight.  That’s one of the first swaps most people I know make when they try to lose weight.  
  5. Don’t even get me started if it’s a population study.  That’s a big topic for another time, but lets just say they’re really really tricky.
If you ever want a fabulous crash course in how nutrition research can be skewed, pick up two diet books that contradict each other, and read through their parts on research.  Take something like Atkins (high protein, low carb) and Joel Fuhrman (nearly vegan), and watch them rip to shreds the research the other one builds their whole case on.  
He may have his own controversy, but this is why I like Michael Pollan.  The book I linked to has a great crash course in why most nutritional research just sees what it wants to.  He refused to take a strict nutritional stance and instead condensed it down to a few “rules” that he gleaned from quizzing nutritionists on “what they could say for sure”.  The answer? Eat real food, not too much, mostly plants.  

Blog Rules

Thanks to some links from the kind people at Assistant Village Idiot  and Maxed Out Mama, I have gotten a bit more traffic than I expected in the past two days.  As such, I realized it might be a good moment to spell out some of the rules for this blog I’ve had bouncing around in my head.  These are rules for me really, not for commenters, as no one can hope to tame the internet:

  1. I will try my best to provide a link for every study I cite, and this link will get as close to source data as possible.  Nothing drives me crazier than reading about “new research” with absolutely no clue as to where to find it.  I spent almost 20 minutes trying to find where the heck Jack Cafferty got his numbers for this article, and it made me mad.  I won’t do that to you.  And here are the numbers he reported on, as a sign of good faith.
  2. I will attempt to remain non-partisan. I have political opinions.  Lots of them.  But really, I’m not here to try to go after one party or another.  They both fall victim to bad data, and lots of people do it outside of politics too.  Lots of smart people have political blogs, and I like reading them…I just don’t feel I’d be a good person to run one.  My point is not to change people’s minds about issues, but to at least trip the warning light that they may be supporting themselves with crap.  That being said, if I start to lean to far to one side, smack me back to center.
  3. I will admit that I will probably fail at #2, and have lots of other biases as well.  What, you thought I was going to claim to be neutral?  No special snowflake here, we humans can’t help ourselves.
  4. I will, when I can, declare those biases up front.  When I review a study on changing last names, I think it’s relevant that I didn’t change mine.  When mentioning healthcare reform, I think it’s relevant that I live in the one state in the nation that won’t be affected by it either way.  It makes it easier 
  5. I will attempt to explain all stats words that are used.  I am not a stats teacher, I am just someone who uses a lot of data to get a job done.  I would love to do more than just preach to the choir, and thus I will try not to have any prereqs for this class.  For the very smart commenters I have here, this may get tedious, but bear with me.
  6. I will try to improve my use of apostrophe’s.  I’m really not good at those.
  7. Suggestions always welcome.  The internet is awesome because I get to learn from smart people I normally wouldn’t meet.  

THE KIND OF DATA I ABSOLUTELY HATE

Hate’s a strong word.  I get that.  I also get that data and survey types are not always the sort of thing that inspires people to strong hatred, but here we are.

In this post I mentioned my annoyance at perception/prediction polls.  The one I referenced was based on women who didn’t change their last names and their level of marital commitment.  Commenter Assistant Village Idiot mentioned another example, which I also liked ““Do you think earthquakes are more likely now because of climate change?” What we think has nothing to do with anything. The earthquakes will happen according to their own rules.”  

In writing that post however, I forgot to mention that same study included an even worse piece of data.  As a rebuttal to the “Midwestern college kids don’t think non-name changing women are committed” they included a remark that women who didn’t plan on changing their names didn’t feel less committed. 


I HATE STATEMENTS LIKE THAT.

I would really love it if someone could tell me if there’s a proper name for this sort of thing, but I always think of it as “the embarrassing question debacle”.  Basically, researchers ask people questions with a potentially embarrassing answer, and then report it as meaningful when people do not answer embarrassingly.

There are only two types of people I have ever heard who will admit they went in to their marriages less than completely committed:

  1. Those who have been married successfully for quite some time who are now comfortable in admitting they were totally naive when they walked down the aisle.
  2. Those who are already divorced and reflecting on what went wrong.
Level of commitment is best assessed in retrospect, and I look with great skepticism at anyone who says they can gauge it before the fact.  
Getting at the reasons people do things can be brutal.  Your only source for your data also has the biggest motivation to conceal it from you.  Some people are actually doing things for good reasons, some just want to look like they are, and some are lying to themselves.  Unless a study at least attempts to account for all 3 scenarios, I would hold all answers suspect.