Weekend Moment of Zen 5-26-2012

Hans Rosling’s enthusiasm gets me every time.  Here, he takes on the ideas of unlimited population growth and religions influence on baby making:

http://video.ted.com/assets/player/swf/EmbedPlayer.swf

Apparently he has a one hour documentary on stats.  I’m adding watching it to my list of goals for the long weekend.

Watch the definitions

A quick one for a Friday:

I’ve blogged before about paying careful attention to the definition of words used in study results.  It is often the case that the definition used in the study/statistic may not actually match what you presume the definition is.

Eugene Volokh posted a good example of this today, when he linked to this op-ed in the Detroit Free Press.  It cites a spokesperson from the Violence Policy Center who states that “Michigan is one of 10 states in which gun deaths now outpace motor vehicle deaths”.

My knee jerk reaction was that seemed high, but my tired Friday brain probably would have kept skimming.  Then I read why Volokh was posting it:

The number of accidental gun deaths in Michigan in 2009 (the most recent year reported in WISQARS) was … 12, compared to 962 accidental motor-vehicle-related deaths. 99% of the gun deaths in Michigan that year consisted of suicides (575) and homicides (495).

To be honest, I had presumed homicides were included, but suicide death didn’t even occur to me.   I’d be interested to see how many of the vehicular deaths were suicides, my guess is the percentage would not be as high as in the gun case.  Either way, I’m sure I’m not the only one who didn’t realize what was being counted.

Watch the definitions, and have a fabulous Memorial Day weekend!

More census data….the minority-majority issue

I was happy to see that my post from yesterday  got an excellent comment from Glenn, a former Census Bureau employee.  He let me know that it was likely the sample they used was actually a stratified cluster sample, which is not exactly what I had surmised, but close.

As I was looking up more info on some of the Census Bureau data, I ran in to a fascinating column from Matthew Yglesias over at Slate.com.  In it, he describes his experience filling out the census form, and how his own experience made him question some of the data being released.

In specific, he questioned the recent headline that we are quickly heading towards a minority-majority society.  He mentions that as a 25% Cuban man, he looks very white, but was not sure how to answer the question regarding whether he was “Hispanic in origin”.  If he wasn’t sure how to answer a race question, how many others were in his boat?  He further comments that as people continue to become increasingly of mixed racial background (keeping in mind that 1 out of 12 marriages is now mixed race) it is much more likely that we will have to shift our concept of what “white” is to keep up with the times.

As Elizabeth Warren can tell you, percentage of heritage matters….but where do we draw the line?  If 3% Native American isn’t enough, how much is?  I mean that quite literally.  I don’t know.

In my cultural competency class in school, we had a fascinating example of racial confusion.  One of the girls I sat next too mentioned that her grandparents were from Lebanon, had immigrated to South America, her parents were both born there, married, moved to the US, and that’s where she was born.  Her skin was fair, she was fluent in Spanish, and she felt she spent her life explaining that she was genetically Arabic, ethnically South American and culturally American.  I don’t know what she checked off on the census, but I’m sure nothing captured that particular combination accurately.

As times change, so do our ideas of race. When reading the history of census racial classification, it’s hard to disagree with Yglesias’ assertion that today’s racial breakdown will not be comparable to whatever breakdown we have in ten years.  That’s a good thing to keep in mind when analyzing racial data.

 Racial numbers are as good as the categories we have to put them in.   

The (ACS) Devil and Daniel Webster

As a New Hampshire native, I am prone to liking people named Daniel Webster.

It is thus with some interest that I realized that the Florida Congressman who is sponsoring the bill to eliminate the American Community Survey happens to share a name with the famous NH statesman.  I have been following this situation since I read about it on the pretty cool Civil Statistician blog, run by a guy who runs stats for the census bureau.

Clearly there’s some interesting debate going on here about data, analysis, role of the government, and the classic “good of the community vs personal liberty” debate.

I’m going to skip over most of that.

So why then, do I bring up Daniel Webster?

Well, I was intrigued by this comment from him , as reported in the NYT article on the ACS:

“We’re spending $70 per person to fill this out. That’s just not cost effective,” he continued, “especially since in the end this is not a scientific survey. It’s a random survey.”

It was that last part of the sentence that caught my eye.

I was curious, first of all, what the background was of someone making that claim.  I took a look at his website, and was pleased to discover that Rep. Webster is an engineer.   It’s always interesting to see one of my own take something like this on (especially since Congress only has 6 of his kind!).

That being said, is a random survey unscientific?

Well, maybe.

In grad school, we actually had to take a whole class on surveys/testing/evaluations, and the number one principal for polling methods is that there is no one size fits all.  The most scientifically accurate way to survey a group is based on the group you’re trying to capture.  All survey methods have pitfalls.   One very interesting example our professor gave us was the students who tried to capture a sample of their college by surveying the first 100 students to walk by them in the campus center.  What they hadn’t realized was that a freshman seminar was just letting out, so their “random” survey turned out to be 85% freshman.  So over all, it’s probably worse when your polling methodology isn’t random than when it is.

There’s all kinds of polling methods that have been created to account for these issues:

  • simple random sampling – attempts to be totally random
  • systematic sampling – picking say, every 5th item on a list
  • stratified sampling – dividing population in to groups and then picking a certain percentage from each one (above this would have meant picking 25 random people from each class year)
  • convenience sampling – grabbing whoever is closest
  • snowball sampling – allowing sampled parties to refer/lead to other samples
  • cluster sampling – taking one cluster of participants (one city, one classroom, etc) and presuming that’s representative of the whole
There are others, though most subtypes off of these types (see more here).
So what does the ACS use?  
As best I can tell, they use stratified sampling.  They compile as comprehensive a list as they can, then they assign geocodes, and select from there.  So technically, their sampling is both random and non-random.   

Now, NYT analysis aside, I wonder if this is really what Webster was questioning.  The other meaning one could take from his statement is that he was challenging the lack of scientific method.  As an engineer, he would be more familiar with this than with sampling statistics (presuming his coursework looked like mine).  What would a scientific survey look like there?  Well, here’s the scientific method in a flowchart (via Sciencebuddies.org):

So it seems plausible he was actually criticizing the polling being done, not the specific polling methodology.  It’s an important distinction, as all data must be analyzed on two levels: integrity of data, and integrity of concept.   When discussing “randomness” in surveys, we must remember to acknowledge that there are two different levels going on, and criticisms can potentially have dual meanings.

Back It Up!

One of the more thought provoking moments of my high school career came from a youth pastor who decided to find an amusing way of calling out a bunch of church kids.  It seemed he had at some point grown weary of hearing too many good church going adolescents start sentences with “Well it says in the Bible….” when what they actually mean was “I heard in a sermon/my Dad says/my mom believes/I read this book once/I’m pretty sure this is true”.   Anyway,  he was a clever sort of youth pastor, and he realized that calling out and/or publicly shaming offenders would probably lead to lots of discord, hurt feelings, and possibly calls from parents, so he decided to take a different tack.

Starting with a few key young gentleman, he began to tell everyone that whenever they heard the phrase “It says in the Bible” they were all allowed (in fact encouraged) to all yell “BACK IT UP!!!!”  At this point whoever had made the claim had to stop and find a verse to back themselves up and read it to the group.  If they couldn’t find a verse quickly, the conversation continued, and they were condemned to sitting rifling through the concordance until they either admitted they couldn’t find it, or they stayed quiet for the rest of the conversation.

I bring this up because I WANT THIS TO BE A THING.

Wouldn’t it be great if we could do this with research?  I bet the story I mentioned in my post this morning wouldn’t have happened if every time we heard/read someone saying “Research shows” we could all scream “BACK IT UP!!!” and then silence them until they found the proper citation (and no, Wikipedia and Malcolm Gladwell would not count as actual citations).

For the printed word on the internet, we need some sort of meme for this that people could leave in comments sections of articles with vague “research” claims…perhaps a gif of some sort (where are the 4channers when I need them???).  I took a poke around the internets and this is the best I could find was this lady:

GIFSoup

I think this could work.  There has to be an unemployed journalist or two out there who could help me spread this around.

Sometimes people just make things up

ALWAYS LOOK FOR A PRIMARY SOURCE.

THEN MAKE SURE THAT PRIMARY SOURCE SAYS WHAT THEY SAY IT SAYS.

Sorry for the caps lock, but some people seem to doubt that other would actually fabricate stats to prove a point.  They do, and in the New York Times no less.

H/T Instapundit.

Correlation and Causation: Real World Problems

In yesterday’s post, I got a bit worked up over sloppy reporting on a study on dietary interventions in pregnancy. 

This led to an interesting comment from the Assistant Village Idiot regarding weight gain recommendations for pregnant women.  The current weight gain recommendation is 25 – 35 pounds for a normal BMI woman, but AVI commented that it used to be much lower, and that women were hospitalized to stop them from eating too much.
I didn’t actually know that, so I immediately decided to look it up.  
I stumbled on to a fascinating presentation put together by an OB at UCSF on the history of maternal weight gain recommendations (link goes to the PowerPoint slides).  It not only confirmed what AVI had mentioned, but also gave some of the reasoning….which turned out to be a very interesting example of people erroneously conflating correlation with causation.  
Apparently part of the reason why they (they being doctor’s circa 1930) were so nervous about weight gain in pregnancy was that they were trying to prevent preeclampsia.  Now preeclampsia is a life threatening condition if left untreated, and one of the warning signs is rapid weight gain.  Apparently some doctors actually thought that the symptom was the cause, and believed that all excessive weight gain was a sign the patient was about to become preeclamptic.  Thus, the theory went, limiting weight gain would prevent preeclampsia and aid in “figure preservation” to boot*.  
Sadly, this also led to higher infant mortality, disability, and mental retardation….which seems a pretty steep price to pay for what was really a data analysis error.  As I’ve said before, this is why statistics are so relevant in medicine….the cost for getting things wrong is too steep to not be careful.
*To note, it is actually true that preeclampsia is linked to higher weight/glucose/insulin production….but the way they went about addressing it did as much harm to the fetus as good.  Current weight gain recommendations are set to optimize outcomes for the babies, not the mothers.  

When in doubt, blame the journalist: prenatal dieting edition

Sometimes bad science reporting makes me laugh, and sometimes it actually kind of stresses me out.  This is one of the “this stresses me out” times.

The headline reads: Diet during pregnancy is safe and reduces risk for complications, study finds

Now aside from being a bit on the garbled side, it’s a pretty provocative headline.  As someone who has been in and out of obstetrician’s offices for the past 7 months or so, it also runs counter to everything I’ve been told.  According to this write-up however, here’s a few things this study found:

 Is it safe for a pregnant woman to go on a diet? According to a new study, not only is it safe, but it can even be beneficial and reduce the risk of dangerous complications.

That would seem to contradict what my doctor has told me….but let’s read on (to what they found about dieting methods):

The researchers found that all three methods reduced a mother’s weight, but diet showed the greatest effect with an average reduction of almost 9 pounds. Pregnant moms who only exercised lost about 1.5 pounds, and moms who did a combination of diet and exercise lost an average of 2.2 pounds.

So they had mothers to be lose weight during pregnancy?  That seems….extra wrong….but go on:

Women who went on a calorie-restricted diet were 33 percent less likely to develop pre-eclampsia, a spike in blood pressure caused by significant amounts of protein in the urine.

Wait, now I know he’s just phoning it in.  Pre-eclampsia is not high blood pressure caused by protein in the urine, it’s high blood pressure AND high protein in the urine….in fact the Mayo Clinic article he links to says so.  

At this point, I took a look at the original study, and found other “oops” moments in the reporting.  First, the study never looked at “diets”.  What they actually looked at was “dietary interventions”…which they describe as follows:

Typical dietary interventions included a balanced diet consisting of carbohydrates, proteins, and fat and maintenance of a food diary. 

Since this was a meta-analysis, I took a look at the references, and in fact only one study cited directly looked at caloric restriction….the sort of thing most of us think of when we hear the word “diet”.

Furthermore, that part about the women’s weight being reduced?  It wasn’t.  Their weight gain was reduced.   …something the study authors are clear about, but the subsequent write up completely leaves out.

I actually got a little angry about this.  You can feel free to blame pregnancy hormones, but I find this sort of thing is just irresponsible.  CBS is a major news network, and people are going to take what they say seriously.  As the Assistant Village Idiot likes to point out, people believing faulty science on small things can be funny and doesn’t matter much….but when you realize bad studies could actually affect the way people live, it gets scary.  Someone following this story could do some real damage.  In fact, the article does get clearer towards the end (when it quotes the original study author), but that’s 6 paragraphs in.  It drives me nuts that a good a carefully thought through study can get reported so sloppily and potentially dangerously.  There is a world of difference between what most of us think of when we say “diet” and what the researchers here described, which was essentially just formalized pre-natal nutritional counseling.

Overall, real dieting during pregnancy is still dangerous….and can backfire in a big way.  Mother’s who are forced to restrict calories during pregnancy (famine victims, etc) actually wind up having children who are more likely to be obese and develop diabetes.  As a side note, one of the most fascinating studies on this is the Dutch Famine Study where mother’s who had temporary famine conditions during pregnancy could be studied for the long term effects on the children.

This is why it matters that the media report things correctly.  People should not walk away from reading about good science with bad ideas.  Words like “diet” or “weight reduction” do not mean the same thing as “dietary interventions” or “weight gain reduction”. No one should have to read to paragraph 7 to get accurate information.  That’s just bad form.

The only thing that could have made this story worse would have been an infographic.  I’m going to have nightmares about that tonight.

Friday Fun Links 5-18-12

When someone who writes about bad science for a living calls something “The worst government statistic ever created“, you know it’s going to be good.

Okay, that report was from the UK….now do you US folks want to know what’s wrong with your state?  Massachusetts has blisters, apparently.

If there’s something wrong with this data, I don’t want to know about it.  There is no such thing as strong coffee, only weak people.

I kid actually, the above study has all the normal problems of nutritional research.  The Time write up did give me the quote of the week however:

Since the study was observational only, the authors couldn’t conclude that coffee drinking actually reduces death risk.

Gee, with a headline like “Coffee: Drink More, Live Longer?” I can’t see why anyone would jump to that conclusion.  Also, I kinda hate the phrase “death risk”.  Unless we’re about to get in to an eschatology debate, I’m pretty sure my death risk is 100%, no matter how much coffee I drink.

Moving on, the Pew Research Group started meta-analyzing their own analysis…with sad results.  
On a perkier note, if you want to win your weekend geek-off, here’s a (NSFW…sorta) guide to why Tesla > Edison…even with that whole pigeon thing.