Paranoia is just good sense if people really are out to get you

Yesterday I posted about retractions in scientific journals, and the assertion that they are going up.  I actually woke up this morning thinking about that study, and wishing I could see more data on how it’s changed year to year (yes, I’m a complete nerd…but what do you ponder while brushing your teeth????).  Anyway, that brought to mind a post I did a few weeks ago, on how conservatives trust in the scientific community has gone steadily down.

It occurred to me that if you superimposed the retraction rate of various journals over the trust in the scientific community rates, it could actually be an interesting picture.   It turns out PubMed actually has a retraction rate by year available here.  For purposes of this graph I figured that would be a representative enough sample.

I couldn’t find the raw numbers for the original public trust study, so these are eyeballed from the original graph in blue, with the exact numbers from the PubMed database in green.  
So it looks like a decreasing trust in the scientific community may actually be a rational thing*.  
It’s entirely possible, by the way, that the increased scrutiny of the internet led to the higher retraction rate…but that would still have given people more reasons not to blindly trust.  As the title of this post suggests, skepticism isn’t crazy if you actually should be skeptical.
Speaking of trust, I obviously had to manipulate the axes a bit to get this all to fit.  Still not sure I got it quite right, but if anyone wants to check my work, the raw data for the retraction rate is here and the data for the public trust study is here.  These links are included earlier as well, just wanted to be thorough.  
*Gringo requested that I run the correlation coefficients.  Conservatives r = -0.81 Liberals r = 0.52 Moderates r = 0.  I can’t stand by these numbers since my data points were all estimates based on the original chart, but they should be about correct.

Bad data vs False Data

We here at Bad Data Bad would like to note that when we pick studies to criticize, we operate under the assumption that what the studies actually published are accurate, and that most of the mistakes are made in the interpretation or the translation of those findings in to news.

This article from the New York Times last week reminds us that this may not always be a good assumption.

A few fabricated papers have managed to make news headlines over the past few years….the Korean researcher who said he’d cloned a stem cell….the UConn researcher who falsified data in a series of papers on the health benefits of red wine….and a Dutch social scientist who faked entire experiments to get his data.

This is where the scientific principle of replication is supposed to step in, and why it’s always a decent idea to withhold judgement until somebody else can find the same thing the first study did.  Without that, it’s nearly impossible to know if someone falsified their data, without people in their own lab blowing the whistle.

If you’re curious about these retractions, the Retraction Watch blog is a pretty good source for papers that get yanked.

Food Deserts and Big City Living

The Assistant Village Idiot did a good post on a new report on the prevalence of “food deserts” and if this was the crisis it’s been reported to be.

While I will point out that the study refuting the idea of food deserts uses self reported data for height, weight and eating habits (check out my previous post on this issue), I was glad to see someone take this issue on.  Food deserts reporting has always fascinated me, mostly because I lived in the middle of the Boston area (albeit in different locations) for about 9 years.   The food desert idea always sort of baffled me, and when I took a look at the USDAs food desert locator, I notice that the only part of Boston proper or the close suburbs that qualifies as a food desert is…..Logan Airport.

I currently live in a suburb that is near 2 food deserts, so checking those out was interesting as well.  One is actually a small peninsula, and I happen to know you have to drive by a grocery store to get on the main route out there.  The other is next to the docks.

For cities, this data gets complicated by the fact that many very small grocers sell all sorts of produce in small spaces that wouldn’t make the list.  For rural areas, personal gardens are not counted.  I also liked that the article pointed out that some people researching this have done grocery stores/1000 people, a metric which make cities look bleak.  That’s a classic case of needing to review why you actually want the data.  A busy grocery store is not a lack of a grocery store.  Additionally, I have never seen one of these surveys that added in farmer’s markets or grocery store delivery services.  While not always the cheapest option, delivery services allowed me (when I was a broke college student) to buy in bulk and save money other ways.  They run about $7 ($5 when I was in college), when a train ride to and from the store was $4 round trip, and a taxi would have been at least $10 (not counting ghost taxis that exist almost exclusively in front of city grocery stores and help you with your groceries for around $5).

Overall, I’m sure access is an issue for some people, I just balk when people who don’t live in the middle of cities on a limited budget like I did try to tell me what it’s like.  I DO think that before we flip out about an issue, doing research as to how much access really affects obesity is key.  The number of regulations and reforms that get pushed without any data proving their relevance staggers me, and I’m glad to see someone questioning the wisdom in this case.

Fun Quotes for Friday

Intuition becomes increasingly valuable in the new information society precisely because there is so much data.
John Naisbitt

It is a capital mistake to theorize before one has data.
Arthur Conan Doyle

I do not think it means what you think it means….

Oh teamwork.

I sat in a fascinating talk yesterday about some pretty interesting team failures.  One in particular stuck out to me: two teams, working on the east and west coast, funded by a huge grant from the NSF.  One team was tasked with building a database, the other was going to populate it with all of the data.  A year’s worth of work later, it was discovered that the two teams had never clarified what they meant by several words (including the word data) and that the whole thing was completely useless.  
Oops.  
Now, there are several lessons in that story, but one of them is the importance of knowing what certain words mean to the people who are saying them.  This can be a big issue in reading research and interpreting data, especially around popular public health type issues.  There are many issues….”rape” “excessive drinking” “binge eating” and “substance abuse” to name a few….that people tend to believe there is one hard and fast definition for.  When reading studies on these things, always verify that the authors definition matches your own.  In looking for good examples of this, I found this report on some drinking statistics that were being floated around a few years ago.  

A new study from Columbia University’s National Center on Addiction and Substance Abuse (CASA) claims that adults who drink excessively and youths who drink illegally account for over half of the alcohol consumed in the United States, and that the alcoholic beverage industry makes too much money from these groups to ever voluntarily address the problem.

 The article goes on to point out that if you look at the data, “excessive drinking” was defined as more than two servings of alcohol in one day, with no respect for height, weight, or frequency.  I somehow doubt this is the picture most people got when they read “adults who drink excessively”.

This comes up a lot in studies that have psychiatric diagnoses attached as well.  I have a friend who works with eating disorders who gets annoyed to no end that you can’t technically call someone anorexic until they’re 15% under a healthy body weight or have had their period stop, even if they stop eating for weeks.  Not many people know that up until this year, the FBI defined rape as something that could only happen to women.

Things to watch out for.

Boston vs Chicago

This week, Bad Data Bad is coming to live from downtown Chicago, just a few feet away from the Magnificent Mile!

I’m at the Science of Team Science conference, and so far it’s going pretty well.  I got a chance to present and discuss some of my research with people last night, and it’s fun having people recognize more of the psych aspect of what I’ve been doing.  Your normal bone marrow transplant crowd really doesn’t care about that part of anything, so it was nice to have people recognize the theories behind what I was doing.  They’re posting the abstracts online at some point, I’ll link to them when I figure that out.

Anyway, on my flight out here the data geek in me realized that a Boston/Chicago comparison would be a great input for the Google Ngram Viewer.  If you haven’t played with this yet, it’s fun.  Basically it tracks how many times the words you put in were mentioned in books over the last 200 years.  They uploaded a massive number of books to get the data, so the results are kind of fun.  Here’s Boston vs. Chicago:

For reference, Chicago wasn’t founded until 1837.  I tried running it starting at Boston’s founding in 1630, but that made a weird spike that made the rest of the graph look silly.  My guess is that’s a function of fewer books from that era loaded in to the database, since the y-axis is percentage.

For more about the project behind google ngrams, here’s my good friend TED to explain:
http://video.ted.com/assets/player/swf/EmbedPlayer.swf

Why career advice on the Internet can be total crap

I like nurses, though I’ve never wanted to be one.  My mother’s a nurse, my sister will be in a year or so.  Most of my best projects have been done in conjunction with nursing departments.  Due to my proximity to lots and lots of nurses, I tend to hear a lot about the ups and downs of the profession.

Given that, this article annoyed the heck out of me.

The headline reads “How To Land A $60K Health Care Job With A Two-Year Degree”, and being curious about the salaries of those around me, I took a peak.  I was stunned to see that the supposed “$60K job with 2 years of education” was nursing.  As proof, they offered the average annual salary for RN’s as $67,000 (backed up by the BLS here.  (The BLS actually used the median, which is slightly lower at $64,000).  They went on to mention that nurses in Massachusetts make an average of $84,000 a year.

Now that all sounds awesome, but here’s what’s deceptive:  RN is not a degree.  RN is a license.  Neither the Bureau of Labor Statistics nor this article differentiate between the salaries of those who get an RN after getting an associate’s degree, and those who get it after getting a bachelor’s degree.  It turns out there’s a lot of debate over how much of a difference this makes, but I can definitely speak to that Massachusetts salary number.  I work for one of the institutions that’s notorious for paying nurses extremely well.  They do not hire nurses who don’t have a BSN.  For most of the major Boston teaching hospitals, this is an increasing trend.  The Institute of Medicine is calling for 80% of nurses to be BSN educated by 2020, and many hospitals are responding accordingly.  Most management jobs are off limits to associate’s level nurses.

I’ll leave it to the nursing associations to debate whether all this is necessary or not, but I will bring up that taking an average of two different degrees with two different sets of job prospects and then not mentioning that it may be apples and oranges.  Additionally, even when nurses and nurse managers make the same amount, it’s often because one is overtime eligible (and works nights and evenings) and one doesn’t.  So overall, deceptive headline, designed to make people click on it.

Of course since I did click on it, I guess that worked.

Friday links for fun – 4.13.12

This will be completely lost on you if you’re not a Hunger Games fan, but the stats work/extrapolation is pretty damn impressive.

Professionally, I found this interesting….I can only get you the numbers, ma’am, I can’t make you use them wisely.

I haven’t talked much about small sample sizes, but this blog does.

These guys are my new heroes.  They noticed a statistical error that kept popping up in neuro research, and then went back and figured out how often people were getting it wrong….half of the studies that could have got it wrong did.  It’s a stat geeky read, but hears the story.