Never trust an infographic over 30

I’ve been tinkering with improving my data visualization skills recently, as I’m sick of using nothing but Excel (although if you want to continue using Excel for everything, this is a pretty useful website).

As anyone who takes a look around the interweb can tell you though, there is a pretty insidious type of data visualization that’s been flooding our society.

Oh yes, I’m talking about the infographic.

While sometimes these are endearing and amusing, they are often terrible, misleading and ridiculous.  I was going to formulate some thoughts on why they were terrible, and then I found out that Megan McArdle already had in a column for the Atlanic.  It’s a pretty good read with lots of pictures.  Her summation at the end pretty much says it all:

If you look at these lovely, lying infographics, you will notice that they tend to have a few things in common:
  1. They are made by random sites without particularly obvious connection to the subject matter. Why is Creditloan.com making an infographic about the hourly workweek?
  2. Those sites, when examined, either have virtually no content at all, or are for things like debt consolidation–industries with low reputation where brand recognition, if it exists at all, is probably mostly negative.
  3. The sources for the data, if they are provided at all, tend to be in very small type at the bottom of the graphic, and instead of easy-to-type names of reports, they provide hard-to-type URLs which basically defeat all but the most determined checkers.
  4. The infographics tend to suggest that SOMETHING TERRIBLE IS HAPPENING IN THE US RIGHT NOW!!! the better to trigger your panic button and get you to spread the bad news BEFORE IT’S TOO LATE!
If that’s too many words for you though, she also includes this graphic:

So while the infographic can be quite useful when tamed and sedated, if you meet one in the wild, be very very careful.  Do not approach directly, do not look it in they eye.  


Friends don’t let friends use lousy infographics (I’m looking at you facebook).

Weekend Moment of Zen 4-29-12

Since my mother still doesn’t agree with any of my food desert postings, I thought of this comic.

Mom, I think we should consider that maybe obesity causes food deserts.  Think about I’m pretty sure I heard about obesity before I heard the phrase food desert.  I’m pretty sure that proves something.

Circumventing the Middle Man

Well, my post on justifiable skepticism (Paranoia is just good sense if people actually are out to get you) certainly was the big winner for traffic/comments this week.  I was happy to see that…I had a lot of fun putting that graph together and thought the outcomes were pretty striking.  Thanks to Maggie’s Farm for linking to it.

It was my post on food deserts however, that got me the most IRL comments.  Both my mother and my brother commented on it, and not terrifically positively.  In retrospect, I wasn’t very clear about the points I was trying to make, though to be fair I had spent a lot of the day on an airplane.

My issue with food desert research, or any similar research, is that what we’re really talking about is a proposed proximate cause to a larger issue: obesity.  In my experience, just having people tell you why they think something’s happening, isn’t good enough to prove that’s the actual reason.  Thus my quibble with much of the theorizing about obesity problems….you have to make sure that what you’re theorizing is the cause is actually the cause (or one of the causes) before you start dumping money in to it.  You cannot make the middle man the holy grail if you haven’t established that it’s really a cause.

Unfortunately, people love to jump on good ideas before truly establishing this link.

Example:  A few years ago, it was discovered that 22% of school children were eating vending machine food.  This school had an obesity problem, the food in the vending machines was unhealthy, so a push began to remove vending machines from schools.  Schools balked, as they make money from vending machines, but the well being of children came first…..until of course this study came out proving that reducing access to vending machines didn’t actually effect obesity rates.   Oops.

It’s really a simple logic exercise…proving that kids are (a) obese and (b) eating from vending machines does  not actually prove that getting rid of (b) will reduce (a).

That’s why I liked the research in to the difference food deserts make in obesity.  It’s a question that needs to be asked more often when trying to address a large issue:  are we sure that the issue we’re trying to address will actually help the issue we were concerned about it the first place???


If you haven’t established that it will, then be careful with how you proceed.  Addressing food deserts (or vending machines or whatever) is  a means to an end, and you shouldn’t confuse it with the end itself…unless you have really good data backing you up.

Trillion Dollar Debt Day

Bias alert:  I graduated college with a LOT of debt.  It was nearly ten years ago, but I was still far above the current average widely reported in the media.  In 3 years, I had paid off all but one loan that was locked at 2.3% interest.  I paid that off two years later due to the fact that Sallie Mae is an absurdly evil company and I was sick of dealing with them.  All in all, I was debt free 20 years earlier than projected and today have zero debt from either my bachelor’s or master’s degree.

Now, all that being said, I guess I can’t feel too left out that I didn’t get invited to the student protest that was Trillion Dollar Debt Day.  Apparently yesterday was the day that total student loan debt in this country hit $1,000,000,000,000.  Want to see it in real time?  Here you go: 

http://www.finaid.org/loans/studentloandebtclock.html

Anyway, student debt is a complicated issue with lots of statistics ripe for dissection.  Actually, the debt really isn’t that complicated….it’s there because college costs have gone up far more than average household income has, and more people are going for both grad and undergrad degrees.  What’s complicated is how people interpret what to do with these statistics.  For example (from the clock website above):  “Student loan debt, on the other hand, as been growing steadily because need-based grants have not been keeping pace with increases in college costs.” Not hard to see what that websites solution would be to this issue.

The 1 trillion number is impressive, but it is not often mentioned how heavily the increase in debt level correlates with how sharply the number of students have gone up.  According to the National Center for Education Statistics “enrollment in degree-granting postsecondary institutions increased by 9 percent between 1989 and 1999. Between 1999 and 2009, enrollment increased 38 percent, from 14.8 million to 20.4 million.”  Nearly 6 million people extra people in 10 years, combined with rising costs and a recession…that will make that number shoot up in a hurry.

In the past 5 years, the average debt per graduating college student (bachelor’s level) has only gone up by about $4000, unadjusted, or $2500 in adjusted dollars.

Year Average Debt Average Debt (2010 $) Median Earnings Median Earnings (2010 $) Debt:Earnings (inflation-adjusted)
2006 $21,100 $22,822 $45,221 $48,912 0.47
2007 21,900 23,032 46,805 49,224 0.47
2008 23,200 23,497 47,094 47,696 0.49
2009 24,000 24,394 47,510 48,289 0.51
2010 25,250 25,250 47,422 47,422 0.53

Sources: Project on Student Debt, U.S. Census American Community Surveys (1-year estimates, 2006-2010), Bureau of Labor Statistics CPI Inflation Calculator.

  You multiply even that amount over 20.4 million however, and the levels start reaching crisis proportion.  Additionally, these “average” numbers, while reported very exactly, are all self reported by the schools.  Also, out of the 2,300 schools they asked, 500 were tossed for identification reasons, and about 300 just didn’t report anything.  This makes these numbers highly suspect.

Overall, I’m not saying there’s not a crisis.  I work in health care, and it’s totally ludicrous to me that while we’re all scrambling to cut costs as fast as we can, higher education is not doing the same. I’ve also had a mortgage for nearly as long as I had my student loans, and I can tell you that my mortgage company has not once pulled any of the disgusting shenanigans that Sallie Mae pulled with my student loans.  I used to have to save my receipts because they, I kid you not, used to ADD small amounts of money to my balance at random.  I would then have to spend 45 minutes on the phone with them proving that this had happened.  I was always right, they would merely “apologize for the misunderstanding”.

However, with this issue, as with so many others, watch the numbers when emotions run high.  People love to throw data at others in these moments, knowing it won’t be questioned.  Business Insider, for example, claims that “For many of you, your degrees won’t matter. One-third of you will land full-time jobs that don’t require them.”  They don’t mention that’s 33% of 500 people who just graduated.  Check back in 5 years, BI, then show me the numbers.

Begin with the end in mind

Most of what I do all day is in the loose category known as operations research.  This is an interesting sort of research that typically starts with a question, and then involves gathering qualitative and quantitative data until you get a hypothesis.  Adjustments are made until you get going in the right direction, which is normally related to either getting more of a good thing or less of a bad thing…or often both.

This is my favorite type of research for any field for a variety of reasons: it’s practical, it helps people, it tends to cut through feelings and deals with facts, and it leaves room for people to be surprised.   
The downside is that the questions are often complex and the answers multi-dimensional.  That’s why good research of this kind is so darn impressive.  I read a great article today about Jacqueline Campbell and her work to reduce domestic homicide.  She started with a complex problem, and worked both forward and backwards until she came up with something that worked.  Working backwards, she went deep in to the statistics to figure out which situations were the most likely to result in homicide, and then trained the front end responders how to reach out to those who were at the most risk.  While she will not claim credit, it is noted that  the state where she implemented this program (Maryland) has cut their domestic homicide rate in half.  
Domestic violence is an issue that can very easily get mired down in politics and emotion, so it’s interesting to note that this is one of the few programs that is getting bipartisan support.  That’s such a good outcome when somebody actually pragmatically addresses an issue rather than just catering to their own pet theories.
To note: starting research with a goal in mind is beneficial only when it’s not a guise to push an agenda.  It’s only good if you really don’t know how to get there.   I feel this is research at it’s best, research that actually helps a real world problem.  I have nothing against research that helps us see the world in new ways, but my practicality bias is probably why I did engineering and not theoretical physics.  It takes all types, I just wish more would focus on the “how do we get there” type questions.

The rise of the datasexual

Datasexual…apparently it’s a thing.

Sometimes I worry that’s what I might become…obsessed with my own personal data, quantifying myself until there’s nothing left that can’t be counted.  I already have an embarrassing number of spreadsheets in google docs dedicated to tracking all sorts of things in my life….7 I’m currently updating regularly.

Normally, my love for efficiency saves me though.  In healthcare, there’s a pretty unending stream of data, so we’ve had to learn how to sort through to what’s useful.  If you don’t know how you’ll use it immediately, or at least have a very large hunch, we don’t collect it.

If efficiency doesn’t work as a motivator, I figure that’s a sign I need to get outside.  Good thing I have a dog to remind me to do that.

In case you’re curious, on a sunny day like today, he’ll walk for an average of 24.6 minutes, with a standard deviation of 3.3, highly dependent on whether or not we see the UPS guy go by.  He HATES the UPS guy.

Paranoia is just good sense if people really are out to get you

Yesterday I posted about retractions in scientific journals, and the assertion that they are going up.  I actually woke up this morning thinking about that study, and wishing I could see more data on how it’s changed year to year (yes, I’m a complete nerd…but what do you ponder while brushing your teeth????).  Anyway, that brought to mind a post I did a few weeks ago, on how conservatives trust in the scientific community has gone steadily down.

It occurred to me that if you superimposed the retraction rate of various journals over the trust in the scientific community rates, it could actually be an interesting picture.   It turns out PubMed actually has a retraction rate by year available here.  For purposes of this graph I figured that would be a representative enough sample.

I couldn’t find the raw numbers for the original public trust study, so these are eyeballed from the original graph in blue, with the exact numbers from the PubMed database in green.  
So it looks like a decreasing trust in the scientific community may actually be a rational thing*.  
It’s entirely possible, by the way, that the increased scrutiny of the internet led to the higher retraction rate…but that would still have given people more reasons not to blindly trust.  As the title of this post suggests, skepticism isn’t crazy if you actually should be skeptical.
Speaking of trust, I obviously had to manipulate the axes a bit to get this all to fit.  Still not sure I got it quite right, but if anyone wants to check my work, the raw data for the retraction rate is here and the data for the public trust study is here.  These links are included earlier as well, just wanted to be thorough.  
*Gringo requested that I run the correlation coefficients.  Conservatives r = -0.81 Liberals r = 0.52 Moderates r = 0.  I can’t stand by these numbers since my data points were all estimates based on the original chart, but they should be about correct.

Bad data vs False Data

We here at Bad Data Bad would like to note that when we pick studies to criticize, we operate under the assumption that what the studies actually published are accurate, and that most of the mistakes are made in the interpretation or the translation of those findings in to news.

This article from the New York Times last week reminds us that this may not always be a good assumption.

A few fabricated papers have managed to make news headlines over the past few years….the Korean researcher who said he’d cloned a stem cell….the UConn researcher who falsified data in a series of papers on the health benefits of red wine….and a Dutch social scientist who faked entire experiments to get his data.

This is where the scientific principle of replication is supposed to step in, and why it’s always a decent idea to withhold judgement until somebody else can find the same thing the first study did.  Without that, it’s nearly impossible to know if someone falsified their data, without people in their own lab blowing the whistle.

If you’re curious about these retractions, the Retraction Watch blog is a pretty good source for papers that get yanked.

Food Deserts and Big City Living

The Assistant Village Idiot did a good post on a new report on the prevalence of “food deserts” and if this was the crisis it’s been reported to be.

While I will point out that the study refuting the idea of food deserts uses self reported data for height, weight and eating habits (check out my previous post on this issue), I was glad to see someone take this issue on.  Food deserts reporting has always fascinated me, mostly because I lived in the middle of the Boston area (albeit in different locations) for about 9 years.   The food desert idea always sort of baffled me, and when I took a look at the USDAs food desert locator, I notice that the only part of Boston proper or the close suburbs that qualifies as a food desert is…..Logan Airport.

I currently live in a suburb that is near 2 food deserts, so checking those out was interesting as well.  One is actually a small peninsula, and I happen to know you have to drive by a grocery store to get on the main route out there.  The other is next to the docks.

For cities, this data gets complicated by the fact that many very small grocers sell all sorts of produce in small spaces that wouldn’t make the list.  For rural areas, personal gardens are not counted.  I also liked that the article pointed out that some people researching this have done grocery stores/1000 people, a metric which make cities look bleak.  That’s a classic case of needing to review why you actually want the data.  A busy grocery store is not a lack of a grocery store.  Additionally, I have never seen one of these surveys that added in farmer’s markets or grocery store delivery services.  While not always the cheapest option, delivery services allowed me (when I was a broke college student) to buy in bulk and save money other ways.  They run about $7 ($5 when I was in college), when a train ride to and from the store was $4 round trip, and a taxi would have been at least $10 (not counting ghost taxis that exist almost exclusively in front of city grocery stores and help you with your groceries for around $5).

Overall, I’m sure access is an issue for some people, I just balk when people who don’t live in the middle of cities on a limited budget like I did try to tell me what it’s like.  I DO think that before we flip out about an issue, doing research as to how much access really affects obesity is key.  The number of regulations and reforms that get pushed without any data proving their relevance staggers me, and I’m glad to see someone questioning the wisdom in this case.