Friday Fun Links 5-18-12

When someone who writes about bad science for a living calls something “The worst government statistic ever created“, you know it’s going to be good.

Okay, that report was from the UK….now do you US folks want to know what’s wrong with your state?  Massachusetts has blisters, apparently.

If there’s something wrong with this data, I don’t want to know about it.  There is no such thing as strong coffee, only weak people.

I kid actually, the above study has all the normal problems of nutritional research.  The Time write up did give me the quote of the week however:

Since the study was observational only, the authors couldn’t conclude that coffee drinking actually reduces death risk.

Gee, with a headline like “Coffee: Drink More, Live Longer?” I can’t see why anyone would jump to that conclusion.  Also, I kinda hate the phrase “death risk”.  Unless we’re about to get in to an eschatology debate, I’m pretty sure my death risk is 100%, no matter how much coffee I drink.

Moving on, the Pew Research Group started meta-analyzing their own analysis…with sad results.  
On a perkier note, if you want to win your weekend geek-off, here’s a (NSFW…sorta) guide to why Tesla > Edison…even with that whole pigeon thing.




 

…and now for something completely different.

I don’t normally get that involved with sports statistics, if only because it’s the one place in the stats world where you could study them for an hour every day and still be barely a rookie.  However, something awfully strange is happening in my house recently, and I feel it’s worth mentioning:  the Orioles are leading the AL East (in fact the whole American League), and the Red Sox are last.

Now, this is particularly interesting to my household, as my husband happens to be a lifelong Orioles fan.  I on the other hand, have always been a Red Sox fan.  Since we met almost 6 years ago, this has pretty much meant that I have had exclusive bragging rights when it came to baseball.  I know it’s not even a quarter of the way in to the season, but this is the longest we’ve gone so far, and it’s surreal.

Yesterday, Grantland put up an article on the Orioles under .500 curse.  Apparently they have not finished over .500 since 1997….more than enough seasons for the baseball stats guys to go nuts with.  I was curious exactly how bad it was, so I looked around until I found this graph generator*.

For those of you who don’t know much about the Orioles, here’s what they’ve looked like since 1998

Yowza.  Even if this season doesn’t hang in there, it’s still the most encouraging thing to happen in 7 years or so.
Now, here’s the Red Sox in the same time period:
Yikes.  If they don’t pick it up soon this will be the worst they’ve started off in 15 years.  
Sweetly enough, if the Sox win tonight against the Rays, that will both increase the Oriole’s lead in the AL East, and look good for the Red Sox.  
Honey, the data proves it, tonight we’re both Red Sox fans.
*If it shows you how crazy sports stats people are, I found that graph generator in exactly one try on google.  Conversely, when I tried to find historic gas prices for this post, I searched for almost half an hour trying to find an official source for anything pre-1978.  Didn’t happen.  

Compensation Data for Mother’s Day

This year for Mother’s Day, use data to figure out how much you owe your mother for her pregnancy and labor.

It turns out I owe mine $99.28*.  I got some good discounts for my low birth weight and my early arrival.  I also got a decent “good offspring” discount for calling her this morning to wish her a happy Mother’s Day, so that was positive.  
Of course, one could quibble that perhaps a mother should not be charging her child for a pregnancy that the child did not have a say in….though the idea of issuing a bill to my own child in 12 weeks or so when he shows up is tempting.  For now though, I think I’ll pass the bill off to my Dad and see if he’d like to chip in.  I’m pretty sure the Edible Arrangement I sent her should cover my half. 
Good luck with the rest Dad.

Love you Mom!

*I am not even going to try to criticize this number.  There is absolutely no explanation for any of the numbers or why they vary the way they do.  This is actually somewhat refreshing to me.  Normally you have overly precise numbers being justified by vague guesses.  Here they don’t even pretend to have reasons.  I like the tacit admission of complete BS.  

Historical accuracy, ngram style

I’ve used google ngram’s a few times on this blog already, mostly for silly things, but this website has the best use of it I’ve seen so far.

He takes the scripts of Downton Abbey (WWI) and Mad Men (1960’s) and feeds them through the ngram to find out which phrases are the most anachronistic.

I find the whole project pretty cool, because apparently he took the whole project on as a response to a few magazine articles about phrases that wouldn’t have been said at the time.  It struck him that those phrases were just the ones that people could hear and think “hey, that sounds modern!”, but no one was thinking through what phrases we might have gotten so used to we weren’t even recognizing as out of place.

I’ve never seen Downton Abbey, and only seen an episode or two of Mad Men, but I still found it interesting what they got wrong.  The last episode of Mad Men apparently had an aspiring actress use the phrase “got a callback”, which apparently was barely used in a theater context at the time (he cross references the OED).  He also makes pretty charts, which I loved (this one is for Downton Abbey):

Overall, a very fun use of data.

What I missed

Apparently in my travels, I missed the series premier of a new History channel show: United Stats of America.

I was hoping it would be up my alley, but reading the synopsis makes me suspicious it’s going to be more about reciting cool numbers than figuring out if those numbers have any accuracy.  Sigh.

Greetings from Maine

After a treacherous journey up Route 1 (over an hour to clear the city of Boston), I’m pleased to tell you that we’re coming to you tonight from Portland, Maine.

I’m running a conference tomorrow at University of Southern Maine about bone marrow transplant patients who have to travel long distances….or as it’s more flourishingly called “Improving Patient Pathways for Complex Care Across Multiple Healthcare Systems”.  This is not my forte, and thus I have nothing long winded tonight….but after the stress of conference planning, I’m sure I’ll have to spend several weeks with nothing but numbers and spreadsheets before I calm down.

While we wait to see where that takes me, I thought I’d continue my pattern of figuring out a good Google Ngram for the trips I take.  This time I decided to run all the New England states to see who got mentioned the most.  

I’m happy to see Massachusetts made a strong showing.  Connecticut managed to eek a win over Maine, and it looks like Vermont, New Hampshire and Rhode Island have just been hanging out for years.

Never trust an infographic over 30

I’ve been tinkering with improving my data visualization skills recently, as I’m sick of using nothing but Excel (although if you want to continue using Excel for everything, this is a pretty useful website).

As anyone who takes a look around the interweb can tell you though, there is a pretty insidious type of data visualization that’s been flooding our society.

Oh yes, I’m talking about the infographic.

While sometimes these are endearing and amusing, they are often terrible, misleading and ridiculous.  I was going to formulate some thoughts on why they were terrible, and then I found out that Megan McArdle already had in a column for the Atlanic.  It’s a pretty good read with lots of pictures.  Her summation at the end pretty much says it all:

If you look at these lovely, lying infographics, you will notice that they tend to have a few things in common:
  1. They are made by random sites without particularly obvious connection to the subject matter. Why is Creditloan.com making an infographic about the hourly workweek?
  2. Those sites, when examined, either have virtually no content at all, or are for things like debt consolidation–industries with low reputation where brand recognition, if it exists at all, is probably mostly negative.
  3. The sources for the data, if they are provided at all, tend to be in very small type at the bottom of the graphic, and instead of easy-to-type names of reports, they provide hard-to-type URLs which basically defeat all but the most determined checkers.
  4. The infographics tend to suggest that SOMETHING TERRIBLE IS HAPPENING IN THE US RIGHT NOW!!! the better to trigger your panic button and get you to spread the bad news BEFORE IT’S TOO LATE!
If that’s too many words for you though, she also includes this graphic:

So while the infographic can be quite useful when tamed and sedated, if you meet one in the wild, be very very careful.  Do not approach directly, do not look it in they eye.  


Friends don’t let friends use lousy infographics (I’m looking at you facebook).