A Pie Chart Smorgasbord

This past week I was complaining about pie charts to a friend of mine, and I was trying to locate this image to show what I was complaining about:

Source.

I knew I had come across this on Twitter, and in finding the original thread, I ALSO discovered all sorts of people defending/belittling the lowly pie chart. Now I generally fall in the anti-pie chart camp, but these made me happy. I sourced what I could find a source on, but will update if anyone knows who I should credit.

First, we have the first and best use for a pie chart:

No other chart represents that data set quite as well.

Sometimes though, you feel like people are just using them to mess with you:

Source.

Sometimes the information they convey can be surprising:

But sometimes the conclusions are just kind of obvious:

And you have to know how to use them correctly:

They’re not all useless, there are some dramatic exceptions:

If you want more on pie charts, try these 16, uh, creative combinations, or read why they’re just the worst here.

Sharing Your Feelings

Yesterday morning during some random Twitter scrolling, I saw two interesting tweets in my feed that seemed a bit related. The first was one complaining about a phenomena that has been irritating the heck out of me recently :

//platform.twitter.com/widgets.js

If the embed doesn’t work, here’s the link. The first shot is some text from a Pacific Standard article about Lisa Durden’s firing. In it, the author claims that “In contrast to other free speech-related controversies on college campuses, there has been almost no media coverage of Durden’s ouster.” The Google news search however shows a different story….in fact many media outlets have covered the story.

Now this type of assertion always seems a little surprising to me for two reasons:

  1. We have absolutely unprecedented access to what people and news outlets are/are not reporting on, and any claim like this should be easy to verify.
  2. It’s an easy claim to modify in a way that makes it a statement of opinion, not fact. “there has been far less media outrage” would seem to preserve the sentiment without being a statement of fact.

Once I started thinking about it, I felt like I heard this type of assertion made quite frequently. Which of course got me wondering if that sort of hyper-attention was part of the phenomena. I think everyone knows the feeling of “I heard one reference to this issue/unusual word/obscure author and now I have seen it 5 places in two days”. I got to wondering….could a related (but opposite) phenomena happen when it came to people you disagreed with saying things? Were people purposefully ignoring or discounting reporting from outlets that didn’t fit their narrative, or were they actually not hearing/registering things that were getting said?

I started wondering further when in one recent case, a writer for the Federalist actually Tweeted out the links to her search results that “proved” the New York Times wasn’t covering a story about NSA abuses under Obama. However, the NYTs had actually covered the story (they broke it actually), and clicking on her links shows that their story was among the results she had been scanning over. She issued a correction Tweet a few hours later when someone pointed that out, which makes me doubt she was really trying to deceive anyone. So what made her look at the story and not see it?

Well, this brings me to the second Tweet I saw, which was about a new study about the emotional drivers of political sharing across social networks. I don’t have access to the full text of the paper, but two interesting findings are making headlines:

  1. For the issues studied (gun control, same-sex marriage, climate change), including moral-emotional language in your headline increased sharing by 20%
  2. This sharing increase occurred almost exclusively in your in-group. Liberals and conservatives weren’t sharing each others stories.

I’m speculating wildly here, but I wonder if this difference in the way we share stories contributes to perceptions that the other side is “not talking” about something. When something outrages my liberal (or conservative) friends, the same exact article will show up in my news feed 10 times. When the opposing party comments on it/covers it, they almost never share the same exact story, they comment/share different ones. They only comment on the same story when they oppose the coverage.

For example, in the NSA case above, the story that got Mollie Hemingway looking at search results was titled “Obama intel agency secretly conducted illegal searches on Americans for years.”. The ones she missed in the NYTs results was “N.S.A. Halts Collection of Americans’ Emails About Foreign Targets” and “How Trump’s N.S.A. Came to End a Disputed Type of Surveillance“. Looking at those 3 headlines, it’s easy to see why you could miss they were all talking about the same thing. At the same time, if you’re going to claim that a story isn’t being reported, you need to double check that it’s not just your feelings on the story that aren’t being mirrored.

And also lest I be a hypocrite here, I should talk about the time I committed this error because I failed to update my information. Back in February I made that error, claiming that TED didn’t update their webpage to reflect the controversy with Amy Cuddy’s research. I was right the first time I claimed it and wrong the second time. I could have sworn I rechecked it, but I either didn’t recheck when I thought I did, or I simply didn’t see the correction that got added. Was it because I was looking for a more dramatic correction, bold letters or some other sort of red flag? Yeah, I’d say that was part of it. TED does not appear nearly as concerned about the controversy as I am, but that doesn’t mean they failed to talk about it.

I need a name for this one I think.

Statisticians and Gerrymandering

Okay, I just said I was blogging less, but this story was too interesting to pass without comment. A few days ago it was announced that the Supreme Court had agreed to hear a case about gerrymandering, or the practice of redrawing voting district lines to influence the outcome of elections. This was a big deal because previously the court has only heard these cases when the lines had something to do with race, but had no comment on redraws that were based on politics. The case they agreed to hear was from Wisconsin, and a lower court found that a 2011 redistricting plan was so partisan that it potentially violated the rights of all minority party voters in the affected districts.

Now obviously I’ll leave it to better minds to comment on the legal issues here, but I found this article on how statisticians are getting involved in the debate quite fascinating. Obviously both parties want the district lines to favor their own candidates, so it can be hard to cut through the noise and figure out what a “fair” plan would actually look like. Historically, this came down to just two parties bickering over street maps, but now with more data available there’s actually a chance that both gerrymandering and the extent of gerrymandering can be measured.

One way of doing this is called the “efficiency gap” and is the work of Eric McGhee and Nicholas Stephanopolous, who explain it here. Basically this measures “wasted” votes, which they explain like this:

Suppose, for example, that a state has five districts with 100 voters each, and two parties, Party A and Party B. Suppose also that Party A wins four of the seats 53 to 47, and Party B wins one of them 85 to 15. Then in each of the four seats that Party A wins, it has 2 surplus votes (53 minus the 51 needed to win), and Party B has 47 lost votes. And in the lone district that Party A loses, it has 15 lost votes, and Party B has 34 surplus votes (85 minus the 51 needed to win). In sum, Party A wastes 23 votes and Party B wastes 222 votes. Subtracting one figure from the other and dividing by the 500 votes cast produces an efficiency gap of 40 percent in Party A’s favor.

Basically this metric highlights unevenness across the state. If one party is winning dramatically in one district and yet losing in all the others, you have some evidence that those lines may not be fair. If this is only happening to one party and never to the other, your evidence grows. Now there are obvious responses to this….maybe some party members really are clustering together in certain locations….but it does provide a useful baseline measure. If your current plan increases this gap in favor of the party in power, then that party should have to offer some explanation. The author’s proposal is that if the other party could show a redistricting plan that had a smaller gap, the initial plan would be considered unconstitutional.

To help with that last part, two mathematicians have created a computer algorithm that draws districts according to state laws but irrespective of voting histories. They then compare these hypothetical districts “average” results to the proposed maps to see how far off the new plans are. In other words, they basically create a normal distribution of results, then see how the current proposals line up. To give context, of the 24,000 maps they drew for North Carolina, all were less gerrymandered than the one the legislature came up with. When a group of retired judges tried to draw new districts for North Carolina, they were less gerrymandered than 75% of the computer models.

It’s interesting to note that some of the most gerrymandered states by this metric are actually not the ones being challenged. Here are all the states with more than 8 districts and how they fared in 2012. The ones in red are the ones facing a court challenge. The range is based on plausible vote swings:

Now again, none of these methods may be perfect, but they do start to point the way towards less biased ways of drawing districts and neutral tests for accusations of bias. The authors note that the courts currently employ simple mathematical tests to evaluate if districts have equal populations: +/- 10%.  It will be interesting to see if any of these tests are considered straightforward enough for a legal standard. Stay tuned!

What I’m Reading: June 2017

Happy Father’s Day folks! As summer approaches I’m probably going to be blogging just once a week for a bit as I fill my time with writing papers for my practicum/vacation/time on the beach. Hopefully some of those distractions will be more frequent than others. I figured that means it’s a great time to put up some links to other stuff.

First up, after 12 weeks of doing my Calling Bullshit read-along, I got a chance to interview the good professors for this piece for Misinfocon. Check it out! Also, they got a nice write up in the New Yorker in a piece about problems with big data. I have to say, reading a New Yorker writer’s take on a topic I had just attempted to write about was definitely one of the more humbling experiences of my life. Whatever, I was an engineering major, y’all should be glad I can even string a sentence together (she said bitterly).

I don’t read Mother Jones often, but I’ve seen some great stuff from them lately calling their own team out on the potential misuses of science they let fly. This piece about the World Health Organization’s decision to declare RoundUp a possible carcinogen raises interesting questions about certain data that wasn’t presented to the committee making the decision. It turns out there was a large study that suggested RoundUp was safe that was actually not shown to the committee, for reasons that continue to be a bit murky. While the reasons may or may not be valid, it’s hard to imagine that if that had been Monsanto’s data and it showed a safety issue anyone would have let that fly.

Speaking of calling out errors (and after spending some time mulling over my own) I picked up Megan McArdle’s book “The Up Side of Down: Why Failing Well is the Key to Success“. I just started it, but in the first few chapters she makes an interesting point about the value of blogging for development: unlike traditional media folks, bloggers can fail at a lower level and faster than regular journalists. By essentially working in public, they can get criticism faster and react more quickly, which over time can make their (collective) conclusions (potentially) better. This appears to be why so many traditional media scandals (she highlights the Dan Rather incident) were discovered and called out by bloggers. It’s not that the bloggers were more accurate, but that their worst ideas were called out faster and their best ones could more quickly rise to the top. Anyway, so far I’d recommend it.

This post about how the world-wide rate of population growth is slowing was interesting to me for two reasons: 1) I didn’t actually know the rate of growth had slowed that much 2) it’s a great set of charts to show the difference between growth and rate of growth and why extrapolation from visuals can sometimes be really hard.

I also learned interesting things from this Economist article about world wide beer consumption. Apparently beer consumption is falling, and with it the overall consumption of alcohol. This seems to be driven by economic development in several key countries like China, Brazil and Russia. The theory is that when countries start to develop, people immediately start using their new-found income to buy beer. When development continues, they start becoming more concerned about health and actually buy less beer and move on to more expensive types of alcohol. I never thought about this before, but it makes sense.

On a “things I was depressed to learn” note, apparently we haven’t yet figured out the best strategy for evacuating high-rises during fires. Most fire safety efforts for high rises are about containing and controlling the blaze, but if that fails there’s not a great strategy for how to evacuate or even who should evacuate. You would assume everyone should just take the stairs, but they point out that this could create a fluid mechanics problem for getting firefighters in to the building. Huh.

This post on why women are underrepresented in philosophy provides a data set I thought was interesting: percent of women expressing interest in a field as a college major during their freshman year vs percent of women receiving a PhD in that field 10 years later, with a correlation of .95. I’d be interested to see if there’s some other data points that could be worked in there (like % of women graduating with a particular undergrad degree) to see if the pattern holds, but it’s an interesting data point all on its own. Note: there’s a small data error in the calculations that I pointed out in the comments, and the author acknowledged. Running a quick adjustment I don’t think it actually changes the correlation numbers, which is why I’m still linking. Update: the author informs me he got a better data set that fixed the error and confirmed the correlation held. 

Born to Run Fact Check: USA Marathon Times

I’ve been playing/listening to a lot of Zombies, Run! lately, and for a little extra inspiration I decided to pull out my copy of “Born to Run” and reread it. Part way through the book I came across a statistic I thought was provocative enough that I decided to investigate it. In a chapter about the history of American distance running, McDougall is talking about the Greater Boston track club and says the following:

“…by the early ’80s, the Greater Boston Track club had half a dozen guys who could run a 2:12 marathon. That’s six guys, in one amateur club, in one city. Twenty years later, you couldn’t find a single 2:12 marathoner anywhere in the country.”

Now this claim seemed incredible to me. Living in Boston, I’d imagine I’m exposed to more marathon talk every year than most people, and I had never heard this. I had assume that like most sports, those who participated in the 70s would be getting trounced by today’s high performance gear/nutrition/coached/sponsored athletes. Marathoning in particular seems like it would have benefited quite a bit from the entry of money in to the sport, given the training time required.

So what happened?

Well, the year 2000 happened, and it got everyone nervous.

First some background In order to make the US Olympic marathon team, you have to do two things 1) finish as one of the top 3 in a one off qualifying race 2) be under the Olympic qualifying time.  In 1984, pro-marathoners were allowed to enter the Olympics. In 1988, the US started offering a cash prize for winning the Olympic trials. Here’s how the men did, starting from 1972:

I got the data from this website and the USATF. I noted a few things on the chart, but it’s worth spelling it out: the winners from 1976 and 1984  would have qualified for every team except 2008 and 2012. The 1980  winner would have qualified for every year except 2012, and that’s before you consider that the course was specifically chosen for speed after the year 2000 disaster.

So it appears to be relatively well supported that the guys who were running marathons for fun in the 70s really would keep pace with the guys today, which is pretty strange. It’s especially weird when you consider how much marathoning has taken off with the general public in that time. The best estimates I could find say that 25,000 people in the US finished a marathon in 1976, and by 2013 that number was up to about 550,000. You would think that would have swept up at least a few extra competitors, but it doesn’t look like it did. All that time and popularity and the winning time was 2 minutes faster for a 26 mile race.

For women it appears to be a slightly different story. Women got their start with marathoning a bit later than men, and as late as 1967 had to dodge race officials when they ran. Women’s marathoning was added to the Olympics in 1984, and here’s how the women did:

A bit more of a dropoff there.

If you’ve read Born to Run, you know that McDougall’s explanation for the failure to improve has two main threads: 1) that shoe companies potentially ruined our ability to run long distances and 2) that running long distances well requires you to have some fun with your running and should be built on community. Both seem plausible given the data, but I wanted to compare it to a different running event to see how it stacked up. I picked the 5000 m run since that’s the most commonly run race length in the US. The history of winning times is here, and the more recent times are here. It turns out the 5k hasn’t changed much either:

So that hasn’t changed much either….but there still wasn’t a year where we couldn’t field a team. Also complicating things is the different race strategies employed by 5000m runners vs marathon runners. To qualify for the 5k, you run the race twice in a matter of a few days. It is plausible that 5k runners don’t run faster than they have to in order to qualify. Marathon runners on the other hand may only run a few per year, especially at the Olympic level. They are more likely to go all out. Supporting this theory is how the runners do when they get to the Olympics. The last man to win a 5000m Olympic medal for the US is Paul Chelimo. He qualified with a 13:35 time, then ran a 13:03 in the Olympics for the silver medal. Ryan Hall on the other hand (the only American to ever run a sub 2:05 marathon), set the Olympic trials record in 2008 running a 2:09 marathon. He placed 10th in the Olympics with a 2:12.  Galen Rupp won the bronze in Rio in 2016 with a time 1 minute faster than his qualifying time. I doubt that’s an unusual pattern….you have far more control over your time when you’re running 3 miles than when you’re running 26.  To further parse it, I decided to pull the data from the Association of Road Racing Statisticians website and get ALL men from the US who had run a sub 2:12 marathon. Since McDougall’s original claim was that there were none to be found around the year 2000, I figured I’d see if this was true. Here’s the graph:

So he was exaggerating. There were 5.

Pedantry aside, there was a remarkable lack of good marathoners in those years, though it appeared the pendulum started to swing back. McDougall’s book came out in 2009 and was credited with a huge resurgence in interest in distance racing, so he may have partially caused that 2010-2014 spike. Regardless, it does not appear that Americans have recaptured whatever happened in the early 80s, even with the increase in nearly every resource that you would think would be helpful. Interestingly enough, two of the most dominate marathoners in the post-2000 spike (Khalid Khannouchi and Meb Keflezighi) came here in poverty as immigrants when they were 29 and 12, respectively. Between the two of them they are actually responsible for almost a third of the sub-2:12 marathons times posted between 2000 and 2015. It seems resources simply don’t help marathon times that much. Genetics may play a part, but it doesn’t explain why the US had such a drop off. As McDougall puts it “this isn’t about why other people got faster; it’s about why we got slower.”

So there may be something to McDougall’s theory, or there may be something about US running in general. It may be that money in other sports siphoned off potential runners, or it may be that our shoes screwed us or that camaraderie and love of the sport was more important than you’d think. Good runners may run fewer races these days, just out of fear that they’ll get injured. I don’t really know enough about it, but the stagnation is a little striking. It does look like there was a bit of an uptick after the year 2000 disaster….I suspect seeing the lack of good marathon runners encouraged a few who may have focused on other sports to dive in.

As an interesting data point for the camaraderie/community influence point, I did discover that women can no longer set a marathon world record in a race where men also run.  From what I can tell, the governing bodies decided that being able to run with a faster field/pace yourself with men was such an advantage that it didn’t count. The difference is pretty stark (2:15 vs 2:17), so they may have a point. The year Paula Radcliffe set the 2:15 record in London, she was 16th overall and presumably had plenty of people to pace herself with. Marathoning does appear to be a sport where your competition is particularly important in driving you forward.

My one and only marathon experience biases me in this direction. In 2009 I ran the Cape Cod Marathon and finished second to last. At mile 18 or so, I had broken out in a rash from the unusually hot October sun, had burst in to tears and was ready to quit. It was at that moment that I came across another runner, also in tears due to a sore knee. We struck up a conversation and laughed/talked/yelled/cried at each other for the remaining 7 miles to the finish line. Despite my lack of bragging rights for my time I was overjoyed to have finished, especially when I realized over 400 people (a third of entrants)  had dropped out. I know for a fact I would not have made it if I hadn’t run in to my new best friend at that moment of despair, and she readily admitted the same thing. McDougall makes the point that this type of companionship running is probably how our ancestors ran, though for things like food and safety as opposed to a shiny medal with the Dunkin Donuts logo. Does this sort of thing make a difference at the Olympic level? Who knows, but the data and anecdote does suggest there’s some interesting psychological stuff going on when you get to certain distances.

Race on folks, race on.

Linguistic vs Numeric Probability

It will probably come as a surprise to absolutely no one that I grew up in the kind of household where the exact range of probabilities covered by the phrase “more likely than not” was a topic of heavy and heated debate. While the correct answer to that question is obviously 51%-60%1, I think it’s worth noting for everyone that this sort of question that actually has some scholarly resources for it.

Sherman Kent, a researcher for the CIA decided to actually poll NATO officers to see how they interpreted different statements about probability and came up with this:

Interesting that the term “probable” itself seems to cause the widest range of perceptions in this data set.

A user on reddit’s r/samplesize decided to run a similar poll and made a much prettier graph that looked like this:

The results are similar, though with some more clear outliers. Interestingly, they also took a look at what people thought certain “number words” meant, and got this:

This is some pretty interesting data for any of us who attempt to communicate probabilities to others. While it’s worth noting that people had to assign just one value rather than a range, I still think it gives some valuable insight in to how different people perceive the same word.

I also wonder if this should be used a little more often as a management tool. Looking at the variability, especially within the NATO officers, one realizes that some management teams actually do use the word “probable” to mean different things. We’ve all had that boss who used “slight chance” to mean “well, maybe” and didn’t use “almost no chance” until they were really serious. Some of the bias around certain terms may be coming from a perfectly rational interpretation of events.

Regardless, it makes a good argument for putting the numeric estimate next to the word if you are attempting to communicate in writing, just to make sure everyone’s on the same page.

1. Come at me Dad.

Using Data to Fight Data Fraud: the Carlisle Method

I’m creating a new tag for my posts “stories to check back in on”, for those times when I want to remember to see how a sensational headline played out once the dust settled.

The particular story prompting this today is the new paper “Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals” that is getting some press under headlines like “Dozens of recent clinical trials may contain wrong or falsified data, claims study“. The paper author (John Carlisle) is one of the people who helped expose the massive fraud by Yoshitaka Fujii, an anesthesiologist who ended up having 183 papers retracted due to fabricated results.

While his work previously focused on the work of anesthesiologists, Carlisle decided to use similar statistical techniques on a much broader group of papers. As he explains in the paper, he started to question if anesthesiology journals were retracting more papers because anesthesiologists were more likely to fabricate, or if the community was simply keeping a sharper eye out for fabrications. To test this out he examined over 5,000 papers published in both specialty anesthesia journals and major medical journals like the New England Journal of Medicine and the Journal of the American Medical Association, looking for data anomalies that might point to fraud or errors.

The method Carlisle used to do this is an interesting one. Rather than look at the primary outcomes of the papers for evidence of fabrication, he looked at the baseline variables like height and weight of the patients in the control groups vs the intervention groups. In a proper randomized control trial, they should be about the same. His statistical methods are described in depth here, but in general his calculations  focuses on the standard deviation of both populations.  The bigger the difference between the control group and the intervention group, the more likely your numbers are wrong. The math isn’t simple, but the premise is: data frauds will probably work hard to make the primary outcome realistic, but likely not pay much attention to the more boring variables. Additionally, most of us reading papers barely glance at patient height and weight, particularly the standard deviations associated with them. It’s the data fraud equivalent of a kid telling their mom they cleaned their room when really they just shoved everything in the closet….you focus on where people will look first, and ignore everything else.

This paper gets REALLY  interesting because Carlisle not only opened the closets, but he published the names (or rather the journal locations) of the studies he thinks are particularly suspect….about 90 in all or 1-2% of the total. He also mentions that some authors have multiple studies that appear to have anomalous baseline data. Given the information he provided, the journals will most certainly have to investigate and people will likely be combing over the work of those named. If you want to see at least some of the named papers, check out this post.

Now I definitely am a fan of finding and calling out data frauds, but I do have to wonder about the broad net cast here. I have not looked up the individual studies, nor do I know enough about most of the fields to know the reputations of the authors in question, but I do wonder what the explanations for the issue with the 90 trials will be. With all the care taken by Carlisle (ie setting his own p-value cutoff at < .0001) It seems likely that a large number of these will be real fraud cases, and that’s great! But it seems likely at least some will have a different explanation, and I’m not sure how many will be in each bucket. The paper itself raises these possibilities, but it will be interesting to see what proportion of the sample was innocent mistakes vs fraud.

This is an interesting data point in the whole ethics of calling BS debate. While the paper is nuanced in its conclusions and raises multiple possibilities for the data issues, it’s hard to imagine the people named won’t have a very rough couple of months. This is why I want to check back in a year to see what happened. For more interesting discussion of the ethics and possible implications, see here. Some interesting point raised there include a discussion about statute of limitations (are we going back for decades?) and how to judge trials going forward now that the method has been released to the public.

To note: Carlisle has published a sensational paper here, but he actually has a great idea about how to use this going forward. He recommends all journals should do this analysis on papers submitted or accepted for publication, so they can inquire about discrepancies with authors up front. This would make sure that innocent mistakes were caught before being published, and that possible frauds would know there were extra safeguards in place. That seems a nice balance of addressing a problem while not overreaching, and apparently has already been implemented by the journal Anaesthesia.