Statisticians and Gerrymandering

Okay, I just said I was blogging less, but this story was too interesting to pass without comment. A few days ago it was announced that the Supreme Court had agreed to hear a case about gerrymandering, or the practice of redrawing voting district lines to influence the outcome of elections. This was a big deal because previously the court has only heard these cases when the lines had something to do with race, but had no comment on redraws that were based on politics. The case they agreed to hear was from Wisconsin, and a lower court found that a 2011 redistricting plan was so partisan that it potentially violated the rights of all minority party voters in the affected districts.

Now obviously I’ll leave it to better minds to comment on the legal issues here, but I found this article on how statisticians are getting involved in the debate quite fascinating. Obviously both parties want the district lines to favor their own candidates, so it can be hard to cut through the noise and figure out what a “fair” plan would actually look like. Historically, this came down to just two parties bickering over street maps, but now with more data available there’s actually a chance that both gerrymandering and the extent of gerrymandering can be measured.

One way of doing this is called the “efficiency gap” and is the work of Eric McGhee and Nicholas Stephanopolous, who explain it here. Basically this measures “wasted” votes, which they explain like this:

Suppose, for example, that a state has five districts with 100 voters each, and two parties, Party A and Party B. Suppose also that Party A wins four of the seats 53 to 47, and Party B wins one of them 85 to 15. Then in each of the four seats that Party A wins, it has 2 surplus votes (53 minus the 51 needed to win), and Party B has 47 lost votes. And in the lone district that Party A loses, it has 15 lost votes, and Party B has 34 surplus votes (85 minus the 51 needed to win). In sum, Party A wastes 23 votes and Party B wastes 222 votes. Subtracting one figure from the other and dividing by the 500 votes cast produces an efficiency gap of 40 percent in Party A’s favor.

Basically this metric highlights unevenness across the state. If one party is winning dramatically in one district and yet losing in all the others, you have some evidence that those lines may not be fair. If this is only happening to one party and never to the other, your evidence grows. Now there are obvious responses to this….maybe some party members really are clustering together in certain locations….but it does provide a useful baseline measure. If your current plan increases this gap in favor of the party in power, then that party should have to offer some explanation. The author’s proposal is that if the other party could show a redistricting plan that had a smaller gap, the initial plan would be considered unconstitutional.

To help with that last part, two mathematicians have created a computer algorithm that draws districts according to state laws but irrespective of voting histories. They then compare these hypothetical districts “average” results to the proposed maps to see how far off the new plans are. In other words, they basically create a normal distribution of results, then see how the current proposals line up. To give context, of the 24,000 maps they drew for North Carolina, all were less gerrymandered than the one the legislature came up with. When a group of retired judges tried to draw new districts for North Carolina, they were less gerrymandered than 75% of the computer models.

It’s interesting to note that some of the most gerrymandered states by this metric are actually not the ones being challenged. Here are all the states with more than 8 districts and how they fared in 2012. The ones in red are the ones facing a court challenge. The range is based on plausible vote swings:

Now again, none of these methods may be perfect, but they do start to point the way towards less biased ways of drawing districts and neutral tests for accusations of bias. The authors note that the courts currently employ simple mathematical tests to evaluate if districts have equal populations: +/- 10%.  It will be interesting to see if any of these tests are considered straightforward enough for a legal standard. Stay tuned!

What I’m Reading: June 2017

Happy Father’s Day folks! As summer approaches I’m probably going to be blogging just once a week for a bit as I fill my time with writing papers for my practicum/vacation/time on the beach. Hopefully some of those distractions will be more frequent than others. I figured that means it’s a great time to put up some links to other stuff.

First up, after 12 weeks of doing my Calling Bullshit read-along, I got a chance to interview the good professors for this piece for Misinfocon. Check it out! Also, they got a nice write up in the New Yorker in a piece about problems with big data. I have to say, reading a New Yorker writer’s take on a topic I had just attempted to write about was definitely one of the more humbling experiences of my life. Whatever, I was an engineering major, y’all should be glad I can even string a sentence together (she said bitterly).

I don’t read Mother Jones often, but I’ve seen some great stuff from them lately calling their own team out on the potential misuses of science they let fly. This piece about the World Health Organization’s decision to declare RoundUp a possible carcinogen raises interesting questions about certain data that wasn’t presented to the committee making the decision. It turns out there was a large study that suggested RoundUp was safe that was actually not shown to the committee, for reasons that continue to be a bit murky. While the reasons may or may not be valid, it’s hard to imagine that if that had been Monsanto’s data and it showed a safety issue anyone would have let that fly.

Speaking of calling out errors (and after spending some time mulling over my own) I picked up Megan McArdle’s book “The Up Side of Down: Why Failing Well is the Key to Success“. I just started it, but in the first few chapters she makes an interesting point about the value of blogging for development: unlike traditional media folks, bloggers can fail at a lower level and faster than regular journalists. By essentially working in public, they can get criticism faster and react more quickly, which over time can make their (collective) conclusions (potentially) better. This appears to be why so many traditional media scandals (she highlights the Dan Rather incident) were discovered and called out by bloggers. It’s not that the bloggers were more accurate, but that their worst ideas were called out faster and their best ones could more quickly rise to the top. Anyway, so far I’d recommend it.

This post about how the world-wide rate of population growth is slowing was interesting to me for two reasons: 1) I didn’t actually know the rate of growth had slowed that much 2) it’s a great set of charts to show the difference between growth and rate of growth and why extrapolation from visuals can sometimes be really hard.

I also learned interesting things from this Economist article about world wide beer consumption. Apparently beer consumption is falling, and with it the overall consumption of alcohol. This seems to be driven by economic development in several key countries like China, Brazil and Russia. The theory is that when countries start to develop, people immediately start using their new-found income to buy beer. When development continues, they start becoming more concerned about health and actually buy less beer and move on to more expensive types of alcohol. I never thought about this before, but it makes sense.

On a “things I was depressed to learn” note, apparently we haven’t yet figured out the best strategy for evacuating high-rises during fires. Most fire safety efforts for high rises are about containing and controlling the blaze, but if that fails there’s not a great strategy for how to evacuate or even who should evacuate. You would assume everyone should just take the stairs, but they point out that this could create a fluid mechanics problem for getting firefighters in to the building. Huh.

This post on why women are underrepresented in philosophy provides a data set I thought was interesting: percent of women expressing interest in a field as a college major during their freshman year vs percent of women receiving a PhD in that field 10 years later, with a correlation of .95. I’d be interested to see if there’s some other data points that could be worked in there (like % of women graduating with a particular undergrad degree) to see if the pattern holds, but it’s an interesting data point all on its own. Note: there’s a small data error in the calculations that I pointed out in the comments, and the author acknowledged. Running a quick adjustment I don’t think it actually changes the correlation numbers, which is why I’m still linking. Update: the author informs me he got a better data set that fixed the error and confirmed the correlation held. 

Born to Run Fact Check: USA Marathon Times

I’ve been playing/listening to a lot of Zombies, Run! lately, and for a little extra inspiration I decided to pull out my copy of “Born to Run” and reread it. Part way through the book I came across a statistic I thought was provocative enough that I decided to investigate it. In a chapter about the history of American distance running, McDougall is talking about the Greater Boston track club and says the following:

“…by the early ’80s, the Greater Boston Track club had half a dozen guys who could run a 2:12 marathon. That’s six guys, in one amateur club, in one city. Twenty years later, you couldn’t find a single 2:12 marathoner anywhere in the country.”

Now this claim seemed incredible to me. Living in Boston, I’d imagine I’m exposed to more marathon talk every year than most people, and I had never heard this. I had assume that like most sports, those who participated in the 70s would be getting trounced by today’s high performance gear/nutrition/coached/sponsored athletes. Marathoning in particular seems like it would have benefited quite a bit from the entry of money in to the sport, given the training time required.

So what happened?

Well, the year 2000 happened, and it got everyone nervous.

First some background In order to make the US Olympic marathon team, you have to do two things 1) finish as one of the top 3 in a one off qualifying race 2) be under the Olympic qualifying time.  In 1984, pro-marathoners were allowed to enter the Olympics. In 1988, the US started offering a cash prize for winning the Olympic trials. Here’s how the men did, starting from 1972:

I got the data from this website and the USATF. I noted a few things on the chart, but it’s worth spelling it out: the winners from 1976 and 1984  would have qualified for every team except 2008 and 2012. The 1980  winner would have qualified for every year except 2012, and that’s before you consider that the course was specifically chosen for speed after the year 2000 disaster.

So it appears to be relatively well supported that the guys who were running marathons for fun in the 70s really would keep pace with the guys today, which is pretty strange. It’s especially weird when you consider how much marathoning has taken off with the general public in that time. The best estimates I could find say that 25,000 people in the US finished a marathon in 1976, and by 2013 that number was up to about 550,000. You would think that would have swept up at least a few extra competitors, but it doesn’t look like it did. All that time and popularity and the winning time was 2 minutes faster for a 26 mile race.

For women it appears to be a slightly different story. Women got their start with marathoning a bit later than men, and as late as 1967 had to dodge race officials when they ran. Women’s marathoning was added to the Olympics in 1984, and here’s how the women did:

A bit more of a dropoff there.

If you’ve read Born to Run, you know that McDougall’s explanation for the failure to improve has two main threads: 1) that shoe companies potentially ruined our ability to run long distances and 2) that running long distances well requires you to have some fun with your running and should be built on community. Both seem plausible given the data, but I wanted to compare it to a different running event to see how it stacked up. I picked the 5000 m run since that’s the most commonly run race length in the US. The history of winning times is here, and the more recent times are here. It turns out the 5k hasn’t changed much either:

So that hasn’t changed much either….but there still wasn’t a year where we couldn’t field a team. Also complicating things is the different race strategies employed by 5000m runners vs marathon runners. To qualify for the 5k, you run the race twice in a matter of a few days. It is plausible that 5k runners don’t run faster than they have to in order to qualify. Marathon runners on the other hand may only run a few per year, especially at the Olympic level. They are more likely to go all out. Supporting this theory is how the runners do when they get to the Olympics. The last man to win a 5000m Olympic medal for the US is Paul Chelimo. He qualified with a 13:35 time, then ran a 13:03 in the Olympics for the silver medal. Ryan Hall on the other hand (the only American to ever run a sub 2:05 marathon), set the Olympic trials record in 2008 running a 2:09 marathon. He placed 10th in the Olympics with a 2:12.  Galen Rupp won the bronze in Rio in 2016 with a time 1 minute faster than his qualifying time. I doubt that’s an unusual pattern….you have far more control over your time when you’re running 3 miles than when you’re running 26.  To further parse it, I decided to pull the data from the Association of Road Racing Statisticians website and get ALL men from the US who had run a sub 2:12 marathon. Since McDougall’s original claim was that there were none to be found around the year 2000, I figured I’d see if this was true. Here’s the graph:

So he was exaggerating. There were 5.

Pedantry aside, there was a remarkable lack of good marathoners in those years, though it appeared the pendulum started to swing back. McDougall’s book came out in 2009 and was credited with a huge resurgence in interest in distance racing, so he may have partially caused that 2010-2014 spike. Regardless, it does not appear that Americans have recaptured whatever happened in the early 80s, even with the increase in nearly every resource that you would think would be helpful. Interestingly enough, two of the most dominate marathoners in the post-2000 spike (Khalid Khannouchi and Meb Keflezighi) came here in poverty as immigrants when they were 29 and 12, respectively. Between the two of them they are actually responsible for almost a third of the sub-2:12 marathons times posted between 2000 and 2015. It seems resources simply don’t help marathon times that much. Genetics may play a part, but it doesn’t explain why the US had such a drop off. As McDougall puts it “this isn’t about why other people got faster; it’s about why we got slower.”

So there may be something to McDougall’s theory, or there may be something about US running in general. It may be that money in other sports siphoned off potential runners, or it may be that our shoes screwed us or that camaraderie and love of the sport was more important than you’d think. Good runners may run fewer races these days, just out of fear that they’ll get injured. I don’t really know enough about it, but the stagnation is a little striking. It does look like there was a bit of an uptick after the year 2000 disaster….I suspect seeing the lack of good marathon runners encouraged a few who may have focused on other sports to dive in.

As an interesting data point for the camaraderie/community influence point, I did discover that women can no longer set a marathon world record in a race where men also run.  From what I can tell, the governing bodies decided that being able to run with a faster field/pace yourself with men was such an advantage that it didn’t count. The difference is pretty stark (2:15 vs 2:17), so they may have a point. The year Paula Radcliffe set the 2:15 record in London, she was 16th overall and presumably had plenty of people to pace herself with. Marathoning does appear to be a sport where your competition is particularly important in driving you forward.

My one and only marathon experience biases me in this direction. In 2009 I ran the Cape Cod Marathon and finished second to last. At mile 18 or so, I had broken out in a rash from the unusually hot October sun, had burst in to tears and was ready to quit. It was at that moment that I came across another runner, also in tears due to a sore knee. We struck up a conversation and laughed/talked/yelled/cried at each other for the remaining 7 miles to the finish line. Despite my lack of bragging rights for my time I was overjoyed to have finished, especially when I realized over 400 people (a third of entrants)  had dropped out. I know for a fact I would not have made it if I hadn’t run in to my new best friend at that moment of despair, and she readily admitted the same thing. McDougall makes the point that this type of companionship running is probably how our ancestors ran, though for things like food and safety as opposed to a shiny medal with the Dunkin Donuts logo. Does this sort of thing make a difference at the Olympic level? Who knows, but the data and anecdote does suggest there’s some interesting psychological stuff going on when you get to certain distances.

Race on folks, race on.

Linguistic vs Numeric Probability

It will probably come as a surprise to absolutely no one that I grew up in the kind of household where the exact range of probabilities covered by the phrase “more likely than not” was a topic of heavy and heated debate. While the correct answer to that question is obviously 51%-60%1, I think it’s worth noting for everyone that this sort of question that actually has some scholarly resources for it.

Sherman Kent, a researcher for the CIA decided to actually poll NATO officers to see how they interpreted different statements about probability and came up with this:

Interesting that the term “probable” itself seems to cause the widest range of perceptions in this data set.

A user on reddit’s r/samplesize decided to run a similar poll and made a much prettier graph that looked like this:

The results are similar, though with some more clear outliers. Interestingly, they also took a look at what people thought certain “number words” meant, and got this:

This is some pretty interesting data for any of us who attempt to communicate probabilities to others. While it’s worth noting that people had to assign just one value rather than a range, I still think it gives some valuable insight in to how different people perceive the same word.

I also wonder if this should be used a little more often as a management tool. Looking at the variability, especially within the NATO officers, one realizes that some management teams actually do use the word “probable” to mean different things. We’ve all had that boss who used “slight chance” to mean “well, maybe” and didn’t use “almost no chance” until they were really serious. Some of the bias around certain terms may be coming from a perfectly rational interpretation of events.

Regardless, it makes a good argument for putting the numeric estimate next to the word if you are attempting to communicate in writing, just to make sure everyone’s on the same page.

1. Come at me Dad.

Using Data to Fight Data Fraud: the Carlisle Method

I’m creating a new tag for my posts “stories to check back in on”, for those times when I want to remember to see how a sensational headline played out once the dust settled.

The particular story prompting this today is the new paper “Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals” that is getting some press under headlines like “Dozens of recent clinical trials may contain wrong or falsified data, claims study“. The paper author (John Carlisle) is one of the people who helped expose the massive fraud by Yoshitaka Fujii, an anesthesiologist who ended up having 183 papers retracted due to fabricated results.

While his work previously focused on the work of anesthesiologists, Carlisle decided to use similar statistical techniques on a much broader group of papers. As he explains in the paper, he started to question if anesthesiology journals were retracting more papers because anesthesiologists were more likely to fabricate, or if the community was simply keeping a sharper eye out for fabrications. To test this out he examined over 5,000 papers published in both specialty anesthesia journals and major medical journals like the New England Journal of Medicine and the Journal of the American Medical Association, looking for data anomalies that might point to fraud or errors.

The method Carlisle used to do this is an interesting one. Rather than look at the primary outcomes of the papers for evidence of fabrication, he looked at the baseline variables like height and weight of the patients in the control groups vs the intervention groups. In a proper randomized control trial, they should be about the same. His statistical methods are described in depth here, but in general his calculations  focuses on the standard deviation of both populations.  The bigger the difference between the control group and the intervention group, the more likely your numbers are wrong. The math isn’t simple, but the premise is: data frauds will probably work hard to make the primary outcome realistic, but likely not pay much attention to the more boring variables. Additionally, most of us reading papers barely glance at patient height and weight, particularly the standard deviations associated with them. It’s the data fraud equivalent of a kid telling their mom they cleaned their room when really they just shoved everything in the closet….you focus on where people will look first, and ignore everything else.

This paper gets REALLY  interesting because Carlisle not only opened the closets, but he published the names (or rather the journal locations) of the studies he thinks are particularly suspect….about 90 in all or 1-2% of the total. He also mentions that some authors have multiple studies that appear to have anomalous baseline data. Given the information he provided, the journals will most certainly have to investigate and people will likely be combing over the work of those named. If you want to see at least some of the named papers, check out this post.

Now I definitely am a fan of finding and calling out data frauds, but I do have to wonder about the broad net cast here. I have not looked up the individual studies, nor do I know enough about most of the fields to know the reputations of the authors in question, but I do wonder what the explanations for the issue with the 90 trials will be. With all the care taken by Carlisle (ie setting his own p-value cutoff at < .0001) It seems likely that a large number of these will be real fraud cases, and that’s great! But it seems likely at least some will have a different explanation, and I’m not sure how many will be in each bucket. The paper itself raises these possibilities, but it will be interesting to see what proportion of the sample was innocent mistakes vs fraud.

This is an interesting data point in the whole ethics of calling BS debate. While the paper is nuanced in its conclusions and raises multiple possibilities for the data issues, it’s hard to imagine the people named won’t have a very rough couple of months. This is why I want to check back in a year to see what happened. For more interesting discussion of the ethics and possible implications, see here. Some interesting point raised there include a discussion about statute of limitations (are we going back for decades?) and how to judge trials going forward now that the method has been released to the public.

To note: Carlisle has published a sensational paper here, but he actually has a great idea about how to use this going forward. He recommends all journals should do this analysis on papers submitted or accepted for publication, so they can inquire about discrepancies with authors up front. This would make sure that innocent mistakes were caught before being published, and that possible frauds would know there were extra safeguards in place. That seems a nice balance of addressing a problem while not overreaching, and apparently has already been implemented by the journal Anaesthesia.


Evangelical Support for Trump: A Review of the Numbers

This is not a particularly new question, but a few friends and readers have asked me over the past few months about the data behind the “Evangelicals support Trump” assertions. All of the people who asked me about this are long term Evangelicals who attend church regularly and typically vote Republican, but did not vote for Trump. They seemed to doubt that Evangelical support for Trump was as high as was being reported, but of course weren’t sure if that was selection bias on their part.

The first data set of interest is the exit polling from right after Election Day. This showed that Evangelical support had gone up from 78% for Romney to 81% for Trump. The full preliminary analysis is here, but I thought it would be interesting to see how all of the tracked religions had changed over the years, so I turned the table in to a bar chart. This shows the percent of people who claimed affiliation with a particular religious group AND said the voted for the Republican candidate:Since some religions tend to show large disparities along racial lines (such as Catholicism), race is included. White evangelical Christian was added as its own affiliation after the 2000 election, when those voters were given credit for putting Bush in office. Mormonism has not been consistently tracked, which is why the 2008 data is missing.

Anyway, I thought it was interesting to see that while support for Trump did increase over Romney’s support, it wasn’t a huge change. On the other hand, Mormons saw a fairly substantial drop in support for Trump as opposed to Romney or Bush. Hispanic Catholics and “other faiths” saw the biggest jump in support for Trump over Romney. However, white Evangelicals remained the most likely to vote for Trump at a full 21 points higher than the next closest group, white Catholics.

So with those kind of numbers, why aren’t my friends hearing this in their churches? A few possible reasons:

We don’t actually know the true percentage of Evangelicals who voted for Trump Even with a number like 81% , we still have to remember that about half of all people don’t vote at all. I couldn’t find data about how likely Evangelicals were to vote, but if it is at the same rate as other groups then only 40% of those sitting in the pews on Sunday morning actually cast a vote for Trump.

Some who have raised this objection have also objected that we don’t know if those calling themselves “Evangelical” actually were sitting in the pews on Sunday morning, so Pew decided to look at this question specifically. At least as of April, Evangelicals stating that they attended church at least once a month were actually the most likely to support Trump and the job he is doing, at 75%. Interestingly, that survey also found that there are relatively few people (20%) who call themselves Evangelical but don’t attend church often.

The pulpit and the pews may have a difference of opinion While exit polls capture the Evangelical vote broadly, some groups decided to poll Evangelical pastors specifically. At least a month before the election, only 30% of Evangelical pastors said they were planning on voting for Trump and 44% were still undecided. While more of them may have ended up voting for him, that level of hesitancy suggests they are probably not publicly endorsing him on Sunday mornings. Indeed, that same poll found that only 3% of pastors had endorsed a candidate from the pulpit during this election.

People weren’t voting based on things you hear sermons about After the data emerged about the Evangelical voting, many pundits hypothesized that the Supreme Court nomination and abortion were the major drivers of Evangelical voting. However, when Evangelicals were actually asked what their primary issues were, they told a different story. When asked to pick their main issues, they named “improving the economy”and “national security”, with the Supreme Court nominee ranking 4th with 10% picking it and abortion ranking 7th, with 4%.  Even when allowed to name multiple issues, the Supreme Court and abortion were ranked as less concerning than terrorism, the economy, immigration, foreign policy and gun policy.

Now the motivation may seem minor, but think about what people actually discuss in church on Sunday morning. Abortion or moral concerns are far more likely to come up in that context than terrorism. Basically, if Evangelicals are voting for Trump based on their beliefs about things that aren’t traditionally talked about on Sunday morning, you are not likely to hear about this on Sunday morning.

National breakdowns may not generalize to individual states I couldn’t find an overall breakdown of the white Evangelical vote by state, but it was widely reported that in some key states like Florida,  Evangelical voters broke for Trump at even higher rates than the national average (85%), which obviously means some states went lower. What might skew the data even further however, is the uneven distribution of Evangelicals themselves. The Pew Research data tells us that about 26% of the voting public is white Evangelical, and Florida is very close to that at 23%.  The states where my friends are from however (New Hampshire and Massachusetts) are much lower at 13% and 9% respectively.  This means some small shifts in Evangelical voting in Florida could be the equivalent of huge shifts in New Hampshire.

As an example: According to the Election Project numbers, Florida had 9.5 million people cast votes and New Hampshire had 750,000. If Evangelicals were represented proportionally in the voting population, that means about 2.18 million Evangelicals cast a vote in Florida, and about 97,500 cast their vote in NH. That’s 22 times as many Evangelical voters in Florida as NH. Roughly speaking, this means a 1% change in Florida would be about 20,000 people….almost 20% of the NH Evangelical population. Massachusetts Evangelicals are similarly outnumbered at about 7 to 1 in comparison to Florida. If 0% of NH/MA Evangelical voters went for Trump but 85% of Florida Evangelicals did vote for him, that would still average out to 71% of Evangelicals voting for Trump across the three states. New England states just really don’t have the population to move the dial much, and even wildly divergent voting patterns wouldn’t move the national average.

Hopefully that sheds a bit of light on the numbers here, even if it is about 7 months too late to be a hot take.

4 Examples of Confusing Cross-Cultural Statistics

In light of my last post about variability in eating patterns across religious traditions, I thought I’d put together a few other examples of times when attempts to compare data across international borders got a little more complicated than you would think.

Note: not all of this confusion changed the conclusions that people were trying to get to, but it did make things a little confusing.

  1. Who welcomes the refugee  About a year or so ago, when Syrian refugees were making headlines, there was a story going around that China was the most welcoming country for people fleeing their homeland. The basis of the story was an Amnesty International survey that showed a whopping 46% of Chinese citizens saying they would be willing to take a refugee in to their home…..far more than any other country. The confusion arose when a Quartz article pointed out that there is no direct Chinese translation for the word “refugee” and the word used in the survey meant “person who has suffered a calamity” without clarifying whether that person is international or lives down the street. It’s not clear how this translation may have influenced the response, but a different question on the same survey that made the “international” part clearer received much lower support.
  2. The French Paradox (reduced by 20%) In the process of researching my last post, I came across a rather odd tidbit I’d never heard of before regarding the “French Paradox”. A term that originated in the 80s, the French Paradox is the apparent contradiction that French people eat lots of cholesterol/saturated fat and yet don’t get heart disease at the rates you would expect based on data from other countries. Now I had heard of this paradox before, but the part I hadn’t heard  was the assertion that French doctors under-counted deaths from coronary heart disease. When you compared death certificates to data collected by more standardized methods, they found that this was true:

    They suspect the discrepancy arose because doctors in many countries automatically attribute sudden deaths in older people to coronary heart disease, whereas the French doctors were only doing so if they had clinical evidence of heart disease. This didn’t actually change the rank of France very much; they still have a lower than expected rate of heart disease. However, it did nearly double the reported incidence of CHD and cuts the paradox down by about 20%.

  3. Crime statistics of all sorts This BBC article is a few years old, but it has some interesting tidbits about cross-country crime rate comparisons. For example, Canada and Australia have the highest kidnapping rates in the world. The reason? They count all parental custody disputes as kidnappings, even if everyone knows where the child is. Other countries keep this data separate and only use “kidnapping” to describe a missing child. Countries that widen their definitions of certain crimes tend to see an uptick in those crimes, like Sweden saw with rape when it widened its definition in 2005.
  4. Infant mortality  This World Health Organization report has some interesting notes about how different countries count infant mortality, and it notes that some countries (such as Belgium, France and Spain) only count infant mortality in infants who survive beyond a certain time period after birth, such as 24 hours. Those countries tend to have lower infant mortality rates but higher stillbirth rates than countries that don’t set such a cutoff. Additionally, as of 2008 approximately 3/4 of countries lack the infrastructure to count infant mortality through hospitals and do so through household surveys instead.

Like I said, not all of these change the conclusions people come to, but they are good things to keep in mind.