Blue Zones Update: the Response

After my post last week about the pre-print paper calling the “Blue Zones” (aka areas with unusual longevity) in to question, an anonymous commenter stopped by to drop the link to the Blue Zones groups response. I thought their response was rather formidable, so I wanted to give it a whole post. They had three major points, all of which I was gratified to see I had raised in my initial read through:

  1. Being designated as a Blue Zone is no small matter, and they have well published criteria. Some places that come to their attention are invalidated. There also are some pretty extensive papers published on how they validated each of the existing 5 regions, which they linked to. Sardinia here, here and here. Okinawa had a paper written on it literally called “They really are that old: a validation study of centenarian prevalence in Okinawa“. Dr Poulain (who did much of the age validation for Sardinia) wrote a paper 10 years ago called “On the age validation of supercentenarians” where he points out that the first concerns about validating supercentenarian ages were raised in 1873. This book has more information about their methods, but notably starts with mentioning 5 regions they were unable to certify. Basically they responded with one big link dump saying “yeah, we thought of that too”. From what I can tell there actually is some really fascinating work being done here, which was very cool to read about. In every place they not only looked at individuals records, crosschecking them with numerous sources, doing personal interviews with people and their families, and then calculating overall population metrics to looks for evidence of fraud. In Okinawa, they mention asking people about the animal for the year of their birth, something people would be unlikely to forget or want to change. It seems pretty thorough to my eye, but I was also struck that none of the papers above were included as references in the original paper. I have no idea if he knew about them or not, but given that he made statements like “these findings raise serious questions about the validity of an extensive body of research based on the remarkable reported ages of populations and individuals.”, it seems like a gap not to include work that had been done.
  2. Supercentenarians are not the focus of the Blue Zones. Again, they publish their criteria, and this is not a focal point. They have focused much more heavily on reaching 90 or 100, particularly with limited chronic disease. As I was scanning through the papers they linked to, I noticed an interesting anecdote about an Okinawan man who for a time was thought to have lived to 120. After he got in the Guinness book of world records, it came out that he had likely been given the name of an older brother who died, and thus was actually “only” 105. This is interesting because it’s a case where his age is fraudulent, but the change wouldn’t impact the “Blue Zone” status.
  3. Relative poverty could be correlated with old age. I raised this point in my initial post, and I was glad to see they echoed it here. Again, most of the way modernity raises life expectancy is by eliminating child mortality and decreasing accidents or repairing congenital defects. Those are the things that will kill you under 55. Over 55, it’s a whole new set of issues.

Now I want to be clear, no one has questioned the fundamental mathematical findings of the paper that in the US the supercentenarian records are probably shaky before birth registration. What’s being questioned is if that finding it’s generalizable to specific areas that have been heavily studied. This is important because in the US “old age” type benefits kick in at 65 and there is no level after that. So basically a random 94 year old claiming to be 111 might get a social bump out of the whole thing, but none of the type of benefits that might have caused people to really look in to it. Once we start getting to things like Blue Zones or international attention though, there’s actually whole groups dedicated to looking in to things. One person faking their age won’t cause much of an issue, but if your claim is that a dozen people in one town are faking their ages, that’s going to start to mess up population curves and show other anomalies. The poorness of the regions actually helps with this case as well. If you’re talking to people in relative poverty with high illiteracy, it’s hard to argue that they could have somehow been criminal masterminds in their forgeries. One or two people can get away with things, but a group deception can be much harder.

I’m still going to keep an eye on this paper, and my guess is it will be published somewhere with some of the suggestions of generalizability toned down, and more references to previous work at validating ages added.

Life Expectancy and Record Keeping

Those of you who follow any sort of science/data/skepticism news on Twitter will have almost certainly have heard of the new pre-print taking the internet by storm this week: “Supercentenarians and the oldest-old are concentrated into regions with no birth certificates and short lifespans“.

This paper is making a splash for two reasons:

  1. It is taking on a hypothesis that has turned in to a cottage industry over the years.
  2. The statistical reasoning makes so much sense it makes you feel a little silly for not questioning point #1 earlier.

Of course #2 may be projection on my part, because I have definitely read the whole “Blue Zone” hypothesis (and one of the associated books) and never questioned the underlying data. So let’s go over what happened here, shall we?

For those of you not familiar with the whole “Blue Zone” concept, let’s start there. The Blue Zones were something popularized by Dan Buettner who wrote a long article about them for National Geographic magazine back in 2005. The article highlighted several regions in the world that seemed to have extraordinary longevity: Sardinia (Italy), Okinawa (Japan) and Loma Linda (California, USA). All of these areas seemed to have a very above average number of people living to be 100. They studied their habits to see if they could find anything the rest of us could learn. In the original article, that was this:

This concept proved so incredibly popular that Dan Buettner was able to write a book, then follow up books, then a whole company around the concept. Eventually Ikaria (Greece) and Nicoya Peninsula (Costa Rica) were added to the list.

As you can see the ultimate advice list obtained from these regions looks pretty good on its face. The idea that not smoking, making good family and social connections, daily activity and fruits and vegetables are good certainly isn’t turning conventional wisdom on it’s head. So what’s being questioned?

Basically the authors of the paper didn’t feel that alternative explanations for longevity had been adequately tested, specifically the hypothesis that maybe not all of these people were as old as they said they were or that otherwise bad record keeping was inflating the numbers. While many of the countries didn’t have clean data sets, they were able to pull some data sets from the US, and discovered that the chances of having people in your district live until they were 110 fell dramatically once state wide birth registration was introduced:

Now this graph is pretty interesting, and I’m not entirely sure what to make of it.  There seems to be a peak at around -15 years before implementation, which is interesting, with some notable fall off before birth registration is even introduced. One suspects birth registration might be some proxy for expanding records/increased awareness of birth year. Actually, now that I think about it, I bet we’re catching some WWI and WWII related things in here. I’m guessing the fall off before complete birth registration had something to do with the draft around those wars, where proving your age would have been very important. The paper notes that the years 1880 to 1900 have the most supercentenarians born in those years, and there was a draft in 1917 for men 21-30. Would be interesting to see if there’s a cluster of men at birth years just prior to 1887. Additionally the WWII draft start in 1941 went up to 45, so I wonder if there’s a cluster at 1897 or just before. Conversely, family lore says my grandfather exaggerated his age to join the service early in WWII, so it’s possible there are clusters at the young end too.

The other interesting thing about this graph is that it focused on supercentenarians, aka those who live to 110 or beyond. I’d be curious to seem the same data for centenarians (those who live to 100) to see if it’s as dramatic. A quick Google suggests that being a supercenetarian is really rare (300ish in the US out of 320 million) but 72,000 or so centenarians. Those living to 90 or over number well over a million. It’s much easier to overwhelm very rare event data with noise than more frequent data. I have the Blue Zone book on Kindle, so I did a quick search and noticed that he mentioned “supercenterians” 5 times, all on the same page. Centenarians are mentioned 167 times.

This is relevant because if we saw a drop off in all advanced ages when birth registrations were introduced, we’d know that this was potentially fraudulent. However, if we see that only the rarest ages were impacted, then we start to get in to issues like typos or other very rare events as opposed to systematic misrepresentation. Given the splash this paper has made already, I suspect someone will do that study soon. Additionally, the only US based “Blue Zone”, Loma Linda California, does not appear to have been studied specifically at all. That also may be worth looking at to see if the pattern still holds.

The next item the paper took a shot at was the non-US locations, specifically Okinawa and Sardinia. From my reading I had always thought those areas were known for being healthy and long lived, but the paper claims they are actually some of the poorest areas with the shortest life expectancies in their countries. This was a surprise to me as I had never seen this mentioned before. But here’s their data from Sardinia:

The Sardinian provinces are in blue, and you’ll note that there is eventually a negative correlation between “chance of living to 55” and “chance of living to 110”. Strange. In the last graph in particular there seem to be 3 provinces in particular that are causing the correlation to go negative, and one wonders what’s going on there. Considering Sardinia as a whole has a population of 1.6 million, it would only take a few errors to produce that rate of longevity.

On the other hand, I was a little surprised to see the author cite Sardinia as having on of the lowest life expectancies. Exact quote “Italians over the age of 100 are concentrated into the poorest, most remote and shortest-lived provinces,”. In looking for a citation for this, I found on Wiki this report (in Italian). It had this table:

If I’m using Google translate correctly, Sardegna is Sardinia and this is a life expectancy table from 2014. While it doesn’t show Sardinia having the highest life expectancy, it doesn’t show it having the lowest either. I tried pulling the Japanese reports, but unfortunately the one that it looks the most useful is in Japanese. As noted though, the paper hasn’t yet gone through peer review, so it’s possible some of this will be clarified.

Finally, I was a little surprised to see the author say “[these] patterns are difficult to explain through biology, but are readily explained as economic drivers of pension fraud and reporting error.” While I completely agree about errors, I do actually think there’s a plausible mechanism that would cause poor people who didn’t live to 55 as often to have longer lifespans. Deaths under 55 tend to be from things like accidents, suicide, homicide and congenital anomalies….external forces. The CDC lists the leading causes of death by age group here:

Over 55, we mostly switch to heart disease and cancer. A white collar office worker with a high stress job and bad eating habits may be more likely to live to 55 than a shepherd who could get trampled, but once they’re both 75 the shepherd may get the upper hand.

I’m not doubting the overall hypothesis by the way….I do think fraud or errors in record keeping can definitely introduce issues in to the data. Checking outliers to make sure they aren’t errors is key, and having some skepticism about source data is always warranted. After writing most of this post though, I decided to check back in on the Blue Zones book to see if they addressed this.  To my surprise, the book claims that at least in Sardinia, this was actually done. On page 25 and 26, they mention specifically how much doubt they faced and how one doctor personally examined about 200 people to help establish their truthfulness about their age. Dr Michel Poulain (a Belgian demographer) apparently was nominated by a professional society specifically to go to Sardinia to check for signs of fraud. According to the book, he visited the region ten times to review records and interview people. I have no idea how thorough he was or how his methods hold up, but his work seems at odds with the idea that someone just blindly pulled ages out of a database or the papers claim that “These results may reflect a neglect of error processes as a potential generative factor in remarkable age records”. Interestingly, I’d imagine WWI and WWII actually help with much of the work here. Since I’d imagine most people have very vivid memories of where they were and what they were doing during the war years, those stories might go far to establishing age.

Basically, it seems like sporadic exaggeration, error or fraud might give mistaken impressions about how many supercenteranian people there are overall, but I do wonder if having an unusual cluster brings enough scrutiny that we don’t have to worry as much that something was missed. In the Blue Zone book, they mention the group that brought attention to the Sardinians had helped debunk 3 other similar claims. Also, as mentioned, the paper doesn’t mention if the one US blue zone was one of the ones to get late birth registration, but I do know the Seventh Day Adventists are one of the most intensely studied groups in the country.

Anyway, given the attention and research that has been paid to these areas, I’d imagine we’re going to hear some responses soon.  Dr Poulain appears to still be active, and one suspects he will be responding to this questioning of his work. This post is getting my “things to check back in on” tag. Stay tuned!

 

 

Statisticians and Gerrymandering

Okay, I just said I was blogging less, but this story was too interesting to pass without comment. A few days ago it was announced that the Supreme Court had agreed to hear a case about gerrymandering, or the practice of redrawing voting district lines to influence the outcome of elections. This was a big deal because previously the court has only heard these cases when the lines had something to do with race, but had no comment on redraws that were based on politics. The case they agreed to hear was from Wisconsin, and a lower court found that a 2011 redistricting plan was so partisan that it potentially violated the rights of all minority party voters in the affected districts.

Now obviously I’ll leave it to better minds to comment on the legal issues here, but I found this article on how statisticians are getting involved in the debate quite fascinating. Obviously both parties want the district lines to favor their own candidates, so it can be hard to cut through the noise and figure out what a “fair” plan would actually look like. Historically, this came down to just two parties bickering over street maps, but now with more data available there’s actually a chance that both gerrymandering and the extent of gerrymandering can be measured.

One way of doing this is called the “efficiency gap” and is the work of Eric McGhee and Nicholas Stephanopolous, who explain it here. Basically this measures “wasted” votes, which they explain like this:

Suppose, for example, that a state has five districts with 100 voters each, and two parties, Party A and Party B. Suppose also that Party A wins four of the seats 53 to 47, and Party B wins one of them 85 to 15. Then in each of the four seats that Party A wins, it has 2 surplus votes (53 minus the 51 needed to win), and Party B has 47 lost votes. And in the lone district that Party A loses, it has 15 lost votes, and Party B has 34 surplus votes (85 minus the 51 needed to win). In sum, Party A wastes 23 votes and Party B wastes 222 votes. Subtracting one figure from the other and dividing by the 500 votes cast produces an efficiency gap of 40 percent in Party A’s favor.

Basically this metric highlights unevenness across the state. If one party is winning dramatically in one district and yet losing in all the others, you have some evidence that those lines may not be fair. If this is only happening to one party and never to the other, your evidence grows. Now there are obvious responses to this….maybe some party members really are clustering together in certain locations….but it does provide a useful baseline measure. If your current plan increases this gap in favor of the party in power, then that party should have to offer some explanation. The author’s proposal is that if the other party could show a redistricting plan that had a smaller gap, the initial plan would be considered unconstitutional.

To help with that last part, two mathematicians have created a computer algorithm that draws districts according to state laws but irrespective of voting histories. They then compare these hypothetical districts “average” results to the proposed maps to see how far off the new plans are. In other words, they basically create a normal distribution of results, then see how the current proposals line up. To give context, of the 24,000 maps they drew for North Carolina, all were less gerrymandered than the one the legislature came up with. When a group of retired judges tried to draw new districts for North Carolina, they were less gerrymandered than 75% of the computer models.

It’s interesting to note that some of the most gerrymandered states by this metric are actually not the ones being challenged. Here are all the states with more than 8 districts and how they fared in 2012. The ones in red are the ones facing a court challenge. The range is based on plausible vote swings:

Now again, none of these methods may be perfect, but they do start to point the way towards less biased ways of drawing districts and neutral tests for accusations of bias. The authors note that the courts currently employ simple mathematical tests to evaluate if districts have equal populations: +/- 10%.  It will be interesting to see if any of these tests are considered straightforward enough for a legal standard. Stay tuned!

Using Data to Fight Data Fraud: the Carlisle Method

I’m creating a new tag for my posts “stories to check back in on”, for those times when I want to remember to see how a sensational headline played out once the dust settled.

The particular story prompting this today is the new paper “Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals” that is getting some press under headlines like “Dozens of recent clinical trials may contain wrong or falsified data, claims study“. The paper author (John Carlisle) is one of the people who helped expose the massive fraud by Yoshitaka Fujii, an anesthesiologist who ended up having 183 papers retracted due to fabricated results.

While his work previously focused on the work of anesthesiologists, Carlisle decided to use similar statistical techniques on a much broader group of papers. As he explains in the paper, he started to question if anesthesiology journals were retracting more papers because anesthesiologists were more likely to fabricate, or if the community was simply keeping a sharper eye out for fabrications. To test this out he examined over 5,000 papers published in both specialty anesthesia journals and major medical journals like the New England Journal of Medicine and the Journal of the American Medical Association, looking for data anomalies that might point to fraud or errors.

The method Carlisle used to do this is an interesting one. Rather than look at the primary outcomes of the papers for evidence of fabrication, he looked at the baseline variables like height and weight of the patients in the control groups vs the intervention groups. In a proper randomized control trial, they should be about the same. His statistical methods are described in depth here, but in general his calculations  focuses on the standard deviation of both populations.  The bigger the difference between the control group and the intervention group, the more likely your numbers are wrong. The math isn’t simple, but the premise is: data frauds will probably work hard to make the primary outcome realistic, but likely not pay much attention to the more boring variables. Additionally, most of us reading papers barely glance at patient height and weight, particularly the standard deviations associated with them. It’s the data fraud equivalent of a kid telling their mom they cleaned their room when really they just shoved everything in the closet….you focus on where people will look first, and ignore everything else.

This paper gets REALLY  interesting because Carlisle not only opened the closets, but he published the names (or rather the journal locations) of the studies he thinks are particularly suspect….about 90 in all or 1-2% of the total. He also mentions that some authors have multiple studies that appear to have anomalous baseline data. Given the information he provided, the journals will most certainly have to investigate and people will likely be combing over the work of those named. If you want to see at least some of the named papers, check out this post.

Now I definitely am a fan of finding and calling out data frauds, but I do have to wonder about the broad net cast here. I have not looked up the individual studies, nor do I know enough about most of the fields to know the reputations of the authors in question, but I do wonder what the explanations for the issue with the 90 trials will be. With all the care taken by Carlisle (ie setting his own p-value cutoff at < .0001) It seems likely that a large number of these will be real fraud cases, and that’s great! But it seems likely at least some will have a different explanation, and I’m not sure how many will be in each bucket. The paper itself raises these possibilities, but it will be interesting to see what proportion of the sample was innocent mistakes vs fraud.

This is an interesting data point in the whole ethics of calling BS debate. While the paper is nuanced in its conclusions and raises multiple possibilities for the data issues, it’s hard to imagine the people named won’t have a very rough couple of months. This is why I want to check back in a year to see what happened. For more interesting discussion of the ethics and possible implications, see here. Some interesting point raised there include a discussion about statute of limitations (are we going back for decades?) and how to judge trials going forward now that the method has been released to the public.

To note: Carlisle has published a sensational paper here, but he actually has a great idea about how to use this going forward. He recommends all journals should do this analysis on papers submitted or accepted for publication, so they can inquire about discrepancies with authors up front. This would make sure that innocent mistakes were caught before being published, and that possible frauds would know there were extra safeguards in place. That seems a nice balance of addressing a problem while not overreaching, and apparently has already been implemented by the journal Anaesthesia.