Statisticians and Gerrymandering

Okay, I just said I was blogging less, but this story was too interesting to pass without comment. A few days ago it was announced that the Supreme Court had agreed to hear a case about gerrymandering, or the practice of redrawing voting district lines to influence the outcome of elections. This was a big deal because previously the court has only heard these cases when the lines had something to do with race, but had no comment on redraws that were based on politics. The case they agreed to hear was from Wisconsin, and a lower court found that a 2011 redistricting plan was so partisan that it potentially violated the rights of all minority party voters in the affected districts.

Now obviously I’ll leave it to better minds to comment on the legal issues here, but I found this article on how statisticians are getting involved in the debate quite fascinating. Obviously both parties want the district lines to favor their own candidates, so it can be hard to cut through the noise and figure out what a “fair” plan would actually look like. Historically, this came down to just two parties bickering over street maps, but now with more data available there’s actually a chance that both gerrymandering and the extent of gerrymandering can be measured.

One way of doing this is called the “efficiency gap” and is the work of Eric McGhee and Nicholas Stephanopolous, who explain it here. Basically this measures “wasted” votes, which they explain like this:

Suppose, for example, that a state has five districts with 100 voters each, and two parties, Party A and Party B. Suppose also that Party A wins four of the seats 53 to 47, and Party B wins one of them 85 to 15. Then in each of the four seats that Party A wins, it has 2 surplus votes (53 minus the 51 needed to win), and Party B has 47 lost votes. And in the lone district that Party A loses, it has 15 lost votes, and Party B has 34 surplus votes (85 minus the 51 needed to win). In sum, Party A wastes 23 votes and Party B wastes 222 votes. Subtracting one figure from the other and dividing by the 500 votes cast produces an efficiency gap of 40 percent in Party A’s favor.

Basically this metric highlights unevenness across the state. If one party is winning dramatically in one district and yet losing in all the others, you have some evidence that those lines may not be fair. If this is only happening to one party and never to the other, your evidence grows. Now there are obvious responses to this….maybe some party members really are clustering together in certain locations….but it does provide a useful baseline measure. If your current plan increases this gap in favor of the party in power, then that party should have to offer some explanation. The author’s proposal is that if the other party could show a redistricting plan that had a smaller gap, the initial plan would be considered unconstitutional.

To help with that last part, two mathematicians have created a computer algorithm that draws districts according to state laws but irrespective of voting histories. They then compare these hypothetical districts “average” results to the proposed maps to see how far off the new plans are. In other words, they basically create a normal distribution of results, then see how the current proposals line up. To give context, of the 24,000 maps they drew for North Carolina, all were less gerrymandered than the one the legislature came up with. When a group of retired judges tried to draw new districts for North Carolina, they were less gerrymandered than 75% of the computer models.

It’s interesting to note that some of the most gerrymandered states by this metric are actually not the ones being challenged. Here are all the states with more than 8 districts and how they fared in 2012. The ones in red are the ones facing a court challenge. The range is based on plausible vote swings:

Now again, none of these methods may be perfect, but they do start to point the way towards less biased ways of drawing districts and neutral tests for accusations of bias. The authors note that the courts currently employ simple mathematical tests to evaluate if districts have equal populations: +/- 10%.  It will be interesting to see if any of these tests are considered straightforward enough for a legal standard. Stay tuned!

Using Data to Fight Data Fraud: the Carlisle Method

I’m creating a new tag for my posts “stories to check back in on”, for those times when I want to remember to see how a sensational headline played out once the dust settled.

The particular story prompting this today is the new paper “Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals” that is getting some press under headlines like “Dozens of recent clinical trials may contain wrong or falsified data, claims study“. The paper author (John Carlisle) is one of the people who helped expose the massive fraud by Yoshitaka Fujii, an anesthesiologist who ended up having 183 papers retracted due to fabricated results.

While his work previously focused on the work of anesthesiologists, Carlisle decided to use similar statistical techniques on a much broader group of papers. As he explains in the paper, he started to question if anesthesiology journals were retracting more papers because anesthesiologists were more likely to fabricate, or if the community was simply keeping a sharper eye out for fabrications. To test this out he examined over 5,000 papers published in both specialty anesthesia journals and major medical journals like the New England Journal of Medicine and the Journal of the American Medical Association, looking for data anomalies that might point to fraud or errors.

The method Carlisle used to do this is an interesting one. Rather than look at the primary outcomes of the papers for evidence of fabrication, he looked at the baseline variables like height and weight of the patients in the control groups vs the intervention groups. In a proper randomized control trial, they should be about the same. His statistical methods are described in depth here, but in general his calculations  focuses on the standard deviation of both populations.  The bigger the difference between the control group and the intervention group, the more likely your numbers are wrong. The math isn’t simple, but the premise is: data frauds will probably work hard to make the primary outcome realistic, but likely not pay much attention to the more boring variables. Additionally, most of us reading papers barely glance at patient height and weight, particularly the standard deviations associated with them. It’s the data fraud equivalent of a kid telling their mom they cleaned their room when really they just shoved everything in the closet….you focus on where people will look first, and ignore everything else.

This paper gets REALLY  interesting because Carlisle not only opened the closets, but he published the names (or rather the journal locations) of the studies he thinks are particularly suspect….about 90 in all or 1-2% of the total. He also mentions that some authors have multiple studies that appear to have anomalous baseline data. Given the information he provided, the journals will most certainly have to investigate and people will likely be combing over the work of those named. If you want to see at least some of the named papers, check out this post.

Now I definitely am a fan of finding and calling out data frauds, but I do have to wonder about the broad net cast here. I have not looked up the individual studies, nor do I know enough about most of the fields to know the reputations of the authors in question, but I do wonder what the explanations for the issue with the 90 trials will be. With all the care taken by Carlisle (ie setting his own p-value cutoff at < .0001) It seems likely that a large number of these will be real fraud cases, and that’s great! But it seems likely at least some will have a different explanation, and I’m not sure how many will be in each bucket. The paper itself raises these possibilities, but it will be interesting to see what proportion of the sample was innocent mistakes vs fraud.

This is an interesting data point in the whole ethics of calling BS debate. While the paper is nuanced in its conclusions and raises multiple possibilities for the data issues, it’s hard to imagine the people named won’t have a very rough couple of months. This is why I want to check back in a year to see what happened. For more interesting discussion of the ethics and possible implications, see here. Some interesting point raised there include a discussion about statute of limitations (are we going back for decades?) and how to judge trials going forward now that the method has been released to the public.

To note: Carlisle has published a sensational paper here, but he actually has a great idea about how to use this going forward. He recommends all journals should do this analysis on papers submitted or accepted for publication, so they can inquire about discrepancies with authors up front. This would make sure that innocent mistakes were caught before being published, and that possible frauds would know there were extra safeguards in place. That seems a nice balance of addressing a problem while not overreaching, and apparently has already been implemented by the journal Anaesthesia.