Using Data to Fight Data Fraud: the Carlisle Method

I’m creating a new tag for my posts “stories to check back in on”, for those times when I want to remember to see how a sensational headline played out once the dust settled.

The particular story prompting this today is the new paper “Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals” that is getting some press under headlines like “Dozens of recent clinical trials may contain wrong or falsified data, claims study“. The paper author (John Carlisle) is one of the people who helped expose the massive fraud by Yoshitaka Fujii, an anesthesiologist who ended up having 183 papers retracted due to fabricated results.

While his work previously focused on the work of anesthesiologists, Carlisle decided to use similar statistical techniques on a much broader group of papers. As he explains in the paper, he started to question if anesthesiology journals were retracting more papers because anesthesiologists were more likely to fabricate, or if the community was simply keeping a sharper eye out for fabrications. To test this out he examined over 5,000 papers published in both specialty anesthesia journals and major medical journals like the New England Journal of Medicine and the Journal of the American Medical Association, looking for data anomalies that might point to fraud or errors.

The method Carlisle used to do this is an interesting one. Rather than look at the primary outcomes of the papers for evidence of fabrication, he looked at the baseline variables like height and weight of the patients in the control groups vs the intervention groups. In a proper randomized control trial, they should be about the same. His statistical methods are described in depth here, but in general his calculations  focuses on the standard deviation of both populations.  The bigger the difference between the control group and the intervention group, the more likely your numbers are wrong. The math isn’t simple, but the premise is: data frauds will probably work hard to make the primary outcome realistic, but likely not pay much attention to the more boring variables. Additionally, most of us reading papers barely glance at patient height and weight, particularly the standard deviations associated with them. It’s the data fraud equivalent of a kid telling their mom they cleaned their room when really they just shoved everything in the closet….you focus on where people will look first, and ignore everything else.

This paper gets REALLY  interesting because Carlisle not only opened the closets, but he published the names (or rather the journal locations) of the studies he thinks are particularly suspect….about 90 in all or 1-2% of the total. He also mentions that some authors have multiple studies that appear to have anomalous baseline data. Given the information he provided, the journals will most certainly have to investigate and people will likely be combing over the work of those named. If you want to see at least some of the named papers, check out this post.

Now I definitely am a fan of finding and calling out data frauds, but I do have to wonder about the broad net cast here. I have not looked up the individual studies, nor do I know enough about most of the fields to know the reputations of the authors in question, but I do wonder what the explanations for the issue with the 90 trials will be. With all the care taken by Carlisle (ie setting his own p-value cutoff at < .0001) It seems likely that a large number of these will be real fraud cases, and that’s great! But it seems likely at least some will have a different explanation, and I’m not sure how many will be in each bucket. The paper itself raises these possibilities, but it will be interesting to see what proportion of the sample was innocent mistakes vs fraud.

This is an interesting data point in the whole ethics of calling BS debate. While the paper is nuanced in its conclusions and raises multiple possibilities for the data issues, it’s hard to imagine the people named won’t have a very rough couple of months. This is why I want to check back in a year to see what happened. For more interesting discussion of the ethics and possible implications, see here. Some interesting point raised there include a discussion about statute of limitations (are we going back for decades?) and how to judge trials going forward now that the method has been released to the public.

To note: Carlisle has published a sensational paper here, but he actually has a great idea about how to use this going forward. He recommends all journals should do this analysis on papers submitted or accepted for publication, so they can inquire about discrepancies with authors up front. This would make sure that innocent mistakes were caught before being published, and that possible frauds would know there were extra safeguards in place. That seems a nice balance of addressing a problem while not overreaching, and apparently has already been implemented by the journal Anaesthesia.


2 thoughts on “Using Data to Fight Data Fraud: the Carlisle Method

  1. Regarding your suggestion “all journals should do this analysis on papers submitted or accepted for publication”, I think you are missing Goodhart’s law here: “When a measure becomes a target, it ceases to be a good measure.” If people are aware of this tests, they will make sure during their manipulation that the test results come back negative.


    • Alas, you’re probably correct. The editorial that accompanied the paper raised a similar point. I do wonder if there some point at which you can make faking the statistics so cumbersome that it starts to discourages fraud anyway, but I guess as long as the incentives are there is someone will find a way of getting around any standards put into place.


Comments are closed.