Big news in the meta science world last week, when Ben Goldacre (of Bad Science and Bad Pharma fame) released some new studies about ethical standards in scientific publishing. The studies called “COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time” and “COMPare: Qualitative analysis of researchers’ responses to critical correspondence on a cohort of 58 misreported trials” wanted to look at what happened when study authors didn’t follow the ethical or quality standards that the journals they published in set forward. The first paper looked at the journals response to issues that were pointed out, the second looked at the response of the paper authors themselves. Goldacre and his team found these papers, found the errors, then pointed them out to the journals to watch what happened. Unfortunately, it went less well than you would hope.
Before we get in to the results though, I want to give a bit of context to this. Over the past few years, there’s been a lot of debate over how to ethically “call out” bad publication patterns. The Calling BS guys have a whole section about this in their class “The Ethics of Calling Bullshit”, which I wrote about here. To highlight concerns about scientific publishing people have published fake papers, led replication efforts, and developed statistical tools to try to ferret out bad actors. In all of these cases, concerns have been raised about the ethics of each approach. People have complained about mob mentalities or picking on individual researchers, taking advantage of people’s trust, or using these things to advance their own careers. “Science is already self-correcting” the complaint goes “no need to make a bigger deal out of it”.
I have to think Goldacre had this in mind when he designed this study. His approach is fascinating in that it actually shares the blame between journals and authors, and also focuses heavily on the ability of people to respond to criticism. Journals tend to point to their ethical/quality standards when proving that they are concerned about quality of studies they publish, but it is often unclear how those standards are actually enforced. Additionally, issues with a journals standards or enforcement are a big deal with a widespread impact. Finding a study author who made a mistake or committed fraud is great, but still only impacts the person in question. Finding out a journal has a systemic issue can have ripple effects to hundreds of studies, and a whole field of research. To highlight this fact, Goldacre and his team specifically looked at some of the biggest journals out there: the New England Journal of Medicine (NEJM), Journal of the American Medical Association (JAMA), Annals of Internal Medicine, British Medical Journal (BMJ) and Lancet. No small fish in this pond.
In the first study, the journals and their responses were the focus. Goldacre and his team looked at 67 trials and found 58 had issues. The metrics they were looking for were simple: did the papers publish their publicly available pre-trial registration outcomes, or did they explain any changes from their original plan. These are the basic requirements laid out by the CONSORT guidelines (found here) which all the journals said they endorse. Basic findings:
- Only 40% of their letters were published
- JAMA and NEJM published NONE of the letters they received
- Most letters were published online only
- Letters that were published in the hard copy journals were often delayed by months
Now the more concerning findings were grouped by the researchers in to themes:
- Conflicts with CONSORT standards Despite saying they endorsed the CONSORT standards, when instances of non-compliance were pointed out to them the journals said they didn’t really agree with the standard or think it was necessary
- Timing of pre-specification/registries in general: Several journals objected that actually the trial pre-registrations were done too early or were too unreliable to go by.
- Rhetoric: This was my favorite category. This is where Goldacre et al put complaints like “space constraints prevented us from adding a reason why we changed our outcome metric”, with a note that mentioned that they had plenty space to add new and interesting outcomes. They also got some “we applaud your goal but think you’re going about this poorly”.
- Journal processes: This one was weird too. Journals clarified that they asked authors to do things by the book, despite Goldacre et al showing that the authors weren’t actually doing those things. Odd defense.
- Placing responsibility on others: Sometimes the journals claimed it was actually up to the reader to go check the preregistration. Sometimes they said it was the preregistration databases that were wrong, not them. The Lancet didn’t reply at all and just let the authors of the paper in question respond.
The paper goes on to also summarize the criticism they got from journal editors once reporters started asking. Just scroll down to the tables in the paper here to read all the gory details. The summary of the responses for individual journals was pretty interesting too:
- NEJM: Published no letters, said they never required authors to adhere to CONSORT. Provided journalists with a rebuttal they had not sent to Goldacre et al.
- JAMA: Published no letters, said they didn’t have enough detail to find the errors. Goldacre points out they have a word limit, and that they linked to their full complaints on their website.
- Lancet: Published almost every letter, but its editors didn’t reply to anything.
- BMJ: Published all letters and issued a correction for one study out of 3
- Annals: Got in a weird fight with the COMPare folks that has its own timeline in the paper
Overall, the results seem to suggest that there are still a lot of work to be done in getting journals to adhere to clear and transparent standards. They suggested that something like CONSORT should perhaps have a list of those who “endorse” the standards and those who agree to “enforce” the standards.
They also noted that they were actually quite surprised by the number of responses that they got saying that trial pre-registrations were inaccurate or not useful, because they noted that journals were actually one of the driving forces behind getting those set up. The idea that they were useless/not their problem was a very troubling rewrite of history.
Interestingly, the COMPare folks noted that for all the back and forth, they had a feeling their findings might actually be making a difference. They plan on doing a follow up study to see if anything’s changed. Something about knowing people are watching does tend to
Alright, I’ve gone on a bit on this one, I’ll wait until next week to review the second paper.