Does Popularity Influence Reliability? A Discussion

Welcome to the “Papers in Meta Science” where we walk through published papers that use science to scrutinize science. At the moment we’re taking a look at the paper “Large-Scale Assessment of the Effect of Popularity on the Reliability of Research” by Pfeiffer and Hoffman. Read the introduction here, and the methods and results section here.

Well hi! Welcome back to our review of how scientific popularity influences the reliability of results. When last we left off we had established that the popularity of protein interactions did not effect the reliability of results for pairings initially, but did effect the reliability of results involving those popular proteins. In other words, you can identify the popular kids pretty well, but figuring out who they are actually connected to gets a little tricky. People like being friends with the popular kids.

Interestingly, the overall results showed a much stronger effect for the “multiple testing hypothesis” than the “inflated error effect” hypothesis, meaning that many of the false positive results seem to be coming from the extra teams running many different experiments and getting a predictable number of false positives. More overall tests = more overall false positives. This effect was 10 times stronger than the inflated error effect, though that was still present.

So what do should we do here? Well, a few things:

  1. Awareness Researchers should be extra aware that running lots of tests on a new and interesting protein could result in less accurate results.
  2. Encourage novel testing Continue to encourage people to branch out in their research as opposed to giving more funding to those researching more popular topics
  3. Informal research wikis This was an interesting idea I hadn’t seen before: use the Wikipedia model to let researchers note things they had tested that didn’t pan out. As I mentioned when I reviewed the Ioannidis paper, there’s not an easy way of knowing how many teams are working on a particular question at any given time. Setting up a less formal place for people to check what other teams were doing may give researchers better insight in to how many false positives they can expect to see.

Overall, it’s also important to remember that this is just one study and that findings in other fields may be different. It would be interesting to see a similar thing repeated in a social science type filed or something similar to see if public interest makes results better or worse.

Got another paper you’re interested in? Let me know!

Does Popularity Influence Reliability? Methods and Results

Welcome to the “Papers in Meta Science” where we walk through published papers that use science to scrutinize science. At the moment we’re taking a look at the paper “Large-Scale Assessment of the Effect of Popularity on the Reliability of Research” by Pfeiffer and Hoffman. Read the introduction here.

Okay, so when we left off last time, we were discussing the idea that findings in (scientifically) popular fields were less likely to be reliable than those in less popular fields.  The theory goes that popular fields would have more false positives (due to an overall higher number of experiments being run) or that increased competition would increase things like p-hacking and data dredging on the part of research teams, or both.

Methods: To test this hypothesis empirically, the researchers decided to look at the exciting world of protein interactions in yeast. While this is not what most people think about when they think of “popular” research, it’s actually a great choice. Since the general public probably is mostly indifferent to protein interactions, all the popularity studied here will be purely scientific. Any bias the researchers picked up will be from their scientific training, not their own pre-conceived beliefs.

To get data on protein interactions, the researchers pulled large data sets that were casting a wide net and smaller data sets that were looking for specific proteins and compared the results between the two. The thought was that the large data sets were testing large numbers of interactions all using the same algorithm and would be less likely to be biased by human judgement and could therefore be used to confirm or cast doubt on the smaller experiments that required more human intervention.

Thanks to the wonders of text mining, the sample size here was HUGE – about 60,000 statements/conclusions made about 30,000 hypothesized interactions. The smaller data sets had about 6,000 statements/conclusions about 4,000 interactions.

Results: The overall results showed some interesting differences in confirmation rates:

Basically, the more popular an interaction, the more often the interaction was confirmed. However, the more popular an interaction partner was, the less often it was confirmed. Confused? Try this analogy: think of protein interactions as the popular kids in school. The popular kids were fairly easy to identify, and researchers got the popular kids right a lot of the time. However, once they tried to turn that around and figure out who interacted with the popular kids later, they started getting a lot of false positives. Just like the less-cool kids in high school might overplay their relationship to the cooler kids, many researchers tried to tie their new findings to previously recognized popular findings.

This held true for both the “inflated error effect”  and the “multiple testing effect”. In other words, having a popular protein involved made both the individual statements or conclusions less likely to be validated, and ended up with more interactions that were found once but then never replicated. This held true across all types of experimental techniques, and it held true across databases that were curated by experts vs broader searches.

We’ll dive in to the conclusions we can draw from this next week.

Does Popularity Influence Reliability? An Introduction

Well hi there! Welcome to the next edition of “Papers in Meta Science” where I walk through interesting papers that use science to scrutinize science. During the first go around we looked at the John Ioannidis paper “Why Most Published Research Findings Are False”, and this time we’re going to look at a paper that attempted to prove one of that papers key assertions: that “hot” scientific fields produce less trustworthy results than less popular fields. This paper is called “Large-Scale Assessment of the Effect of Popularity on the Reliability of Research“, and was published on PlosOne by Pfeiffer and Hoffmann in 2009. They sought to test empirically whether or not this particular claim was true using the field of protein interactions.

Before we get to the good stuff though, I’d expect this series to have about 3 parts:

  1. The Introduction/Background. You’re reading this one right now.
  2. Methods and Results
  3. Further Discussion

Got it? Let’s go!

Introduction:  As I mentioned up front, one of the major goals of this paper was to confirm or refute the mathematical theory put forth by John Ioannidis that “hot” fields were more likely to produce erroneous results than those that were less popular. There are two basic theories as to why this could be the case:

  1. Popular fields create competition, and competitive teams are more likely to be incentivized to cut corners or do what it takes to get positive results (Ioannidis Corollary 5)
  2. Lots of teams working on a problem means lots of hypothesis testing, and lots of tested hypotheses means more false positives due to random chance (Ioannidis Corollary 6).

While Pfeiffer and Hoffman don’t claim to be able to differentiate between those two motives, they were hopeful that by looking at the evidence they could figure out if this effect was real and if it was perhaps estimate a magnitude. For their scrutiny, they chose the field of protein interactions in yeast.

This may seem a little counter-intuitive, as almost no definition of “popular science” conjures pictures of protein interactions. However, it is important to remember that the point of this paper was to examine scientific popularity, not mentions in the popular press. Since most of us probably already assume that getting headline grabbing research can cause it’s own set of bias problems, it’s interesting to consider a field that doesn’t grab headlines. Anyway, despite it’s failure to lead the 6 o’clock news, it turns out that the world of protein interactions actually does have a popularity issue. Some proteins and their corresponding genes are studied far more frequently than others, and this makes it a good field for examination. If a field like this can fall prey to the effect of multiple teams, than we can assume that more public oriented fields could as well.

Tune in next week to see what we find out!