One of the more basic principles of science is the idea of replication or reproducibility. In its simplest form, this concept is pretty obvious: if you really found the phenomena you say you found, then when I look where you looked I should be able to find it too. Most people who have ever taken a science class are at least theoretically familiar with this concept, and it makes a lot of sense…..if someone tells you the moon is green, and no one else backs them up, then we know the observation of the moon being green is actually a commentary on the one doing the looking as opposed to being a quality of the moon itself.
That being said, this is yet another concept that everyone seems to forget the minute they see a headline saying “NEW STUDY BACKS UP IDEA YOU ALREADY BELIEVED TO BE TRUE”. While every field faces different challenges and has different standards, scientific knowledge is not really a binary “we know it or we don’t” thing. Some studies are stronger than other studies, but in general, the more studies that find the same thing, the stronger we consider the evidence. While many fields have different nuances, I wanted to go over some of the possibilities we have when someone tries to go back and replicate someone else’s work.
Quick note: in general, replication is really applicable to currently observable/ongoing phenomena. The work of some fields can rely heavily on modeling future phenomena (see: climate science), and obviously future predictions cannot be replicated in the traditional sense. Additionally attempts to explain past behavior can often not be replicated (see: evolution) as they have already occurred.
Got it? Great….let’s talk in generalities! So what happens when someone tries to replicate a study?
- The replication works. This is generally considered a very good thing. Either someone attempted to redo the study under similar circumstances and confirmed the original findings, or someone undertook an even stronger study design and still found the same findings. This is what you want to happen. Your case is now strong. This is not always 100% definitive, as different studies could replicate the same error over and over again (see the ego depletion studies that were replicated 83 times before they were called in to question) but in general, this is a good sign.
- You get a partial replication. For most science, the general trajectory of discovery is one of refining ideas. In epidemiology for example, you start with population level correlations, and then try to work your way back to recommendations that can be useful on an individual level. This is normal. It also means that when you try to replicate certain findings, it’s totally normal to find that the original paper had some signal and some noise. For example, a few months ago I wrote about headline grabbing study that claimed that women’s political, social and religious preference varied with their monthly cycle. The original study grabbed headlines in 2012, and by the time I went back and looked at it in 2016 further studies had narrowed the conclusions substantially. Now the claims were down to “facial preferences for particular politicians may vary somewhat based on monthly cycles, but fundamental beliefs do not”. I would like to see the studies replicated using the 2016 candidates to see if any of the findings bear out, but even without this you can see that subsequent studies narrowed the findings quite a bit. This is not necessarily a bad thing, but actually a pretty normal process. Almost every initial splashy finding will undergo some refinement as it continues to be studied.
- The study is disputed. Okay, this one can meander off the replication path here a bit. When I say “disputed” here, I’m referencing the phenomena that occurs when one studies findings are called in to question by another study that found something different, but they used two totally different methods to get there and now no one knows what’s right. Slate Star Codex has a great overview of this in Beware the Man of One Study, and a great example in Trouble Walking Down the Hallway. In the second post he covers two studies, one that shows a pro-male bias in hiring a lab manager and one that shows a pro-female bias in hiring a new college faculty member. Everyone used the study whose conclusions they liked better to bolster their case while calling the other study “discredited” or “flawed”. The SSC piece breaks it down nicely, but it’s actually really hard to tell what happened here and why these studies would be so different. To note: neither came to the conclusion that no one was biased. Maybe someone switched the data columns on one of them.
- The study fails to replicate. As my kiddos preschool teacher would say “that’s sad news”. This is what happens when a study is performed under the same conditions and effect goes away. For a good example, check out the Power Pose/Wonder Woman study, where a larger sample size undid the original findings….though not before TED talk went viral or the book the research wrote about it got published. This isn’t necessarily bad either, thanks to p-value dependency we expect some of this, but in some fields it has become a bit of a crisis.
- Fraud is discovered. Every possibility I mentioned above assumes good faith. However, some of the most bizarre scientific fraud situations get discovered because someone attempts to replicate a previous published study and can’t do it. Not replicate the findings mind you, but the experimental set up itself. Most methods sections are dense enough that any set up can sound plausible on paper, but it’s in practice that anomalies appear. For example, in the case of a study about how to change people’s views on gay marriage, a researcher realized the study set up was prohibitively expensive when he tried to replicate the original. While straight up scientific fraud is fairly rare, it does happen. In these cases, attempts at replication are some of our best allies at keeping everyone honest.
It’s important to note here that not all issues fall neatly in one of these categories. For example, in the Women, Ovulation and Voting study I mentioned in #2, two of the research teams had quite the back and forth over whether or not certain findings had been replicated. In an even more bizarre twist, when the fraudulent study from #5 was actually done, the findings actually stood (still waiting on subsequent studies!). For psychology, the single biggest criticism of the replication project (which claims #4) is that it’s replications aren’t fair and thus it’s actually #3 or #2.
My point here is not necessarily that any one replication effort is obviously in one bucket or the other, but to point out that there are a range of possibilities available. As I said in the beginning, very few findings will end up going in a “totally right” or “total trash” bucket, at least at first. However, it’s important to realize any time you see a big exciting headline that subsequent research will almost certainly add or subtract something from the original story. Wheels of progress and all that.