I Got a Problem, Don’t Know What to do About It

Help and feedback request! This past weekend I encountered an interesting situation where I discovered that a study I had used to help make a point in several posts over the years has come under some scrutiny (full story at the bottom of the post). I have often blogged about meta-science, but this whole incident got me thinking about meta-blogging, and what the responsibility of someone like me is when you find out a study you’ve leaned on may not be as good as you thought it was. I’ve been poking around the internet for a few days, and I really can’t find much guidance on this.

I decided to put together a couple quick poll questions to gauge people’s feelings on this. Given that I tend to have some incredibly savvy readers, I would also love to hear more lengthy opinions either in the comments or sent to me directly.  The polls will stay open for a month, and I plan on doing a write up of the results. The goal of these poll questions is to assess a starting point for error correction, as I completely acknowledge the specifics of a situation may change people’s views. If you have strong feelings about what would make you take error correction more or less seriously, please leave it in the comments!

Why I’m asking (aka the full story)

This past weekend I encountered a rather interesting situation that I’m looking for some feedback on. I was writing my post for week 6 of the Calling BS read-along, and remembered an interesting study that found that  people were more likely to find stories with “science pictures” or graphs credible than those that were just text. It’s a study I had talked about in one of my Intro to Internet Science posts  and I have used it in presentations to back up my point that graphs are something you should watch closely. Since the topic of the post was data visualization and the study seemed relevant, I included it in the intro to my write up.

The post had only been up for a few hours when I got a message from someone tipping me off that the lab the study was from was under some scrutiny for some questionable data/research practices. They thought I might want to review the evidence and consider removing the reference to the study from my post. While the study I used doesn’t appear to be one of the ones being reviewed at the moment, I did find the allegations against the lab concerning. Since the post didn’t really change without the citation, I edited the post to remove the citation and replaced it with a note alerting people the paragraph had been modified. I put a full explanation at the bottom of the post that included the links to a summary of the issue and the research lab’s response.

I didn’t stop thinking about it though. There’s not much I could have done about using the study originally….I started citing it almost a full year before concerns were raised, and the “visuals influence perception” point seemed reasonable. I’ll admit I missed the story about the concerns with the research group, but even if I’d seen it I don’t know if I would have remembered that they were the ones who had done that study. Now that I know though, I’ve been mulling over what the best course of action is in situations like this. As someone who at least aspires to blog about truth and accuracy, I’ve always felt that I should watch my own blogging habits pretty carefully. I didn’t really question removing the reference, as I’ve always tried to update/modify things when people raise concerns. I also don’t modify posts after they’ve been published without noting that I’ve done so, other than fixing small typos. I feel good about what I did with that part.

What troubled me more was the question of “how far back to I go?” As I mentioned, I know I’ve cited that study previously. I know of at least one post where I used it, and there may be more. Given that my Intro to Internet Science series is occasionally assigned by high school teachers, I feel I have some obligation to go a little retro on this.

 

Current hypothesis (aka my gut reaction)

My gut reaction here is that I should probably start keeping an updates/corrections/times I was wrong page just to discuss these issues. While I think notations should be made in the posts themselves, some of them warrant their own discussion. If I’m going to blog about where others go wrong, having a dedicated place to discuss where I go wrong seems pretty fair.  I also would likely put some links to my “from the archives” columns to have a repository for posts that have more updates versions. Not only would this give people somewhere easy to look for updates, give some transparency to my own process and weaknesses, but it would also probably give me a better overview of where I tend to get tripped up and help me improve. If I get really crazy I might even start doing root cause analysis investigations in to my own missteps. Thoughts on this or examples of others doing this would be appreciated.

 

5 Replication Possibilities to Keep in Mind

One of the more basic principles of science is the idea of replication or reproducibility.  In its simplest form, this concept is pretty obvious: if you really found the phenomena you say you found, then when I look where you looked I should be able to find it too. Most people who have ever taken a science class are at least theoretically familiar with this concept, and it makes a lot of sense…..if someone tells you the moon is green, and no one else backs them up, then we know the observation of the moon being green is actually a commentary on the one doing the looking as opposed to being a quality of the moon itself.

That being said, this is yet another concept that everyone seems to forget the minute they see a headline saying “NEW STUDY BACKS UP IDEA YOU ALREADY BELIEVED TO BE TRUE”. While every field faces different challenges and has different standards, scientific knowledge is not really a binary “we know it or we don’t” thing. Some studies are stronger than other studies, but in general, the more studies that find the same thing, the stronger we consider the evidence. While many fields have different nuances, I wanted to go over some of the possibilities we have when someone tries to go back and replicate someone else’s work.

Quick note: in general, replication is really applicable to currently observable/ongoing phenomena. The work of some fields can rely heavily on modeling future phenomena (see: climate science), and obviously future predictions cannot be replicated in the traditional sense. Additionally attempts to explain past behavior can often not be replicated (see: evolution) as they have already occurred.

Got it? Great….let’s talk in generalities! So what happens when someone tries to replicate a study?

  1. The replication works. This is generally considered a very good thing. Either someone attempted to redo the study under similar circumstances and confirmed the original findings, or someone undertook an even stronger study design and still found the same findings. This is what you want to happen. Your case is now strong. This is not always 100% definitive, as different studies could replicate the same error over and over again (see the ego depletion studies that were replicated 83 times before they were called in to question) but in general, this is a good sign.
  2. You get a partial replication. For most science, the general trajectory of discovery is one of refining ideas.  In epidemiology for example, you start with population level correlations, and then try to work your way back to recommendations that can be useful on an individual level. This is normal. It also means that when you try to replicate certain findings, it’s totally normal to find that the original paper had some signal and some noise. For example, a few months ago I wrote about headline grabbing study that claimed that women’s political, social and religious preference varied with their monthly cycle. The original study grabbed headlines in 2012, and by the time I went back and looked at it in 2016 further studies had narrowed the conclusions substantially. Now the claims were down to “facial preferences for particular politicians may vary somewhat based on monthly cycles, but fundamental beliefs do not”. I would like to see the studies replicated using the 2016 candidates to see if any of the findings bear out, but even without this you can see that subsequent studies narrowed the findings quite a bit. This is not necessarily a bad thing, but actually a pretty normal process. Almost every initial splashy finding will undergo some refinement as it continues to be studied.
  3. The study is disputed. Okay, this one can meander off the replication path here a bit. When I say “disputed” here, I’m referencing the phenomena that occurs when one studies findings are called in to question by another study that found something different, but they used two totally different methods to get there and now no one knows what’s right. Slate Star Codex has a great overview of this in Beware the Man of One Study, and a great example in Trouble Walking Down the Hallway. In the second post he covers two studies, one that shows a pro-male bias in hiring a lab manager and one that shows a pro-female bias in hiring a new college faculty member. Everyone used the study whose conclusions they liked better to bolster their case while calling the other study “discredited” or “flawed”. The SSC piece breaks it down nicely, but it’s actually really hard to tell what happened here and why these studies would be so different. To note: neither came to the conclusion that no one was biased. Maybe someone switched the data columns on one of them. 
  4. The study fails to replicate. As my kiddos preschool teacher would say “that’s sad news”. This is what happens when a study is performed under the same conditions and effect goes away. For a good example, check out the Power Pose/Wonder Woman study, where a larger sample size undid the original findings….though not before TED talk went viral or the book the research wrote about it got published. This isn’t necessarily bad either, thanks to p-value dependency we expect some of this, but in some fields it has become a bit of a crisis.
  5. Fraud is discovered. Every possibility I mentioned above assumes good faith. However, some of the most bizarre scientific fraud situations get discovered because someone attempts to replicate a previous published study and can’t do it. Not replicate the findings mind you, but the experimental set up itself. Most methods sections are dense enough that any set up can sound plausible on paper, but it’s in practice that anomalies appear. For example, in the case of a study about how to change people’s views on gay marriage, a researcher realized the study set up was prohibitively expensive when he tried to replicate the original. While straight up scientific fraud is fairly rare, it does happen. In these cases, attempts at replication are some of our best allies at keeping everyone honest.

It’s important to note here that not all issues fall neatly in one of these categories. For example, in the Women, Ovulation and Voting study I mentioned in #2, two of the research teams had quite the back and forth over whether or not certain findings had been replicated.  In an even more bizarre twist, when the fraudulent study from #5 was actually done, the findings actually stood (still waiting on subsequent studies!).  For psychology, the single biggest criticism of the replication project (which claims #4) is that it’s replications aren’t fair and thus it’s actually #3 or #2.

My point here is not necessarily that any one replication effort is obviously in one bucket or the other, but to point out that there are a range of possibilities available. As I said in the beginning, very few findings will end up going in a “totally right” or “total trash” bucket, at least at first. However, it’s important to realize any time you see a big exciting headline that subsequent research will almost certainly add or subtract something from the original story. Wheels of progress and all that.