The True Crime Replication Crisis: Intro

September 14, 2025November 24, 2025 / bs king / 2 Comments

I started stats and data blogging back in 2012. Those were heady days back then, as the scientific replication crisis (which called in to question the validity of many published scientific findings) was just being uncovered and would indeed would first be called a crisis in November of that year. It was a fun time to be a blogger who knew a thing or two about research, and while I was always a little niche blog with a small but excellent set of readers, I did get the occasional nod from bigger accounts for some of my work. Topics to comment on were plentiful, a good number of people were interested, and it was overall a good way to improve my scientific communication skills.

Over the next decade+, life got busier, my health got more difficult, and my blogging trailed off. A lot of people even in non-stats and research fields knew to question numbers, and blogging was replaced by shorter form social media. I was pretty content just hanging out on the sidelines. I didn’t much expect to revisit that, until a rather unexpected event got me thinking about data blogging and the replication crisis again: I found myself near the epicenter of a true crime shit storm.

If you’ve followed this blog long enough to have some familiarity with me personally and have any familiarity with true crime, you might be able to guess which case. I don’t plan on publicly naming it due to the extreme toxicity around it, but if you’re a regular feel free to shoot me a message and we can chat privately. For everyone else I will only reassure you that neither I nor anyone close to me was directly involved, but I was physically extremely close to the location of the crime and most of the major players, enough that it was extremely hard to ignore even if we’d wanted to. It’s a weird feeling to watch national media descend on mundane places you’ve been to hundreds of times, and to suddenly have people commenting on your town as though it’s their new favorite TV show. We couldn’t check in for appointments without people catching their breath when they saw the town name, and everyone wanted to know your opinion. It was WEIRD. I also got a first hand look at a genre of media I hadn’t spent much time with: true crime. It was rather eye opening, but I couldn’t shake the feeling that I had seen a lot of these issues before. I started looking around at other true crime cases to see how they were handled in the media, and I slowly put it together. This was the replication crisis all over again. Many of the same errors, many of the same issues, sucking a whole different group of people in with various logical fallacies, questionable motivations, and creative data twisting.

I couldn’t find anyone else drawing this comparison, so I decided I needed to blog about it. I want to write the guide I wish I’d had before I had to assess a true crime case from the ground up.

Ready? Ok, let’s go!

So how are we going to do this?

I’ve been trying to figure out how to lay out all the reasons for the replication crisis, and really the most comprehensive thing I’ve found is the Wikipedia page. I’ll be using this archive page from September 10th, just so no one rearranges the article on me halfway through this series. I’m mostly going to focus on the causes and how I think statistical issues actually apply more broadly to the way we evaluate all evidence even outside of traditional scientific study.

I will not, generally speaking, be commenting on court procedures or rules of evidence etc. There are many other people much better placed to do that than I. What I will be covering is how I’ve covered data here in the past: how should you as a media consumer evaluate a claim you hear? If you watch a true crime documentary or listen to a podcast, what should you look for? How should you think about the different claims? One of the reasons I’m not naming the specific case I got familiar with is because I think most of this should apply to every case you hear about.

But wait I’m not totally sure what the Replication Crisis is!

Oh, yeah. Sorry about that, I should have clarified earlier. The replication crisis, broadly speaking, was the slow realization that in many scientific fields published research couldn’t always be replicated. This a cornerstone of scientific research, and having a study not replicate is a bad sign your initial findings may not have been all that correct. For example, if I tell you that on average men are taller than women, it shouldn’t matter if you get a sample from Montana, Maine or Minnesota. If your sample size is large enough and random, you should find the same thing. The problem that started to occur is people would get very large and compelling findings that would disappear during subsequent studies. There were a lot of reasons for this, which we will go in to going forward but also feel free to search “replication crisis” on this blog for a lot of my prior writing on the topic. Here’s a sample.

Great thanks, but what do you mean by “true crime”?

True crime is not crime in general, but rather that genre of media that covers crime. This includes books, movies, podcasts, documentaries and other media that goes more in depth in to crimes, perpetrators or trials. The genre is actually pretty broad, while we typically think about murders or other sensational cases, it can also include fraud cases like John Carreyrou’s Bad Blood reporting on the Elizabeth Holmes/Theranos scandal. It can involve missing person cases or open cases, or it can revisit cases where we already have a conviction. The vast majority (about 75%) of fans are women, and it’s the third biggest genre of podcast on iTunes. It has extremely high market penetration, with about 85% of people saying they’ve consumed at least some true crime content. True crime is the number one podcast content choice for women and your average true crime podcast listener consumes more content than your average podcast listener in general. There’s also a heavy social aspect, true crime podcast listeners are far more likely to recommend their favorite podcasts to others. Overall it’s a several billion dollar market with individual podcasts making millions per year. My Favorite Murder literally calls their fans “the Fan Cult” and “Murderinos”.

I’ll start in next week with more historical and sociological causes, but I want to point out we’re already seeing some similarities. It was at exactly the moment scientific research started becoming more lucrative and in demand, and scientists started becoming superstars that we started seeing cracks form. That’s not a coincidence, but I will follow up in a future post.

To go straight to part 1, click here.

Index:

The True Crime Replication Crisis: Part 1

The True Crime Replication Crisis Part 2: Problems With the Publication System

The True Crime Replication Crisis Part 3: More Problems with the Publication System

The True Crime Replication Crisis Part 4: Questionable Research Practices

The True Crime Replication Crisis Part 5: Fraud

The True Crime Replication Crisis: Part 6 Statistical Errors

The True Crime Replication Crisis Part 7: Random Other Issues

The True Crime Replication Crisis Part 8: Consequences

I Got a Problem, Don’t Know What to do About It

April 12, 2017April 12, 2017 / bs king / 11 Comments

Help and feedback request! This past weekend I encountered an interesting situation where I discovered that a study I had used to help make a point in several posts over the years has come under some scrutiny (full story at the bottom of the post). I have often blogged about meta-science, but this whole incident got me thinking about meta-blogging, and what the responsibility of someone like me is when you find out a study you’ve leaned on may not be as good as you thought it was. I’ve been poking around the internet for a few days, and I really can’t find much guidance on this.

I decided to put together a couple quick poll questions to gauge people’s feelings on this. Given that I tend to have some incredibly savvy readers, I would also love to hear more lengthy opinions either in the comments or sent to me directly. The polls will stay open for a month, and I plan on doing a write up of the results. The goal of these poll questions is to assess a starting point for error correction, as I completely acknowledge the specifics of a situation may change people’s views. If you have strong feelings about what would make you take error correction more or less seriously, please leave it in the comments!

Why I’m asking (aka the full story)

This past weekend I encountered a rather interesting situation that I’m looking for some feedback on. I was writing my post for week 6 of the Calling BS read-along, and remembered an interesting study that found that people were more likely to find stories with “science pictures” or graphs credible than those that were just text. It’s a study I had talked about in one of my Intro to Internet Science posts and I have used it in presentations to back up my point that graphs are something you should watch closely. Since the topic of the post was data visualization and the study seemed relevant, I included it in the intro to my write up.

The post had only been up for a few hours when I got a message from someone tipping me off that the lab the study was from was under some scrutiny for some questionable data/research practices. They thought I might want to review the evidence and consider removing the reference to the study from my post. While the study I used doesn’t appear to be one of the ones being reviewed at the moment, I did find the allegations against the lab concerning. Since the post didn’t really change without the citation, I edited the post to remove the citation and replaced it with a note alerting people the paragraph had been modified. I put a full explanation at the bottom of the post that included the links to a summary of the issue and the research lab’s response.

I didn’t stop thinking about it though. There’s not much I could have done about using the study originally….I started citing it almost a full year before concerns were raised, and the “visuals influence perception” point seemed reasonable. I’ll admit I missed the story about the concerns with the research group, but even if I’d seen it I don’t know if I would have remembered that they were the ones who had done that study. Now that I know though, I’ve been mulling over what the best course of action is in situations like this. As someone who at least aspires to blog about truth and accuracy, I’ve always felt that I should watch my own blogging habits pretty carefully. I didn’t really question removing the reference, as I’ve always tried to update/modify things when people raise concerns. I also don’t modify posts after they’ve been published without noting that I’ve done so, other than fixing small typos. I feel good about what I did with that part.

What troubled me more was the question of “how far back to I go?” As I mentioned, I know I’ve cited that study previously. I know of at least one post where I used it, and there may be more. Given that my Intro to Internet Science series is occasionally assigned by high school teachers, I feel I have some obligation to go a little retro on this.

Current hypothesis (aka my gut reaction)

My gut reaction here is that I should probably start keeping an updates/corrections/times I was wrong page just to discuss these issues. While I think notations should be made in the posts themselves, some of them warrant their own discussion. If I’m going to blog about where others go wrong, having a dedicated place to discuss where I go wrong seems pretty fair. I also would likely put some links to my “from the archives” columns to have a repository for posts that have more updates versions. Not only would this give people somewhere easy to look for updates, give some transparency to my own process and weaknesses, but it would also probably give me a better overview of where I tend to get tripped up and help me improve. If I get really crazy I might even start doing root cause analysis investigations in to my own missteps. Thoughts on this or examples of others doing this would be appreciated.

5 Replication Possibilities to Keep in Mind

June 19, 2016 / bs king

One of the more basic principles of science is the idea of replication or reproducibility. In its simplest form, this concept is pretty obvious: if you really found the phenomena you say you found, then when I look where you looked I should be able to find it too. Most people who have ever taken a science class are at least theoretically familiar with this concept, and it makes a lot of sense…..if someone tells you the moon is green, and no one else backs them up, then we know the observation of the moon being green is actually a commentary on the one doing the looking as opposed to being a quality of the moon itself.

That being said, this is yet another concept that everyone seems to forget the minute they see a headline saying “NEW STUDY BACKS UP IDEA YOU ALREADY BELIEVED TO BE TRUE”. While every field faces different challenges and has different standards, scientific knowledge is not really a binary “we know it or we don’t” thing. Some studies are stronger than other studies, but in general, the more studies that find the same thing, the stronger we consider the evidence. While many fields have different nuances, I wanted to go over some of the possibilities we have when someone tries to go back and replicate someone else’s work.

Quick note: in general, replication is really applicable to currently observable/ongoing phenomena. The work of some fields can rely heavily on modeling future phenomena (see: climate science), and obviously future predictions cannot be replicated in the traditional sense. Additionally attempts to explain past behavior can often not be replicated (see: evolution) as they have already occurred.

Got it? Great….let’s talk in generalities! So what happens when someone tries to replicate a study?

The replication works. This is generally considered a very good thing. Either someone attempted to redo the study under similar circumstances and confirmed the original findings, or someone undertook an even stronger study design and still found the same findings. This is what you want to happen. Your case is now strong. This is not always 100% definitive, as different studies could replicate the same error over and over again (see the ego depletion studies that were replicated 83 times before they were called in to question) but in general, this is a good sign.
You get a partial replication. For most science, the general trajectory of discovery is one of refining ideas. In epidemiology for example, you start with population level correlations, and then try to work your way back to recommendations that can be useful on an individual level. This is normal. It also means that when you try to replicate certain findings, it’s totally normal to find that the original paper had some signal and some noise. For example, a few months ago I wrote about headline grabbing study that claimed that women’s political, social and religious preference varied with their monthly cycle. The original study grabbed headlines in 2012, and by the time I went back and looked at it in 2016 further studies had narrowed the conclusions substantially. Now the claims were down to “facial preferences for particular politicians may vary somewhat based on monthly cycles, but fundamental beliefs do not”. I would like to see the studies replicated using the 2016 candidates to see if any of the findings bear out, but even without this you can see that subsequent studies narrowed the findings quite a bit. This is not necessarily a bad thing, but actually a pretty normal process. Almost every initial splashy finding will undergo some refinement as it continues to be studied.
The study is disputed. Okay, this one can meander off the replication path here a bit. When I say “disputed” here, I’m referencing the phenomena that occurs when one studies findings are called in to question by another study that found something different, but they used two totally different methods to get there and now no one knows what’s right. Slate Star Codex has a great overview of this in Beware the Man of One Study, and a great example in Trouble Walking Down the Hallway. In the second post he covers two studies, one that shows a pro-male bias in hiring a lab manager and one that shows a pro-female bias in hiring a new college faculty member. Everyone used the study whose conclusions they liked better to bolster their case while calling the other study “discredited” or “flawed”. The SSC piece breaks it down nicely, but it’s actually really hard to tell what happened here and why these studies would be so different. To note: neither came to the conclusion that no one was biased. Maybe someone switched the data columns on one of them.
The study fails to replicate. As my kiddos preschool teacher would say “that’s sad news”. This is what happens when a study is performed under the same conditions and effect goes away. For a good example, check out the Power Pose/Wonder Woman study, where a larger sample size undid the original findings….though not before TED talk went viral or the book the research wrote about it got published. This isn’t necessarily bad either, thanks to p-value dependency we expect some of this, but in some fields it has become a bit of a crisis.
Fraud is discovered. Every possibility I mentioned above assumes good faith. However, some of the most bizarre scientific fraud situations get discovered because someone attempts to replicate a previous published study and can’t do it. Not replicate the findings mind you, but the experimental set up itself. Most methods sections are dense enough that any set up can sound plausible on paper, but it’s in practice that anomalies appear. For example, in the case of a study about how to change people’s views on gay marriage, a researcher realized the study set up was prohibitively expensive when he tried to replicate the original. While straight up scientific fraud is fairly rare, it does happen. In these cases, attempts at replication are some of our best allies at keeping everyone honest.

It’s important to note here that not all issues fall neatly in one of these categories. For example, in the Women, Ovulation and Voting study I mentioned in #2, two of the research teams had quite the back and forth over whether or not certain findings had been replicated. In an even more bizarre twist, when the fraudulent study from #5 was actually done, the findings actually stood (still waiting on subsequent studies!). For psychology, the single biggest criticism of the replication project (which claims #4) is that it’s replications aren’t fair and thus it’s actually #3 or #2.

My point here is not necessarily that any one replication effort is obviously in one bucket or the other, but to point out that there are a range of possibilities available. As I said in the beginning, very few findings will end up going in a “totally right” or “total trash” bucket, at least at first. However, it’s important to realize any time you see a big exciting headline that subsequent research will almost certainly add or subtract something from the original story. Wheels of progress and all that.

graph paper diaries

because some of us need a few more lines to keep everything straight

replication crisis

The True Crime Replication Crisis: Intro

So how are we going to do this?

But wait I’m not totally sure what the Replication Crisis is!

Great thanks, but what do you mean by “true crime”?

Index:

I Got a Problem, Don’t Know What to do About It

5 Replication Possibilities to Keep in Mind

So how are we going to do this?

But wait I’m not totally sure what the Replication Crisis is!

Great thanks, but what do you mean by “true crime”?

Index:

Share this:

Share this:

Share this: