The True Crime Replication Crisis: Part 6 Statistical Errors

Welcome back folks! This week we’re still talking about true crime, and I’m going to cover how some statistical errors and how they relate to cognitive errors we see being made when we discuss true crime stories. Before I get to that though, I want to touch on a point made in the comments last week. David brought up that a good example of a fraudulent case that gained traction was the Duke Lacrosse rape accusation, which was ultimately found to be a false accusation. Many people continued to cling to it long after the evidence turned because they believed it was “an important conversation”. This sounds silly, but in the phenomenal “Toxoplasma of Rage” essay by Scott Alexander over at Slate Star Codex, he points out the following:

The University of Virginia rape case profiled in Rolling Stone has fallen apart. In doing so, it joins a long and distinguished line of highly-publicized rape cases that have fallen apart. Studies sometimes claim that only 2 to 8 percent of rape allegations are false. Yet the rate for allegations that go ultra-viral in the media must be an order of magnitude higher than this. As the old saying goes, once is happenstance, twice is coincidence, three times is enemy action.

The enigma is complicated by the observation that it’s usually feminist activists who are most instrumental in taking these stories viral. It’s not some conspiracy of pro-rape journalists choosing the most dubious accusations in order to discredit public trust. It’s people specifically selecting these incidents as flagship cases for their campaign that rape victims need to be believed and trusted. So why are the most publicized cases so much more likely to be false than the almost-always-true average case?

Scott goes on to hypothesize why this is: basically we are attracted to controversial stories because they allow us to signal our beliefs about different topics. I tend to believe he’s on to something, but for purposes of this series I want to emphasize his point that cases that get talked about are often more likely to contain extreme deception than regular every day cases. We have no reason to believe this is limited to rape cases, and every reason to believe that stories that grab headlines are uniquely unreliable.

Alright, with that out of the way, let’s move on to some stats issues!

Low Statistical Power

One issue that has likely contributed to the replication crisis is that many studies lack statistical power, which basically means a study doesn’t have enough data to reliably detect real effects. This basically makes the findings unstable, so when you repeat the study, the result might not appear again. Adequate statistical power is dependent on a few things, including sample size and the size of effect you’re looking to detect. For example, if you want to understand height differences between adult men and women, you might need a decent group before you can accurately say if the difference is 3 inches or 5 inches. If you’re looking at the height differences between adults and 5 year olds however, you’re going to need a much smaller group to establish there’s a huge difference. The smaller the effect size, the more people you need to reliably see what’s happening.

So how does this apply to true crime? Well, as I pointed out in part 2, most popular crime stories are highly unusual. While they are often things we deeply fear, they are almost always things we have no experience with. Given this lack of data, we have almost no basis for deciding what’s normal/abnormal, and yet we do it anyway! It’s a running joke on social media that every time a new subject comes up, people immediately switch from being infectious disease experts to nuclear war experts to trade agreement experts, etc. True crime is an extension of that, with people who have never experienced any part of the justice system loudly opining about what should or shouldn’t have been done. In the rush to get press coverage, I also noticed a lot of experts who did have experience in related fields would often comment on cases without actually having read all the details. I also consider this a lack of statistical power: all the general knowledge in the world doesn’t help if you don’t actually know the specifics of the case you’re talking about.

Positive Effect Size Bias

Otherwise known as the decline effect, many studies experience the phenomena of initially finding a large effect size that keeps getting smaller with each subsequent study. A classic example is medications, which often appear to work extremely well when they’re first rolled out, only to be much less impressive when studied after a few years.

I have seen this in a lot of true crime cases, where initially you are told “oh hey, you have to look at this absolutely CRAZY case they cover in this documentary”. If you look at the other side though, you gradually discover most of the things that hooked your attention are a lot more nuanced than they appeared. In our local case, there was one article that sparked all the interest and several years later someone went back and fact checked it. They estimated about 75% of it was proven incorrect and often laughably inaccurate. Bizarrely, people who got interested in the case didn’t seem to care that the thing that hooked them was so unreliable, they had simply moved on to new claims. Regardless of what you think happened in some case, it’s good to note when claims don’t hold up and not simply move on to new claims.

Problems of Meta-Analysis

One guardian against the replication crisis was supposed to be meta-anlyses, which take a lot of studies on the same topic and analyzes them together. A few issues with this is that one bad study can “infect” the whole meta-analysis, so even lumping a whole bunch of studies together doesn’t help. If you get one 6’2″ basketball player in your female height sample, it’s going to take a while for that average to come back to normal. Another issue is that if the hypothesis is wrong, you are not going to get studies with a strong effect in the opposite direction to balance things out, you are going to get studies that cluster around zero. Again, this means it will take a LOT of studies to show the real effect size.

So how does this work in true crime? Well, I actually think meta-analyses are the worst thing that can happen to a true crime case. Our justice system is supposed to be based on individual facts, not on group dynamics. This gets argued a lot with racial profiling, but perhaps my favorite example is family criminality. Crime is highly heritable, and yet our justice system doesn’t let your family history in to court, and for good reason. The foundation of our justice system is that you are supposed to be judged as an individual based on evidence, not on “well this would make sense”. True crime on the other hand is rife with this type of commentary. The police are always like this, people in small towns are like this, white rich kids are like this, etc etc etc. I actually am not very against stereotypes as a first step, but stereotypes are not evidence. If the evidence starts to contradict your stereotype, you may want to consider that someone might have been attempting to evoke exactly that stereotype to get you to override your reason.

P-hacking

I covered p-hacking back in part 4, where we talked about the idea of looking through tons of data for “surprising” connections. In both research and true crime, the more data you take in, the more likely you are to find connections that may or may not be meaningful. I did want to emphasize one more part of this though, something I’ll call “narrative hacking”. If p-hacking is when you overinterpret random connections, the narrative hacking is selectively including or emphasizing details, interpretations, or coincidences until a desired emotional or moral conclusion ‘feels significant.’. As I said to someone when talking about my local case, “some of what they complain about is real, some of it is just normal stuff said in a scary voice”. Selective interpretation of events is a normal human trait, and trying to make mundane things sound significant is a key trait of anyone trying to hook you on a story. Suddenly “weirdly, he never left the house all day” is said just the same as “oddly, he only left the house once that day” and “bizarrely, he left the house multiple times that day”. It’s good to be alert for when a narrator is emphasizing details that really aren’t that interesting.

Statistical heterogeneity

Statistical heterogeneity means that different studies of “the same” effect actually vary in methods, samples, measures, or contexts. What this means is that when you try to replicate a study, you can run in to the issue of changing something that actually was important to the study. For example, you might find an effect in a study done on all men that disappears if you add women to the sample, or a study on college students that doesn’t replicate to senior citizens. Sometimes slight wording in questions can radically change answers, etc etc. This can actually be an important issue to note, because sometimes it can show a previously hidden factor was influencing the original results.

In true crime, similar inputs do not always yield similar outputs. Two missing child cases can have very different reactions from parents, not because one is lying and the other isn’t, but because there’s a huge range of possible reactions to a horrible situation. This is somewhat akin to what I said above about overgeneralizations. There’s a huge range of crimes, contexts, and individuals involved, and even in a perfect system that would produce a huge range of human behavior. Trying to “follow” unusual tragic cases may lead to false confidence in your conclusions.

Alright, I think that’s all I have for today, tune in next week for what I’m hoping might be my last post before the wrap up, depending on how long winded I get. It’ll be fun!

To go straight to part 7, click here.

One thought on “The True Crime Replication Crisis: Part 6 Statistical Errors

  1. For actual True Crime stories I restrict myself to Dragnet, Adam-12, The New Detectives, and a YouTube channel called Crime Zone (where the focus is on telling the story as a whodunnit, while never forgetting that there are real people in the story).

    Like

Leave a comment