There Weren’t Just 2 Scientific Advances that Made the Sexual Revolution Possible, There Were 4

There’s a Bret Weinstein speech going around on Twitter where he makes a comment about how birth control and abortion changed the game around sex, commonly known as the sexual revolution that occurred in the 1950s-1970s. I have not listened to his speech so I have no comment on what he was saying specifically, but in reading some of the comments I was interested that when people discuss “what changed” during the 1950s through the 1970s, they seem to focus on just abortion and birth control on repeat. Even the Wikipedia page for the sexual revolution only mentions these two. Those things absolutely changed behavior, but I think there’s two more things that need to be a bigger part of the discussion:

  1. Paternity testing
  2. Antibiotics

Paternity testing started out with blood testing in the 1920s, but hit it’s stride in the 1960s with HLA testing. Prior to that, you had to use social rules and general vibes to determine paternity. It largely relied on people’s own truthfulness. Prior to paternity testing, marriage was the most surefire way to ensure no one questioned whose kids were whose, but after we got a better method the number of kids born to single moms went from 5% to 40%. You can see that as good/bad/neutral, but that almost certainly doesn’t happen without the ability to identify a father accurately.

As for antibiotics Penicillin was discovered in 1928, but WWII sped up the perfection of antibiotics for treatment of bacterial infections, and widespread for the public use came in during the 1950s. From 1935 to 1968, 12 new classes of antibiotics were launched. Prior to this, basic STDs like syphilis were actually killing people at a rate similar to suicide today:

And that’s just deaths from syphilis, not cases. That figure comes from this analysis, which notes that prior treatment methods may have been as effective, but they were expensive and time consuming, and penicillin just made everything easier. Of course, syphilis is just one of the diseases people were dodging, chlamydia and gonorrhea also would have been issues. Antibiotics changed the game here.

I bring these up not to take any particular stance on any issue, but to point out that the past was very different in ways we don’t often think about. Even if somehow birth control and abortion were wiped off the face of the planet today, antibiotics and paternity testing would still ensure our population level practices around sex were different than they were 100 years ago. Sexual mores were never just about pregnancy, they were also about ensuring you could establish paternity and avoid STDs.

I think this is important for both cultural conservatives and cultural liberals to remember, as at times we can look at the past as either a golden era of morality or a deep pit of oppression. But in prior “moral” eras, a lot of sexual behavior was kept in check by people lying or threatening to lie about true things, and paternity testing stopped that. Conversely, things like religion may never have had quite the level of influence we attribute to them, they were often coping with very real issues around STD control in an era when the medical community couldn’t help much. When those things changed, behavior changed. It’s a good reminder that most social changes have several causes, and are not just related to one thing.

To note: the things I mention above are those I believe had a direct impact on sexual issues in the 1950s-1970s specifically. There’s a few other advances that probably changed sexual behavior in a slightly less direct fashion: cars (teenagers could go see each other more easily), at home pregnancy tests (earlier identification of pregnancy, no doctor needed), mass distribution of porn (TBD), dating apps (thank God I missed that era).

Anything else I missed?

Snip Happens: A Study in Hypothetical Hair Sabotage

Earlier this week, the Assistant Village Idiot tagged me in one of his link roundups:

Off With Her Hair Women tell attractive women to cut their hair. The study’s authors are all female.  I wonder what it is like for women studying female intrasexual competition. Is it harder to get along, or easier? Bethany, you need to get in on researching the women who research women.

I’ll admit I got a kick out of this, in part because I love a good gender study, and in part because I have REALLY long hair. I mostly wear it up, but it’s the kind of hair that people actually say “whoa, I had no idea it was that long” if I take it down. I call it homeschool hair. The last time I wore it down for an extended period of time, someone (who I knew) stopped me and asked if she could take a picture of it. I have no particular attachment to this style, but I actually don’t like haircuts, so here we are.

I hadn’t yet had a chance to dive in to the study, when a Tweet popped up on a similar topic:

It actually came to my attention because a few people immediately pointed out that these women were in a no win situation: if they’d told their coworker “she looked like shit” they would be considered catty, but if they tell her it looks good they are intrasexually competitive. Additionally, they were coworkers of hers, not friends, and it’s pretty weird to expect that all women at all moments must be aiding every other woman they know with her appearance. I suppose there’s an option where they could have tried to be pleasant but not endorse the haircut, but that’s a very hard tone to hit correctly and honestly? I’ve also seen plenty of male coworkers say things “looked great” when other males came in proud of some new thing they did/purchased/whatever. Why start conflict with a coworker for no reason?

All of this prompted me to deep dive in to this study, to see what they found. Ready? Let’s go!

Study Set Up

So the basic set up of the study is that 200ish (mostly college aged) women were recruited for a series of two studies. In both, they had a series of female faces cropped to the shoulders like this:

The women studied were supposed to suggest how many centimeters (they were Australian) they were supposed to cut off. They were given the picture of the woman, an assessment of the hair’s condition and then how much hair the woman was comfortable cutting off. Those last two were a binary: hair condition was either good/damaged and the requested length of cut was either as much as needed/as little as possible. After that they asked women to rank themselves on a few different scales, including one that measured intrasexual competitiveness.

What’s intrasexual competitiveness you might ask? Well, it’s apparently a 12 question measure that asks you stuff about how you feel about those of your gender who might be better than you on some level. The questions they mention are things like asking you to agree/disagree with statements like “I just don’t like very ambitious women” or “I tend to look for negative characteristics in attractive women”. Highly intrasexually competitive women are those who answer that they strongly agree with questions like that.

They hypothesized that women who scored high on this scale might be more aggressive with their recommendations to other women about how much hair the should cut off, under the idea that men like long hair and this would be sabotaging other women who might be competitors to them. And to be honest, this sounds like a pretty plausible hypothesis to me! These are women who just answered a bunch of questions reiterating that they really didn’t particularly like other women, I would imagine they’d actually end up being meaner to other women than people who disagreed with those statements. It reminded me of someone who recently pointed this out about introvert/extravert tests: they will ask a bunch of people if they like big groups of people, and then you call those who said “no” introverts, then we declare that we found introverts don’t really like parties. I mean, that makes sense! But it does at times seem like most of the sorting already took place before we even got to the study itself. But I digress, let’s keep going.

The Findings

Ok, so the first thing that caught my eye is that the primary finding of the study is that all women, regardless of scale ranking, first and foremost based their haircut recommendations on two things:

  1. The condition of the woman’s hair (those with damaged hair were told to get more cut off)
  2. The hypothetical client’s stated preference (it was followed).

So to be clear, it was found that even women who stated they didn’t much like other women primarily based their recommendations on what was best for the other woman and what they other woman wanted. And it wasn’t even close. Every other effect we are going to talk about was much smaller both in absolute value and in statistical significance. Here’s the graph:

To orient you, the top panel is the recommendations for healthy hair, the bottom is the recommendations for unhealthy hair. As you can see, in general the difference in recommendations based on that condition alone is quite large, around the 2cm (a bit under inch for us USA folks) range for all conditions. The second biggest impact was what women wanted, which made a difference of about 1-1.5cm in the recommendations. Then we get to everything else.

It’s important to note that despite how this topic often gets introduced, there was no significant effect found based on attractiveness in general. This is notable because like the Tweet above shows, this stuff is often portrayed in popular culture as something “women” do, and we don’t have much proof that it is! They did find an attractiveness effect for the women with healthy hair being judged by regular and highly competitive women, but it went the opposite way: it was actually unattractive women who got the recommendation to cut off more hair. And again, the difference was a fraction of the impact of the other two factors: somewhere between .1-.2cm. For those of us in the US, that’s less than 1/10 of an inch. A quick Google suggests that’s less than a weeks worth of hair growth, and certainly not enough for anyone to notice.

I think it’s good to hammer on this because if I told you someone was out to sabotage you, you might be worried. But if I told you someone was out to sabotage you but they’d first do what was best for you, then follow what you wanted, then would sabotage you so subtly it would be imperceptible to the naked eye…..well, you’d probably calm down substantially. Much like when we see studies like “eating eggs will double your risk of heart disease in your 50s (from .001 to .002 per thousand)”, we need to be careful when we are quoting results like this that find a near imperceptible difference that can be fixed with 5 days of regularly hair growth.

But back to the finding that attractive women didn’t actually get penalized and instead the slight increase in hair cut recommendation was aimed the other direction, the study authors conclude the following:

This suggests that appearance advice may act as a vector for intrasexual competition, and that such competition (in this scenario at least) tends to be projected downward to less attractive competitors.

I will admit that annoyed me a bit, because this means that ANY variation is now considered to prove the thesis. They stated this was ok because there was no active “mate threat”, so they would expect it to go this way, but I will point out if attractive women had been penalized it would have also been considered proof. Having just finished our series on the replication crisis, I will point out that explaining every finding as proving your original thesis is a big driver of non-replicated findings.

Moving on to the second study though, the study authors did a few really smart changes to their set up. First, they provided participants with a picture of a ruler and a credit card up front so they’d actually have a reminder of what different lengths meant. They also changed from using a text box for the answers to “how much hair would you recommend they cut off” to using a Likert scale type set up where you had to recommend a whole number 1-10 cm. I liked that these changes were there because it showed a good faith effort to improve the results. In this condition, they added faces that were considered “average” to the mix and repeated most of the same experiment.

The findings were similar. The biggest variations were based on hair damage and client wishes, with relatively small differences .1-.2cm appearing across different individual groups. The graph that got the headline though is this one:

This is the graph they used for the title of the study, and it comes from dropping the whole clients wishes/hair damage thing and just looking at the overall amount of hair these women suggested be removed for anyone. You will note again the variation across attractiveness levels is .1-.2cm, but indeed the “high” intrasexual competitiveness women recommend more than the other two groups. The highest recommendation is about .8cm higher than the lowest value. That’s about 1/3 of an inch. Not enough for you to visually notice, but still something.

What caught my eye though was that we only really saw variation with the high and low group, which got me wondering how many women were in each category. And that’s where I found something interesting. In the first study, they defined “high” and “low” intrasexual competitiveness as being 1 SD from the mean. Assuming a normal distribution, that would mean about 16% of the sample were in the high/low groups, and the remaining 68% were in the average group. For this study though, they changed it to 1.5 SD, which means a little less than 7% of the group are in the high/low groups. Given the sample size of around 250, we’re looking at about 17 people in both the high and low group (34 people total) and 216 or so in the average group. By itself that will lead to higher variation in the groups with smaller sample sizes. You will note there is very little variation in what the group with most of the participants answered.

My thoughts

So like I said at the beginning, I find this study’s conclusion fairly plausible. The idea that women who specifically state they don’t like other women will give other women worse advice just kind of makes sense. But a few thoughts:

  1. The main findings weren’t mentioned. The title and gist of this study was presented as “intrasexually competitive women advise other women to cut more hair off”, but it could just as easily have been “intrasexually competitive women primarily take other women’s best interest and preferences in to account” and it would be just as (if not more) accurate. The extra hair cut is presented as a primary driver of haircut recommendations, but really it’s in a distant third to the other two. This is fine for academic research, but if you’re trying to talk about how this applies to real life, it’s probably good to note that women actually gave quite reasonable advice, with slight variation around the edges.
  2. The absolute value was never discussed. I was curious if the authors would bring up the small absolute findings as part of their discussion, and alas, they did not. The AVI let me know he found the link in Rob Henderson’s post here, and I was amused to find this line one paragraph before his discussion of this study: This is why reproductive suppression is primarily a female phenomenon. Of course, there have been cases of male suppression (e.g., eunuchs). Or men raiding a village and simply slaughtering all of the males and abducting the women as wives and concubines. But suppression among women is subtler. If by subtler you mean 2mm of extra hair, then yes. If I had to pick between that and murder and castration, I admit I’m feeling women got the better end of the deal here. If you would keep eating eggs (or whatever other food) that was associated with a tiny increase in cancer, then you probably can’t take this hair cutting study as a good sign of intrasexual competition. How are women sabotaging other women if they are doing so at a level most men wouldn’t notice? I suspect there’s an assumption this effect is magnified in real life, but again, this study doesn’t prove that.
  3. Motives are assumed. Much like in the critiques of the Tweet above, I noticed that through the paper the authors explained why targeting attractive women, average women and unattractive women would all be intrasexual competition. What I did not see was any attempt to consider non-intrasexual competition reasons. Maybe people suggest unattractive people cut more hair off because they think they should try a different look? Maybe scoring high on a intrasexual competition survey is an indication of aggressiveness, and aggressiveness correlates to more aggressive hair cutting? Unclear, but I will note the idea that all variances could only be explained by intrasexual competition surprised me, particularly when we’re discussing effects that are likely too subtle to be spotted by the opposite sex.
  4. We don’t know this is a female only phenomena. Despite Rob Henderson’s claim above, you will be unsurprised to hear no one (that I could find) has ever done this study on men. I actually would have been interested to see that study, even if it was men making suggestions for female hair. One reason I’d like to see this is because I heavily suspect men would be somewhat more erratic in their rankings, which would actually increase the risk of spurious findings. Frankly, that would amuse me to watch people have to explain why their statistically significant findings were still meaningful, or to have to admit sometimes that just happens and it doesn’t mean anything at all. But still, we’re told constantly that “subtle” sabotage is a woman thing, but I actually couldn’t find any studies suggesting people were looking at this. Might be interesting.

Ok, well that’s all I have! Thanks for reading, and I’m going to go consider cutting my hair an amount no one will notice, just for fun.

The True Crime Replication Crisis Part 8: Consequences

Well we’ve reached the end of the road here folks, and it’s time to wrap things up with some conclusions and consequences. As I mentioned in the first post, I’ve been loosely following the Wikipedia entry on the replication crisis, and I’d like to point out the first paragraph of it’s consequences section (bolding mine):

When effects are wrongly stated as relevant in the literature, failure to detect this by replication will lead to the canonization of such false facts.[195]

A 2021 study found that papers in leading general interest, psychology and economics journals with findings that could not be replicated tend to be cited more over time than reproducible research papers, likely because these results are surprising or interesting. The trend is not affected by publication of failed reproductions, after which only 12% of papers that cite the original research will mention the failed replication.[196][197] Further, experts are able to predict which studies will be replicable, leading the authors of the 2021 study, Marta Serra-Garcia and Uri Gneezy, to conclude that experts apply lower standards to interesting results when deciding whether to publish them.[197]

So overall we find that in science, with highly educated PhDs with professional reputations and institutional affiliations built on truth we find that:

  • False facts end up being canonized
  • Less reliable studies get more attention
  • Even when findings are formally challenged, they will continue to be repeated as true with almost no one mentioning they were called in to question
  • Standards are lower for anything surprising or interesting

Do we really believe that Youtubers and TikTokers are actually more reliable than this, while they compete for nothing but attention? I hate to beat a dead horse, but papers can get retracted, colleges can investigate you, and you can sink a career in academia. Maybe not often, but the odds are certainly better than even a mainstream journalist actually losing a defamation case. Science is set up to self police, maybe not as well as it should be, but there are mechanisms. True crime documentaries and podcasts are set up to entertain, and there are no mechanisms to self correct outside of a person getting aggravated enough to file a lawsuit against you. So it is very likely that:

  • Some portion of what you believe you know about popular cases is flat out false
  • The most popular cases will have more incorrect facts floating around than the “boring” cases
  • Even when things are proven to be incorrect, they will not stop circulating as fact
  • Standards are lower for anything surprising or interesting

So what do we do?

Well, it’s actually not straightforward. Because of the apparatus around science, it’s been straightforward to propose changes. Change hasn’t always come fast, but it has been progressing. True crime has no such oversight, so any change will be a challenge. However, I think the things I used to bring up in my Intro to Internet Science Course still all apply here. I broke down the things to watch for in to 4 categories: Presentation: How They Reel You In, Pictures: Trying to Distract You, Proof: Using Number to Deceive, and People: Our Own Worst Enemy. I think those still all apply here, with just a few tweaks.

  1. Presentation: How They Reel You In A high production value documentary is not the same as an honest documentary, and a lengthy series on a topic does not mean people didn’t leave anything out. Be skeptical of things, no matter how glossy or voluminous.
  2. Pictures: Trying to Distract You In the stats and data world, graphs are often used to catch people’s eye and give them the immediate visual impression something is happening before they’ve had a chance to read anything. In true crime, this is often what the victims or the perpetrator look like, immediately playing on tropes of who we think commits crimes or which victims get our sympathy. Be skeptical of anything that focuses on the good looking, wealthy or college educated to the exclusion of others. Additionally, watch any attempt to immediately invoke another case or movie in the current case, which will prime you to skip actual facts in favor of an “I know this type of person, they do X”. When our local case hit national media, one of the first things one of the main people did was to start citing a popular movie filmed in the area almost 20 year ago, based somewhat on events that had occurred 20-30 years prior to that. The attempt to evoke specific imagery was clear.
  3. Proof: Using Number to Deceive While numbers aren’t always at play in the true crime world, evidence certainly gets kicked around pretty often. But just like numbers, out of context evidence is often worse than useless and extremely misleading.
  4. People: Our Own Worst Enemy We bring our biases to every case, and some narratives will be more palatable to us than others. Be careful with people who bring cases in to make a “bigger point” or anything that seems a little too outrageous or focuses on extremely unusual types of crime. It’s also good to look back on early reporting and see if what got you in to the case held up, and to actually take it in to account if it didn’t.

To all of this, I’d add two more points. The first is that a surprising number of people tell me that true crime is fine playing fast and loose with the facts as long as it challenges the police, because there the state has more power. This is of course how our whole justice system is set up, but I think it falls rather flat. In science we are taught that there are both type 1 errors (false positives) and type 2 errors (false negatives) and that both carry consequences. This is also true in the criminal justice system. Blackstone’s principal says that it’s better that ten guilty men go free than one innocent man hang, and that is what we build our system around. But this doesn’t mean there’s no consequences to a guilty person going free. The obvious first issue is that they offend again, and that we will then also be upset that nobody stopped them. But this is a natural consequence of “it’s never bad to let the accused go”, and we can’t have it both ways. A recent Twitter thread highlighted this from a victims perspective, as she recounted both the emotional toll of testifying against a stranger who assaulted her and then watching him get let go repeatedly just to watch him continue to assault other women. The other issue of course is that if you have a justice system that never finds anyone guilty, people take things in to their own hands. It’s commonly noted that the mafia initially gained power with immigrant Italian communities because the police wouldn’t investigate crimes against them, and the same is true of newer gangs. Likewise, the Old Testament is riddled with references to the sin of denying justice. Even if you’re not religious, it’s good to flag that unpaid for crimes have been considered a socially destabilizing force for thousands of years. Playing fast and loose with the truth about government actions is not a victimless crime just because they have power, as people typically find when their particular group falls out of favor in the court of public opinion.

And finally, I want to give a mini rant about why this topic bothers me so much. Watching a case up close and personal like this, I was stunned and appalled how many people seemed to completely miss that this case was for many people, one of the darkest moments of their life even before the internet was involved. Watching people turn that in to their own personal whodunit/reality TV show was horrifying. People talked about the various people like they were merely characters in a movie, like you could say horrifying things about them with no consequences. I didn’t know these people but I do see many of them frequently, and the pain on their faces was visible. None of this was fun. None of this was asked for. We’re in a time when we have blockbuster documentaries about how exploitive reality TV show was, so it’s bizarre to me so many people are excited to tune in to stories about people who never volunteered for this. While errors in scientific publishing can erroneously impact how we view the world, errors in true crime reporting can irreparably ruin lives. The first one may sound worse, unless you’re the target of the second. Power posing failing to replicate hurt a few self help gurus talks, thousands of people falsely accusing someone of murder is something you probably never recover from. Consume media that reminds you that everyone involved, whether accused or victim, is a human.

Thanks for reading folks.

The True Crime Replication Crisis Part 7: Random Other Issues

Ok folks, so we’re nearing the end of our Wikipedia list of issues, so I’m at the point where I don’t know what to call this one. We have a bunch of random issues I’ll run through in order. Ready? Let’s go!

Context Sensitivity

In scientific study, context sensitivity refers to the idea that the same study performed under two different sets of circumstances might yield different results in ways people didn’t expect. This seems somewhat obvious when you say it directly, but often isn’t actually on people’s minds when they are reading a study. I have actually covered this a LOT on my blog over the years, as often people will make huge claims about how men or women (in general) view marriage, and you’ll find out the whole study was done on a group of 18 year old psychology students who are almost certainly not married or getting married any time soon. Zooming out, there’s a big criticism that most psychological research is done on “WEIRD” people, meaning Western, Educated, Industrialized, Rich and Democratic. What we consider settled science around human behavior may not be so settled if you include people from wildly different countries and contexts.

So how does this apply to true crime? Well, just like when I look up a paper the first thing I do is go to the methods section to understand the context in which the data was collected, I think the most important thing in a true crime story is to understand the big picture of where and how things happened. As I mentioned previously, true crime cases are often really unusual cases, so it’s important to flag any abnormalities will be heightened substantially. A few questions: how much crime is in the area in general? Were there any unusual events challenging people’s behavior? True crime often goes over this stuff, but I’ve noticed some cases breeze through contextualizing things or not acknowledging that unusual circumstances might change people’s behavior.

The other odd context thing is that a lot of people seem to think that because a case became well known later, the initial investigators should have been thinking from the get go how things would look on Dateline. Unfortunately most investigators/witnesses/defendants don’t have the luxury of knowing in the first 24 hours that people will be reviewing their actions for decades to come. If the case is OJ Simpson? Well yes, you should be prepared for that. If the case is Jon Benet Ramsey? You should give them some grace for not predicting the firestorm. Context matters.

Bayesian Explanation

This is similar to some of the statistical concerns I mentioned last week, but basically if you have a “surprising” result and a low powered study, Bayes theorem suggests you will have a high failure to replicate rate. Bayesian statistics can be powerful to help think through this, because they force you to consider how likely you thought something was before you ran your study, which can help you put your subsequent results in context.

So what’s the true crime equivalent? Well, I think it’s actually a good reminder to put all the evidence in context. Here’s an example: imagine a police department (or podcaster) believes a suspect is guilty mainly because they failed a polygraph. The polygraph has a low ability to detect real guilt (low power) and many innocent people fail it (high false-positive rate), and the prior likelihood that this particular person committed the crime is low. Even though the polygraph result says “guilty,” it does not mean there is a 95% chance they did it. Just like a weak psychological study, a “positive” polygraph doesn’t reliably tell you whether the hypothesis is true or whether the result will replicate.

This can be reapplied to all sorts of evidence, and should be, particularly when you have one piece of evidence that flies in the face of the rest of them. We even have a legal standard for this: circumstantial evidence, which can only be let in under certain circumstances. However in true crime reporting, a lot of circumstantial evidence is treated as extremely weighty, regardless of how discordant it is with everything else. You have to be honest about the prior probability or all your subsequent calculations are going to be skewed.

The Problem With Null Hypothesis Testing

This is a somewhat interesting theory, based on the idea that null hypothesis testing may not be appropriate for every field. For example, if you are testing whether or not a new drug helps cure cancer, you want to know if it has an effect or not. Pretty simple. But with a field like social psychology, human behavior may be too nuanced to have a true yes or no question. Running statistical tests that suggest there is a clear yes/no might end up with unreliable results because the whole set up was inappropriate for the question asked.

In true crime, this reminds me of people using legal standards as though they are moral standards or everyday standards we might use. For example, a person accused of rape may not be convicted under a reasonable doubt standard, but that doesn’t mean that you’d be ok with them dating your daughter/sister/friend. In murder cases, even when the police get things wrong they often had a good reason to start believing people were guilty. Drug or alcohol use can make people looks suspicious, lying up front to the police can make you look suspicious, prior similar convictions can make you look suspicious etc etc. I’ve seen a strong tendency for people to decide that whoever they favor is blameless (null hypothesis = absolutely nothing wrong), but as we covered last week a lot of people mixed up with legal trouble have something working against them.

Base Rate Fallacy

I’ve written about the base rate fallacy before, and it can be a tricky thing to overcome. In short, the base rate fallacy happens when something is extremely uncommon and you use an imperfect method to try to find it. For example, if you use an HIV test to test a thousand random people in the US for HIV, we know that 3-4 might have it. If you are using a test that is 99% accurate but has a 1% false positive rate, that actually means more people (10) will get a false positive result than a true positive result. When the frequency of something is low, false positives become a much bigger problem. In publishing, the theory is that previously unnoticed phenomena are getting rare, so surprising findings are increasingly likely to be false positives.

So how does this apply to true crime? Well, it’s a little hard to make a clear comparison, because so many crimes have unusual things happening by default. To take OJ Simpson as an example, it’s unusual for a celebrity of his stature to be accused of a crime. However, it’s also pretty unusual for a celebrity’s ex wife to end up dead like his did. Our base rate doesn’t totally work because we actually know something weird has happened. This is where we have to get back to judging people by evidence, not statistics.

However, in the broader scheme of true crime content, I think it’s good to note that the demand for new cases is currently exceeding the supply. As we’ve continued to cover, people want attractive articulate defendants with “interesting” cases, and we just don’t have that many of them. This creates a vacuum where people are very incentivized to make their cases “interesting” enough for true crime podcasters to pick up on. This is challenging because overall the murder rate in the US is down substantially from the 80s and 90s, so we have fewer current cases to draw from.

Alright, that’s all I have for this week. I’ll be looking to wrap up next week with a few lessons learned and thoughts. Thanks all!

To go to part 8, click here.

The True Crime Replication Crisis: Part 6 Statistical Errors

Welcome back folks! This week we’re still talking about true crime, and I’m going to cover how some statistical errors and how they relate to cognitive errors we see being made when we discuss true crime stories. Before I get to that though, I want to touch on a point made in the comments last week. David brought up that a good example of a fraudulent case that gained traction was the Duke Lacrosse rape accusation, which was ultimately found to be a false accusation. Many people continued to cling to it long after the evidence turned because they believed it was “an important conversation”. This sounds silly, but in the phenomenal “Toxoplasma of Rage” essay by Scott Alexander over at Slate Star Codex, he points out the following:

The University of Virginia rape case profiled in Rolling Stone has fallen apart. In doing so, it joins a long and distinguished line of highly-publicized rape cases that have fallen apart. Studies sometimes claim that only 2 to 8 percent of rape allegations are false. Yet the rate for allegations that go ultra-viral in the media must be an order of magnitude higher than this. As the old saying goes, once is happenstance, twice is coincidence, three times is enemy action.

The enigma is complicated by the observation that it’s usually feminist activists who are most instrumental in taking these stories viral. It’s not some conspiracy of pro-rape journalists choosing the most dubious accusations in order to discredit public trust. It’s people specifically selecting these incidents as flagship cases for their campaign that rape victims need to be believed and trusted. So why are the most publicized cases so much more likely to be false than the almost-always-true average case?

Scott goes on to hypothesize why this is: basically we are attracted to controversial stories because they allow us to signal our beliefs about different topics. I tend to believe he’s on to something, but for purposes of this series I want to emphasize his point that cases that get talked about are often more likely to contain extreme deception than regular every day cases. We have no reason to believe this is limited to rape cases, and every reason to believe that stories that grab headlines are uniquely unreliable.

Alright, with that out of the way, let’s move on to some stats issues!

Low Statistical Power

One issue that has likely contributed to the replication crisis is that many studies lack statistical power, which basically means a study doesn’t have enough data to reliably detect real effects. This basically makes the findings unstable, so when you repeat the study, the result might not appear again. Adequate statistical power is dependent on a few things, including sample size and the size of effect you’re looking to detect. For example, if you want to understand height differences between adult men and women, you might need a decent group before you can accurately say if the difference is 3 inches or 5 inches. If you’re looking at the height differences between adults and 5 year olds however, you’re going to need a much smaller group to establish there’s a huge difference. The smaller the effect size, the more people you need to reliably see what’s happening.

So how does this apply to true crime? Well, as I pointed out in part 2, most popular crime stories are highly unusual. While they are often things we deeply fear, they are almost always things we have no experience with. Given this lack of data, we have almost no basis for deciding what’s normal/abnormal, and yet we do it anyway! It’s a running joke on social media that every time a new subject comes up, people immediately switch from being infectious disease experts to nuclear war experts to trade agreement experts, etc. True crime is an extension of that, with people who have never experienced any part of the justice system loudly opining about what should or shouldn’t have been done. In the rush to get press coverage, I also noticed a lot of experts who did have experience in related fields would often comment on cases without actually having read all the details. I also consider this a lack of statistical power: all the general knowledge in the world doesn’t help if you don’t actually know the specifics of the case you’re talking about.

Positive Effect Size Bias

Otherwise known as the decline effect, many studies experience the phenomena of initially finding a large effect size that keeps getting smaller with each subsequent study. A classic example is medications, which often appear to work extremely well when they’re first rolled out, only to be much less impressive when studied after a few years.

I have seen this in a lot of true crime cases, where initially you are told “oh hey, you have to look at this absolutely CRAZY case they cover in this documentary”. If you look at the other side though, you gradually discover most of the things that hooked your attention are a lot more nuanced than they appeared. In our local case, there was one article that sparked all the interest and several years later someone went back and fact checked it. They estimated about 75% of it was proven incorrect and often laughably inaccurate. Bizarrely, people who got interested in the case didn’t seem to care that the thing that hooked them was so unreliable, they had simply moved on to new claims. Regardless of what you think happened in some case, it’s good to note when claims don’t hold up and not simply move on to new claims.

Problems of Meta-Analysis

One guardian against the replication crisis was supposed to be meta-anlyses, which take a lot of studies on the same topic and analyzes them together. A few issues with this is that one bad study can “infect” the whole meta-analysis, so even lumping a whole bunch of studies together doesn’t help. If you get one 6’2″ basketball player in your female height sample, it’s going to take a while for that average to come back to normal. Another issue is that if the hypothesis is wrong, you are not going to get studies with a strong effect in the opposite direction to balance things out, you are going to get studies that cluster around zero. Again, this means it will take a LOT of studies to show the real effect size.

So how does this work in true crime? Well, I actually think meta-analyses are the worst thing that can happen to a true crime case. Our justice system is supposed to be based on individual facts, not on group dynamics. This gets argued a lot with racial profiling, but perhaps my favorite example is family criminality. Crime is highly heritable, and yet our justice system doesn’t let your family history in to court, and for good reason. The foundation of our justice system is that you are supposed to be judged as an individual based on evidence, not on “well this would make sense”. True crime on the other hand is rife with this type of commentary. The police are always like this, people in small towns are like this, white rich kids are like this, etc etc etc. I actually am not very against stereotypes as a first step, but stereotypes are not evidence. If the evidence starts to contradict your stereotype, you may want to consider that someone might have been attempting to evoke exactly that stereotype to get you to override your reason.

P-hacking

I covered p-hacking back in part 4, where we talked about the idea of looking through tons of data for “surprising” connections. In both research and true crime, the more data you take in, the more likely you are to find connections that may or may not be meaningful. I did want to emphasize one more part of this though, something I’ll call “narrative hacking”. If p-hacking is when you overinterpret random connections, the narrative hacking is selectively including or emphasizing details, interpretations, or coincidences until a desired emotional or moral conclusion ‘feels significant.’. As I said to someone when talking about my local case, “some of what they complain about is real, some of it is just normal stuff said in a scary voice”. Selective interpretation of events is a normal human trait, and trying to make mundane things sound significant is a key trait of anyone trying to hook you on a story. Suddenly “weirdly, he never left the house all day” is said just the same as “oddly, he only left the house once that day” and “bizarrely, he left the house multiple times that day”. It’s good to be alert for when a narrator is emphasizing details that really aren’t that interesting.

Statistical heterogeneity

Statistical heterogeneity means that different studies of “the same” effect actually vary in methods, samples, measures, or contexts. What this means is that when you try to replicate a study, you can run in to the issue of changing something that actually was important to the study. For example, you might find an effect in a study done on all men that disappears if you add women to the sample, or a study on college students that doesn’t replicate to senior citizens. Sometimes slight wording in questions can radically change answers, etc etc. This can actually be an important issue to note, because sometimes it can show a previously hidden factor was influencing the original results.

In true crime, similar inputs do not always yield similar outputs. Two missing child cases can have very different reactions from parents, not because one is lying and the other isn’t, but because there’s a huge range of possible reactions to a horrible situation. This is somewhat akin to what I said above about overgeneralizations. There’s a huge range of crimes, contexts, and individuals involved, and even in a perfect system that would produce a huge range of human behavior. Trying to “follow” unusual tragic cases may lead to false confidence in your conclusions.

Alright, I think that’s all I have for today, tune in next week for what I’m hoping might be my last post before the wrap up, depending on how long winded I get. It’ll be fun!

To go straight to part 7, click here.

The True Crime Replication Crisis Part 5: Fraud

This week I have to say, we are getting to one of my favorite topics: straight up fraud. Prior to this we have covered a lot of things that can skew the thinking of otherwise good people despite their best efforts, which is the vast majority of issues we run in to, but today we’re going to cover those who intentionally deceived others. Even in the context of the replication crisis, straight up fraud cases make up a very small percentage of the concerns about research findings, but they are still worth focusing on as a potential trouble source.

Before we get started though, I want to mention a somewhat weird thing I’ve noticed over the past few years. I’ve noted that very often when it comes to research, people are often very quick to call human error and/or bias fraud, and then often too slow to call actual fraud, fraud. I have wondered why this is, and my suspicion is that it’s because well intentioned humans who make errors are often very ashamed and may not defend themselves as vigorously, whereas straight up fraudsters are extremely prepared to be challenged and are prepared to be aghast you would ever suspect them of anything. Thus the well intentioned error people seem a lot more “guilty looking” than the fight to death fraudsters.

So with that in mind, let’s talk about what fraud is and isn’t. Fraud is not making a mistake, even if it means you have to retract your study. Admitting you got it wrong and owning up to it is exactly what we want researchers to do. Fraud is also not publishing a faulty study you didn’t question rigorously enough because it matched your pre-existing beliefs, at this point most of us have accidentally share a link to a story that turned out to be false because it just “sounded true”. Fraud isn’t even necessarily only publishing certain outcomes in a study and failing to publish others. Many of these things can teeter towards fraud depending on the circumstances, but most people in their day to day lives will occasionally jump to conclusions or tell stories in ways that benefit them. It’s not a great human weakness, but it’s one we see often. So if those things aren’t fraud, what is? Well in the research world one of the main examples is data falsification. From making up numbers to pretending to have done experiments that never happened, this is an unfortunate reality of some research and it’s only through replication efforts that this can be uncovered.

The wildest example in the research world is actually fairly recent, the sordid tale of Francesca Gino. Gino was a Harvard Business School professor who, amazingly, specialized in “honesty and ethical behavior” research. Back in 2020, a graduate student raised concerns about one of her papers, and then tried to replicate it in 2021. She became suspicious that not only did the study fail to replicate, but the whole set of results seemed wildly implausible. She got some data bloggers involved, and things spiraled from there. To condense a very long story, Gino was eventually put on leave and ultimately her tenure was revoked and she was fired.

What’s interesting, given my second paragraph, is that this all came to the attention of most people because Gino sued both Harvard and the Data Colada bloggers for $25 million saying they were all defaming her. It was actually her own lawsuit that caused Harvard’s internal investigation of her to be released, which made her look incredibly bad. She has alternated between claiming she was the victim of sexism, that it’s all a big mistake and that she was framed. Her coauthors on the other hand, started a website to investigate all of the papers they’d worked with her on to make sure they knew which findings were reliable and which weren’t. While I will note that Gino has defenders still, it’s an interesting story of defensiveness in the face of accusations.

So how does this relate to true crime?

Well, I’d imagine much of the connection would be obvious, but I’d like to point out that in true crime we actually know pretty much from the get go that someone is actually straight up lying. In scientific research, fraud is always a possibility, but probably not more so than in regular human endeavors. It reminds me of the old stats 101 type problem, where you calculate things like “given that the child is a boy, what are the chances his name is John” vs “given that the child’s name is John, what are the chances that he is a boy” and they highlight those are wildly different answers. Here it’s the difference between “given that a scientist published a paper, what are the chances there is fabricated data?” and “given that a bunch of suspects have given different incompatible stories so someone is lying, what are the chances person X is lying”. Why do I point this out? Because as I mentioned at the beginning of this series, for some reason the average person I talk to is more open to hearing that a research study they heard about is wrong than they are to hearing the new true crime podcast they’re listening to is. This makes no sense because crime stories are almost by definition full of liars. One of the first types of lies little kids tell is lies to get themselves out of trouble. If you have even a passing familiarity with the Biblical story of the Garden of Eden, you’ll know that it’s alleged that the first crime humanity itself committed was to attempt to shift the blame for eating the apple. Lying about this stuff is as innate to human nature as it gets. So again, why are we so resistant to being skeptical about these stories when someone puts them on a podcast?

I think there’s a few things skewing our thinking with these. The first I think is that crimes tend to involve a lot of human error from the get go. Witnesses often don’t have the best memories of times/dates/sequences of events, so any attempt to call someone a liar has to be tempered with the frailty of human memory. Additionally, in many crimes, victims are purposely selected because they have pre-existing credibility issues, making things even harder to sort through. In the documentary about the fraudulent results from the Massachusetts state crime lab, a defense attorney notes that the number one risk factor for being falsely accused of a crime is already having a criminal record. Two fraudsters in two different labs got away with filing false drug test results for years in large part because the results mostly impacted known drug dealers.

Interestingly, this applies to any group who comes under fire. I don’t think it’s coincidental that the true crime genre exploded in popularity around the same time as George Floyd/Black Lives Matter gained steam, as “police framed/justice system railroaded innocent person” is perhaps the most popular true crime storyline. Just like having a criminal history does not make you automatically guilty of a crime, police having issues also does not negate the fact that nearly every defendant claims to be framed. There’s actually been some interesting discussion of this in defense attorney circles, with some attorneys arguing that all media that draws attention to the flaws of our justice system is useful, and some maintaining that this type of infotainment does more harm than good. Scott Greenfield, my personal favorite defense attorney/blogger, falls in the latter camp. For this post I went looking for his thoughts on True Crime and was interested that in the years since Serial debuted, he’s gotten even harsher than his initial skepticism. I’d recommend the whole thing, but I love his first three paragraphs that he wrote back in 2023 (bolding mine):

After the podcast Serial became a hit, the phone started ringing. The calls were from journalists, producers, wannabe podcasters, asking whether I had any cases involving a clearly innocent defendant who was abused by the system and ended up convicted and serving a lengthy sentence. Well, of course I did. We all do. But as it turned out, that really wasn’t the story they were interested in.

What they really wanted was a sympathetic defendant, the sort of innocent person people could love, and a simple, clear story of misconduct and abuse that ended with imprisonment. This was where I made the mistake. I had no stories like that, as few defendants were up for beatification before being charged with murder, and while there were arguments for the defense, and complex, messy problems along the way, it wasn’t as if the prosecution didn’t have a case to show they committed the murder.

The sort of post hoc contentions, like witnesses who recanted after they had nothing on the line or jailhouse snitches who say their cellies confessed to them, that true crime producers adored and thought critically valuable were the sort of things judges laughed off, as did I. People lie, all the time, for all sorts of reasons. Why is a post-trial recantation more credible than sworn trial testimony? Defendants bought witness silence or post-trial recantations on occasion. They often claimed innocence all along, even though they were guilty as sin. That’s the nature of criminal defense.

This is a man who makes his living in criminal defense pointing out the rather obvious fact that very few people get to trial without some pretty good evidence that they did it, and that people are constantly lying. If someone claims they lied under oath but now are telling the truth on a podcast, you may want to mark that person down in credibility. So what do we do here? Well, I think we have to approach these things with a huge eye towards fraud, both for the defendant and the podcaster. A few thoughts:

  1. Compare the story being told to different sources/established facts: I’ve said it before but I’ll say it again, before you start any documentary or podcast, look for a summary of the facts so you can tell if something’s being left out. Remember that every single person involved, from the defendant to the witnesses to the podcaster, is highly motivated to make themselves look as good as possible. It’s also good to note that wanting your story to be public in and of itself is not a sign of honesty, see my prior comments about Francesca Gino being the one to get her own damning internal investigation released. Some people truly believe facts make them look better than they do.
  2. Beware of emotional investment, your own or others: Over a 10 podcast series, you can feel you get to know the host/the subject/whoever, which can lead you to overattribute credibility to them and become less skeptical as time goes by. By the time you finish it can feel mentally awkward to consider someone you’ve come to like a liar. This goes double for podcaster by the way, especially if they got exclusive access to some of the players in the case. I have a rule of thumb that when someone covers a controversial case and interviews someone extensively, then starts hemming and hawing about their opinion while saying “I guess we’ll never know”, they think the person’s guilty. With my local case, we had at least one documentary film maker admit that’s exactly what they did. The burden of highlighting someone’s case just to condemn them is too much for some people.
  3. Beware of applying big picture thinking to individual cases: We live in a world where people get raped. This does not mean every individual rape accusation is true. We live in a world where people falsely accuse others of rape. This does not mean every single claim something is a false accusation is true. Unfortunately, there’s an odd thing that happens with true crime I’ll call “true crime as though experiment” where people use a true crime case as a stand in for a bigger issue. This can work in the research world, where research that suggests something similar to prior findings actually can be considered more credible than novel research. But in a crime case? The facts of every single crime still matter on their own. Once a case gets big enough though, a surprising number of people will claim the exact details don’t matter because we need a “bigger conversation”, but good lord imagine if it was you stuck in the middle of things? If your loved one was murdered and someone else decided to fudge some details and portray the murderer sympathetically because they wanted to make a “bigger point”? You’d hate it, we all would. Always consider there are real humans at the center of things, and ask if they signed up to be your morality tale.
  4. Remember, people do in fact just make things up and people have been hurt by it: One weird thing I’ve noticed with some true crime assessments is that people will try to play “fair” and give everyone equal credit, like they all are lying a little bit. I think this comes from our natural instinct when we’re adjudicating arguments in our personal life. If you have two friends in a fight and you hear both sides, our instinct tends to be to split the difference and assume both have some points and both are being a little self serving. With many crime stories though, some stories are just incompatible. This happened in my local case, and I was surprised how many people wanted to try to split the difference between two extremely incompatible claims. I ended up having more respect for those who went all in on one side or the other than those who tried to “both sides” two stories that clearly could not coexist.
  5. Factual innocence is different from not guilty: As I have throughout this series, I will reiterate that I support the reasonable doubt standard and our justice system. However, I continue to ding some true crime folk for acting like “beyond a reasonable doubt” means the defendant should be given more deference vs every other person involved. As we saw at the height of the #metoo era, a claim of wrongdoing that never enters a courtroom can destroy lives very easily. Not as dramatically as actually being wrongfully convicted in a court of law, but well beyond a level that’s reasonable to accept. It surprises me therefore that so many podcasters take this responsibility so lightly. If you know that one person committed a murder and you spend hours talking about 6 suspects, you should be aware 5 of those people are innocent and you may have just helped ruin their lives. Even Sarah Koenig admitted she’s ashamed of this part of the Serial podcast, that it encouraged people to treat others as pieces in a puzzle to be solved rather than humans who had been through pain. I get the reason people focus on the person on trial, meticulously cataloguing every issue with the case against them, but it’s notable they tend to spend just a few minutes on the weaknesses of the case against alternative suspects, if they mention them at all. This mimics the tactic of defense lawyers who are explicitly there to do this, but I’m surprised it doesn’t weigh heavier on the conscience of those just doing it as a hobby. If the defense lawyer was wrong, he did his job. If you’re wrong, you actually just wrecked somebody’s life for entertainment.
  6. Watch how people address the victims: This is a somewhat weird one, but hear me out: the more dismissive a true crime podcast or a suspect is of the loved ones of the deceased, specifically those who could not themselves be suspects, the more I’d question the story. Victims by definition shift the attention away from neatly crafted stories, and thus seem to prompt outsized anger or complete dismissal from those seeking to push a narrative. A good recent example of this is Candace Owens attacking Charlie Kirk’s widow Erika. Owens has stated there was a conspiracy to murder Kirk, and it seems the further she went with the story, the more the grieving widow not asking similar questions annoyed her. Even if you believed Charlie Kirk was killed as part of a bigger conspiracy (I don’t) raging at a young widow would be a weird place to start in making your case. Watch how people treat the undisputed victims, and you’ll get a good insight in to where their focus is.

Ok, that’s all I have for today! Tune in next week when I go over some statistical issues.

To go straight to part 6, click here.

Pink Sparkle Unicorn Science

I unfortunately have a packed weekend and have not been feeling well, so no True Crime post today. Instead, I would like to mention something I was wrong about.

For years I have disliked the whole “science for girls” thing, believing that it was fairly condescending to slap pink on something sciencey and to declare it “for girls”. I believed this right up until I had to buy a present for a precocious 4 year old girl I know who is obsessed with all things pink, sparkly, and unicorn adorned. It had been requested I try to find something sciencey, so I decided to take a chance with, well, a pink sparkle unicorn science kit for girls.

She loved it. Last I heard she had told one of her parents to “go away, I’m doing scientist things”.

Worth every penny.

The True Crime Replication Crisis Part 4: Questionable Research Practices

Welcome back folks! This week we’re going to be diving in to questionable research practices, and their impact on the replication crisis.

There’s a few specific questionable research practices that we should cover here, but I want to start with the umbrella concept of “researcher degrees of freedom”. This is a concept that basically means that any time you want to do any experiment, there are a bunch of different ways to do things. Two people locked in separate room asked to create an experiment will almost certainly come up with two different ways of doing things, just because there is often not one right way to do anything. This is fine. What’s less fine is when people start making choices that we know have a high tendency to produce false positive results. Some of these include data dredging, selective reporting of outcomes, HARKing (hypothesizing after results are known), PARKing (pre-registering after results are known). So what are these things and how do they impact research? Let’s get in to it!

Ok, before we talk about the questionable research practices, I want to take a step back and remind everyone that at it’s core, science is largely about trying to deal with the problem of coincidences. We as humans have a tendency to see patterns that may or may not actually exist, and most scientific methods were developed with the idea that you need to separate out coincidences from true causal effect. For example, if a lady claims she can taste a difference in tea based on when the milk was added, you can design a randomized experiment to actually test her prowess. This helps us separate a few lucky guesses from a real effect. Now to reiterate, we almost exclusively have to use this to assess coincidences. If a woman says she could identify the strategy used to make and then promptly misidentified how her tea was made, we’d all move on. It’s mostly when something starts to look plausible that we have to really dig down in the scientific method to figure out what’s going on.

Enter data dredging. Often researchers get large data sets at one time, and may start to run something called exploratory data analyses, just to see if any interesting correlations emerge. This is fine! The problem is that it wasn’t always well communicated that this is what was being done. I’ve actually written about this before when covering some of Brian Wansink’s ongoing scandals and comparing it to my approach for my masters thesis. If you are looking at hundreds of data points, the chance of uncovering a coincidence goes way up, and there are actually statistical methods created just to solve for this problem. Data dredging often leads to spurious correlations, which are fine and relatively easy to deal with if you admit this might be a problem and that you’re going to have to investigate more to figure out if the correlation is real or not. The problem is that even very smart researchers can trick themselves in to thinking that a coincidental finding is more meaningful than it actually is. Andrew Gelman has done excellent work explaining this in his paper “The garden of forking paths: Why multiple comparisons can be a problem, even when there is no fishing expedition or p-hacking and the research hypothesis was posited ahead of time“. In it, he points out that no one has to do this on purpose:

In a recent article, we spoke of shing expeditions, with a willingness to look hard for patterns and report any comparisons that happen to be statistically significant (Gelman, 2013a). But we are starting to feel that the term fishing was unfortunate, in that it invokes an image of a researcher trying out comparison after comparison, throwing the line into the lake repeatedly until a fish is snagged. We have no reason to think that researchers regularly do that. We think the real story is that researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.

The papers go on to explain several examples, but the basics issue is that when you are looking hard for something, you will start unintentionally start expanding your definitions until you are very very likely to find something that is coincidental but you believe it fits in with your initial hypothesis. This can quickly lead to some of the other issues I mentioned earlier: failing to report all endpoints (because you only report the meaningful ones, leaving people in the dark that you actually tested other things), HARKing, basically stating after the fact that you “knew it all along”, and pre-registering after the fact (PARKing) where you take it a step further and state that this is what you were always looking for. Gelman has great examples, but XKCD perhaps put it best:

So how does this apply to true crime? Oh you sweet summer child, let me tell you the ways.

Remember a paragraph or two ago when I said “the basic issue is that when you are looking hard for something, you will start unintentionally start expanding your definitions until you are very very likely to find something that is coincidental but you believe it fits in with your initial hypothesis.“? This is so rampant in true crime you wouldn’t even believe it, and there’s literally no statistical test waiting in the wings to help sort things out.

For years, people have been aware that police can do this. Once they fixate on a suspect, everything that person does can look suspicious. Did the suspect get extremely upset when they heard their wife was dead? Suspicious, probably an act. Did they fail to get upset? Also suspicious! I would argue a large part of our justice system is set up specifically to guard against this issue, and it’s why we have a standard of “beyond a reasonable doubt”. People are fallible, our justice system is imperfect but acknowledges this exists. But as we’ve covered, the disparate and competitive true crime ecosystem takes very little time for self reflection and often breathlessly reports coincidences with no particular acknowledgement that sometimes a coincidence is just a coincidence.

This gets complicated by the fact that (as we covered earlier in this series), true crime type cases happen disproportionately in suburbs or smaller towns vs large cities. I actually have wondered if part of the reason is because it’s easier to have “coincidences” in places with smaller more stable populations. If a police officer gets called to a random murder in the projects, it’s very unlikely they will have a connection to whoever the victim/suspect is. A police officer in a town of 25,000 though? You’ve got a much higher chance of knowing someone who knows someone. Now if there are three officers who respond to a place where 4 people live? You have almost guaranteed someone has a connection. Suspicious!!!! You may think I’m exaggerating, but the case in my town involved DOZENS of things like this. It’s actually what first got me thinking about the connection with the replication crisis. I have had so many people say things like “you REALLY believe that it’s a coincidence that <one of the ten first responders> had a connection with <a relative of one of the four people there>.” Yes, actually I do. It would actually be pretty bizarre if there were literally no connections. I can’t run errands on a Saturday morning without running in to someone I know and you think a bunch of random people from the same town would have absolutely no connections to each other?

The problem here is that people way underestimate the probability of a coincidence. One of my favorite examples of this is “the birthday problem“, a classic stats problem that asks people what the chances are that two people in a randomly selected group of people have the same birthday. Most people are surprised to find out that you only need 23 people before the chances are 50/50 that you’ll get a match, and by 30 people you have a 70% chance of a match. The issue is that people forget you are not trying to find one particular birthday match, but any birthday match. This wildly increases the chance you’ll find something. For my problem above, let’s say you have 6 first responders. Each of them has some combination of parents, siblings, inlaws, children, neighbors and other “suspiciously close” people in their lives. Let’s give them 15-20 each. Now the 5 people at the scene who each have a similar number of people are compared to this. So in the end we have 100+ people compared to 60-80 people, all living in a similar area, and we are looking for ANY connection. What are the chances you think we find one? I’d say quite high! But in true crime land, stuff like this is stated like a stunning development.

Ok now, I’m going to take a step back and be fair for a second: if you’re trying to figure out who did a crime, coincidences may be something you have to look at. I’ve often been annoyed when people start yelling “correlation does not imply causation“, because of course it does. That’s the whole problem. Two items that are correlated may not be causal, but they are much more likely to be causal than two uncorrelated items. But just like with all coincidences, you have to test it against other things to figure out if it’s a coincidence or if you’re really on to something. Otherwise you’re just data dredging, desperately looking for anything that connects to jump off of.

And overinterpretation of data around crimes can cause a LOT of problems. One of the saddest parts of the book In Cold Blood (a classic by Truman Capote that kicked off the modern true crime genre) is when the killers were caught six weeks after murdering a family in a quiet town, Capote mentions many locals had trouble accepting the killers (who ended up confessing) because various coincidences had convinced them others were involved. Those types of issues can stick with you for YEARS, with people vaguely feeling you must have done something.

Ok, so you may be thinking I’m just calling out random individuals here, surely no big time groups report on coincidences like they are meaningful? Au contraire! Recently I saw a Charles Manson documentary that makes the charge that Manson was actually a CIA sleeper agent, based largely on the fact that a CIA member was operating near Manson for years. The guy pushing the theory goes in to great detail about this CIA agent, and how many places were linked to both Manson and this CIA operative. I was already feeling skeptical, when about three quarters of the way through the documentary, the main guy drops “it’s the perfect theory, I just can’t put them in the same room at the same time”. Wait what? Record scratch. So everything we’ve been talking about is just to explain that these two men lived in the same city for some period of time, but you can’t prove they ever met? That’s sort of a key piece of evidence! And this documentary was based on a whole book that was a NYTs best seller and is considered an “Editor’s pick” on Amazon. I don’t see how you consider any of this any more than data dredging, looking for some coincidence, any coincidence, you can use to prop up your theory.

And don’t even get me started on only reporting certain outcomes, this is endemic. One of the weirder examples I’ve come across is one of the most viewed Netflix true crime documentaries of all time The Keepers. This documentary looks in to the murder of a young nun in 1969, and correlates it with sex abuse allegations a woman (then a student) came forward with against a priest at the same school. The heroes are the now adult women who went to the school at the time, and it holds a 97% positive review on Rotten Tomatoes. Surely it can’t be a coincidence that a young nun turned up dead at a school where girls were being molested, can it? Well, it may not be that simple. The problem? The viewers are never told about the background of the primary accuser, which I got suspicious about when I heard she’d been offered a very low settlement. I went looking and discovered that the primary accuser admits that the memories of her abuse was all “recovered memories” that she had no idea about until decades later shortly after she started visiting a new therapist. She actually initially accused her uncle and dozens of strangers, then after a decade of accusations moved on to accusing dozens of school employees. All her initial accusations were actually documented pretty early on, as she filed a 1992 lawsuit that ended up enumerating all of them, including her own admission that many of her reported memories were verifiably false and (in her words) “bull crap”:

The documents go on to point out that even once she settled on the school, she first accused a different priest, then after finding out he was deceased, recanted and moved on to a new priest. And literally none of this is in the documentary. I don’t know what happened here and it sounds like the priest may have been creepy to some people, but it does feel relevant to know someone actually accused dozens of other people before they got to the person in question. There is another accuser, but interestingly the documentary does actually make clear she didn’t come forward until the first woman sent out a mass mailing to all her classmates. I’d really encourage you to read the whole article if you want a sense of how badly a smash hit true crime documentary can shape a narrative through omission.

So where does this leave us? Well, much like with correlated data points, I don’t think it’s wrong to point out coincidences, as long as they are properly contextualized. But a few things to keep in mind:

  1. Watch how big you’re making your data set. As mentioned, the more people you look at, the more likely you are to find odd coincidences. Expanding your timeline to everything dozens of people have ever done or to include all of their family members/friends/neighbors/acquaintances wildly expands your data pool. The bigger the data pool, the less meaningful every individual coincidence.
  2. Don’t discard data that doesn’t fit the narrative. It’s interesting to watch some coincidence finders totally discard certain coincidences with “well of course that person behaved strangely, they were in shock” while jumping all over other coincidences. I understand the temptation, but it’s always good to admit when your own side has holes.
  3. Be aware other people might have already discarded data before you got there. Most documentaries/podcast series have limited space and are going to discard some pieces of information and some of that information could have been important. My new suggestion for everyone is that if you must watch a true crime documentary, google the name + criticism before you watch it. Once someone gives you a slick narrative you are much less likely to care about information that could have shaped your conclusions had you known it beforehand. At least you’ll know when they’re breezing by something that could have been important.
  4. Compare coincidences to your own life/baseline probability. Some coincidences are more likely than others. For example, if you hear that someone bought a knife the day before someone got stabbed, but that someone else did laundry the day after, those are two weird coincidences right? Well yes! But also no. Just in my own life, I am several thousand times more likely to do laundry than I am to buy a knife, and I’d imagine most people are. We can certainly look at both people, but remember how frequently you yourself do the “suspicious” action. This also applies to relationships between people. I once had someone ask me how I felt about all the “close relationships” in our local case, and I pointed out the one they were talking about was somebody’s brother’s wife’s sister’s friend. Knowing a bit about their family, I asked them how close they felt to all their brothers wife’s sisters friends, and they admitted that actually did feel rather more distant when they mapped it out using their own brother/sister-in-law/sister of sister-in-laws friend. Again this is fine! But if film makers can make a benign scene feel ominous with scary music, so can true crimers make somewhat distant relationships feel close by saying them ominously.
  5. Understand the modern environment for information finding. One thing that is very new since the time of In Cold Blood or even JonBenet Ramsey is social media. Nowadays if a new crime occurs, it is trivially easy to go online and find who people might be connected to through Facebook. Suddenly, that guy who ran the trivia night you used to go to 20 years ago can be connected to you just as easily as your actual best friends from college or actual family members, making coincidences even easier to find. Additionally, there are even bigger groups of online sleuths desperate to track down leads, and they scour the internet finding even smaller and smaller discrepancies. I recently saw someone mention they believed JonBenet’s mother was complicit in her murder because of weird phrasing she used in an interview 10+ years after the fact. At that point your data set has become absolutely massive and you may need to take a break.
  6. Prioritize coincidences backed by other evidence. You’d think this would be obvious, but a coincidence that precisely fits the theory of the crime as backed by physical evidence is a lot more meaningful than “this person did something weird elsewhere”.

That’s all I can think of for now! So I’ll close with a quote from Frederick Mosteller “it is easy to lie with statistics, but easier without them.” If scientists attempting to adhere to good statistical methods can make these mistakes, those not even trying to watch their work are several times more likely to fall in to these errors.

To go straight to part 5, click here.

The True Crime Replication Crisis Part 3: More Problems with the Publication System

Hi friends! Last week we covered some problems with the publication system that helped cause the replication crisis in science, and this week we’re continuing in the same vein with three more topics. Ready? Good, let’s dive in.

Standards of Reporting

When people started getting interested in the replication crisis, one of the first things everyone wanted to do was figure out how bad it was. Before this could even be tested, an immediate problem was noticed: most papers don’t describe what they did well enough for anyone to even try to replicate it. And this was in cancer biology. Yikes. Now having actually written some papers in oncology and also having written work instructions for how people should do their job, I will state this is almost certainly for two reasons: it is not easy to write out what you did well enough for someone else to truly follow, and it’s very boring to try to do so. If you’ve ever tried to teach a kid how to do a multi step home chore, you’ve probably seen this. “Ok now put the soap in the dishwasher” quickly makes you realize you did not specify there is special dishwasher soap and a specific spot in the dishwasher for said soap. So basically this is not necessarily sketchy, but also could seriously impede replication efforts.

So how does this relate to true crime? Well, the biggest content delivery systems for true crime right now are podcasts and documentaries, which just so happen to be the hardest mediums to include any sourcing in. Depending on venue, court documents can be really inaccessible, police don’t tend to release a detailed timeline of their investigation until the trial, and even then they keep it pretty narrow to the specific case. So figuring out the big picture of how an investigation played out can actually be super hard and it’s extremely hard to find a source document to check if your podcaster/documentary film maker of choice is being honest or even just reading the facts the way you would have. I ran in to this recently when someone Tweeted out an “everything you think you know about Amanda Knox is wrong” type thread, and I decided I’d check a fact or two to see if this person was trustworthy. The problem? All the stuff necessary to do that is in Italian. I did eventually find a fan maintained document repository that has some translations, but it’s still wasn’t quite sure how to check a quick fact. I gave up.

This is not great because most of how we sort through which podcasts to listen to on history or politics or other topics come from a quick assessment of how honest people are, but with true crime it’s almost impossible to do this easily. Even when I talk to people about my local case it’s often very hard to send them sources for corrections, often the source is buried in the middle of 5 hours of testimony that has no transcripts, so you’d have to watch through hours of footage to link to the spot. So you’ve got a case where you are telling a story, but it is extremely hard for anyone to check the specifics of what you’re saying. That kind of set up has never once bred honesty. The only advice I can give here is to see if there are podcasts/documentaries from two different sides and try to consume both of them. At least then you’ll see what people leave in and what they leave out. And honestly? The crazier the story sounds for people’s behavior especially over long periods of time? Question it harder. Some interesting studies were pretty rightfully called in to question when people started pointing out they had very brief methods sections for very elaborate study set ups. “Dozens of people acted in insane ways for a period of several years” is a claim you should always be skeptical of. It’s not impossible, but always good to see if there is any nuance being left out. After all, if it happens in science where you’re required to write up everything and cite sources, it’s almost certainly happening even more often in podcasts that are required to do neither of those things.

Procedural Bias

So in that last section we covered some issues that can arise with science even when everyone has the best intentions, and in this section we are going to cover another one. Procedural bias concerns arose from a thing called the Duhem-Quine hypothesis, that talks about how most scientific research actually rests on a bunch of different assumptions, including that your instruments are actually working correctly. In psychology research, there’s concerns that people could end up only testing their instruments/procedures if their tests show nothing, but could assume any positive result is proof their thesis is correct. This is a pretty human tendency right? If I ask you to show me how far you can hit a baseball and you don’t do well, you’ll probably ask for another try because your finger slipped/there was a loud noise/the wind blew the ball/etc/etc. But if the wind blows the ball in your favor, you would almost certainly accept the extra few feet. We tend to look for what went wrong when we don’t get a particular result we want.

In science? This is not a great tendency. But in the justice system? This is a feature not a bug! When someone is accused of a crime, they get a lawyer whose actual job is to sit and nitpick every single thing that was done to get to the point of indictment. This is good and how the system is supposed to work, a defense attorney who showed up and said “gosh your honor, it actually looks like the police did everything they could, guess we gotta take the L here” would be grossly negligent and probably lose their license. If the police searched a two block radius? Well why didn’t they search a 3 block radius? If the local police handled it? Why didn’t they call in the state police? If they called the state police? Why didn’t they call earlier. If they called early, what were they so worried about? Etc etc etc. Again, this is a good and proper design of the system and anyone who doesn’t get this type of representation should. But I think true crime has taken this tendency a little too far in a few different ways.

First, as trust in police declines, I’m seeing people put a surprising amount of trust in defense attorneys, as though they are not paid to question everything. If the police went right instead of left first, the defense isn’t saying “oh well everyone knows you go right first” because they necessarily believe that to be true, they are saying it because it is absolutely their job. You’d think this would be obvious, but especially with good defense attorneys I see a surprising number of people quote them as though they are authorities on the topic. This works in both directions, btw, with people claiming offense that a defense attorney claimed a rape victim actually consented (not a lotta defenses left if you don’t use that) or people saying the police were obviously wrong because the defense attorney said so.

None of this is to say that defense attorneys can’t cross ethical lines, and indeed, I’ve been discovering there are surprisingly few ways of reigning rogue defense attorneys in. However, the point is just because a defense attorney claims something does not mean it’s true or even what they would be claiming in a non-professional setting. One of the more interesting points I read while looking in to this is that while defense attorneys have done an excellent job branding themselves as defenders of our constitutional rights, it should be noted that defense attorneys at work are only defending their clients constitutional rights. They will absolutely argue that the police could have or should have violated other people’s constitutional rights if it will help their client. In the case I’m familiar with, the lawyers actually argued multiple times for warrantless searches for other people in a way many critics pointed out they’d be infuriated with if it was done to any of their clients.

So this is all fine for defense attorneys, who are doing their job. But I think this tendency has snuck in to true crime, particularly amongst people who fancy themselves civil rights defenders. If your answer to how one persons rights should have been preserved is to suggest violating someone else’s rights, then you’re not a civil rights advocate, you’re a fangirl. While courts attach certain rights to those on trial, it is absolutely insane to act like only those accused of crime have rights. A defense attorney doesn’t get the contents of my phone just because he wants it, he also has to offer evidence just like what had to be offered to get his or her clients phone. The constitution applies to all of us at all times, not just individual people at specific times.

All of this points to a slightly different problem I’ve noticed with a lot of true crime media and fans: they want it both ways using legal standards. Some time ago, I had a heated discussion with someone who felt differently than I did about our local case. She dismissed multiple things I said as “irrelevant to the court case” and declared she wanted to just follow the case like the jury would. Ok, fair enough! But less than 5 minutes later when she wanted to counter a different point, she promptly mentioned several things that had also not been allowed in to court. I’ve known this person for years and truly believe she was not trying to be manipulative, I think she actually didn’t notice what she was doing. I didn’t even notice what she was doing until I thought about it later, but since then I’ve proactively brought it up to people when I see it. We can either talk about everything from a strict legal perspective, or we can talk about it from a colloquial “do we think they’re guilty standard” but we can’t have two different standards depending on which part of the case we’re talking about. Make sure you’re machine’s working when you get results you like and when you get results you don’t like. Having two different standards is human nature, but it’s a recipe for disaster when it comes to truth.

Cultural Evolution

Ok, this is a fun one, based on a paper with a great name “The Natural Selection of Bad Science“. In it, the authors use some fun statistical methods to show that if scientists primarily get promoted based on publications, and there are no particular penalties for your study failing to replicate, the quality of science will, over time, optimize towards a high volume of low quality publications. They explain it this way:

An incentive structure that rewards publication quantity will, in the absence of countervailing forces, select for methods that produce the greatest number of publishable results. This, in turn, will lead to the natural selection of poor methods and increasingly high false discovery rates. Although we have focused on false discoveries, there are additional negative repercussions of this kind of incentive structure. Scrupulous research on difficult problems may require years of intense work before yielding coherent, publishable results. If shallower work generating more publications is favoured, then researchers interested in pursuing complex questions may find themselves without jobs, perhaps to the detriment of the scientific community more broadly.

Yuck.

So how does this apply to true crime? Well, as we covered in the publish or perish section, true crime is a highly competitive space and making sure you have a steady stream of content is more important than having an entirely accurate retelling of the story. Currently, there are almost no ramifications for those who are inaccurate, so one assumes the same dynamics will come in to play.

This is actually one spot where I think true crime may be in a slightly better spot than science, as there are some podcasters who literally make “we are going to do a ton of research and be moderate and careful” their brand. At least some of these have gotten a pretty dedicated following, so it is possible for consumers to demand more of this. With science, sadly, most of us never know people who toiled away in obscurity and then didn’t succeed in their field.

Ok, that’s it for this week! Next up we’ll start getting in to questionable research practices. Some of these are field specific so may not entirely apply, but we’ll see what we can draw out. Thanks for reading, and stay safe out there.

To go straight to part 4, click here.

The True Crime Replication Crisis Part 2: Problems With the Publication System

Welcome back! Last week we covered the proposed historical and sociological causes of the replication crisis and applied it to things we see in the true crime genre, and this week we’ll be doing a similar analysis with the group of causes under “problems with the publication system”. As a reminder, I’m loosely following the order in the replication crisis Wiki page, so if you want more you can go there. There’s about 6 reasons listed under the “problems with the publication system” section, so we’ll be taking those one at a time.

Publication Bias

Publication bias in science is a topic I’ve written a lot about over the years, but perhaps my most succinct post was when I covered it during my review of the Calling Bullshit course, where they did a whole class on it. I think the second paragraph I wrote during that post broke it down nicely:

This week we’re taking a look at publication bias, and all the problems that can cause. And what is publication bias? As one of the readings so succinctly puts it, publication bias  “arises when the probability that a scientific study is published is not independent of its results.” This is a problem because it not only skews our view of what the science actually says, but also is troubling because most of us have no way of gauging how extensive an issue it is.  How do you go about figuring out what you’re not seeing?

In science, some of the findings were skewed by people being more interested in doing novel research rather than trying to replicate others findings, the “file drawer effect” where papers that didn’t find an association between two factors were much less likely to be published, and (outside of science) the fact that the media will mostly focus on unusual findings rather than the full body of scientific literature. My guess is you already see where this is going, but lets think how this applies to true crime.

One of the first thing anyone who looks at true crime as a genre starts to realize is that the crimes covered by traditional true crime are almost never the most common type of crimes. While there’s no “average” homicide, there is certainly a “modal” one! If you were going to describe a typical homicide in the US, most people would pretty quickly come up with something close to this: a young adult man, shot with a handgun, by another young adult man he knows, during an argument or dispute, in a city setting.

Looking at the stats, we see why this would come to mind: about 80% of homicide victims are male, about 90% of perpetrators are male. Age-wise, crime is dominated by the young. The FBI data also tells us that firearms are used in about 73% of homicides, that you are much more likely to be killed by someone you know than someone you don’t know, and just based on population density alone we would assume most homicides happen in crowded cities. I didn’t include race in the “modal” case because it’s actually closer to 50/50 black vs white, but both homicide victims and perpetrators are disproportionately black. Now there’s a lot of holes in this data because we don’t always know who killed someone, but I think most people would agree based on the general news that these data match what we assume.

If you listen to true crime though? You won’t find that type of crime represented almost at all. I think nearly everyone who’s ever glanced at the news knows if you go missing, heaven help you if you’re anything other than a young attractive white woman, and true crime’s racism problem has been remarked upon for years. I asked ChatGPT which 20 true crime cases it thought got the most media attention in the US (post-1980), and it’s pretty clear these cases caught fire in part because they are so unlike the “typical” crime:

  1. OJ Simpson Trial (1994-95): famous defendant, white female victim, knife violence
  2. JonBenet Ramsey (1996): white female child victim, wealthy family, beauty pageants, strangled
  3. Menendez Brothers (1989): Wealthy family, kids murdering parents
  4. Jeffrey Dahmer (1991): Male victims, gay sex, cannibalism
  5. Casey Anthony (2008): Attractive female defendant, white female child victim, strangulation
  6. Scott Peterson (2002): White female pregnant victim, knife victim
  7. Central Park Five (1989): white female victim, not murdered
  8. Amanda Knox (2007): white female victim, white female defendant, stabbed
  9. Michael Jackson child abuse trials (1993, 2005): celebrity defendant
  10. OJ Simpson Robbery case (2007): famous defendant, already accused of a crime
  11. BTK Killer (2005): Serial killer
  12. Richard Ramirez “Night Stalker” (1980s): serial killer
  13. Waco Siege (1993): Mass death, cult
  14. Columbine high (1999): large school shooting
  15. Unabomber (1996): Mass death
  16. Tylenol murders (1982): multiple dead, unknown culprit
  17. Jodi Arias (2008): white female defendant
  18. Gabby Petito (2021): white female victim, social media star victim
  19. Michael Skakel/Martha Moxley: Kennedy connection for defendant
  20. Pamela Smart (1990): white female defendant

What’s interesting about this list is immediately a few things jump out: gun violence is very underrepresented with only a few cases (Menendez brothers, Pam Smart, Columbine, sort of Jodi Arias) involving a firearm. Outside of the mass deaths/serial killers, almost all of the cases involve someone who was at least middle class or higher. Very few cases involve a solo male victim, it’s mostly solo females or men dying in mixed groups. The exceptions are actually 2 cases where there were accusations of homosexuality (Dahmer, Jackson), and the remaining one is Pam Smart’s husband. There’s also a dearth of black or Hispanic victims by themselves, they only appear in groups. In other words, it’s pretty clear the true crime genre does not get hooked on your “average” crimes, they want the unusual ones. I asked Chat GPT what the modal true crime case was, and it summed it up this way: A White, female (often young) victim, killed in a domestic or intimate context, often by a male partner/family member, in an otherwise “safe” suburban setting. The perpetrator is either wealthy/celebrity or a seemingly ordinary middle-class person hiding darkness. The case includes salacious details, a highly publicized trial, and often an ambiguous or polarizing outcome.

Seems about right.

The problem here is if one pays attention to true crime, you are getting an inaccurate view of how crimes are committed and who the victims are. I think this not only skews people’s perception of their own risk of being a victim, but also people getting weirdly judgmental of police departments. Once I started poking around at older true crime cases, I found that an incredibly common criticism is when police treat a big unusual case as though it was going to be a normal case. Well, yeah. Even large police departments may not be prepared for a celebrity perpetrator, simply because we don’t have too many celebrities running around killing people. The day the call comes in for a surprisingly big case, no one flags for the police “actually better send your best guys down there and double the amount of resources available, this case is going to be on Dateline next year”. They are operating under the assumption they will be responding to the modal crime story, true crimers believe they should have been prepping for the modal true crime story.

I also think it’s very relevant how many of these cases happen in quiet suburbs or small towns. In the case I’ve become familiar with, we’ve had 4 murders here in 40 years, and our county has a murder rate of 1/100,000 a year. That’s on par with the safest countries in the world. The idea that taxpayers were going to pay through the nose to keep our police department in a constant state of readiness for unusual events disregards how most taxpayers actually function. Indeed, there was an audit done on our local police department during all this, and one of their conclusions was “if you want your police trained for unusual events, you’re going to have to increase their training budget so they can go do that” and people FLIPPED OUT. And these were the people who were most viciously critiquing the police! Even after years of unrest they were still unwilling to increase the training budget, believe that (as one person actually publicly put it) “you can just watch CSI to know what you should do”. Sure, and you can skip medical school if you watch old ER reruns.

And finally, I think a lot of people justify the focus on white wealthy attractive people with a sort of “trickle down justice system” type philosophy. If we can only monitor how the police handle the most vaunted in our society, this will somehow trickle down to help the poor and the marginalized. The problem is, I’ve never seen particular evidence that’s true. How did the myriad of resources poured in to JonBenet Ramsey’s case help anyone? I grew up near Pam Smart, and I don’t recall that case making much difference after things settled down. Indeed, I think these cases often give us a false impression of what accused killers and victims “worthy” of sympathy look like. Indeed, attractive people are much more likely to get preferential treatment in every part of the justice system. They are arrested less often, convicted less often, and get shorter sentences when they do. It’s hard to get numbers on what percent have a college degree, but even the most generous estimates suggest it’s around 6% of inmates compared to 37% of the general population. The pre-jail income average for prisoners is about $19k a year. Wealthy educated attractive people have very little trouble getting their stories boosted, the people who need help are those not in those groups.

All of this to say, listening constantly to a non-random assortment of cases is not going to give you a good sense of how our justice system works on a day to day basis, any more than only publishing (or pushing) flashy science results gave us a sense of scientific fields. As a pro-tip, when you hear about a case that’s gaining traction, it’s not a bad idea to try to find a couple similar non-famous cases with victims/perpetrators who aren’t wealthy or attractive to see how those cases were handled. Your concerns may remain, but at least it will give you a baseline to work from that typical true crime reporting lacks.

Mathematical Errors

One interesting issue that has played in to a few replication attempts seems almost too silly to mention, but typos and other errors can and do end up influencing papers and their published conclusions. Within the past few days I’ve actually seen this happen at work when we found out that an abstract had the wrong units for a medication dose we wanted to add to a regimen. The typo was mg vs g, so it would have been a very easy typo to make and a pretty disastrous issue for patients. So at least a few replications might fail due to simple human errors in pulling together the information. For example, an oft quoted study saying that men frequently leave their wives when they are diagnosed with cancer was quickly retracted when it was found the whole result was a coding error. The error was regrettable, but what’s even worse is I still see the original finding quoted any time the topic comes up. It’s not true. It was never true, the finding would definitely not replicate. Even the authors admit this, but the rumor doesn’t die.

So how does this relate to true crime? In almost every major case I’ve peaked at, rumors get going about things that did/didn’t happen, and it is very hard to kill them once they’ve started. One good example is actually the Michael Brown/Ferguson case, where it was initially reported he said something like “Hands Up, Don’t Shoot”, a phrase so popular it now has it’s own Wikipedia page. The problem? It doesn’t exist. When the DOJ looked in to the whole thing, the witness who initially said it happened no longer said it did, none of the evidence matched this account, and it’s considered so debunked even the Washington Post ran an article titled “Hands up Don’t Shoot Did not happen in Ferguson“. For the public though? This is considered gospel. I’ve told a few people in the past few years that this didn’t happen, and they look at me like I kicked a puppy. When I’ve pulled up the WaPo headline, Wiki page or DOJ report, they’re still convinced something is wrong. How is it possible something so repeated just…didn’t happen?

I’m not sure but this is way way way more common in true crime reporting than anyone wants to believe, especially on the internet. Shortly after my local case was resolved I saw a Reddit thread about it on a non-true crime subreddit, and people were naming the evidence that most convinced them of their opinion, 7 out of the first ten things listed didn’t occur. And I’m not talking “are disputed” didn’t occur, I’m talking “both the defense and prosecution would look at you like a crazy person if you made these claims in court” stuff. People were publicly proclaiming they’d based a guilt or innocence opinion on stuff they’d never checked out. Since then I play a game in my head every time I talk to someone about the case, I count how many pieces of evidence they mention before they get to one that’s entirely made up. 90% of people don’t get past their third piece of evidence before they quote something made up. That is…not great.

Interestingly when I’ve corrected people, they generally look at me like I’ve missed the point and I’m dwelling on trivialities. To that I have two responses:

  1. If it’s worth your time to lie about it, it’s worth my time to correct you.
  2. If I were being accused of a heinous crime I didn’t do, whether in court or just in public opinion, I would want people to correctly quote the evidence against me. So would you. So would everyone. These are real people’s lives, this is not a TV show plot you’re only half remembering.

It’s totally fine in my mind not to be super familiar with any famous public case btw, but if you’re going to speak on it and declare you have a strong opinion, you may want to make sure all of your foundational facts are true. With the internet providing so much of our information now, it is really easy to mistakenly quote something you saw someone tweet about rather than something you actually saw testified to.

Publish or Perish Culture

When you take up a career in science, publication is key to career advancement. One of the issues this leads us to is that papers with large or novel findings are far more likely to be published than those that don’t have those qualities. And what’s less interesting than spending tons of time and resources on a study that someone already did just to say “yeah, seems about right, slightly smaller effect size though”. If there’s no particular reward for trying to replicate studies, people aren’t going to do it. And if people aren’t going to do it, you are not going to spend too much time worrying about if your own study can be replicated or not. One can easily see how this would lead to an issue where studies replicated less often than they would in a system that rewarded replication efforts.

So with true crime, the pressure is all on people to make interesting and bold claims about a story to catch eyes. The remedy for this has basically always been defamation claims, and if you think replications are slow and time consuming, boy have I got news for you about defamation claims. Netflix got incredibly sloppy with it’s documentaries and has a stack of lawsuits waiting to get sorted out in court, but progress is glacial.

This problem has been heightened by the influx of small creators who don’t actually have a lot to lose in court. If you work for the NYTs and report something wrong, your employer takes the hit. If you have a tiktok account you started in your parents basement, you can pretty much say whatever you want knowing no one’s going to spend the cash to go after you. This is starting to change as people realize they need to send messages to these content creators who make reckless accusations, but change is slow. Even true crime podcast redditors have wondered how some of the hosts get away with saying all the stuff they do and why more people don’t sue. Oh, and now the mainstream media can just report on the social media backlash rather than report on things directly. Covington Catholic helped set some better guidelines for this, but the problem remains that none of this has improved the accuracy of reporting.

Even if the content creators confine themselves to facts, they often aren’t their facts. Like all social media, pumping out weekly content is king, and most people simply do not have time to thoroughly research cases themselves. A surprising number of true crime podcasts have been hit with plagiarism accusations, including one where they were just reading other people’s articles on air without attribution. Given that podcasts often end up licensing their content, this drives a lot of possibly sticky legal issues. So what are the consequences for this? As of right now almost nothing. The podcast named in the article above just removed the episode, and as of this writing is the 6th most listened to podcast in the world. Publish or perish, good research be damned.

All right we have a few more publication issues to cover, but I think I’ve gone on long enough and will save that for next week. Stay safe everyone!

To go straight to part 3, click here.