The True Crime Replication Crisis Part 4: Questionable Research Practices

Welcome back folks! This week we’re going to be diving in to questionable research practices, and their impact on the replication crisis.

There’s a few specific questionable research practices that we should cover here, but I want to start with the umbrella concept of “researcher degrees of freedom”. This is a concept that basically means that any time you want to do any experiment, there are a bunch of different ways to do things. Two people locked in separate room asked to create an experiment will almost certainly come up with two different ways of doing things, just because there is often not one right way to do anything. This is fine. What’s less fine is when people start making choices that we know have a high tendency to produce false positive results. Some of these include data dredging, selective reporting of outcomes, HARKing (hypothesizing after results are known), PARKing (pre-registering after results are known). So what are these things and how do they impact research? Let’s get in to it!

Ok, before we talk about the questionable research practices, I want to take a step back and remind everyone that at it’s core, science is largely about trying to deal with the problem of coincidences. We as humans have a tendency to see patterns that may or may not actually exist, and most scientific methods were developed with the idea that you need to separate out coincidences from true causal effect. For example, if a lady claims she can taste a difference in tea based on when the milk was added, you can design a randomized experiment to actually test her prowess. This helps us separate a few lucky guesses from a real effect. Now to reiterate, we almost exclusively have to use this to assess coincidences. If a woman says she could identify the strategy used to make and then promptly misidentified how her tea was made, we’d all move on. It’s mostly when something starts to look plausible that we have to really dig down in the scientific method to figure out what’s going on.

Enter data dredging. Often researchers get large data sets at one time, and may start to run something called exploratory data analyses, just to see if any interesting correlations emerge. This is fine! The problem is that it wasn’t always well communicated that this is what was being done. I’ve actually written about this before when covering some of Brian Wansink’s ongoing scandals and comparing it to my approach for my masters thesis. If you are looking at hundreds of data points, the chance of uncovering a coincidence goes way up, and there are actually statistical methods created just to solve for this problem. Data dredging often leads to spurious correlations, which are fine and relatively easy to deal with if you admit this might be a problem and that you’re going to have to investigate more to figure out if the correlation is real or not. The problem is that even very smart researchers can trick themselves in to thinking that a coincidental finding is more meaningful than it actually is. Andrew Gelman has done excellent work explaining this in his paper “The garden of forking paths: Why multiple comparisons can be a problem, even when there is no fishing expedition or p-hacking and the research hypothesis was posited ahead of time“. In it, he points out that no one has to do this on purpose:

In a recent article, we spoke of shing expeditions, with a willingness to look hard for patterns and report any comparisons that happen to be statistically significant (Gelman, 2013a). But we are starting to feel that the term fishing was unfortunate, in that it invokes an image of a researcher trying out comparison after comparison, throwing the line into the lake repeatedly until a fish is snagged. We have no reason to think that researchers regularly do that. We think the real story is that researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.

The papers go on to explain several examples, but the basics issue is that when you are looking hard for something, you will start unintentionally start expanding your definitions until you are very very likely to find something that is coincidental but you believe it fits in with your initial hypothesis. This can quickly lead to some of the other issues I mentioned earlier: failing to report all endpoints (because you only report the meaningful ones, leaving people in the dark that you actually tested other things), HARKing, basically stating after the fact that you “knew it all along”, and pre-registering after the fact (PARKing) where you take it a step further and state that this is what you were always looking for. Gelman has great examples, but XKCD perhaps put it best:

So how does this apply to true crime? Oh you sweet summer child, let me tell you the ways.

Remember a paragraph or two ago when I said “the basic issue is that when you are looking hard for something, you will start unintentionally start expanding your definitions until you are very very likely to find something that is coincidental but you believe it fits in with your initial hypothesis.“? This is so rampant in true crime you wouldn’t even believe it, and there’s literally no statistical test waiting in the wings to help sort things out.

For years, people have been aware that police can do this. Once they fixate on a suspect, everything that person does can look suspicious. Did the suspect get extremely upset when they heard their wife was dead? Suspicious, probably an act. Did they fail to get upset? Also suspicious! I would argue a large part of our justice system is set up specifically to guard against this issue, and it’s why we have a standard of “beyond a reasonable doubt”. People are fallible, our justice system is imperfect but acknowledges this exists. But as we’ve covered, the disparate and competitive true crime ecosystem takes very little time for self reflection and often breathlessly reports coincidences with no particular acknowledgement that sometimes a coincidence is just a coincidence.

This gets complicated by the fact that (as we covered earlier in this series), true crime type cases happen disproportionately in suburbs or smaller towns vs large cities. I actually have wondered if part of the reason is because it’s easier to have “coincidences” in places with smaller more stable populations. If a police officer gets called to a random murder in the projects, it’s very unlikely they will have a connection to whoever the victim/suspect is. A police officer in a town of 25,000 though? You’ve got a much higher chance of knowing someone who knows someone. Now if there are three officers who respond to a place where 4 people live? You have almost guaranteed someone has a connection. Suspicious!!!! You may think I’m exaggerating, but the case in my town involved DOZENS of things like this. It’s actually what first got me thinking about the connection with the replication crisis. I have had so many people say things like “you REALLY believe that it’s a coincidence that <one of the ten first responders> had a connection with <a relative of one of the four people there>.” Yes, actually I do. It would actually be pretty bizarre if there were literally no connections. I can’t run errands on a Saturday morning without running in to someone I know and you think a bunch of random people from the same town would have absolutely no connections to each other?

The problem here is that people way underestimate the probability of a coincidence. One of my favorite examples of this is “the birthday problem“, a classic stats problem that asks people what the chances are that two people in a randomly selected group of people have the same birthday. Most people are surprised to find out that you only need 23 people before the chances are 50/50 that you’ll get a match, and by 30 people you have a 70% chance of a match. The issue is that people forget you are not trying to find one particular birthday match, but any birthday match. This wildly increases the chance you’ll find something. For my problem above, let’s say you have 6 first responders. Each of them has some combination of parents, siblings, inlaws, children, neighbors and other “suspiciously close” people in their lives. Let’s give them 15-20 each. Now the 5 people at the scene who each have a similar number of people are compared to this. So in the end we have 100+ people compared to 60-80 people, all living in a similar area, and we are looking for ANY connection. What are the chances you think we find one? I’d say quite high! But in true crime land, stuff like this is stated like a stunning development.

Ok now, I’m going to take a step back and be fair for a second: if you’re trying to figure out who did a crime, coincidences may be something you have to look at. I’ve often been annoyed when people start yelling “correlation does not imply causation“, because of course it does. That’s the whole problem. Two items that are correlated may not be causal, but they are much more likely to be causal than two uncorrelated items. But just like with all coincidences, you have to test it against other things to figure out if it’s a coincidence or if you’re really on to something. Otherwise you’re just data dredging, desperately looking for anything that connects to jump off of.

And overinterpretation of data around crimes can cause a LOT of problems. One of the saddest parts of the book In Cold Blood (a classic by Truman Capote that kicked off the modern true crime genre) is when the killers were caught six weeks after murdering a family in a quiet town, Capote mentions many locals had trouble accepting the killers (who ended up confessing) because various coincidences had convinced them others were involved. Those types of issues can stick with you for YEARS, with people vaguely feeling you must have done something.

Ok, so you may be thinking I’m just calling out random individuals here, surely no big time groups report on coincidences like they are meaningful? Au contraire! Recently I saw a Charles Manson documentary that makes the charge that Manson was actually a CIA sleeper agent, based largely on the fact that a CIA member was operating near Manson for years. The guy pushing the theory goes in to great detail about this CIA agent, and how many places were linked to both Manson and this CIA operative. I was already feeling skeptical, when about three quarters of the way through the documentary, the main guy drops “it’s the perfect theory, I just can’t put them in the same room at the same time”. Wait what? Record scratch. So everything we’ve been talking about is just to explain that these two men lived in the same city for some period of time, but you can’t prove they ever met? That’s sort of a key piece of evidence! And this documentary was based on a whole book that was a NYTs best seller and is considered an “Editor’s pick” on Amazon. I don’t see how you consider any of this any more than data dredging, looking for some coincidence, any coincidence, you can use to prop up your theory.

And don’t even get me started on only reporting certain outcomes, this is endemic. One of the weirder examples I’ve come across is one of the most viewed Netflix true crime documentaries of all time The Keepers. This documentary looks in to the murder of a young nun in 1969, and correlates it with sex abuse allegations a woman (then a student) came forward with against a priest at the same school. The heroes are the now adult women who went to the school at the time, and it holds a 97% positive review on Rotten Tomatoes. Surely it can’t be a coincidence that a young nun turned up dead at a school where girls were being molested, can it? Well, it may not be that simple. The problem? The viewers are never told about the background of the primary accuser, which I got suspicious about when I heard she’d been offered a very low settlement. I went looking and discovered that the primary accuser admits that the memories of her abuse was all “recovered memories” that she had no idea about until decades later shortly after she started visiting a new therapist. She actually initially accused her uncle and dozens of strangers, then after a decade of accusations moved on to accusing dozens of school employees. All her initial accusations were actually documented pretty early on, as she filed a 1992 lawsuit that ended up enumerating all of them, including her own admission that many of her reported memories were verifiably false and (in her words) “bull crap”:

The documents go on to point out that even once she settled on the school, she first accused a different priest, then after finding out he was deceased, recanted and moved on to a new priest. And literally none of this is in the documentary. I don’t know what happened here and it sounds like the priest may have been creepy to some people, but it does feel relevant to know someone actually accused dozens of other people before they got to the person in question. There is another accuser, but interestingly the documentary does actually make clear she didn’t come forward until the first woman sent out a mass mailing to all her classmates. I’d really encourage you to read the whole article if you want a sense of how badly a smash hit true crime documentary can shape a narrative through omission.

So where does this leave us? Well, much like with correlated data points, I don’t think it’s wrong to point out coincidences, as long as they are properly contextualized. But a few things to keep in mind:

  1. Watch how big you’re making your data set. As mentioned, the more people you look at, the more likely you are to find odd coincidences. Expanding your timeline to everything dozens of people have ever done or to include all of their family members/friends/neighbors/acquaintances wildly expands your data pool. The bigger the data pool, the less meaningful every individual coincidence.
  2. Don’t discard data that doesn’t fit the narrative. It’s interesting to watch some coincidence finders totally discard certain coincidences with “well of course that person behaved strangely, they were in shock” while jumping all over other coincidences. I understand the temptation, but it’s always good to admit when your own side has holes.
  3. Be aware other people might have already discarded data before you got there. Most documentaries/podcast series have limited space and are going to discard some pieces of information and some of that information could have been important. My new suggestion for everyone is that if you must watch a true crime documentary, google the name + criticism before you watch it. Once someone gives you a slick narrative you are much less likely to care about information that could have shaped your conclusions had you known it beforehand. At least you’ll know when they’re breezing by something that could have been important.
  4. Compare coincidences to your own life/baseline probability. Some coincidences are more likely than others. For example, if you hear that someone bought a knife the day before someone got stabbed, but that someone else did laundry the day after, those are two weird coincidences right? Well yes! But also no. Just in my own life, I am several thousand times more likely to do laundry than I am to buy a knife, and I’d imagine most people are. We can certainly look at both people, but remember how frequently you yourself do the “suspicious” action. This also applies to relationships between people. I once had someone ask me how I felt about all the “close relationships” in our local case, and I pointed out the one they were talking about was somebody’s brother’s wife’s sister’s friend. Knowing a bit about their family, I asked them how close they felt to all their brothers wife’s sisters friends, and they admitted that actually did feel rather more distant when they mapped it out using their own brother/sister-in-law/sister of sister-in-laws friend. Again this is fine! But if film makers can make a benign scene feel ominous with scary music, so can true crimers make somewhat distant relationships feel close by saying them ominously.
  5. Understand the modern environment for information finding. One thing that is very new since the time of In Cold Blood or even JonBenet Ramsey is social media. Nowadays if a new crime occurs, it is trivially easy to go online and find who people might be connected to through Facebook. Suddenly, that guy who ran the trivia night you used to go to 20 years ago can be connected to you just as easily as your actual best friends from college or actual family members, making coincidences even easier to find. Additionally, there are even bigger groups of online sleuths desperate to track down leads, and they scour the internet finding even smaller and smaller discrepancies. I recently saw someone mention they believed JonBenet’s mother was complicit in her murder because of weird phrasing she used in an interview 10+ years after the fact. At that point your data set has become absolutely massive and you may need to take a break.
  6. Prioritize coincidences backed by other evidence. You’d think this would be obvious, but a coincidence that precisely fits the theory of the crime as backed by physical evidence is a lot more meaningful than “this person did something weird elsewhere”.

That’s all I can think of for now! So I’ll close with a quote from Frederick Mosteller “it is easy to lie with statistics, but easier without them.” If scientists attempting to adhere to good statistical methods can make these mistakes, those not even trying to watch their work are several times more likely to fall in to these errors.

To go straight to part 5, click here.

4 thoughts on “The True Crime Replication Crisis Part 4: Questionable Research Practices

  1. Back when I was getting my degree people still remembered the unexpected discovery of the J/psi and were tempted to “bump hunt.” My advisor warned me that “If you look at a hundred plots, one of them will have a 3-sigma peak in it.”

    Like

  2. Pingback: The True Crime Replication Crisis: Part 6 Statistical Errors | graph paper diaries

Leave a reply to Anonymous Cancel reply