Weather in Minneapolis vs Boston

I’m just getting back from a conference in Minneapolis, which is an interesting city to go to in November. I’m from Boston so cold doesn’t bother me, but it did strike me as interesting how much colder it seemed to be this time of year.

I did a quick Google search and found the climate data for Minneapolis and Boston and decided to do a quick comparison.

The average high temps in both states are nearly identical (+/-4 degrees) from March to October. In November the average high drops 10 degrees lower in Minneapolis, then the gap widens to 12-14 degrees for Dec-Jan, then back to a 10 degree gap for February, then back to similar climates for the rest of the year. My guess is that’s some sort of ocean moderating effect.

The precipitation levels were even more interesting:

Note: the temperature axes are different on these graphs, with the Boston one starting at 10 degrees and going to 90, and Minneapolis going 0-90. Still, you see that Boston doesn’t get the same level of “dry winter air” that Minneapolis does. I felt that when I got my first nosebleed in years on day 3 there.

Always interesting to see the side by side.

What I’m Reading: November 2019

My migraines have been in full swing this week, so we’ve got a few lighter ones here. Like the Audobon “What Kind of Owl Are You?” quiz. I’m apparently a spotted owl, but that may just be because a dark woods sounds good right about now.

For those who Tweet, if you ever want to see how many words you’ve racked up over time, this link hooks you up.  It tells you what your Twitter feed would be if it were a book. I’m slacking at “Where the Wild Things Are”. The goal apparently is to beat Proust at 1.5 million words.

The above led me down a rabbit hole of “longest novel” Googling, which got me here. Turns out a lot of long novels end up with controversy over whether they are one or many books. Regardless, I’d only even heard of 3 of these. Interesting.

For a slightly longer read, I thought SSCs post on the fall of New Atheism was pretty interesting. As someone who blogged for one of the websites involved in all this for a few years, I’d say Scott hits on a lot of interesting things, and he’s right that more people should be asking “what happened here?”. My two cents: I think people involved in the movement were there for two different reasons. One group rejected religion primarily because they believed religion opposed science and reason, the other because they believed religion promoted oppression. When the second group started to accuse the first group of being oppressive, they were upset to find the first group didn’t care as much as they’d assumed. When the first group started hearing about oppression, they got upset because they believed it to be a secondary concern. I think this was a case of finding out the hard way that the enemy of your enemy isn’t always your friend, but if any readers who were involved have other thoughts I’d like to hear them.

On a related note, the AVI wrote me this week to tell me he wants to lend me The Genesis of Science, which chronicles the history of science and the church in the middle ages. It looks interesting.


From the Archives: Blinded Orchestra Auditions Update

Welcome to “From the Archives”, where I dig up old posts and see what’s changed in the years since I originally wrote them.

A few years ago after the now infamous James D’Amore/Google memo incident, I decided to write a post about one of the most famous “unconscious sexism” studies of all time. Known as the “blinded orchestra auditions” study, it is frequently used to claim that when orchestras started hiding the appearance of the applicant by using a screen, they increased the number of women getting a job.  When I started reading the paper however, I realized the situation was a bit more complicated. Sometimes women were helped by the blinding, sometimes they weren’t. It certainly wasn’t as clear cut as often got reported, and I thought there was some interesting details that got left out of popular retellings. Read my original post if interested.

This post was decently well received when I put it up in 2017, but I was surprised back in May to see it suddenly getting traffic again. Turns out a data scientist from Denmark, Jonatan Pallesen, had written a very thorough post criticizing this study. That post got flipped to Andrew Gelman, who agreed the conclusions of the paper were much murkier than the press seemed to think they were.  He also pointed out that these observations weren’t new, and as proof pointed to….my post. That felt good.

After all this, I was interested to see my post spike again this week, and I wondered what happened. A quick jaunt to Twitter showed me that Christina Hoff Sommers had done a YouTube video explainer about this study, raising some of the same objections. She also wrote a Wall Street Journal op-ed on the same topic.

Now obviously I was pretty happy to see that my original concerns concerns had some merit. I had felt a little crazy when I originally wrote my post because I couldn’t figure out how a paper with so many caveats had been portrayed as such definitive proof for the effectiveness of blinding. However, I started to get some concerns that the pushback was overstepping a bit too.

For example, Jesse Singal (who I follow and whose work I generally like) said this:

I questioned this on Twitter, as typically when we say a study “fell” we mean failed to replicate or that the authors had evidence of fraud. In this case there was neither. All the evidence we have that these conclusions were not as strong as often repeated comes from the paper itself. I questioned Singal’s wording on Twitter, and got a reply from Sommers herself:

I think this statement needs to be kept in mind. While the replication crisis has rocked a lot of our understanding of social science studies, it’s a little incredible that so many people cited this study without noticing the very clear limitations that were presented within the paper itself. As Gelman said in his post “Pallesen’s objections are strongly stated but they’re not new. Indeed, the authors of the original paper were pretty clear about its limitations. The evidence was all in plain sight.

Additionally, while the author’s 50% claim in the concluding paragraph seems unwise, it should be noted that this is the paper abstract (bold mine):

A change in the audition procedures of symphony orchestras adoption of “blind” auditions with a “screen” to conceal the candidate’s identity from the jury provides a test for sex-biased hiring. Using data from actual auditions, in an individual fixed-effects framework, we find that the screen increases the probability a woman will be advanced and hired. Although some of our estimates have large standard errors and there is one persistent effect in the opposite direction, the weight of the evidence suggests that the blind audition procedure fostered impartiality in hiring and increased the proportion women in symphony orchestras.

Journalists and others quoting this study weren’t being limited by a paywall and relying on the abstract, because that stat wasn’t in the abstract. Those stats appear to have been in the press release, and that seems to be what everyone copied them from.

While I totally agree that the study authors could have been more careful, I do think they deserve credit for putting the caveats and limitations in the abstract itself. They didn’t know when that press release was put together that this study would still be quoted as gospel 2 decades later, and it’s not clear how much control they had over it. They deserve credit for not putting those stats in their abstract, and for making sure some of the limitations were mentioned there instead.

I’m hammering on this because I think it’s worth examining what really went wrong here. I suspect at some point people stopped reading this study entirely, and started just copying and pasting conclusions they saw printed elsewhere. This is a phenomena I noted back in 2017 and have dubbed The Bullshit Two-Step:  A dance in which a story or research with nuanced points and specific parameters is shared via social media. With each share some of the nuance or specificity is eroded, finally resulting in a story that is almost total bullshit but that no one individually feels responsible for.

While I do think the researchers bear some responsibility, it’s worth noting that there’s no clear set of ethics for how researchers should handle seeing their studies misquoted. Misquotes or unnuanced recitations of studies can happen at any time, and researchers may not see them, or might be busy with an illness or something. I do think it would be interesting for someone to pose a set of standards for this….if anyone knows of such a thing, let me know.

For the rest of us, I think the moral of this story is that no matter how often you hear a study quoted, it’s always worth taking a look at the original information. You never know what you could find.

Short Takes: Perception vs Reality vs Others

I was going to do a normal “what I’m reading” column this week, but I thought so much about the first two links I just decided to turn it in to a short takes. I’m seeing a lot of interesting parallels between these two articles, so I wanted to highlight a few things.

The first link was a Medium post called “How to Change a Mind“, an excerpt from an upcoming book called “Stop Being Reasonable: How We Really Change Our Minds“. It tells the story of a woman named Missy, and how she got her husband Dylan to leave a cult. The whole story is worth reading, but it the ultimate conclusion is worth pondering. Dylan didn’t leave because she was able to point out some of the ridiculousness in what the cult believed (though she tried), but because one of the leaders ended up offering a large and objectively unfair critique of Missy ending with an encouragement to leave her.

This was the proverbial straw that broke the camel’s back. Dylan knew his wife had been nothing but kind and supportive, and the attempt to cast her in a different light caused him to doubt the leaders in a way he never had. As the article says “Dylan did not need to lose his faith in what his elders were saying; he needed to lose his faith in them.” And lose it he did. He spent two days straight Googling every critique of the group that was out there, then severed his ties. He describes his faith in them like a faucet that just got suddenly shut off.

The article does a good job of contextualizing this, and pointing out the lessons here for all of us. While most of us have never joined a cult, many of us take the word of others for granted on many topics. We have faith in certain sources, and barring any challenges will continue to believe those things. Maybe the topic is history, chemistry, math or some other topic we are aware of but didn’t study much personally. Even something as simple as another person’s name is mostly taken on faith. The point is, we can’t check every single thing that comes across our path, so we all have short cuts and rubrics to decide what information we believe and what we don’t. The point of this story is that the “who” part of that rubric can at times be more important than the “what”.

Given that, it was interesting that this next link landed in my inbox this morning “The Dangers of Fluent Lectures“. The article is based on a study that compared Harvard freshmen who took a physics class with lots of well polished lectures (passive learning) and those who took a class that made students work through problems on their own before explaining the answers to them (active learning). The results were interesting. Those who sat through the nicely polished lecture believed they learned more, but those who sat through the active lecture actually learned more:

There’s a couple theories about why this happens, but I think at least some of it has to do with the first article. Feeling that you are in the presence of someone hyper-competent could end up giving you the impression that you are more competent than you are. The active learning forces students to focus on their own deficiencies, while the passive learning lets them ignore that and focus on the professor. As the study authors say “novice students are poor at judging their actual learning and thus rely on inaccurate metacognitive cues such as fluency of instruction when they attempt to assess their own learning.” Again, it’s not always what you believe, it’s who.

Now there’s a couple caveats with this study: it’s not clear what would have happened if they had tried this study on 4th year students who were doing more advanced work, or if they had tried this at a state school rather than Harvard. They also mentioned that the kids in the study weren’t given any warning about teaching methods up front. In a later version of the study, they spent a few minutes in the first lecture teaching kids about active learning methods and the proof that they help students learn more. The students subsequently rated those classes as more effective, and said they felt better about the learning methods.

As always, we continue to be poor judges of our own objectivity.

Rotten Tomatoes and Selection Bias

The AVI sent along a link (from 2013) this week about movies that audiences love and critics hated as judged by their Rotten Tomatoes scores.

For those of you not familiar with Rotten Tomatoes, it’s a site that aggregates movie reviews so you can see overall what percentage of critics liked a movie. After a few years of that, they also allowed users to leave reviews so you can see what percentage of audience members liked a movie. This article pulled out every movie with a critic score and an audience score in their database and figured out which ones were most discordant. The top movies audiences loved/critics hated are here:

The most loved by critics/hated by audiences ones are here:

The article doesn’t offer a lot of commentary on these numbers, but I was struck by how much selection bias goes in to these numbers. While movie critics are often (probably fairly) accused of favoring “art pieces” or “movies with a message” over blockbuster entertainment, I think there’s some skewing of audience reviews as well. Critic and audience scores are interesting because critics are basically assigned to everything, and are supposed to write their reviews with the general public in mind. Audience members select movies they are already interested in seeing, and then review them based solely on personal feelings.

For example, my most perfect movie going experience ever was seeing “Dude, Where’s my Car?” in the theater. I was in college when it came out, and had just finished some grueling final exams. My brain was toast. A friend suggested we go, and the theater was full of other college students who had also just finished their exams. It was a dumb movie, a complete stoner comedy from the early 2000s. We all laughed uproariously. I have very fond memories of this, and the movie in general. It was a great movie for a certain moment in my life, but I would probably never recommend it to anyone. It has a 17% critic score on Rotten Tomatoes, and a 47% audience score. This seems very right to me. No one walks in to a movie with that title thinking they are about to see something highbrow, and critics were almost certainly not the target audience. Had more of the population been forced to go to that movie as part of their employment, the audience score would almost certainly dip. If only the critics who wanted to see it went, their score would go up.

This is key with lists like this, especially when we’re looking at movies that came out before the site that existed. Rotten Tomatoes started in 1998, but a quick look at the top 20 users loved/audiences shows that the top 3 most discordant movies all came out prior to that year. So essentially the user scores are all from people who cared enough about the movie to go in and rank it years after the fact.

For the critics loved/users hated movies, the top one came out in 1974. I was confused about the second one (Bad Biology, a sex/horror movie that came out in 2008), but noted that Rotten Tomatoes no long assigns it a critic score. My suspicion is that “100%” might have been one review. From there, numbers 3-7 are all pre 1998 films. In the early days of Rotten Tomatoes you could sort movies by critic score, so I suspect some people decided to watch those movies based on the good critic score and got disappointed. Who knows.

It’s interesting to think about all of this an how websites can improve their counts. Rotten Tomatoes recently had to stop allowing users to rate movies before they came out as they found too many people were using it to try to tank movies they didn’t like. I wonder if sending emails to users asking them to rank (or say “I haven’t seen this”) to 10 random movies on a regular basis might help lower the bias in the audience score. I’m not sure, but as we crowd source more and more of our rankings, bias prevention efforts may have to get a little more targeted. Interesting to think about.


5 Things About Appendicitis Rates Over Time

A close relative of mine had a bit of a scare this week when she ended up admitted to the hospital for (what was ultimately diagnosed as) acute appendicitis. She ended up in surgery with a partially ruptured appendix, though she’s doing fine now.

When I mentioned this saga to a coworker, she said she felt like she didn’t hear much about appendicitis anymore. We started wondering what the rates were, and if they were going down over time. Of course this meant I had to take a look, so here’s what I found:

  1. The rates have fallen over the decades, and no one is really sure why. This paper suggests that rates fell by 15% between 1970 and the mid 80s, but no one’s sure what happened. Did appendicitis become less common? Less deadly? Or did our diagnostic tools get better and some number of cases get reclassified? This is a valid question because of this next point….
  2. A surprisingly high number of appendectomies aren’t necessary. An interesting study from 2011 showed that about 12% of patients who get an appendectomy end up not getting diagnoses with appendicitis. They suggest that this rate has been falling over time which could have helped the numbers in point #1. Is it the whole story? It’s not clear! But definitely something to keep in mind.
  3. The number of incorrectly removed appendixes may not be going down. Contrary to the assertions of the study above, it’s not certain that misdiagnosed appendicitis is going down. Despite better diagnostics, it appears that easier surgical techniques (i.e. laparoscopic surgeries) actually may have increased the rate of unnecessary surgeries. This sort of makes sense. If you have to do a big complicated surgery, you are going to really want to verify that it’s necessary before you go in. As the surgery get easier, you make focus more on getting people to surgery more quickly.
  4. The data sources may not be great. One of the more interesting papers I found compared the administrative database (based off insurance coding) vs a pathology database and found that insurance coding consistently underestimated the number of cases of appendicitis. Since most studies have been done off of insurance code databases, it’s not clear how this has skewed our view of appendicitis rates.
  5. Other countries seem to be seeing a drop too Whatevers going on with appendicitis diagnosis, the whole world seems to be seeing a similar trend. Greece has seen a 75% decrease. England has also seen falling rates. To be fair though, some data shows it’s mixed. Developed countries  seem to be stabilizing, newly developed countries seem to see high rates.

So who knew how hard it was to get a handle on appendicitis rates? I certainly thought it would be a little more straightforward. Always fascinating to explore the limits of data.

What’s My Age Again?

One of my favorite weird genre of news story occurs when the journalist/editor/newsroom all forget how old they are in relation to the people they are writing about. This phenomena is what often gives rise to articles about millenials that don’t actually quote millenials,  or articles about millenial parents of small children that compare them to Boomer parents of teenage children. I also see this in the working world, where there are still seminars about “how to manage millenials”, even though the oldest millenials are nearing 40 (and age discrimination laws!) and new college grads are most likely “Gen Z”.

Anyway, given my love for this genre of story, I got a kick out of a Megan McArdle Tweet this week that pointed out a Mother Jones article that fell a bit in to this trap.

She was pointing to this article that explained how Juul (an ecigarette manufacturer) had been marketing to teens for several years. As proof, they cited this:

Now for many millenials, this makes perfect sense. How could you screen three teen movies like “Can’t Hardly Wait”, “SCREAM” and “Cruel Intentions” and say you were marketing to adults? Well, that depends on your perspective. Can’t Hardly Wait came out in 1998, SCREAM in 1996 and Cruel Intentions in 1999. Current 14-18 year olds were born between 2001 and 2005. Does a party featuring movies made 5 years before you were born sound like it is trying to attract current teens? Or is it more likely that it would draw those who were teens at the time they were released….i.e. those in their early 30s?

As a quick experiment, subtract  5 years from your current birth year, Google “movies from ______”, take out the actual classics/Oscar winners and see how many of those movies you would have gone to an event to see at age 16. I just did it for myself and I’d have gone to see Rocky (though that’s an actual classic) and that’s pretty much it. I enjoyed the Omen, but not until later in college, ditto for Murder by Death and Network. In thinking back to my teen years, I did attend an event where Jaws was screened at a pool party, but I suspect the appeal of Jaws is more widespread/durable than “Can’t Hardly Wait”.

To be clear, I have very little insight in to Juul’s marketing plan or anything about them other than what I’ve seen on the news. What I do know though is that some movies appeal to broad audiences, and some appeal to a very narrow band of people who saw them at the right age. Teen movies in particular do not tend to appeal endlessly to teens, but rather to continue to appeal to the cohort who originally saw them.

There is an odd phenomena with some movies where they do poorly in the box office then pick up steam on DVD or cable broadcasts. The movie Hocus Pocus  (1993)is a good example. It was a flop at the box office, but was rebroadcast on ABC Family and the Disney Channel and then landed on a kids “13 Nights of Halloween” special in the early 2000s. This has caused the very odd phenomena of kids who weren’t born when it was released remembering it as a movie of their childhood more than those in the “right” cohort would have.

So basically I think it can be a bit of a challenge to triangulate what pop culture appeals to what age groups, particularly once you are out of that age group. Not that I’m judging. I struggled enough to figure out what was cool with teens when I actually was one. I have no idea how I’d figure it out now.


Diagnoses: Common and Uncommon

There was an interesting article in the Washington Post this week, about a man with a truly bizarre disorder. Among many other terrible symptoms, he essentially never has to go to the bathroom while he’s standing up and going about his day and appears to be dehydrated no matter how much he drinks, but the minute he lays down at night he has to urinate copiously and shows signs of being overhydrated. He has so many bizarre symptoms that he ended up in something called the Undiagnosed Disease Program, a fascinating group run by the NIH that seeks to find diagnoses for people who have baffled other physicians. They conduct all sorts of testing and try to either find people a diagnosis or to add their information to a database in the hopes that eventually they’ll get some information that will help them figure this out. The overall goal is to both help people and add to our collective knowledge about the human body.

Outlier medical cases are truly fascinating to many people, myself included. The WaPo column is actually part of a series called “medical mysteries“. Oliver Sacks made a whole writing career out of writing books about them. These cases make it in to our textbooks in school, and they are the stories that stick in our minds. These aren’t even one in a million cases, they are often one in 10 or 100 million. The guy in the WaPo story might even be 1 in a billion or 10 billion.

I am also fascinated by these stories in part because last year I started in on a medical mystery of my own. It started innocuously enough: random bouts of nausea, random bouts of extreme fatigue, then noticeable increased sensitivity to smells, tastes and pain. I assumed I was pregnant. I wasn’t.

I followed up with my doctor who confirmed that my hormone and other blood levels were fine. She ran tests to see if I was being poisoned, if I had a weird vitamin deficiency or had ODed on something accidentally.  She referred me to a couple of other doctors. The bouts came and went, but they actually started to get very disconcerting. My increased sense of smell meant that my car would frequently smell strongly of gas…something most of us take to mean there’s a problem. I couldn’t wear certain clothes because it felt like the seems or zippers were cutting my skin, but my skin showed no signs of redness. I couldn’t drink my coffee some mornings because I was convinced it was scalding my mouth. When I ate food I was convinced I could still taste the wrapper. Sensory information is supposed to help us make our way through the world, and to have it suddenly shifting around on you is incredibly disorienting.

Over the course of 6 months I saw 7 different doctors, all of whom were baffled. Since I work at a hospital I informally talked to half a dozen other NPs/PAs/MDs, and none of them had any idea either. The nausea and fatigue could come with hundreds of disorders, but nervous system hypersensitivity is a much less common symptom.

In the course of all this, the Assistant Village Idiot made a comment about how I should remember that strange symptoms were more likely to be an uncommon presentation of a common thing than an uncommon thing. The most experienced doctors I saw also mentioned the limitations of diagnosis. We build diagnoses based on the most common presentations of things, but we often don’t know if there are other possible presentations. We give names to clusters of symptoms because we see them together often, but it’s possible the biological underpinnings of the disorder could end up different places we don’t see as often. One doctor mentioned that in 6 months or a year I might add more symptoms that made things much clearer.

After about 6 months I still had no answers, but got some relief when I discovered that a magnesium supplement I’d taken to help me sleep seemed to help my symptoms. My doctor told me I could increase the dose and take it daily, and over the course of 6 weeks it mostly worked. I had relief, even if I still had no answers.

That was in January, and for the last 8 months I’ve seen small flares of symptoms that magnesium seemed to help. Then, about a month ago a new symptom started that made the whole thing much clearer: I got a headache. A one sided, splitting “gotta go lay down in a dark room” headache. A week or two later I got another one, then I got another one. I had always gotten a handful of migraines a year, but with the sudden change in frequency I started to notice something. For two days before I would be extra sensitive to light, pain, and smell. Sound too. Then during the migraine I would be incredibly nauseous, then the day after I would be so fatigued I could barely get out of bed. I looked back at my journals of my mystery symptoms I’d started keeping last year and realized it fit the same pattern. The symptoms that seemed so mysterious were actually part of the very classic migraine prodome/aura/postdrome pattern. It was then that I learned about the existence of acephalgic or “silent” migraines…..migraines that occur with all of the symptoms except the classic headache. My doctor confirmed my suspicions. I had been having chronic migraines with the headache, that now had developed in to chronic migraines with the headache. Once the headache appeared, my case was textbook. I got prescriptions for Imitrex and Fioricet along with a prophylactic medication.

Now per the Wiki page (and everything else I’ve read), acephalgic migraines are uncommon. It’s not particularly normal to get them as badly as I did without regular migraines, though they admit the data may be flawed. Since most people wouldn’t identify those symptoms as migraines, they might have an underreporting problem. Regardless, the AVIs point stood: this was an uncommon presentation of a common thing, not an uncommon disorder.

I like this story both because I am relieved to have a diagnosis and because it is relieving to have a diagnosis and because it is an interesting example of the entire concept of base rate. Migraines are the third most common disease in the world, after tension-type headaches and dental caries (cavities). One out of every 7 people get them. If we assume that my symptoms are highly unusual for migraine sufferers….say 1% of cases….that still means about 15 out of 10,000 people will get them. For comparison, schizophrenia is 1.5 out of 10,000.  Epilepsy is 120 out of 10,000, or about 10% the rate of migraine sufferers. A small percentage of a big number is often still a big number. An uncommon presentation of a common disorder can often be more common than uncommon disorders.

See, everything’s a stats lesson if you look hard enough. While I’m relieved to have a diagnosis, the downside of this is that the more frequent headaches are impacting my ability to sit in front of a screen as often, which may impact blogging. While we figure out what works to reduce the frequency of these, I may end up doing some more archives posting, maybe a top 100 post countdown like the AVI has been doing. We’ll see. While my doctor is great, any good resources are appreciated!

From the Archives: Birthday Math

Three years ago on my birthday, I put up a post of 5 fun math related birthday things. One of these was the “Cheryl’s Birthday” math problem which had gone viral the year prior. Here it is:

I was thinking about this recently, and found out it now had its own Wikipedia page.

The Wiki informed me that there had been a follow up problem released by the same university:

Albert and Bernard now want to know how old Cheryl is.
Cheryl: I have two younger brothers. The product of all our ages (i.e. my age and the ages of my two brothers) is 144, assuming that we use whole numbers for our ages. 
Albert: We still don’t know your age. What other hints can you give us? 
Cheryl: The sum of all our ages is the bus number of this bus that we are on. 
Bernard: Of course we know the bus number, but we still don’t know your age. 
Cheryl: Oh, I forgot to tell you that my brothers have the same age. 
Albert and Bernard: Oh, now we know your age.

So what is Cheryl’s age?

It’s a fun problem if you have a few minutes. I thought it was easier than the first one, but still requires actually sitting down and doing a few steps to get to the answer. Very hard to short cut this one. It also retains the charm of the original problem of making you flip your thinking around a bit to think about what you don’t know and why you don’t know it.

The answer’s at the bottom of the Wiki page if you’re curious.

There’s More to that Story: 4 Psych 101 Case Studies

Well it’s back to school time folks, and for many high schoolers and college students, this means “Intro to Psych” is on the docket. While every teacher teaches it a little differently, there are a few famous studies that pop up in almost every textbook. For years these studies were taken at face value, however with the onset of the replication crisis many have gotten a second look and have been found to be a bit more complicated than originally thought.  I haven’t been in a classroom for psych for quite a few years so I’m hopeful the teaching of these has changed, but just in case it hasn’t, here’s a post with the extra details my textbooks left out.

Kitty Genovese and the bystander effect: Back in my undergrad days, I learned all about Kitty Genovese, murdered in NYC while 37 people watched and did nothing. Her murder helped coin the term “bystander effect”, where large groups of people do nothing because they assume someone else will. It also helped prompt the creation of “911” the emergency number we all can call to report anything suspicious.

So what’s the problem? Well, the number 37 was made up by the reporter, and likely not even close to true. The New York Times had published the original article reporting on the crime, and in 2016 called their own reporting “flawed“. A documentary was made in 2015 by Kitty’s brother investigating what happened, and while there are no clear answers, what is clear is that a murder that occurred at 3:20am probably didn’t have 38 witnesses who saw anything, or even understood what they were hearing.

Zimbardo/Stanford Prison Experiment: The Zimbardo (or Stanford) Prison Experiment is a famous experiment in which study participants were asked to act as prisoners or guards in a multi-day recreation of a prison environment. However, things got quickly out of control and the guards got so cruel and the prisoners so rowdy that the whole thing had to be shut down early. This showed the tendency of good people to immediately conform to expectations when they were put in bad circumstances.

So what’s the problem? Well, basically the researcher coached a lot of the bad behavior. Seriously, there’s audio of him doing it. This directly contradicts his own statements later that there were no instructions given. Reporter Ben Blum went back and interviewed some of the participants who said they were acting how they thought the researchers wanted them to act. One guy said he freaked out because he wanted to get back to studying for his GREs and thought the meltdown would make them let him go early. Can bad circumstances and power imbalances lead people to act in disturbing ways? Absolutely, but this experiment does not provide the straightforward proof it’s often credited with.

The Robber’s Cave Study: A group of boys are camping in the wilderness and are divided in to two teams. They end up fighting each other based on nothing other than assigned team, but then come back together when facing a shared threat. This shows how tribalism works, and how we can overcome it through common enemies.

So what’s the problem? The famous/most reported on study was take two of the experiment. In the first version the researchers couldn’t get the boys to turn on each other, so they did a second try eliminating everything they thought had added group cohesion in the first try, and finally got the boys to behave as they wanted. There’s a whole book written about it and it showcases some rather disturbing behavior on the part of the head researcher Muzafer Sherif. He was never clear with the parents what type of experiment the boys were subjected to, and he actually both destroyed personal belongings himself (to blame it on the other team) and egged the boys on in their destruction. When Gina Perry wrote her book she found that many of the boys who participated (and are now in their 70s) were still unsettled by the experiment. Not great.

Milgram’s electric shock experiment: A study participant is brought in to a room and asked to administer an electric shock to a person they can’t see who is participating in another experiment. When the hidden person gets a question “wrong” they are supposed to zap them to help them learn. When they zap them, a recording plays of someone screaming in pain. It is found that 65% of people will administer a fatal shock to a person as long as the researcher keeps encouraging them to do so. This shows that our obedience to authority can override our own ethics.

So what’s the problem? Well, this one’s a little complicated. The original study was actually 1 of 19 studies conducted, all with varying rates of compliance. The most often reported findings were from the version of the experiment that resulted in the highest amount of compliance. A more recent study also reanalyzed participants behavior in light of their (self-reported) belief that the subject was actually in pain or not. One of the things the researchers told people to get them to continue was that the shocks were not dangerous, and it also appears many participants didn’t think what they were participating in was real, and it wasn’t. They found that those who either believed the researchers assurances or expressed skepticism about the entire experiment were far more likely to administer higher levels of voltage than those who believed the experiment was legit. To note though, there have been replication attempts that did find comparable compliance rates to Milgram’s, though the shock voltage has always been lower due to ethics concerns.

So overall, what can we learn from this? Well first and foremost that once study results hit psych textbooks, it can be really hard to correct the error. Even if kids today aren’t learning these things, many of us who took psych classes before the more recent scrutiny of these tests may keep repeating them.

Second, I think that we actually can conclude something rather dark about human nature, even if it’s not what we first thought. The initial conclusion of these studies is always something along the lines of “good people have evil lurking just under the surface”, when in reality the researchers had to try a few times to get it right. And yet this also shows us something….a person dedicated to producing a particular outcome can eventually get it if they get enough tries. One suspects that many evil acts were carried out after the instigators had been trying to inflame tensions for months or years, slowly learning what worked and what didn’t. In other words, random bad circumstances don’t produce human evil, but dedicated people probably can produce it if they try long enough. Depressing.

Alright, any studies you remember from Psych 101 that I missed?