Welcome to “From the Archives”, where I dig up old posts and see what’s changed in the years since I originally wrote them.
A few years ago after the now infamous James D’Amore/Google memo incident, I decided to write a post about one of the most famous “unconscious sexism” studies of all time. Known as the “blinded orchestra auditions” study, it is frequently used to claim that when orchestras started hiding the appearance of the applicant by using a screen, they increased the number of women getting a job. When I started reading the paper however, I realized the situation was a bit more complicated. Sometimes women were helped by the blinding, sometimes they weren’t. It certainly wasn’t as clear cut as often got reported, and I thought there was some interesting details that got left out of popular retellings. Read my original post if interested.
This post was decently well received when I put it up in 2017, but I was surprised back in May to see it suddenly getting traffic again. Turns out a data scientist from Denmark, Jonatan Pallesen, had written a very thorough post criticizing this study. That post got flipped to Andrew Gelman, who agreed the conclusions of the paper were much murkier than the press seemed to think they were. He also pointed out that these observations weren’t new, and as proof pointed to….my post. That felt good.
After all this, I was interested to see my post spike again this week, and I wondered what happened. A quick jaunt to Twitter showed me that Christina Hoff Sommers had done a YouTube video explainer about this study, raising some of the same objections. She also wrote a Wall Street Journal op-ed on the same topic.
Now obviously I was pretty happy to see that my original concerns concerns had some merit. I had felt a little crazy when I originally wrote my post because I couldn’t figure out how a paper with so many caveats had been portrayed as such definitive proof for the effectiveness of blinding. However, I started to get some concerns that the pushback was overstepping a bit too.
For example, Jesse Singal (who I follow and whose work I generally like) said this:
I questioned this on Twitter, as typically when we say a study “fell” we mean failed to replicate or that the authors had evidence of fraud. In this case there was neither. All the evidence we have that these conclusions were not as strong as often repeated comes from the paper itself. I questioned Singal’s wording on Twitter, and got a reply from Sommers herself:
I think this statement needs to be kept in mind. While the replication crisis has rocked a lot of our understanding of social science studies, it’s a little incredible that so many people cited this study without noticing the very clear limitations that were presented within the paper itself. As Gelman said in his post “Pallesen’s objections are strongly stated but they’re not new. Indeed, the authors of the original paper were pretty clear about its limitations. The evidence was all in plain sight.”
Additionally, while the author’s 50% claim in the concluding paragraph seems unwise, it should be noted that this is the paper abstract (bold mine):
A change in the audition procedures of symphony orchestras adoption of “blind” auditions with a “screen” to conceal the candidate’s identity from the jury provides a test for sex-biased hiring. Using data from actual auditions, in an individual fixed-effects framework, we find that the screen increases the probability a woman will be advanced and hired. Although some of our estimates have large standard errors and there is one persistent effect in the opposite direction, the weight of the evidence suggests that the blind audition procedure fostered impartiality in hiring and increased the proportion women in symphony orchestras.
Journalists and others quoting this study weren’t being limited by a paywall and relying on the abstract, because that stat wasn’t in the abstract. Those stats appear to have been in the press release, and that seems to be what everyone copied them from.
While I totally agree that the study authors could have been more careful, I do think they deserve credit for putting the caveats and limitations in the abstract itself. They didn’t know when that press release was put together that this study would still be quoted as gospel 2 decades later, and it’s not clear how much control they had over it. They deserve credit for not putting those stats in their abstract, and for making sure some of the limitations were mentioned there instead.
I’m hammering on this because I think it’s worth examining what really went wrong here. I suspect at some point people stopped reading this study entirely, and started just copying and pasting conclusions they saw printed elsewhere. This is a phenomena I noted back in 2017 and have dubbed The Bullshit Two-Step: A dance in which a story or research with nuanced points and specific parameters is shared via social media. With each share some of the nuance or specificity is eroded, finally resulting in a story that is almost total bullshit but that no one individually feels responsible for.
While I do think the researchers bear some responsibility, it’s worth noting that there’s no clear set of ethics for how researchers should handle seeing their studies misquoted. Misquotes or unnuanced recitations of studies can happen at any time, and researchers may not see them, or might be busy with an illness or something. I do think it would be interesting for someone to pose a set of standards for this….if anyone knows of such a thing, let me know.
For the rest of us, I think the moral of this story is that no matter how often you hear a study quoted, it’s always worth taking a look at the original information. You never know what you could find.