Weather in Minneapolis vs Boston

I’m just getting back from a conference in Minneapolis, which is an interesting city to go to in November. I’m from Boston so cold doesn’t bother me, but it did strike me as interesting how much colder it seemed to be this time of year.

I did a quick Google search and found the climate data for Minneapolis and Boston and decided to do a quick comparison.

The average high temps in both states are nearly identical (+/-4 degrees) from March to October. In November the average high drops 10 degrees lower in Minneapolis, then the gap widens to 12-14 degrees for Dec-Jan, then back to a 10 degree gap for February, then back to similar climates for the rest of the year. My guess is that’s some sort of ocean moderating effect.

The precipitation levels were even more interesting:

Note: the temperature axes are different on these graphs, with the Boston one starting at 10 degrees and going to 90, and Minneapolis going 0-90. Still, you see that Boston doesn’t get the same level of “dry winter air” that Minneapolis does. I felt that when I got my first nosebleed in years on day 3 there.

Always interesting to see the side by side.

What I’m Reading: November 2019

My migraines have been in full swing this week, so we’ve got a few lighter ones here. Like the Audobon “What Kind of Owl Are You?” quiz. I’m apparently a spotted owl, but that may just be because a dark woods sounds good right about now.

For those who Tweet, if you ever want to see how many words you’ve racked up over time, this link hooks you up.  It tells you what your Twitter feed would be if it were a book. I’m slacking at “Where the Wild Things Are”. The goal apparently is to beat Proust at 1.5 million words.

The above led me down a rabbit hole of “longest novel” Googling, which got me here. Turns out a lot of long novels end up with controversy over whether they are one or many books. Regardless, I’d only even heard of 3 of these. Interesting.

For a slightly longer read, I thought SSCs post on the fall of New Atheism was pretty interesting. As someone who blogged for one of the websites involved in all this for a few years, I’d say Scott hits on a lot of interesting things, and he’s right that more people should be asking “what happened here?”. My two cents: I think people involved in the movement were there for two different reasons. One group rejected religion primarily because they believed religion opposed science and reason, the other because they believed religion promoted oppression. When the second group started to accuse the first group of being oppressive, they were upset to find the first group didn’t care as much as they’d assumed. When the first group started hearing about oppression, they got upset because they believed it to be a secondary concern. I think this was a case of finding out the hard way that the enemy of your enemy isn’t always your friend, but if any readers who were involved have other thoughts I’d like to hear them.

On a related note, the AVI wrote me this week to tell me he wants to lend me The Genesis of Science, which chronicles the history of science and the church in the middle ages. It looks interesting.

 

From the Archives: Blinded Orchestra Auditions Update

Welcome to “From the Archives”, where I dig up old posts and see what’s changed in the years since I originally wrote them.

A few years ago after the now infamous James D’Amore/Google memo incident, I decided to write a post about one of the most famous “unconscious sexism” studies of all time. Known as the “blinded orchestra auditions” study, it is frequently used to claim that when orchestras started hiding the appearance of the applicant by using a screen, they increased the number of women getting a job.  When I started reading the paper however, I realized the situation was a bit more complicated. Sometimes women were helped by the blinding, sometimes they weren’t. It certainly wasn’t as clear cut as often got reported, and I thought there was some interesting details that got left out of popular retellings. Read my original post if interested.

This post was decently well received when I put it up in 2017, but I was surprised back in May to see it suddenly getting traffic again. Turns out a data scientist from Denmark, Jonatan Pallesen, had written a very thorough post criticizing this study. That post got flipped to Andrew Gelman, who agreed the conclusions of the paper were much murkier than the press seemed to think they were.  He also pointed out that these observations weren’t new, and as proof pointed to….my post. That felt good.

After all this, I was interested to see my post spike again this week, and I wondered what happened. A quick jaunt to Twitter showed me that Christina Hoff Sommers had done a YouTube video explainer about this study, raising some of the same objections. She also wrote a Wall Street Journal op-ed on the same topic.

Now obviously I was pretty happy to see that my original concerns concerns had some merit. I had felt a little crazy when I originally wrote my post because I couldn’t figure out how a paper with so many caveats had been portrayed as such definitive proof for the effectiveness of blinding. However, I started to get some concerns that the pushback was overstepping a bit too.

For example, Jesse Singal (who I follow and whose work I generally like) said this:

I questioned this on Twitter, as typically when we say a study “fell” we mean failed to replicate or that the authors had evidence of fraud. In this case there was neither. All the evidence we have that these conclusions were not as strong as often repeated comes from the paper itself. I questioned Singal’s wording on Twitter, and got a reply from Sommers herself:

I think this statement needs to be kept in mind. While the replication crisis has rocked a lot of our understanding of social science studies, it’s a little incredible that so many people cited this study without noticing the very clear limitations that were presented within the paper itself. As Gelman said in his post “Pallesen’s objections are strongly stated but they’re not new. Indeed, the authors of the original paper were pretty clear about its limitations. The evidence was all in plain sight.

Additionally, while the author’s 50% claim in the concluding paragraph seems unwise, it should be noted that this is the paper abstract (bold mine):

A change in the audition procedures of symphony orchestras adoption of “blind” auditions with a “screen” to conceal the candidate’s identity from the jury provides a test for sex-biased hiring. Using data from actual auditions, in an individual fixed-effects framework, we find that the screen increases the probability a woman will be advanced and hired. Although some of our estimates have large standard errors and there is one persistent effect in the opposite direction, the weight of the evidence suggests that the blind audition procedure fostered impartiality in hiring and increased the proportion women in symphony orchestras.

Journalists and others quoting this study weren’t being limited by a paywall and relying on the abstract, because that stat wasn’t in the abstract. Those stats appear to have been in the press release, and that seems to be what everyone copied them from.

While I totally agree that the study authors could have been more careful, I do think they deserve credit for putting the caveats and limitations in the abstract itself. They didn’t know when that press release was put together that this study would still be quoted as gospel 2 decades later, and it’s not clear how much control they had over it. They deserve credit for not putting those stats in their abstract, and for making sure some of the limitations were mentioned there instead.

I’m hammering on this because I think it’s worth examining what really went wrong here. I suspect at some point people stopped reading this study entirely, and started just copying and pasting conclusions they saw printed elsewhere. This is a phenomena I noted back in 2017 and have dubbed The Bullshit Two-Step:  A dance in which a story or research with nuanced points and specific parameters is shared via social media. With each share some of the nuance or specificity is eroded, finally resulting in a story that is almost total bullshit but that no one individually feels responsible for.

While I do think the researchers bear some responsibility, it’s worth noting that there’s no clear set of ethics for how researchers should handle seeing their studies misquoted. Misquotes or unnuanced recitations of studies can happen at any time, and researchers may not see them, or might be busy with an illness or something. I do think it would be interesting for someone to pose a set of standards for this….if anyone knows of such a thing, let me know.

For the rest of us, I think the moral of this story is that no matter how often you hear a study quoted, it’s always worth taking a look at the original information. You never know what you could find.

Short Takes: Perception vs Reality vs Others

I was going to do a normal “what I’m reading” column this week, but I thought so much about the first two links I just decided to turn it in to a short takes. I’m seeing a lot of interesting parallels between these two articles, so I wanted to highlight a few things.

The first link was a Medium post called “How to Change a Mind“, an excerpt from an upcoming book called “Stop Being Reasonable: How We Really Change Our Minds“. It tells the story of a woman named Missy, and how she got her husband Dylan to leave a cult. The whole story is worth reading, but it the ultimate conclusion is worth pondering. Dylan didn’t leave because she was able to point out some of the ridiculousness in what the cult believed (though she tried), but because one of the leaders ended up offering a large and objectively unfair critique of Missy ending with an encouragement to leave her.

This was the proverbial straw that broke the camel’s back. Dylan knew his wife had been nothing but kind and supportive, and the attempt to cast her in a different light caused him to doubt the leaders in a way he never had. As the article says “Dylan did not need to lose his faith in what his elders were saying; he needed to lose his faith in them.” And lose it he did. He spent two days straight Googling every critique of the group that was out there, then severed his ties. He describes his faith in them like a faucet that just got suddenly shut off.

The article does a good job of contextualizing this, and pointing out the lessons here for all of us. While most of us have never joined a cult, many of us take the word of others for granted on many topics. We have faith in certain sources, and barring any challenges will continue to believe those things. Maybe the topic is history, chemistry, math or some other topic we are aware of but didn’t study much personally. Even something as simple as another person’s name is mostly taken on faith. The point is, we can’t check every single thing that comes across our path, so we all have short cuts and rubrics to decide what information we believe and what we don’t. The point of this story is that the “who” part of that rubric can at times be more important than the “what”.

Given that, it was interesting that this next link landed in my inbox this morning “The Dangers of Fluent Lectures“. The article is based on a study that compared Harvard freshmen who took a physics class with lots of well polished lectures (passive learning) and those who took a class that made students work through problems on their own before explaining the answers to them (active learning). The results were interesting. Those who sat through the nicely polished lecture believed they learned more, but those who sat through the active lecture actually learned more:

There’s a couple theories about why this happens, but I think at least some of it has to do with the first article. Feeling that you are in the presence of someone hyper-competent could end up giving you the impression that you are more competent than you are. The active learning forces students to focus on their own deficiencies, while the passive learning lets them ignore that and focus on the professor. As the study authors say “novice students are poor at judging their actual learning and thus rely on inaccurate metacognitive cues such as fluency of instruction when they attempt to assess their own learning.” Again, it’s not always what you believe, it’s who.

Now there’s a couple caveats with this study: it’s not clear what would have happened if they had tried this study on 4th year students who were doing more advanced work, or if they had tried this at a state school rather than Harvard. They also mentioned that the kids in the study weren’t given any warning about teaching methods up front. In a later version of the study, they spent a few minutes in the first lecture teaching kids about active learning methods and the proof that they help students learn more. The students subsequently rated those classes as more effective, and said they felt better about the learning methods.

As always, we continue to be poor judges of our own objectivity.

Rotten Tomatoes and Selection Bias

The AVI sent along a link (from 2013) this week about movies that audiences love and critics hated as judged by their Rotten Tomatoes scores.

For those of you not familiar with Rotten Tomatoes, it’s a site that aggregates movie reviews so you can see overall what percentage of critics liked a movie. After a few years of that, they also allowed users to leave reviews so you can see what percentage of audience members liked a movie. This article pulled out every movie with a critic score and an audience score in their database and figured out which ones were most discordant. The top movies audiences loved/critics hated are here:

The most loved by critics/hated by audiences ones are here:

The article doesn’t offer a lot of commentary on these numbers, but I was struck by how much selection bias goes in to these numbers. While movie critics are often (probably fairly) accused of favoring “art pieces” or “movies with a message” over blockbuster entertainment, I think there’s some skewing of audience reviews as well. Critic and audience scores are interesting because critics are basically assigned to everything, and are supposed to write their reviews with the general public in mind. Audience members select movies they are already interested in seeing, and then review them based solely on personal feelings.

For example, my most perfect movie going experience ever was seeing “Dude, Where’s my Car?” in the theater. I was in college when it came out, and had just finished some grueling final exams. My brain was toast. A friend suggested we go, and the theater was full of other college students who had also just finished their exams. It was a dumb movie, a complete stoner comedy from the early 2000s. We all laughed uproariously. I have very fond memories of this, and the movie in general. It was a great movie for a certain moment in my life, but I would probably never recommend it to anyone. It has a 17% critic score on Rotten Tomatoes, and a 47% audience score. This seems very right to me. No one walks in to a movie with that title thinking they are about to see something highbrow, and critics were almost certainly not the target audience. Had more of the population been forced to go to that movie as part of their employment, the audience score would almost certainly dip. If only the critics who wanted to see it went, their score would go up.

This is key with lists like this, especially when we’re looking at movies that came out before the site that existed. Rotten Tomatoes started in 1998, but a quick look at the top 20 users loved/audiences shows that the top 3 most discordant movies all came out prior to that year. So essentially the user scores are all from people who cared enough about the movie to go in and rank it years after the fact.

For the critics loved/users hated movies, the top one came out in 1974. I was confused about the second one (Bad Biology, a sex/horror movie that came out in 2008), but noted that Rotten Tomatoes no long assigns it a critic score. My suspicion is that “100%” might have been one review. From there, numbers 3-7 are all pre 1998 films. In the early days of Rotten Tomatoes you could sort movies by critic score, so I suspect some people decided to watch those movies based on the good critic score and got disappointed. Who knows.

It’s interesting to think about all of this an how websites can improve their counts. Rotten Tomatoes recently had to stop allowing users to rate movies before they came out as they found too many people were using it to try to tank movies they didn’t like. I wonder if sending emails to users asking them to rank (or say “I haven’t seen this”) to 10 random movies on a regular basis might help lower the bias in the audience score. I’m not sure, but as we crowd source more and more of our rankings, bias prevention efforts may have to get a little more targeted. Interesting to think about.

 

5 Things About Appendicitis Rates Over Time

A close relative of mine had a bit of a scare this week when she ended up admitted to the hospital for (what was ultimately diagnosed as) acute appendicitis. She ended up in surgery with a partially ruptured appendix, though she’s doing fine now.

When I mentioned this saga to a coworker, she said she felt like she didn’t hear much about appendicitis anymore. We started wondering what the rates were, and if they were going down over time. Of course this meant I had to take a look, so here’s what I found:

  1. The rates have fallen over the decades, and no one is really sure why. This paper suggests that rates fell by 15% between 1970 and the mid 80s, but no one’s sure what happened. Did appendicitis become less common? Less deadly? Or did our diagnostic tools get better and some number of cases get reclassified? This is a valid question because of this next point….
  2. A surprisingly high number of appendectomies aren’t necessary. An interesting study from 2011 showed that about 12% of patients who get an appendectomy end up not getting diagnoses with appendicitis. They suggest that this rate has been falling over time which could have helped the numbers in point #1. Is it the whole story? It’s not clear! But definitely something to keep in mind.
  3. The number of incorrectly removed appendixes may not be going down. Contrary to the assertions of the study above, it’s not certain that misdiagnosed appendicitis is going down. Despite better diagnostics, it appears that easier surgical techniques (i.e. laparoscopic surgeries) actually may have increased the rate of unnecessary surgeries. This sort of makes sense. If you have to do a big complicated surgery, you are going to really want to verify that it’s necessary before you go in. As the surgery get easier, you make focus more on getting people to surgery more quickly.
  4. The data sources may not be great. One of the more interesting papers I found compared the administrative database (based off insurance coding) vs a pathology database and found that insurance coding consistently underestimated the number of cases of appendicitis. Since most studies have been done off of insurance code databases, it’s not clear how this has skewed our view of appendicitis rates.
  5. Other countries seem to be seeing a drop too Whatevers going on with appendicitis diagnosis, the whole world seems to be seeing a similar trend. Greece has seen a 75% decrease. England has also seen falling rates. To be fair though, some data shows it’s mixed. Developed countries  seem to be stabilizing, newly developed countries seem to see high rates.

So who knew how hard it was to get a handle on appendicitis rates? I certainly thought it would be a little more straightforward. Always fascinating to explore the limits of data.

What’s My Age Again?

One of my favorite weird genre of news story occurs when the journalist/editor/newsroom all forget how old they are in relation to the people they are writing about. This phenomena is what often gives rise to articles about millenials that don’t actually quote millenials,  or articles about millenial parents of small children that compare them to Boomer parents of teenage children. I also see this in the working world, where there are still seminars about “how to manage millenials”, even though the oldest millenials are nearing 40 (and age discrimination laws!) and new college grads are most likely “Gen Z”.

Anyway, given my love for this genre of story, I got a kick out of a Megan McArdle Tweet this week that pointed out a Mother Jones article that fell a bit in to this trap.

She was pointing to this article that explained how Juul (an ecigarette manufacturer) had been marketing to teens for several years. As proof, they cited this:

Now for many millenials, this makes perfect sense. How could you screen three teen movies like “Can’t Hardly Wait”, “SCREAM” and “Cruel Intentions” and say you were marketing to adults? Well, that depends on your perspective. Can’t Hardly Wait came out in 1998, SCREAM in 1996 and Cruel Intentions in 1999. Current 14-18 year olds were born between 2001 and 2005. Does a party featuring movies made 5 years before you were born sound like it is trying to attract current teens? Or is it more likely that it would draw those who were teens at the time they were released….i.e. those in their early 30s?

As a quick experiment, subtract  5 years from your current birth year, Google “movies from ______”, take out the actual classics/Oscar winners and see how many of those movies you would have gone to an event to see at age 16. I just did it for myself and I’d have gone to see Rocky (though that’s an actual classic) and that’s pretty much it. I enjoyed the Omen, but not until later in college, ditto for Murder by Death and Network. In thinking back to my teen years, I did attend an event where Jaws was screened at a pool party, but I suspect the appeal of Jaws is more widespread/durable than “Can’t Hardly Wait”.

To be clear, I have very little insight in to Juul’s marketing plan or anything about them other than what I’ve seen on the news. What I do know though is that some movies appeal to broad audiences, and some appeal to a very narrow band of people who saw them at the right age. Teen movies in particular do not tend to appeal endlessly to teens, but rather to continue to appeal to the cohort who originally saw them.

There is an odd phenomena with some movies where they do poorly in the box office then pick up steam on DVD or cable broadcasts. The movie Hocus Pocus  (1993)is a good example. It was a flop at the box office, but was rebroadcast on ABC Family and the Disney Channel and then landed on a kids “13 Nights of Halloween” special in the early 2000s. This has caused the very odd phenomena of kids who weren’t born when it was released remembering it as a movie of their childhood more than those in the “right” cohort would have.

So basically I think it can be a bit of a challenge to triangulate what pop culture appeals to what age groups, particularly once you are out of that age group. Not that I’m judging. I struggled enough to figure out what was cool with teens when I actually was one. I have no idea how I’d figure it out now.