State Level Excess Mortality Updates – Oct 6th, 2021

I can’t believe another month has gone by, but here we are! I am back to update state level excess mortality data from the CDC website, pulled on 06OCT21. See previous posts for more details about this data.

First up though, here’s an interesting gif someone made that shows the spread of COVID cases over time by region. Definitely shows some interesting seasonality, and also some interesting data anomalies.

Excess Mortality – How bad has it been?

As I’ve talked to a few people about state level data over the past few months, one of the things I’ve noticed is that some people’s perceptions of the pandemic do not match their individual state. I started wondering if this has anything to do with when the peak excess mortality is, and how long the states spend at high levels of excess mortality. Using the same CDC data I’ve been using, I decided to pull the number of weeks each state has a mortality rate >50% above their average. The data goes back to 2017, so we can see that this phenomena only happened three times between January of 2017 and March 28th, 2020: once to Puerto Rico in September 2017 (Hurricane Maria), and twice in Wyoming (October 2018 and January 2020). I’m not totally clear what happened those weeks.

So this happened 3 times in a little over 3 years. How often has it occurred since the end of March 2020? A total of 363 times in 45 states. The only 5 states that haven’t reached that level since the pandemic began are Alaska, Hawaii, Maine, New Hampshire and Oregon. The US as a whole spent 6 weeks in that range, with 25 states exceeding the national average. Here are those states, and how many weeks they spent at that level (so far):

State# of weeks at >50% excess mortality
Texas19
Mississippi17
DC16
Alabama, Arizona14
Nevada, North Dakota13
Oklahoma12
Georgia, Louisiana, Montana, South Dakota, Tennessee10
Arkansas, California, Florida, 9
Indiana, New Mexico, New York City (city only)8
Iowa, Michigan, New Jersey, Pennsylvania, New York (excluding city7

Just a note on NYC vs NY: only one of those weeks wasn’t overlapping. If we raise the bar and look at only states that have at least one week where they had DOUBLE the number of deaths they usually do, we find only 9 states have hit that bar:

State# of weeks at >100% excess mortality
New York City (city only)7
New Jersey, South Dakota5
California, Connecticut, Massachusetts3
Florida, New York (excluding city), North Dakota2
DC1

Another note on NYC vs NY: the 2 weeks for NY are also in the 7 week stretch for NYC. Not clear why the CDC reports these separately.

Excess Mortality Over Average Updates

First up, here’s the whole US. It’s worth noting that when I did this graph a month ago, the lowest value was 554 excess deaths/million. Now it’s 739 excess deaths/million. The brightest red a month ago was 4107/million, now it’s 4624/million. The greens and the reds mean more than before:

So who were the top movers this month? Let’s see:

StateExcess Deaths Above Average/Million 2/1/20-10/6/21 (change from 9/8)Change from 9/8 rank
Mississippi4624 (+516)No change
Alabama4000 (+559)+1
Louisiana3801 (+534)+3
Arkansas3404 (+379)No change
DC3749 (+97)-3
Arizona3597 (+251)-1
South Carolina3453 (+326)No change
Tennessee3381 (+326)+3
Florida3365 (+538)+3
New York3177 (+91)-2

Note: the NY data here is all of NY, state and city combined. Seems incredible that New York may actually fall out of the top 10 for excess mortality since the pandemic started. To note: there were 4 states that saw substantial gains but are not yet at top 10 level. These were: Georgia (+471, 14th place), Oklahoma (+442, 13th place), Peurto Rico (+407, 37th place) and Kentucky (+390, 17th place).

Excess Mortality Over Upper Bound by State

Okay, here are the states that most exceeded 2 standard deviations from the mean mortality:

And now the top 10:
StateExcess Deaths Over Upper Bound (change from 9/8)Change from 9/8 rank
Mississippi3302 (+443)No change
Alabama3004 (+502)+1
Arizona2659 (+210)+1
Florida2647 (+499)+5
New York2646 (+56)-3
Arkansas2582 (+324)No change
Louisiana2574 (+449)+3
Texas2549 (+345)-1
South Carolina2471 (+280)-1
New Jersey2452 (+52)-5

To note, there are again 4 states who had a top 10 gain in excess mortality, but didn’t make the overall top 10. These are: Tennessee (+483, 11th place), Georgia (+420, 12th place), Oklahoma (+380, 14th place), Kentucky (+327, 21st place).

As always, let me know if there are any questions and I’ll be back in a few weeks! Given seasonality, I’m going to try to keep this up monthly. I’d also ideally like to see if some states start to regress at all. There is a lot of commentary that COVID mostly killed people who were going to die anyway, but so far that is not what we are seeing. If that’s true, at some point some states excess mortality should start to decrease below the norm. So far I’m only seeing slight decreases for Connecticut, Rhode Island and Minnesota, but those are small and could be adjustments.

State Level Excess Mortality Updates – Sept 8, 2021

More Explanation and Some Links

Well hello again folks! When last we left off about 4 weeks ago, I had updated the state level mortality data provided by the CDC for 2/1/2020 – 8/11/2021. Today I’m updating through 9/8/2021, about 4 extra weeks. All the caveats from my prior post still apply, so go there for any more explanation.

First though, I wanted to clarify some things from my prior post. I find excess mortality data interesting because every state counts COVID deaths differently. There are varying reasons for this, some more valid than others. There are also lots of theories about what the non-COVID excess deaths are. I like looking at state level data because it forces us to think more critically about what those deaths might be and to avoid making sweeping generalizations. In the national press, only the biggest 4 states (California, Texas, Florida and New York) seem to get any air time. Other states may occasionally be cherry picked if something interesting is going on, but otherwise they are mostly ignored.

There is some good work going on with excess mortality, both in trying to estimate it and trying to track it. First up, some good analysis of the 2020 death data, including racial breakdown. While the early phase of the pandemic (when it hit NYC hard) was very skewed towards black and Hispanic deaths, it appears things got far more even as we got towards the winter. For example, here’s the excess death incidence rate for those > 65 years old. Bars are quarters of the year 2020:

Next up is an interesting link (explanation here, site here) to someone trying to catalogue excess mortality in real time, with the concerning hypothesis that we may be seeing an uptick in other kinds of deaths too. Now there are two competing hypotheses here: people could have put off getting treated for other medical conditions due to the pandemic, or people could be more susceptible to other medical conditions after having COVID. Actually, those aren’t competing. It could be both. We know that with the flu there is a well established link between getting the flu and subsequently having a heart attack, and there’s no reason COVID-19 couldn’t act similarly. We also know that in many places hospitals are full and it makes sense people may put off care. We will know more as the data comes in I’m sure, but it’s unfortunate. On that happy note, on to the next updates!

Excess Mortality Over Average by State

I made a more multi colored graph this time:

Now here are the updates for the top 10:
StateExcess deaths above average/million 2/1/20 – 9/8/21 (change from 8/11)Change from 8/11 rank
Mississippi4108 (+473) No change
District of Columbia 3652 (+111) No change
Alabama 3441 (+320) +1 spots
Arkansas 3404 (+392) +2 spots
Arizona 3346 (+208) -2 spots
Louisiana 3267 (+166) -1 spots
South Carolina 3127 (+238) +1 spot
New York 3086 (+76) -1 spot
New Jersey 2894 (+33) No change
Nevada2842 (+281)+3 spots

Mississippi’s struggling here guys.

Excess Mortality Over Upper Bound by State

Okay, here’s the updated numbers for deaths only falling outside the upper bound:

And here are the top 10:
StateExcess deaths over upper bound 2/1/20-9/8/21 (change from 8/11)Change from 8/11 rank
Mississippi2859 (+379)+1 spot
New York2590 (+39)-1 spot
Alabama2502 (+229)+2 spots
Arizona2449 (+155)no change
New Jersey2400 (+5)-2 spots
Arkansas2258 (+333)+4 spots
Texas2204 (+181)-1 spot
South Carolina2191 (+190)no change
Florida2148 (+454)+8 spots
Louisiana2125 (+107)-3 spots

There was more motion on this ranking than I expected to see, which is sad because it means there are multiple places where we are seeing truly unusual death tolls.

States of Interest

Since everyone’s always interested in the Big 4, here they are. Change from 8/11 in parentheses:

Excess Deaths Over Upper Bound/MillionExcess Deaths Over Average/MillionRank in Excess Deaths Over AverageRank in Excess Deaths Over Upper Bound
New York2590 (+39)3860 (+76)82
Florida2148 (+454)2827 (+488)129
Texas2204 (+181)2723 (+216)157
California1761 (+50)2285 (+92)3020

And because I’m always interested in my state and those of similar size, here they are:

Excess Deaths Over Upper Bound/Million
Excess Deaths Over Average/Million
Rank in Excess Deaths Over AverageRank in Excess Deaths Over Upper Bound
Arizona2449 (+155)3346 (+208)54
Massachusetts1240 (-9)1642 (+25)4034
Tennessee1904 (+117)2660 (+193)1311

Age Adjustments?

So on my last post Kyle Watson made an interesting point that there should be some sort of age adjustment if we were going to compare things on the state level. While some of this is sort of inherent to the entire concept of excess mortality (states with older populations likely have more expected deaths in a given year), we would expect a disease like COVID to hit states with older populations harder even if everything else was equal.

Interestingly there was some work done on this by a group using the raw COVID numbers, which also looked at international data. They found that states like Texas actually had a worse pandemic than previously reported due to their young population:

I had to ponder a bit what the fairest way of doing this was though. It turns out the CDC also publishes the numbers by week by age group, so I took a look at the US as a whole from 2015-2021:

So every age group from 45 years on showed a fairly noticeable bump last year. Actually every age group showed an increase in mortality when compared to the previous years, and it wasn’t entirely the groups I expected. It’s hard to see on the graph, but here’s the increase for each age group over the average from 2015-2019:

Age Group% Increase Over 2015-2019 Average
Under 25 years5%
25-44 years33%
45-64 years20%
65-74 years30%
75-84 years27%
85 years and older19%

I was surprised so many of these increases were so close together, it was just the starting numbers that were different. Please note the bin sizes are different however. There are twice as many ages contained in the 45-64 year old group as the 65-74 group, which is how you get a similar number of deaths in the younger age category.

It’s also interesting to note that while the data for this year is obviously still highly incomplete and anything could happen, there’s a chance the 85+ group may not show a large jump for 2021. Almost certainly not as large as last year.

Back to age adjustments though: I couldn’t find a great source to give me state by state age breakdowns matching the ones above, but I did find a breakdown of how many people in each state are over 65. I assumed that excess mortality followed roughly the same pattern as the overall mortality numbers, and adjusted from there. Here are the new leaders for excess deaths above average:

StateAge-Adjusted (albeit crudely) Excess Mortality above Average 2/1/20-9/8/21
DC4330
Mississippi4100
Louisiana3328
Alabama3303
Arkansas3253
Texas3183
Arizona3136
New York3009
South Carolina2915
New Jersey2864

The map overall shows there’s a pretty substantial dropoff between Mississippi, DC and everywhere else:

Now here’s the big 4:

StateAdjusted Rank for Excess Mortality Over Average (previous rank)Adjusted Rank for Excess Mortality Over Upper Bound (previous rank)
New York8 (8)3 (2)
Florida24 (12)15 (9)
Texas6 (15)2 (7)
California22 (30)13 (20)

Interestingly, the states most helped/hurt by this adjustment aren’t necessarily the ones you’d think of. For deaths above the upper bound, 4 states added on more than 150 deaths/million and 4 states lost more than that after the adjustment. The ones that gained deaths were: Texas, DC, Georgia and Utah. The ones that lost the most post-adjustment were Arizona, Florida, Pennsylvania and West Virginia. As mentioned, only Texas and Florida receive much air time nationally, and since this worked out differently for both of them I wouldn’t expect to see much on this any time soon.

As always, let me know if there’s more you want to see! I have a lot left on spreadsheets for individual states. Stay safe out there.

State Level Excess Mortality Data

A Warm Hello!

Well hello there! It’s been a while. Unfortunately I’ve been dealing with some (non-COVID related) health issues that have made reading and writing rather difficult, so blogging has been taking a back seat to things like um, paid employment. You know how it goes. I’ve missed you guys though, and thank you to those who reached out with nice messages asking how I was doing. That was appreciated.

Anyway, for the first time in a long time I recently fell down a rabbit hole of data and started putting together an exceptionally lengthy email with graphs for a small email group, when I realized I may as well just turn this in to a blog post in case any one was still poking around here and might be interested. So here we are.

Some Background About Data That’s Currently Interesting Me

So despite the aforementioned reading/writing troubles, I have of course been interested in the data coming out of the COVID-19 pandemic. I could go on and on about many things, but one of my top fixations is the difference between the state level reported COVID deaths (compiled by the CDC here) and the overall excess mortality across the US compiled by the CDC here.

Essentially the first set of numbers is exactly what it sounds like: the number of people in a state that the state says have died of COVID-19. The second number is a little more interesting. Basically the CDC has years and years worth of data about how many people die each week in 1) the USA as a whole (51k-60k depending on the time of year) and 2) individual states. Thus they can predict each year how many people are going to die in a given week and then say if we are right on track with that number or if we are wildly above that number (95% confidence interval) for both the country as a whole and each state individually.

They published this data prior to COVID as well….if you’ve ever heard someone say we had a “really bad flu year” this data is probably why. If an outbreak of the flu (or anything) pushes the country above the 95% CI for expected deaths, the CDC will generally raise an alert. For example, the flu season in the winter of 2017/2018 pushed us above the 95% CI from December 23rd 2017 – February 3rd 2018. Currently the country has had excess deaths from all cause mortality since March 28th, 2020. We have yet to drop back below the 95% CI for more than a week. The graph for the whole US looks like this (recent weeks trail off as jurisdictions are still reporting):

Since 2/1/20 this comes out to 595,688 deaths above the 95% CI (yellow line) or 758,749 deaths above average.

Now while seeing the entire country interests me, what really interested me about this data is that sometimes the excess mortality data from all causes and the COVID-19 reported death data for a particular state don’t match. That’s something I wanted to look in to.

COVID-19 Deaths vs All Cause Excess Mortality

I first got interested in this topic because the first time I looked at excess mortality data, I noticed that my state (Massachusetts) had a MUCH higher number listed for COVID-19 deaths than it does for excess mortality. Checking the CDC website today, they have us listed at 18,131 deaths, or 236 COVID deaths/100k residents. However, our excess mortality since 2/1/20 is only listed as between 8,780 and 11,369. I started running the numbers because the overall COVID number puts us at 3rd worst in the nation. The lower number would rank us somewhere between #31 and #40.

I Googled a bit and as close as I could find, we’ve changed our counting method twice to better align with federal standards, but don’t appear to have subtracted the “overcounts” back off our total. This article suggests we were overcounting nursing home deaths (take that Cuomo!) until April of 2021 and this article suggests that we also included more “probable” deaths than other states until October 2020.

So given that every state counts COVID deaths differently but (presumably) counts all deaths, how common is it that a states COVID deaths exceed their excess mortality? Which states have the highest “overcounts” and “undercounts” and what does it look like if you just compare excess mortality and remove COVID classifications entirely? Well I’m glad you asked! That’s what I wanted to know too!

The Overcounts

Pulling from the CDC website here through 8/11 and taking their upper and lower guesses for excess mortality and converting to deaths/million, I found 5 states that have reported more COVID deaths (as of today 8/14) than they have excess mortality:

  1. Massachusetts (+1,015/million – #3 ranked)
  2. Rhode Island (+690/million – #5 ranked)
  3. Minnesota (+145/million – #37 ranked)
  4. New Jersey (+143/million – #1 ranked)
  5. Connecticut (+41/million -#9 ranked)

Now it is important to note that not all the death data is in. It is possible that these states are simply really good at reporting COVID deaths and less good at reporting other deaths, or that something else is going on. COVID could be killing people in these states who would have died anyway, and thus it could be failing to add to the excess mortality in the way it is in other states, or some mitigation effort the states took could be reducing other types of mortality in a way that is balancing COVID out. The CDC won’t close out this data for quite some time, but it will be interesting to see what happens when all the accounts are settled.

What is notable here though is that 4 of these 5 states are in the top 10 for worst death counts. If these are truly over-reported, that means the pandemics were not as bad there as commonly believed. Additionally, several of those states had fairly strict lockdowns. If excess mortality is caused by lockdowns, it is not showing up in these states data so far.

The Undercounts

Now undercounting is tricky. The CDC notes that some states have extra process in place to ensure accurate coding of COVID deaths, so it’s possible these states are just behind. It’s also possible that excess mortality in these states is from something other than COVID, so they just wouldn’t have as much COVID death as they would excess mortality. Here’s the list, there were also 5 states here. Well, 4 states and DC:

  1. Washington DC (-294/million – #34 ranked)
  2. Texas (-147/million – #26 ranked)
  3. California (-76/million – #32 ranked)
  4. South Carolina (-51/million – #21 ranked)
  5. Vermont (-33/million -#50 ranked)

To my point about other causes of death, Washington DC in particular would be potentially impacted by a jump in homicides (up 19% last year) and opioid deaths (hit a record in 2020). For the other states, we’ll continue to see what happens as the data trickles in.

Overall, it’s interesting that 40 states COVID deaths counts fell somewhere in between their states upper and lower bound estimate for excess mortality. So how did every state do when compared for excess mortality so far? Let’s check it out?

About the Data

A few things to keep in mind before I show state level graphs:

  1. The data is excess mortality from ALL CAUSES since 2/1/20.
  2. There’s one graph for amount above lower bound (excess above average) and amount above upper bound (excess above 95% CI). I’ll discuss the differences a bit below.
  3. Some of this is estimated. Since every state reports at a different pace, they estimate where states will be at to bring everyone up to the same level. I’ve been watching this for a few months and they rarely have to take many people away, so the estimates look pretty good.
  4. The data is from here. I download the “National and State Estimates of Excess Deaths” csv file and then use the “Total Excess Lower Estimate” and “Total Excess Higher Estimate” (Column J and K on my spreadsheet) for each state.
  5. To convert to per capita, I used the 2020 census numbers for each state. I included Puerto Rico and DC, so all rankings are out of 52.

There’s been a lot of talk about how the pandemic impacted other types of deaths, so it’s notable to see where the highest excess mortality has been.

Excess Mortality Over Average by State

Without further ado, here’s the excess mortality over the average, by state. Sorry about the small font, click on it to embiggen:

So for deaths above average from ALL CAUSES, per capita the top ten states are:
StateExcess Deaths Over Average/Million (2/1/20-8/11/21)
Mississippi3635
District of Columbia3541
Arizona3138
Alabama3121
Louisiana3101
Arkansas3012
New York3010
South Carolina2889
New Jersey2861
South Dakota2708

Now this is just deaths over average. Some states have more yearly variation than others, and thus look a little different if you only take the deaths above the 95%CI interval. That’s next.

Excess Mortality Over Upper Bound by State

Again, click to make that bigger.

As you can see, going over the upper bound mostly evens out the smaller states. This makes sense. For example, Massachusetts and Montana had surprisingly similar excess mortality across the timespan represented. However, Montana is 1/7th the size of Massachusetts. They typically hover around 200 deaths per week statewide, and Massachusetts generally has 1,100-1,200. With 200 deaths, slight differences in reporting (like someone in one hospital forgetting to send the numbers for a week) could skew things quite a bit. That’s less likely over larger populations. So here are the new top 10:

StateExcess Deaths Over Upper Bound/Million (2/1/20-8/11/21)Prior Rank
New York25517
Mississippi24801
New Jersey23959
Arizona22943
Alabama22734
Texas202316
Louisiana20185
South Carolina20018
Pennsylvania194317
Arkansas19256

As expected, the two places with the smallest populations (DC and South Dakota) dropped off this list and were replaced with two much larger places: Texas and and Pennsylvania.

Other States of Interest and Possible Posts Going Forward?

Now throughout the pandemic, it seems everyone gets fixated on some subgroup of “the big four”: California, Florida, New York and Texas. If you want to know how they’re doing, here they are pulled out:

StateExcess Deaths Over Upper Bound/MillionExcess Deaths Over Average/MillionRank in Excess Deaths Over AverageRank in Excess Deaths Over Upper Bound
New York2551301071
Texas20232507166
California171121933017
Florida169423392318

Here are the states I track, as they are all approximately the same size as Massachusetts (around 7 million people):

StateExcess Deaths Over Upper Bound/MillionExcess Deaths Over Average/MillionRank in Excess Deaths Over AverageRank in Excess Deaths Over Upper Bound
Arizona2294313834
Tennessee178726361114
Massachusetts124916173931

If people are interested in particular other states, I’d be happy to post them in the comments as time/health allow. Additionally, the CDC updates this data weekly. Now that I have the explanation typed out and my spreadsheet set up I can fairly easily post updates every few weeks (sans lengthy intro) if there’s interest. Let me know what you all think! Hope everyone is staying well.

New Year’s Resolutions

I don’t often make New Year’s resolutions, but this year I’ve decided to join Gretchen Rubin (of happiness project fame) in resolving to go on a 20 minutes walk every day in 2020. Her theory is that if you aren’t getting much exercise, resolving to get a little bit daily will provide big benefits. She has research on her side on this one, and it doesn’t hurt that walking seems to be the only form of exercise that makes my migraines better rather than worse. We’ll see how this goes.

This got me thinking about New Year’s resolutions in general, and wondering what the most common ones were. there appears to be a lot of selection bias in the studies, but healthy eating/exercise/weight loss and saving money seem to be the most common in America.

I tried to find some from other countries, and it seems like Germans may put stress reduction and family time at the top of their list. My googling for other European and north and south American countries didn’t turn up much.

I did however, find this blog post from Duolingo, that had some really interesting insights about one particular New Year’s resolution. Duolingo is an app that helps you learn a second language, and they have a distinctive peak in sign ups and account usage just before the first of the year. They discovered that the countries their users normally came from changed a bit around the first of the year:

Apparently users who sign up around the first of the year actually are slightly more likely to continue using the app than those who sign up at other times.

Overall, I’ll admit I was a little surprised that I couldn’t find more research on the subject of New Year’s resolutions. It seems like this would be an interesting study in how priorities change across countries or time. If anyone knows of any good resources that I didn’t find, please let me know! In the meantime, happy new year everyone!

Diversity and Religion Trivia Question

Back when this blog was in its first incarnation, I used to occasionally do some challenge questions. I stumbled across one this week that seemed like a good candidate, and since my computer is still broken I figured I’d throw it out there.

As part of their religious landscape survey, Pew Research has put together a racial diversity ranking of religions and major denominations in the US. Six groups were found to be more diverse than the U.S. general population. What are they?

A few clarifications and one hint to help:

1. The diversity ranking measures the spread across 5 racial groups: White, Black, Asian, Latino and Mixed/Other. A perfect score would be 20% in each category. In other words, a group dominated by one group would not be considered diverse even if that group was a minority group in the US.

2. Pew breaks Christianity down in to major denominations and includes several types of unaffiliated (aka not religious) groups in their survey. If you want to see the groups included, see this page.

3. The survey also only looked at people in the US, so diversity is only based solely on that. Groups may have more diversity in other countries, but only their US members were counted.

4. If you need a hint: 3 of the top 6 groups are Christian or Christian-adjacent* and 3 were other religions or unaffiliated groups.

For the answer, see the list here.

*For purposes of this question, Christian adjacent means that the members of the group might consider themselves Christians, but a majority of Christians in other denominations do not.

If Not Voting Were a Candidate

My computer is still having problems, so another short post today. I saw this graphic on Twitter this week, and thought it was interesting:

Our voting certainly leaves a wide margin of error with regards to public opinion.

What’s interesting of course is that we have no idea how those people would vote if they were forced to, though many people seem to think they know. From experiences in other countries it seems like it might increase support for left leaning policies and higher tax brackets. However in other countries it boosted fringe third parties, and doing away with it increased support for major parties. Other countries have not seen a difference.

Point being, a non-random sample doesn’t always tell you much about what’s not in the sample. Keep that in mind with any initiatives aimed at changing voting requirements.

Weather in Minneapolis vs Boston

I’m just getting back from a conference in Minneapolis, which is an interesting city to go to in November. I’m from Boston so cold doesn’t bother me, but it did strike me as interesting how much colder it seemed to be this time of year.

I did a quick Google search and found the climate data for Minneapolis and Boston and decided to do a quick comparison.

The average high temps in both states are nearly identical (+/-4 degrees) from March to October. In November the average high drops 10 degrees lower in Minneapolis, then the gap widens to 12-14 degrees for Dec-Jan, then back to a 10 degree gap for February, then back to similar climates for the rest of the year. My guess is that’s some sort of ocean moderating effect.

The precipitation levels were even more interesting:

Note: the temperature axes are different on these graphs, with the Boston one starting at 10 degrees and going to 90, and Minneapolis going 0-90. Still, you see that Boston doesn’t get the same level of “dry winter air” that Minneapolis does. I felt that when I got my first nosebleed in years on day 3 there.

Always interesting to see the side by side.

Rotten Tomatoes and Selection Bias

The AVI sent along a link (from 2013) this week about movies that audiences love and critics hated as judged by their Rotten Tomatoes scores.

For those of you not familiar with Rotten Tomatoes, it’s a site that aggregates movie reviews so you can see overall what percentage of critics liked a movie. After a few years of that, they also allowed users to leave reviews so you can see what percentage of audience members liked a movie. This article pulled out every movie with a critic score and an audience score in their database and figured out which ones were most discordant. The top movies audiences loved/critics hated are here:

The most loved by critics/hated by audiences ones are here:

The article doesn’t offer a lot of commentary on these numbers, but I was struck by how much selection bias goes in to these numbers. While movie critics are often (probably fairly) accused of favoring “art pieces” or “movies with a message” over blockbuster entertainment, I think there’s some skewing of audience reviews as well. Critic and audience scores are interesting because critics are basically assigned to everything, and are supposed to write their reviews with the general public in mind. Audience members select movies they are already interested in seeing, and then review them based solely on personal feelings.

For example, my most perfect movie going experience ever was seeing “Dude, Where’s my Car?” in the theater. I was in college when it came out, and had just finished some grueling final exams. My brain was toast. A friend suggested we go, and the theater was full of other college students who had also just finished their exams. It was a dumb movie, a complete stoner comedy from the early 2000s. We all laughed uproariously. I have very fond memories of this, and the movie in general. It was a great movie for a certain moment in my life, but I would probably never recommend it to anyone. It has a 17% critic score on Rotten Tomatoes, and a 47% audience score. This seems very right to me. No one walks in to a movie with that title thinking they are about to see something highbrow, and critics were almost certainly not the target audience. Had more of the population been forced to go to that movie as part of their employment, the audience score would almost certainly dip. If only the critics who wanted to see it went, their score would go up.

This is key with lists like this, especially when we’re looking at movies that came out before the site that existed. Rotten Tomatoes started in 1998, but a quick look at the top 20 users loved/audiences shows that the top 3 most discordant movies all came out prior to that year. So essentially the user scores are all from people who cared enough about the movie to go in and rank it years after the fact.

For the critics loved/users hated movies, the top one came out in 1974. I was confused about the second one (Bad Biology, a sex/horror movie that came out in 2008), but noted that Rotten Tomatoes no long assigns it a critic score. My suspicion is that “100%” might have been one review. From there, numbers 3-7 are all pre 1998 films. In the early days of Rotten Tomatoes you could sort movies by critic score, so I suspect some people decided to watch those movies based on the good critic score and got disappointed. Who knows.

It’s interesting to think about all of this an how websites can improve their counts. Rotten Tomatoes recently had to stop allowing users to rate movies before they came out as they found too many people were using it to try to tank movies they didn’t like. I wonder if sending emails to users asking them to rank (or say “I haven’t seen this”) to 10 random movies on a regular basis might help lower the bias in the audience score. I’m not sure, but as we crowd source more and more of our rankings, bias prevention efforts may have to get a little more targeted. Interesting to think about.

 

What’s My Age Again?

One of my favorite weird genre of news story occurs when the journalist/editor/newsroom all forget how old they are in relation to the people they are writing about. This phenomena is what often gives rise to articles about millenials that don’t actually quote millenials,  or articles about millenial parents of small children that compare them to Boomer parents of teenage children. I also see this in the working world, where there are still seminars about “how to manage millenials”, even though the oldest millenials are nearing 40 (and age discrimination laws!) and new college grads are most likely “Gen Z”.

Anyway, given my love for this genre of story, I got a kick out of a Megan McArdle Tweet this week that pointed out a Mother Jones article that fell a bit in to this trap.

She was pointing to this article that explained how Juul (an ecigarette manufacturer) had been marketing to teens for several years. As proof, they cited this:

Now for many millenials, this makes perfect sense. How could you screen three teen movies like “Can’t Hardly Wait”, “SCREAM” and “Cruel Intentions” and say you were marketing to adults? Well, that depends on your perspective. Can’t Hardly Wait came out in 1998, SCREAM in 1996 and Cruel Intentions in 1999. Current 14-18 year olds were born between 2001 and 2005. Does a party featuring movies made 5 years before you were born sound like it is trying to attract current teens? Or is it more likely that it would draw those who were teens at the time they were released….i.e. those in their early 30s?

As a quick experiment, subtract  5 years from your current birth year, Google “movies from ______”, take out the actual classics/Oscar winners and see how many of those movies you would have gone to an event to see at age 16. I just did it for myself and I’d have gone to see Rocky (though that’s an actual classic) and that’s pretty much it. I enjoyed the Omen, but not until later in college, ditto for Murder by Death and Network. In thinking back to my teen years, I did attend an event where Jaws was screened at a pool party, but I suspect the appeal of Jaws is more widespread/durable than “Can’t Hardly Wait”.

To be clear, I have very little insight in to Juul’s marketing plan or anything about them other than what I’ve seen on the news. What I do know though is that some movies appeal to broad audiences, and some appeal to a very narrow band of people who saw them at the right age. Teen movies in particular do not tend to appeal endlessly to teens, but rather to continue to appeal to the cohort who originally saw them.

There is an odd phenomena with some movies where they do poorly in the box office then pick up steam on DVD or cable broadcasts. The movie Hocus Pocus  (1993)is a good example. It was a flop at the box office, but was rebroadcast on ABC Family and the Disney Channel and then landed on a kids “13 Nights of Halloween” special in the early 2000s. This has caused the very odd phenomena of kids who weren’t born when it was released remembering it as a movie of their childhood more than those in the “right” cohort would have.

So basically I think it can be a bit of a challenge to triangulate what pop culture appeals to what age groups, particularly once you are out of that age group. Not that I’m judging. I struggled enough to figure out what was cool with teens when I actually was one. I have no idea how I’d figure it out now.

 

Diagnoses: Common and Uncommon

There was an interesting article in the Washington Post this week, about a man with a truly bizarre disorder. Among many other terrible symptoms, he essentially never has to go to the bathroom while he’s standing up and going about his day and appears to be dehydrated no matter how much he drinks, but the minute he lays down at night he has to urinate copiously and shows signs of being overhydrated. He has so many bizarre symptoms that he ended up in something called the Undiagnosed Disease Program, a fascinating group run by the NIH that seeks to find diagnoses for people who have baffled other physicians. They conduct all sorts of testing and try to either find people a diagnosis or to add their information to a database in the hopes that eventually they’ll get some information that will help them figure this out. The overall goal is to both help people and add to our collective knowledge about the human body.

Outlier medical cases are truly fascinating to many people, myself included. The WaPo column is actually part of a series called “medical mysteries“. Oliver Sacks made a whole writing career out of writing books about them. These cases make it in to our textbooks in school, and they are the stories that stick in our minds. These aren’t even one in a million cases, they are often one in 10 or 100 million. The guy in the WaPo story might even be 1 in a billion or 10 billion.

I am also fascinated by these stories in part because last year I started in on a medical mystery of my own. It started innocuously enough: random bouts of nausea, random bouts of extreme fatigue, then noticeable increased sensitivity to smells, tastes and pain. I assumed I was pregnant. I wasn’t.

I followed up with my doctor who confirmed that my hormone and other blood levels were fine. She ran tests to see if I was being poisoned, if I had a weird vitamin deficiency or had ODed on something accidentally.  She referred me to a couple of other doctors. The bouts came and went, but they actually started to get very disconcerting. My increased sense of smell meant that my car would frequently smell strongly of gas…something most of us take to mean there’s a problem. I couldn’t wear certain clothes because it felt like the seems or zippers were cutting my skin, but my skin showed no signs of redness. I couldn’t drink my coffee some mornings because I was convinced it was scalding my mouth. When I ate food I was convinced I could still taste the wrapper. Sensory information is supposed to help us make our way through the world, and to have it suddenly shifting around on you is incredibly disorienting.

Over the course of 6 months I saw 7 different doctors, all of whom were baffled. Since I work at a hospital I informally talked to half a dozen other NPs/PAs/MDs, and none of them had any idea either. The nausea and fatigue could come with hundreds of disorders, but nervous system hypersensitivity is a much less common symptom.

In the course of all this, the Assistant Village Idiot made a comment about how I should remember that strange symptoms were more likely to be an uncommon presentation of a common thing than an uncommon thing. The most experienced doctors I saw also mentioned the limitations of diagnosis. We build diagnoses based on the most common presentations of things, but we often don’t know if there are other possible presentations. We give names to clusters of symptoms because we see them together often, but it’s possible the biological underpinnings of the disorder could end up different places we don’t see as often. One doctor mentioned that in 6 months or a year I might add more symptoms that made things much clearer.

After about 6 months I still had no answers, but got some relief when I discovered that a magnesium supplement I’d taken to help me sleep seemed to help my symptoms. My doctor told me I could increase the dose and take it daily, and over the course of 6 weeks it mostly worked. I had relief, even if I still had no answers.

That was in January, and for the last 8 months I’ve seen small flares of symptoms that magnesium seemed to help. Then, about a month ago a new symptom started that made the whole thing much clearer: I got a headache. A one sided, splitting “gotta go lay down in a dark room” headache. A week or two later I got another one, then I got another one. I had always gotten a handful of migraines a year, but with the sudden change in frequency I started to notice something. For two days before I would be extra sensitive to light, pain, and smell. Sound too. Then during the migraine I would be incredibly nauseous, then the day after I would be so fatigued I could barely get out of bed. I looked back at my journals of my mystery symptoms I’d started keeping last year and realized it fit the same pattern. The symptoms that seemed so mysterious were actually part of the very classic migraine prodome/aura/postdrome pattern. It was then that I learned about the existence of acephalgic or “silent” migraines…..migraines that occur with all of the symptoms except the classic headache. My doctor confirmed my suspicions. I had been having chronic migraines with the headache, that now had developed in to chronic migraines with the headache. Once the headache appeared, my case was textbook. I got prescriptions for Imitrex and Fioricet along with a prophylactic medication.

Now per the Wiki page (and everything else I’ve read), acephalgic migraines are uncommon. It’s not particularly normal to get them as badly as I did without regular migraines, though they admit the data may be flawed. Since most people wouldn’t identify those symptoms as migraines, they might have an underreporting problem. Regardless, the AVIs point stood: this was an uncommon presentation of a common thing, not an uncommon disorder.

I like this story both because I am relieved to have a diagnosis and because it is relieving to have a diagnosis and because it is an interesting example of the entire concept of base rate. Migraines are the third most common disease in the world, after tension-type headaches and dental caries (cavities). One out of every 7 people get them. If we assume that my symptoms are highly unusual for migraine sufferers….say 1% of cases….that still means about 15 out of 10,000 people will get them. For comparison, schizophrenia is 1.5 out of 10,000.  Epilepsy is 120 out of 10,000, or about 10% the rate of migraine sufferers. A small percentage of a big number is often still a big number. An uncommon presentation of a common disorder can often be more common than uncommon disorders.

See, everything’s a stats lesson if you look hard enough. While I’m relieved to have a diagnosis, the downside of this is that the more frequent headaches are impacting my ability to sit in front of a screen as often, which may impact blogging. While we figure out what works to reduce the frequency of these, I may end up doing some more archives posting, maybe a top 100 post countdown like the AVI has been doing. We’ll see. While my doctor is great, any good resources are appreciated!