Rotten Tomatoes and Selection Bias

The AVI sent along a link (from 2013) this week about movies that audiences love and critics hated as judged by their Rotten Tomatoes scores.

For those of you not familiar with Rotten Tomatoes, it’s a site that aggregates movie reviews so you can see overall what percentage of critics liked a movie. After a few years of that, they also allowed users to leave reviews so you can see what percentage of audience members liked a movie. This article pulled out every movie with a critic score and an audience score in their database and figured out which ones were most discordant. The top movies audiences loved/critics hated are here:

The most loved by critics/hated by audiences ones are here:

The article doesn’t offer a lot of commentary on these numbers, but I was struck by how much selection bias goes in to these numbers. While movie critics are often (probably fairly) accused of favoring “art pieces” or “movies with a message” over blockbuster entertainment, I think there’s some skewing of audience reviews as well. Critic and audience scores are interesting because critics are basically assigned to everything, and are supposed to write their reviews with the general public in mind. Audience members select movies they are already interested in seeing, and then review them based solely on personal feelings.

For example, my most perfect movie going experience ever was seeing “Dude, Where’s my Car?” in the theater. I was in college when it came out, and had just finished some grueling final exams. My brain was toast. A friend suggested we go, and the theater was full of other college students who had also just finished their exams. It was a dumb movie, a complete stoner comedy from the early 2000s. We all laughed uproariously. I have very fond memories of this, and the movie in general. It was a great movie for a certain moment in my life, but I would probably never recommend it to anyone. It has a 17% critic score on Rotten Tomatoes, and a 47% audience score. This seems very right to me. No one walks in to a movie with that title thinking they are about to see something highbrow, and critics were almost certainly not the target audience. Had more of the population been forced to go to that movie as part of their employment, the audience score would almost certainly dip. If only the critics who wanted to see it went, their score would go up.

This is key with lists like this, especially when we’re looking at movies that came out before the site that existed. Rotten Tomatoes started in 1998, but a quick look at the top 20 users loved/audiences shows that the top 3 most discordant movies all came out prior to that year. So essentially the user scores are all from people who cared enough about the movie to go in and rank it years after the fact.

For the critics loved/users hated movies, the top one came out in 1974. I was confused about the second one (Bad Biology, a sex/horror movie that came out in 2008), but noted that Rotten Tomatoes no long assigns it a critic score. My suspicion is that “100%” might have been one review. From there, numbers 3-7 are all pre 1998 films. In the early days of Rotten Tomatoes you could sort movies by critic score, so I suspect some people decided to watch those movies based on the good critic score and got disappointed. Who knows.

It’s interesting to think about all of this an how websites can improve their counts. Rotten Tomatoes recently had to stop allowing users to rate movies before they came out as they found too many people were using it to try to tank movies they didn’t like. I wonder if sending emails to users asking them to rank (or say “I haven’t seen this”) to 10 random movies on a regular basis might help lower the bias in the audience score. I’m not sure, but as we crowd source more and more of our rankings, bias prevention efforts may have to get a little more targeted. Interesting to think about.

 

What’s My Age Again?

One of my favorite weird genre of news story occurs when the journalist/editor/newsroom all forget how old they are in relation to the people they are writing about. This phenomena is what often gives rise to articles about millenials that don’t actually quote millenials,  or articles about millenial parents of small children that compare them to Boomer parents of teenage children. I also see this in the working world, where there are still seminars about “how to manage millenials”, even though the oldest millenials are nearing 40 (and age discrimination laws!) and new college grads are most likely “Gen Z”.

Anyway, given my love for this genre of story, I got a kick out of a Megan McArdle Tweet this week that pointed out a Mother Jones article that fell a bit in to this trap.

She was pointing to this article that explained how Juul (an ecigarette manufacturer) had been marketing to teens for several years. As proof, they cited this:

Now for many millenials, this makes perfect sense. How could you screen three teen movies like “Can’t Hardly Wait”, “SCREAM” and “Cruel Intentions” and say you were marketing to adults? Well, that depends on your perspective. Can’t Hardly Wait came out in 1998, SCREAM in 1996 and Cruel Intentions in 1999. Current 14-18 year olds were born between 2001 and 2005. Does a party featuring movies made 5 years before you were born sound like it is trying to attract current teens? Or is it more likely that it would draw those who were teens at the time they were released….i.e. those in their early 30s?

As a quick experiment, subtract  5 years from your current birth year, Google “movies from ______”, take out the actual classics/Oscar winners and see how many of those movies you would have gone to an event to see at age 16. I just did it for myself and I’d have gone to see Rocky (though that’s an actual classic) and that’s pretty much it. I enjoyed the Omen, but not until later in college, ditto for Murder by Death and Network. In thinking back to my teen years, I did attend an event where Jaws was screened at a pool party, but I suspect the appeal of Jaws is more widespread/durable than “Can’t Hardly Wait”.

To be clear, I have very little insight in to Juul’s marketing plan or anything about them other than what I’ve seen on the news. What I do know though is that some movies appeal to broad audiences, and some appeal to a very narrow band of people who saw them at the right age. Teen movies in particular do not tend to appeal endlessly to teens, but rather to continue to appeal to the cohort who originally saw them.

There is an odd phenomena with some movies where they do poorly in the box office then pick up steam on DVD or cable broadcasts. The movie Hocus Pocus  (1993)is a good example. It was a flop at the box office, but was rebroadcast on ABC Family and the Disney Channel and then landed on a kids “13 Nights of Halloween” special in the early 2000s. This has caused the very odd phenomena of kids who weren’t born when it was released remembering it as a movie of their childhood more than those in the “right” cohort would have.

So basically I think it can be a bit of a challenge to triangulate what pop culture appeals to what age groups, particularly once you are out of that age group. Not that I’m judging. I struggled enough to figure out what was cool with teens when I actually was one. I have no idea how I’d figure it out now.

 

Diagnoses: Common and Uncommon

There was an interesting article in the Washington Post this week, about a man with a truly bizarre disorder. Among many other terrible symptoms, he essentially never has to go to the bathroom while he’s standing up and going about his day and appears to be dehydrated no matter how much he drinks, but the minute he lays down at night he has to urinate copiously and shows signs of being overhydrated. He has so many bizarre symptoms that he ended up in something called the Undiagnosed Disease Program, a fascinating group run by the NIH that seeks to find diagnoses for people who have baffled other physicians. They conduct all sorts of testing and try to either find people a diagnosis or to add their information to a database in the hopes that eventually they’ll get some information that will help them figure this out. The overall goal is to both help people and add to our collective knowledge about the human body.

Outlier medical cases are truly fascinating to many people, myself included. The WaPo column is actually part of a series called “medical mysteries“. Oliver Sacks made a whole writing career out of writing books about them. These cases make it in to our textbooks in school, and they are the stories that stick in our minds. These aren’t even one in a million cases, they are often one in 10 or 100 million. The guy in the WaPo story might even be 1 in a billion or 10 billion.

I am also fascinated by these stories in part because last year I started in on a medical mystery of my own. It started innocuously enough: random bouts of nausea, random bouts of extreme fatigue, then noticeable increased sensitivity to smells, tastes and pain. I assumed I was pregnant. I wasn’t.

I followed up with my doctor who confirmed that my hormone and other blood levels were fine. She ran tests to see if I was being poisoned, if I had a weird vitamin deficiency or had ODed on something accidentally.  She referred me to a couple of other doctors. The bouts came and went, but they actually started to get very disconcerting. My increased sense of smell meant that my car would frequently smell strongly of gas…something most of us take to mean there’s a problem. I couldn’t wear certain clothes because it felt like the seems or zippers were cutting my skin, but my skin showed no signs of redness. I couldn’t drink my coffee some mornings because I was convinced it was scalding my mouth. When I ate food I was convinced I could still taste the wrapper. Sensory information is supposed to help us make our way through the world, and to have it suddenly shifting around on you is incredibly disorienting.

Over the course of 6 months I saw 7 different doctors, all of whom were baffled. Since I work at a hospital I informally talked to half a dozen other NPs/PAs/MDs, and none of them had any idea either. The nausea and fatigue could come with hundreds of disorders, but nervous system hypersensitivity is a much less common symptom.

In the course of all this, the Assistant Village Idiot made a comment about how I should remember that strange symptoms were more likely to be an uncommon presentation of a common thing than an uncommon thing. The most experienced doctors I saw also mentioned the limitations of diagnosis. We build diagnoses based on the most common presentations of things, but we often don’t know if there are other possible presentations. We give names to clusters of symptoms because we see them together often, but it’s possible the biological underpinnings of the disorder could end up different places we don’t see as often. One doctor mentioned that in 6 months or a year I might add more symptoms that made things much clearer.

After about 6 months I still had no answers, but got some relief when I discovered that a magnesium supplement I’d taken to help me sleep seemed to help my symptoms. My doctor told me I could increase the dose and take it daily, and over the course of 6 weeks it mostly worked. I had relief, even if I still had no answers.

That was in January, and for the last 8 months I’ve seen small flares of symptoms that magnesium seemed to help. Then, about a month ago a new symptom started that made the whole thing much clearer: I got a headache. A one sided, splitting “gotta go lay down in a dark room” headache. A week or two later I got another one, then I got another one. I had always gotten a handful of migraines a year, but with the sudden change in frequency I started to notice something. For two days before I would be extra sensitive to light, pain, and smell. Sound too. Then during the migraine I would be incredibly nauseous, then the day after I would be so fatigued I could barely get out of bed. I looked back at my journals of my mystery symptoms I’d started keeping last year and realized it fit the same pattern. The symptoms that seemed so mysterious were actually part of the very classic migraine prodome/aura/postdrome pattern. It was then that I learned about the existence of acephalgic or “silent” migraines…..migraines that occur with all of the symptoms except the classic headache. My doctor confirmed my suspicions. I had been having chronic migraines with the headache, that now had developed in to chronic migraines with the headache. Once the headache appeared, my case was textbook. I got prescriptions for Imitrex and Fioricet along with a prophylactic medication.

Now per the Wiki page (and everything else I’ve read), acephalgic migraines are uncommon. It’s not particularly normal to get them as badly as I did without regular migraines, though they admit the data may be flawed. Since most people wouldn’t identify those symptoms as migraines, they might have an underreporting problem. Regardless, the AVIs point stood: this was an uncommon presentation of a common thing, not an uncommon disorder.

I like this story both because I am relieved to have a diagnosis and because it is relieving to have a diagnosis and because it is an interesting example of the entire concept of base rate. Migraines are the third most common disease in the world, after tension-type headaches and dental caries (cavities). One out of every 7 people get them. If we assume that my symptoms are highly unusual for migraine sufferers….say 1% of cases….that still means about 15 out of 10,000 people will get them. For comparison, schizophrenia is 1.5 out of 10,000.  Epilepsy is 120 out of 10,000, or about 10% the rate of migraine sufferers. A small percentage of a big number is often still a big number. An uncommon presentation of a common disorder can often be more common than uncommon disorders.

See, everything’s a stats lesson if you look hard enough. While I’m relieved to have a diagnosis, the downside of this is that the more frequent headaches are impacting my ability to sit in front of a screen as often, which may impact blogging. While we figure out what works to reduce the frequency of these, I may end up doing some more archives posting, maybe a top 100 post countdown like the AVI has been doing. We’ll see. While my doctor is great, any good resources are appreciated!

Blue Zones Update: the Response

After my post last week about the pre-print paper calling the “Blue Zones” (aka areas with unusual longevity) in to question, an anonymous commenter stopped by to drop the link to the Blue Zones groups response. I thought their response was rather formidable, so I wanted to give it a whole post. They had three major points, all of which I was gratified to see I had raised in my initial read through:

  1. Being designated as a Blue Zone is no small matter, and they have well published criteria. Some places that come to their attention are invalidated. There also are some pretty extensive papers published on how they validated each of the existing 5 regions, which they linked to. Sardinia here, here and here. Okinawa had a paper written on it literally called “They really are that old: a validation study of centenarian prevalence in Okinawa“. Dr Poulain (who did much of the age validation for Sardinia) wrote a paper 10 years ago called “On the age validation of supercentenarians” where he points out that the first concerns about validating supercentenarian ages were raised in 1873. This book has more information about their methods, but notably starts with mentioning 5 regions they were unable to certify. Basically they responded with one big link dump saying “yeah, we thought of that too”. From what I can tell there actually is some really fascinating work being done here, which was very cool to read about. In every place they not only looked at individuals records, crosschecking them with numerous sources, doing personal interviews with people and their families, and then calculating overall population metrics to looks for evidence of fraud. In Okinawa, they mention asking people about the animal for the year of their birth, something people would be unlikely to forget or want to change. It seems pretty thorough to my eye, but I was also struck that none of the papers above were included as references in the original paper. I have no idea if he knew about them or not, but given that he made statements like “these findings raise serious questions about the validity of an extensive body of research based on the remarkable reported ages of populations and individuals.”, it seems like a gap not to include work that had been done.
  2. Supercentenarians are not the focus of the Blue Zones. Again, they publish their criteria, and this is not a focal point. They have focused much more heavily on reaching 90 or 100, particularly with limited chronic disease. As I was scanning through the papers they linked to, I noticed an interesting anecdote about an Okinawan man who for a time was thought to have lived to 120. After he got in the Guinness book of world records, it came out that he had likely been given the name of an older brother who died, and thus was actually “only” 105. This is interesting because it’s a case where his age is fraudulent, but the change wouldn’t impact the “Blue Zone” status.
  3. Relative poverty could be correlated with old age. I raised this point in my initial post, and I was glad to see they echoed it here. Again, most of the way modernity raises life expectancy is by eliminating child mortality and decreasing accidents or repairing congenital defects. Those are the things that will kill you under 55. Over 55, it’s a whole new set of issues.

Now I want to be clear, no one has questioned the fundamental mathematical findings of the paper that in the US the supercentenarian records are probably shaky before birth registration. What’s being questioned is if that finding it’s generalizable to specific areas that have been heavily studied. This is important because in the US “old age” type benefits kick in at 65 and there is no level after that. So basically a random 94 year old claiming to be 111 might get a social bump out of the whole thing, but none of the type of benefits that might have caused people to really look in to it. Once we start getting to things like Blue Zones or international attention though, there’s actually whole groups dedicated to looking in to things. One person faking their age won’t cause much of an issue, but if your claim is that a dozen people in one town are faking their ages, that’s going to start to mess up population curves and show other anomalies. The poorness of the regions actually helps with this case as well. If you’re talking to people in relative poverty with high illiteracy, it’s hard to argue that they could have somehow been criminal masterminds in their forgeries. One or two people can get away with things, but a group deception can be much harder.

I’m still going to keep an eye on this paper, and my guess is it will be published somewhere with some of the suggestions of generalizability toned down, and more references to previous work at validating ages added.

The Evangelical Voter Turnout that (Maybe) Wasn’t

There was an interesting graph in a recent New York Times article  that got Twitter all abuzz:

Visually, this graph is pretty fascinating, showing an increasingly motivated white Evangelical group, whose voter participation rates must put every other group to shame. I was so taken aback by this I actually did share it with a few people as part of a point about voter turnout.

After sharing though, I started to wonder how this turnout rate compared to other religious groups, so I went looking for the source data. A quick Google took me to this Pew Research page, which contained this table:

Two things surprised me about this:

  1. Given the way the data is presented, it appears the Evangelical question was asked by itself as a binary yes/no, as opposed to being part of a list of other options.
  2. The question was not simply “are you Evangelical” but “are you Evangelical/born again”.

Now from researching all sorts of various things for this blog, I happen to know that one of the most common ways of calculating how many white Evangelicals there are in the population is to ask people their denominational affiliation from a menu of choices, then classify those denominations in to Evangelical/Catholic/etc. That’s what PPRI (the group that got the 15% number) does.

For the voting block question however, they were only asked if they were a “White born-again or evangelical Christian?

Now to get too far in to the theological nuances, but there are plenty of folks I know who would claim the “born again” label who don’t go to traditionally “Evangelical” churches.  In fact, according to Mark Silk over at Religion News (who noted this discrepancy at the time), he’s been involved with research that “found that 38.6 percent of mainline Protestants and 18.4 percent of Catholics identified as “born again or evangelical.” So yes, the numbers may be skewed. It’s also worth noting that Pew Research puts the number of Evangelical Protestants at 25%, in a grouping that categorizes historically black groups separately (and thus is presumably mostly white).

So is the Evangelical turnout better than other groups? Well, it might still be. However, it’s good to know that this graph isn’t strictly comparing apples to apples, but rather slightly different questions given to different groups for different purposes. As we know slight changes in questions can yield very different results, so it’s worth noting. Caveat emptor, caveats galore on this one.

Asylum Claims and Other Numbers at the Border

There’s a lot in the news right now about border crossing, immigration and asylum claims, and I’m seeing all sorts of numbers being thrown around on Twitter. I wanted to do a quick round up of some numbers/sources to help people wade through it.

First up, every month US Customs and Border Patrol publishes the number of apprehensions they have at the Southern US Border and how that compares to the last 5 years. They do this relatively close to real time, we have the numbers for May, but not yet for June. If you want to know why you’re seeing so much in the news, take a look at this graph:

So with 4 months left to go in the fiscal year, we’re already 100,000 over the highest year on that chart. To give some context to that though, apprehension numbers have actually been relatively low for the last few years. They peaked at 1.6 million in the late 90s/early 2000s. However, there have been some changes to the makeup of that group….family crossings. Vox published this chart based on the CBP data that shows how this has changed:

I couldn’t find what those numbers were during the last spike, but it seems to be a record high.

So now what about asylum claims? Recently acting DHS Secretary Kevin McAleenan said the 90% of asylum seekers were skipping their hearings, but others were claiming that actually 89% show up. That’s a MASSIVE discrepancy, so I wanted to see what was going on.

First up, the 89% rate. The DOJ publishes all sorts of statistics about asylum hearings, and in this massive report they showed the “in absentia” rates for asylum decisions (page 33):

So for FY17, asylum claimants were at their decision hearing 89% of the time.

So where did the “90% don’t show up” claim come from? Reading the full context of McAleenan’s quote, it appears that he was specifically referencing a new pilot program for families claiming asylum. From what I can tell the pilot program is not published anywhere, so it’s not possible to check the numbers.

So is it plausible it jumped from 11% to 90%? I tend to doubt it, but it’s important to note the lag time here. The last published DOJ numbers are from FY2017, but those are for hearings that took place in FY2017. The average wait time for hearings in these cases for these cases is enormous….727 days so far in 2019.  These wait times are climbing, but if you toggle the graph around, we can see that the wait time back in 2015 was nearly two years:

So essentially those with decisions in FY2017 probably filed in FY2015. And a lot has happened to the stats since then. First, here are the number of applications over the last few years:

So compared to 2015, the number of applications have tripled but the number of approvals has barely budged. We don’t yet know what that will do to the percentage of people who show up, but it seems very plausible that it could increase the absentee rate. Additionally, because family migration is increasing so rapidly, it’s not clear what that will do to the numbers. Regardless, McAleelan’s reference was specifically to that group, so it was only a subset of the numbers that were previously reported. Still, 90% seems awfully high.

Complicating things further of course is the fact that this was a “pilot program”. That means it could have selected just one country or point of entry. One of the more interesting fact sheets from the DOJ site was the rate of asylum approvals by country. In the past few years, here are the top countries (page 29 of this report):

The rates of granting asylum from each of these countries were wildly different in 2018 though. Chinese asylum seekers were 53% granted, El Salvador was 15%, Honduras was 14%, Guatemala was 11%, Mexico was 6%. It seems plausible that a pilot program might have just been addressing those that arrive at the southern border, so it’s possible that individual countries have different profiles.

Overall, it’s clear that the data on this topic is worth watching.

Gender Ratios at Public Events

I’m out of town this weekend indulging in a very non-stats related hobby: pro wrestling.

Those of you who follow such things will know that this is WrestleMania weekend and it’s in New Jersey/just outside of NYC this year, and just so happens to coincide with my husbands birthday. Convenient.

Given the throngs of wrestling fans converging on one spot, there were quite a few other shows put on by other groups looking to capitalize on the crowds. One such show was a New Japan pro wrestling/Ring of Honor joint venture held last night at Madison Square Garden.

We went to this one, and I was interested to note it had one of the more lopsided gender ratios of any event I’ve been to. I’ve mentioned previously that I have a habit of counting such things, and last night was no exception. Normally I like to estimate the ratio of mono-gender groups, but despite all my looking I never found a women only group at this event.

I ended up switching to the number of women I could see in each row – rows were about 15 to 20 seats each. Rows almost always had 2 to 3 women in them. I never found a row with 4 women. I’d estimate the ratio at 7 to 1 male to female.

Other notes:

  • There were more women in the expensive seats than the cheap seats. I’d never explicitly noted uneven distribution of gender before, but I’ll watch for it from now on.
  • The gender ratio changed as the place filled up. When we first got there it was probably closer to 10 to 1, but more women showed up the later it got.
  • Men with long hair were a major confounder. When I count rows from far away I’m mostly looking at hair first, but I had to proceed slowly here.

Obviously those first two bullet points emphasize that this is a male dominated fandom, which I’m sure is not surprising to anyone. This was not a show aimed at the more popular or mainstream fan, but the “willing to tolerate half the show being announced in Japanese” type fan.

Tonight I expect the gender ratio to be more even, as WrestleMania has a broader spectrum of appeal and their women’s division is currently on FIRE. I’ll report back with my estimate tomorrow. Go Becky Lynch!

Update: The gender ratio at Wrestlemania ended up being about 4 or 5 to 1 male:female. Interestingly, the “extra” women were almost entirely younger girls, mostly there with their dad. One ahead of us walking in was even fully in costume (as Bailey), carrying a replica title belt and started trying to lead the “Woooooooooooo” cheer on the escalator. I think the strategy of pumping up their women’s division is paying off.