What I’m Reading: February 2018

No, there haven’t been 18 school shootings so far this year. Parkland is a tragedy, but no matter what your position, spreading stats like these doesn’t help.

I’ve followed Neuroskeptic long enough to know I should be skeptical of fMRI studies, and this paper shows why: some studies trying to look at brain regions may be confounded by individual variation. In other words, what was identified as “change” may have just been individual differences.

Speaking of questionable data, I’ve posted a few times about Brian Wansink and the ever-growing scrutiny of his work. This week his marquee paper was called in to question: the bottomless bowl experiment.  This experiment involved diners with “self-refilling” bowls of tomato soup, and the crux of the finding is that without visual cues people tend to underestimate how much they’ve eaten. The fraud accusations were surprising, given that:

  1. This finding seems really plausible
  2. This finding pretty much kicked off Wansink’s career in the public eye

If this finding was based on fake data, it seems almost certain everything that ever came out of his lab is suspect. Up until now I think the general sense was more that things might have gotten sloppy as the fame of his research grew, but a fake paper up front would indicate a different problem.

Related: a great thread on Twitter about why someone should definitely try to replicate the soup study ASAP. Short version: the hypothesis is still plausible and your efforts will definitely get you attention.

Another follow-up to a recent post: AL.com dives in to Alabama school districts to see if school secession (i.e. schools that split off from a county system to a city controlled system) are racially motivated.  While their research was prompted by a court ruling that one particular proposed split was racially motivated, they found that in general schools didn’t significantly change their racial or class makeup all that much when they split off from larger districts. What they did find was that cities who split off their schools ended up spending more per student than they did when they were part of a county system. This change isn’t immediate, but a few years out it was almost universally true. This suggests that taxpayers are more agreeable to increasing tax rates when they have more control over where the money is going. Additionally, the new schools tend to wind up more highly rated than the districts they left, and the kids do better on standardized testing. Interesting data, and it’s nice to see a group look at the big picture.

 

Idea Selection and Survival of the Fittest

It probably won’t come as a shock to you that I spend a lot of time ruminating over why there are so many bad ideas on the internet. Between my Intro to Internet Science, my review of the Calling Bullshit class, and basically every other post I’ve written on this site, I’ve put a lot of thought in to this.

One of the biggest questions that seems to come up when you talk about truth in the social media age is a rather basic “are we seeing something new here, or are we just seeing more of what’s always happened?” and what are the implications for us as humans in the long run? It’s a question I’ve struggled a lot with, and I’ve gone back and forth in my thinking. On the one hand, we have the idea that social media simply gives bigger platforms to bad actors and gives the rest of more situations in which we may be opining about things we don’t know much about.  On the other hand, there’s the idea that something is changing, and it’s going to corrupt our way of relating to each other and the truth going forward. Yeah, this and AI risk are pretty much what keeps me up at night.

Thus I was interested this week to see this Facebook post by Eliezer Yudkowsky about the proliferation of bad ideas on the internet. The post is from July, but I think it’s worth mentioning. It’s long, but in it Yudkowsky raises the theory that we are seeing the results of hypercompetition of ideas, and they aren’t pretty.

He starts by pointing out that in other fields, we’ve seen the idea that some pressure/competition is good, but too much can be bad. He uses college admissions and academic publishing as two examples. Basically, if you have 100 kids competing for 20 slots, you may get all the kids to step up their game. If you have 10,000 kids competing for 1 slot, you get widespread cheating and  test prep companies that are compared to cartels. Requiring academics to show their work is good, “publish or perish” leads to shoddy practices and probably the whole replication crisis. As Goodheart’s law states “When a measure becomes a target, it ceases to be a good measure”. In practical terms, hypercompetition ends up with a group that optimizes for one thing and only one thing, while leaving the back door completely unguarded.

Now take that entire idea and apply it to news and information in the social media age. While there are many good things about democratizing the spread of information, we have gone from moderate competition (get a local newspaper or major news network to pay attention to you, then everyone will see your story) to hypercompetition (anyone can get a story out there, you have to compete with billions of other stories to be read). With that much competition, we are almost certainly not going to see the best or most important stories rise to the top, but rather the ones that have figured out how to game the system….digital, limbic, or some combination of both. That’s what gets us to Toxoplasma of Rage territory, where the stories that make the biggest splash are the ones that play on your worst instincts. As Yudkowsky puts it “Print magazines in the 1950s were hardly perfect, but they could sometimes get away with presenting a complicated issue as complicated, because there weren’t 100 blogs saying otherwise and stealing their clicks”.

Depressed yet? Let’s keep going.

Hypercompetitive/play to your worst instincts stories clearly don’t have a great effect on the general population, but what happens to those who are raised on nothing other than that? In one of my favorite lines of the post, Yudkowsky says “If you look at how some groups are talking and thinking now,  “intellectually feral children” doesn’t seem like entirely inappropriate language.”  I’ve always thought of things like hyperreality in terms of virtual reality vs physical reality or artificial intelligence vs human intelligence, but what if we are kicking that off all on our own? Wikipedia defines it as “an inability of consciousness to distinguish reality from a simulation of reality, especially in technologically advanced postmodern societies.”, and isn’t that exactly what we’re seeing here on many topics? People use technology, intentionally or unintentionally to build bubbles that skew their view of how the world works, but consistently get reinforcement that they are correct?

Now of course it’s entirely possible that this is just a big “get off my lawn” post and that we’ll all be totally fine. It’s also entirely possible that I should not unwind from long weeks by drinking Pinot Noir and reading rationalist commentary on the future of everything, as it seems to exacerbate my paranoid tendencies. However, I do think that much of what’s on the internet today is the equivalent of junk food, and living in an environment full of junk food doesn’t seem to be working out too well for many of us. In physical health, we may have reached the point where our gains begin to erode, and I don’t think it’s crazy to think that a similar thing could happen intellectually.  Being a little more paranoid about why we’re seeing certain stories or why we’re clicking on certain links may not be the worst thing. For those of us who have still developing kids, making sure their ideas get challenged may be progressively more critical.

Good luck out there.

Praiseworthy Wrongness: Dr vs Ms

I ran across a pretty great blog post this week that I wanted to call attention to. It’s by a PhD/science communicator Bethany Brookshire who blogs at Scicurious.com and hosts a podcast Science for the People*.

The post recounts her tale of being wrong on the internet in a Tweet that went viral.

For those too lazy to click the link, it happened like this: early one Monday morning, she checked her email and noticed that two scientists she’d reached out to for interviews had gotten back to her, one male and one female.  The male had started his reply with “Dear Ms. Brookshire”, and the woman “Dear Dr Brookshire”. She felt like this was a thing that had happened before, so she sent this Tweet:

After sending it and watching it get passed around, she started to feel uneasy. She realized that since she actually reached out to a LOT of people for interviews over the last 2 years, she could actually pull some data on this. Her blog post is her in depth analysis of what she found (and I recommend you read the whole thing), but basically she was wrong. While only 7% of people called her “Dr Brookshire”, men were actually slightly more likely to do so than women. Interestingly, men were also more likely to launch is to their email without using any name, and women were actually more likely to use “Ms”. It’s a small sample size so you probably can’t draw any conclusions other than this: her initial Tweet was not correct. She finishes her post with a discussion of recency bias and confirmation bias, and how things went awry.

I kept thinking about this blog post after I read it, and I realized it’s because what she did here is so uncommon in the 2018 social media world. She got something wrong quite publicly, and she was willing to fess up and admit it. Not because she got caught or might have gotten caught (after all, no one had access to her emails) but simply because she realized she should check her own assumptions and make things right if she could. I think that’s worthy of praise, and the kind of thing we should all be encouraging of.

As part of my every day work, I do a lot of auditing of other people’s work and figuring out where they might be wrong. This means I tend to do a lot of meditating on what it means to be wrong….how we handle it, what we do with it, and how to make it right. One of the things I always say to staff when we’re talking about mistakes is that the best case scenario is that you don’t make a mistake, but the second best case is that you catch it yourself. Third best is that we catch it here, and fourth best is someone else has to catch us. I say that because I never want staff to try to hide errors or cover them up, and I believe strongly in having a “no blame” culture in medical care. Yes, sometimes that means staff might think confessing is all they have to do, but when people’s health is at stake the last thing you want is for someone to panic and try to cover something up.

I feel similarly about social media. The internet has made it so quick and easy to announce something to a large group before you’ve thought it through, and so potentially costly to get something wrong that I fear we’re going to lose the ability to really admit when we’ve made a mistake. Would it have been better if she had never erred? Well, yes. But once she did I think self disclosure is the right thing to do. In our ongoing attempt to call bullshit on internet wrongness, I think  giving encouragement/praise to those who own their mistakes is a good thing. Being wrong and then doubling down (or refusing to look in to it) is far worse than stepping back and reconsidering your position. The rarer this gets, the more I feel the need to call attention those who are willing to do so.

No matter what side of an issue you’re on, #teamtruth should be our primary affiliation.

*In the interest of full disclosure, Science for the People is affiliated with the Skepchick network, which I have also blogged for at their Grounded Parents site. Despite that mutual affiliation and the shared first name, I do not believe I have ever met or interacted with  Dr Brookshire. Bethany’s a pretty rare first name, so I tend to remember it when I meet other Bethanys (Bethanii?)

 

What I’m Reading: January 2018

I recently finished a book called Deep Survival, and I haven’t stopped thinking about it since. If you haven’t heard of it before, it’s an examination of people who get stuck in life threatening situations (particularly in nature) and live. The book seeks to understand what behaviors are common to survivors, and what they did differently than others. While not directly about statistics (and suffering from some quite literal survivor bias), it’s a good examination of how we calculate risk and process information at critical moments.

Since I’ve brought up gerrymandering before, I thought I’d point to the ongoing 538 series on the topic, which includes this article about how fixing gerrymandering might not fix anything. That included this graphic, which shows that if you break down partisan voting by county lines (which are presumably not redrawn as often), you still see a huge jump in polarized voting: 

There was an interesting hubbub this week when California proposed a new law that would find restaurants $1000 if their waiters  offered people “unsolicited straws”. The goal of the bill was to cut down on straw use, which the sponsors said was 500 million straws per day. Reason magazine questioned that number, and discovered that it was based on some research a 9 year old did. To be fair to the 9 year old (now 16) he was pretty transparent and forthcoming about how he got it (more than some adult scientists), and he had put his work up on a website for others to use. What’s alarming is how many major news outlets took what was essentially an A+ science fair project as uncontested fact.  Given that there are about 350 million people in the US, it seems like a number that shows we all use 1-2 straws per day should have raised some eyebrows. (h/t Gringo)

This piece in Nature was a pretty interesting discussion of the replication crisis and what can be done about it. The authors point out that when we replicate a finding,  any biases or errors that are endemic to the whole design will remain. What they push for is “triangulation” or trying multiple approaches and seeing what they hone in on. Sounds reasonable to me.

Another topic I’ve talked about a bit in the past is the concept of food deserts, and how hard they are to measure. This article talks about about the inverse concept: food swamps. While food deserts measure how far you are from a grocery store, food swamps measure how many fast food options you have nearby. Apparently living in a food swamp is a much stronger predictor of obesity than living in a food dessert, especially if you don’t have a car.

The Assistant Village Idiot has had a few good posts up recently about how to lie with maps, and this Buzzfeed article actually has a few interesting factoids about how we mis-perceive geography as well. My favorite visual was this one, showing world population by latitude:

I liked this piece on the Doomsday Clock and the downsides of trying to put a number on the risk of nuclear war. Sometimes we just have to deal with a bit of uncertainty.

Speaking of uncertainty, I liked this piece from Megan McArdle about how to wrap your head around the “body cameras don’t actually change police behavior” study.  Her advice is pretty generalizable to every study we hear that counters our intuition.

Analyzing Happiness via Social Media

Happy Wednesday everyone….or is it?

I stumbled across a new-to-me study recently: “Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter“, and I’ve been fascinated by it ever since. It’s a cumbersome name for an interesting study that analyzed Twitter posts to determine if there’s any pattern to when we express certain types of feelings on social media. For example, use of the word “starving” rises as you approach typical meal times, and falls once those times pass:

This data is fascinating to me because it gives some indication of where social media reflects reality, and some ideas of where it might not. For example, it appears the word starving is not often used at breakfast, but is used quite a bit for lunch. I don’t know that people are really the hungriest right before lunch, but it appears they may be most likely to want to express feelings of hunger at that time. I am guessing this is because people may have less control over when they get to eat (being at work, running around with errands, etc) and thus may get more agitated about it.

The researchers decided specifically to look at happiness as expressed through social media posts. They tracked this on a day to day basis, and the decided to figure out which days of the week were the happiest ones. Turns out Wednesday’s not looking so good:

I know the running joke is about Monday, but it’s interesting to note that Tuesday fared  the worst on this ranking. I suspect that’s related to the fact that Monday’s may instill the most dread in people, but aggravation you want to express may need a day or two to build up. Of course if you look at the overall scale, it’s not clear how much of a difference a score of 6.025 vs 6.08 really makes, but I’ll roll with it for now.

That havg on the y-axis there kind of an interesting number. They pulled out a lot of commonly used Twitter words, then asked people on Mechanical Turk to rate how happy each word made them on a scale of 1 to 9. Here’s how some common words fared:

I love that they ranked “the” and “of”, and was interested to see that vanity was more highly rated than greed.

Interestingly, in  order to keep their data clean, the researchers also excluded a few days that produced noticeable changes in happiness measures. Some of these were for obvious reasons (like holidays, days of natural disasters), but some were kind of funny. For example, they noted that May 24, 2010 as an unusual date because:

the finale of the last season of the highly rated television show ‘Lost’, marked by a drop in our time series on May 24, 2010, and in part due to the word ‘lost’ having a low happiness score of havg = 2.76, but also to an overall increase in negative words on that date.”

This of course shows an interesting weakness in social media studies….you always risk counting things that shouldn’t be counted. Additionally, you may give more credit to certain days than they deserve. For example, Saturday got a boost because of the high rankings of the word “party” and “wedding”, both of which are mostly held on Saturdays.

As social media continues to dominate our lives, I’m sure we’ll see progressively more research like this. Always interesting to consider the possible insights vs ways it can be misleading. Good luck with Wednesday folks, Saturday’s right around the corner.

Penguin Awareness Day and Extinction Threat Levels

Sometimes Twitter teaches me the most interesting things. Apparently yesterday was Penguin Awareness Day, which I found out when someone I knew retweeted this:

(Link if embed doesn’t work)

I was intrigued by the color coding under each, and was rather curious what the difference between “endangered”, “vulnerable” and “near threatened” were. Since I’m always on the lookout for faux classifications, I was wondering if those were random categories, or if they had some sort of basis.

Turns out, it’s actually the latter! This is probably well known to anyone in to conservation, but the classification system is actually put out by the International Union for Conservation of Nature.  It looks like this:

They publish a rather extensive document detailing each category, and apparently they update this document every couple years.  The entire goal of this classification system was to bring some rigor to the process of assessing different species populations, and they have some interesting guidelines.

For example, if a species population drops due to known and/or reversible causes, the size of the drop dictates their status. A drop of >90% in 10 years (or 3 generations) gets you labeled critically endangered, >70% gets you labeled endangered, and a drop of >50% gets you a “vulnerable” label. “Near threatened” doesn’t have a number, but would apply if there was growing concern/problems that didn’t meet any of the other criteria. They play out some other scenarios here. All of the criteria include numbers plus ongoing threats, so there are a few different cases for each.

For example, a critically endangered species could have <250 mature individuals + a threatened habitat, or <50 mature individuals with no threat. For endangered animals, those numbers are 2500 and 500 respectively, and for vulnerable animals it’s 10,000 and 1000. I was interested to note that they include quantitative models as a valid form of forecasting extinction.

Anyway, whether you agree with the criteria or not, it was nice to know that someone’s actually tried to define these terms in a transparent way that anyone can read up on.  Hopefully that means these guys will be okay:

 

Recreational Quantification

On my recent post about hot drinks and esophageal cancer, Gringo made a comment about how quickly his Yerba Mate cooled down in the summer (30 minutes) vs winter (10-15 minutes). I was struck by this, because I find random numerical trivia about people’s daily life quite fascinating. I think this is mostly because many people don’t actually keep track of stuff like this, or if they notice it they don’t remember it.

While this phenomena is obviously probably related to numerical aptitude, I also think it’s probably related to something John Allen Paulos talks about. In an article about Stories vs Statistics, he posits that about 61% of people (update: he may have been joking with this number, there’s no source for it) see numbers as “rhetorical decoration” to stories, whereas the other 39% see numbers as “clarifying information”.

This reminded me of an exchange I had with my father last week when we were discussing how cold it was:

Dad: How are you surviving the cold down there?
Me: It’s been pretty chilly. I could tell it was cold because my walk from the train normally takes me 30 minutes, and this week I noticed it was taking 26 minutes without me consciously increasing my speed.
Dad: wow, that’s cold.
<5 more minutes of back and forth on walking speeds during various weather patterns, and how traffic lights/street crossings make the 4 minute time saving even more impressive>

I have come to understand that most people do not reach for anecdotes like this when they are trying to explain how cold it is, but it’s one of the best ways of communicating information like that to my Dad.

Interestingly, Paulos attributes this communication preference to our feelings towards Type 1 vs Type 2 errors. He posits that those who want to hear numbers are doing so because they are focused on avoiding Type 1 errors (seeing something that’s not there), and those who prefer stories are more interested in avoiding Type 2 errors (failing to see something that is there). I have no idea if he’s right about this, but personality typing based on statistical approaches is a thing I am totally on board with.

Anyway, I find myself counting and/or finding ways of quantifying all sorts of things as I go through life. Some of these are straightforward (I tracked my gas mileage for quite some time, I track my steps and resting heart rate, I have a particular obsession with hours of daylight), but some are a little more complex.

For example, every time I go to a concert, I always take note of the relative frequency of mixed gender groups vs male only groups vs female only groups. I started this because I attend a lot of concerts with my husband, and we got in a running discussion about “guy bands” vs “girl bands”. As I tried to quantify which was which, I realized that a strict gender breakdown sometimes hid information about the band’s core audience. AC/DC for example: the crowd there is 30-40% women, but almost all of the women are there with men. The number of male only groups was 3 to 4 times the number of female only groups. Interestingly, in many of the mixed gender groups there were more women than men, which is why the proportion was so high despite women not attending alone. Thus I put AC/DC in the category of a “guy band” that appeals to women, as opposed to a gender neutral band. In other words, it appears women are happy to attend, but only if someone else suggests it.

Since I started tracking this, I have seen two bands who appear to have truly equal gender appeal: Tom Petty and the Heart Breakers and Aerosmith.

The most male dominated concert I have ever been to was Judas Priest. The most female dominated concert was Ani Difranco. At neither of these concerts could I find a member of the minority gender unaccompanied by a member of the majority gender.

Another interesting breakdown is “couples concerts” or “date concerts” where you see very few people attending in mono-gender groups. TV on the Radio and a few other hipster bands I’ve seen appear to be like that. On the other side, when I went to see a Drag Queen Christmas, it was entirely the opposite. The audience was half male and half female, but since most of the men were (presumably) gay the groups that attended were mostly mono-gender.

All that being said, I’d be interested in hearing about random things that readers count/track/note when out and about, or your band examples. I understand I have rather idiosyncratic tastes in music, so I’d be interested in other examples.