Blood Sugar Model Magik?

An interesting new-to-me study came on my radar this week “Personalized Nutrition by Prediction of Glycemic Responses” published by Zeevi et al in 2015. Now, if you’ve ever had the unfortunate experience of talking about food with me in real life, you probably know I am big on  quantifying things and particularly obsessed with blood sugar numbers. The blood sugar numbers thing started when I was pregnant with my son and got gestational diabetes. 4 months of sticking yourself with a needle a couple of times a day will do that to a person.

Given that a diagnosis of gestational diabetes is correlated with a much higher risk of an eventual Type 2 diabetes diagnosis, I’ve been pretty interested in what effects blood sugar numbers. One of those things is the post-prandial glucose response (PPGR) or basically how high your blood sugar numbers go after you eat a meal. Unsurprisingly, chronically high numbers after meals tend to correlate with overall elevated blood sugar and diabetes risk. To try and help people manage this response the glycemic index was created, which attempted to measure what an “average” glucose response to particular foods. This sounds pretty good, but the effects of using this as a basis for food choices in non-diabetics have been kind of mixed. While it appears that eating all high glycemic index foods (aka refined carbs) is bad, it’s not clear that parsing things out further is very helpful.

There are a lot of theories about why glycemic index may not work that well: measurement issues (it measures an area under a curve without taking in to account the height of the spike), the quantities of food eaten (watermelon has a high glycemic index, but it’s hard to eat too much of it calorie-wise), or the effects of mixing foods with each other (the values were determined by having people eat just one food at a time). Zeevi et al had yet another theory: maybe the problem was taking the “average” response. Given that averages can often hide important information about the population they’re describing, they wondered if individual variability was mucking about with the accuracy of the numbers.

To test this theory, they recruited 800 people, got a bunch of information about them, and hooked them up to a continuous glucose monitor and had them log what they ate. They discovered that while some foods caused a similar reaction in everyone (white bread for example), some foods actually produced really different responses (pizza or bananas for example). They then used factors like BMI, activity level, gut microbiome data to build a model that they hoped would predict who would react to what food.

To give this study some real teeth, they then took the model they built and applied it to 100 new study participants. This is really good because it means they tested if they overfit their model….i.e. tailored it too closely to the original group to get an exaggerated correlation number. They showed that their model worked just as well on the new group as the old group (r=.68 vs r=.70). To take it a step further, they recruited 26 more people, got their data, then feed them a diet predicted to be either “good” or “bad” for them.  They found overall that eating the “good” diet helped keep blood sugar in check as compared to just regular carbohydrate counting.

The Atlantic did a nice write up of the study here, but a few interesting/amusing things I wanted to note:

  1. Compliance was high Nutrition research has been plagued by self reporting bias and low compliance to various diets, but apparently that wasn’t a problem in this study. The researchers found that by emphasizing to people what the immediate benefit to them would be (a personalized list of “good” and “bad” foods, people got extremely motivated to be honest. Not sure how this could be used in other studies, but it was interesting.
  2. They were actually able to double blind the study Another chronic issue with nutrition research is the inability to blind people to what they’re eating. However, since people didn’t know what their “good” foods were, it actually was possible to do some of that for this study. For example, some people were shocked to find that their “good” diet had included ice cream or chocolate.
  3. Carbohydrates  and fat content were correlated with PPGR, but not at the same level for everyone At least for glucose issues, it turns out the role of macronutrients was more pronounced in some people than others. This has some interesting implications for broad nutrition recommendations.
  4. Further research confirmed the issues with glycemic index  In the Atlantic article, some glycemic index proponents were cranky because this study only compared itself to carb counting, not the glycemic index. Last year some Tufts researchers decided to focus just on the glycemic index response and found that inter-person variability was high enough that they didn’t recommend using it.
  5. The long term effects remain to be seen It’s good to note that the nutritional intervention portion of this study was just one week, so it’s not yet clear if this information will be helpful in the long run. On the one hand, it seems like personalized information could be really helpful to people…it’s probably easier to avoid cookies if you know you can still have ice cream. On the other hand, we don’t yet know how stable these numbers are. If you cut out cookies entirely but keep ice cream in your diet, will your body react to it the same way in two years?

That last question, along with “how does this work in the real world” is where the researchers are going next. They want to see if people getting personalized information are less likely to develop diabetes over the long term. I can really see this going either way. Will people get bored and revert to old eating patterns? Will they overdo it on foods they believe are “safe”? Or will finding out you can allow some junk food increase compliance and avoid disease? As you can imagine, they are having no trouble recruiting people. 4,000 people (in Israel) are already on their waiting list, begging to sign up for future studies. I’m sure we’ll hear more about this in the years to come.

Personally, I’m fascinated by the whole concept. I read about this study in Robb Wolf’s new book “Wired to Eat“, in which he proposes a way people can test their own tolerance for various carbohydrates at home. Essentially you follow a low to moderate carbohydrate paleo (no dairy, no legumes, no grain) plan for 30 days, then test your blood glucose response to a single source of carbohydrates every day for 7 days. I plan on doing this and will probably post the results here. Not sure what I’ll do with the results, but like I said, I’m a sucker for data experiments like this.

Calling BS Read-Along Week 4: Causality

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 3 click here.

Well hello week 4! We’re a third of the way through the class, and this week we’re getting a crash course in correlation/causation confusion, starting with this adapted comic:

Man, am I glad we’re taking a look at this. Correlating variables is one of the most common statistical techniques there is, but it is also one of the most commonly confused. Any time two variables are correlated, there are actually quite a few possible explanations such as:

  1. Thing A caused Thing B (causality)
  2. Thing B caused Thing A (reversed causality)
  3. Thing A causes Thing B which then makes Thing A worse (bidirectional causality)
  4. Thing A causes Thing X causes Thing Y which ends up causing Thing B (indirect causality)
  5. Some other Thing C is causing both A and B (common cause)
  6. It’s due to chance (spurious or coincidental)

You can find examples of each here, but the highlight is definitely the Spurious Correlations website.  Subjects include the theory that Nicolas Cage movies cause drownings and why you don’t want to eat margarine in Maine.

With that framing, the first reading is an interesting anecdote that highlights both correlation/causation confusion AND why sometimes it’s the uncorrelated variables that matter. In Milton Friedman’s thermostat analogy, Friedman ponders what would happen if you tried to analyze the relationship between indoor temperature, outdoor temperature and energy usage in a home. He points out that indoor temperature would be correlated with neither variable, as the whole point is to keep that constant. If you weren’t familiar with the system, you could conclude that using energy caused a drop in temperatures, and that the best way to stay warm would be to turn off the furnace. A good anecdote to keep in mind as it illustrates quite a few issues all at once.

Next up is the awesomely named paper “Storks Deliver Babies (p = 0.008)“. In it, Robert Mathews takes the birth rates in 17 European countries and correlates them with the approximate number of storks in each country and finds a correlation coefficient of .62.  As the title of the paper suggests, this correlation is statistically significant. The author uses this to show the weaknesses of some traditional statistical analyses, and how easy it is to get ridiculous results that sound impressive.

Misleading statistics is also the subject of the Traffic Improvements case study, where a  Seattle news station complained that a public works project cost $74 million but only made the average commute 2 seconds faster, leading to the conclusion that the spending was not correlated with any improvements. When you dig a bit deeper though, you discover that the volume the highway could accomodate rose by 30,000 cars/day.  If you take cars/day as a variable, the spending was correlated with an improvement. This is a bit like the Milton Friedman thermostat example: just because a variable stays constant doesn’t mean it’s not involved. You have to look at the whole system.

Speaking of the whole system, I was interested to note that part way through the case study the Calling BS overlords cited Boston’s own Big Dig and mention that “Boston traffic got better”. As a daily commuter in to Boston, I would like to mention that looking at the whole system here also gives a slightly more complicated picture. While it is true that the Big Dig allowed more cars to move through the city underground, a Boston Globe report noted that this only helped traffic along the route that got worked on. Traffic elsewhere in the city (like say, the area I commute to) got much worse during this time frame, and Boston never lost it’s ranking as one of the most congested cities. Additionally, while the improvements made it possible to handle more cars on the road, the cost overrun severely hampered the cities ability to build or maintain it’s public transportation. Essentially by overspending on getting more cars through, the Big Dig made it necessary for more people to drive. Depending on which metric you pick, the Big Dig is both correlated with success AND failure…plus a tax bill I’m still chipping in money towards on TOP of what I pay for subpar commuter rail service. Not that I’m bitter or anything.

One interesting issue to note here is that sometimes even if journalists do a good job reporting on the nuances of correlation/causation, editors or headline writers can decide to muddy the issue. For example, Slate Star Codex did a great piece on how 4 different news outlets wrote a headline on the same study: 

Unless you were pretty on top of things, I don’t think most people would even recognize those middle two headlines were about the same study as the first and fourth. The Washington Post had to take a headline down after they had declared that if women wanted to stop violence against them they should get married. The new improved headline is below, but the original is circled in red:

It’s easy to think of headlines as innocuous if the text is good, but subtle shifts in headlines do color our perception of anything that comes after it. (I’ve collected most of my links on this research here)

Alright, back to the readings.

Our last piece goes back to 1897 and is written by Mr Correlation Coefficient himself: Karl Pearson. The math to work out the correlation coefficients had no sooner been done than Pearson started noticing people were misusing it. He was particularly concerned about people attributing biological causality to things that actually came from a common cause. Glad to see we’ve moved beyond that. Interestingly, history tells us that in Pearson’s day this was the fault of statisticians who used different methods to get correlations they wanted. After Pearson helped make correlations more rigorous, the problem flipped to people over-attributing meaning to correlations they generated. In other words, 100 years ago people put in correlations that didn’t belong, now they fail to take them out.

Okay, that’s it for this week! We’ll see you back here next week for Statistical traps and trickery.


Number Blindness

“When the facts are on your side, pound the facts. When the law is on your side, pound the law. When neither is on you side, pound the table.” – old legal adage of unclear origin

Recently I’ve been finding it rather hard to go on Facebook. It seems like every time I log in, someone I know has chosen that moment to start a political debate that is going poorly. It’s not that I mind politics or have a problem with strong political opinions, but what bugs me is how often suspect numbers are getting thrown out to support various points of view. Knowing that I’m a “numbers person”, I have occasionally had people reach out asking me to either support or refute whatever number it is that is being used, or use one of my posts to support/refute what is being said. While some of these requests are perfectly reasonable requests for explanations, I’ve gotten a few recently that were rather targeted “Come up with a reason why my opponent is wrong” type things, with a heavy tone of “if my opponent endorses these numbers, they simply cannot be correct”. This of course put me in a very meta mood, and got me thinking about how we argue about numbers. As a result, I decided to coin a new term for a logical fallacy I was seeing: Number Blindness.

Number Blindness: The phenomena of becoming so consumed by an issue that your cease to see numbers as independent entities and view them only as props whose rightness or wrongness is determined solely by how well they fit your argument

Now I want to make one thing very clear up front: the phenomena I’m talking about is not simply criticizing or doubting numbers or statistics. A tremendous amount of my blogging time is spent writing about why you actually should doubt many of the numbers that are flashed before your eyes. Criticism of numbers is a thing I fully support, no matter whose “side” you’re on.

I am also not referring to people who say that numbers are “irrelevant” to the particular discussion or said that I missed the point. I actually like it when people say that, because it clears the way to have a purely moral/intellectual/philosophical discussion. If you don’t really need numbers for a particular discussion, go ahead and leave them out of it.

The phenomena I’m talking about is actually when people want to involve numbers in order to buffer their argument, but take any discussion of those numbers as offensive to their main point. It’s a terrible bait and switch and it degrades the integrity of facts. If the numbers you’re talking about were important enough to be included in your argument, then they are important enough to be held up for debates about their accuracy. If you’re pounding the table, at least be willing to admit that’s what you’re doing.

Now of course all of this got inspired by some particular issues, but I want to be very clear: everyone does this. We all want to believe that every objective fact points in the direction of the conclusion that we want. While most people are acutely aware of this tendency in whichever political party they disagree with, it is much harder to see it in yourself or in your friends. Writing on the internet has taught me to think carefully about how I handle criticism, but it’s also taught me a lot about how to handle praise. Just like there are many people who only criticize you because you are disagreeing with them, there are an equal number who only praise you because you’re saying something they want to hear. I’ve written before about the idea of “motivated numerancy” (here and for the Sojourners blog here), but studies do show that ability to do math rises and falls depending on how much you like the conclusions that math provides….and that phenomena gets worse the more intelligent you are. As I said in my piece for Sojourners “Your intellectual capacity does NOT make you less likely to make an error — it simply makes you more likely to be a hypocrite about your errors.”

Now in the interest of full disclosure, I should admit that I know number blindness so well in part because I still fall prey to it. It creeps up every time I get worked up about a political or social issue I really care about, and it can slip out before I even have a chance to think through what I’m saying. One of the biggest benefits of doing the type of blogging I do is that almost no one lets me get away with it, but the impulse still lurks around. Part of why I make up these fallacies is to remind myself that guarding against bias and selective interpretation requires constant vigilance.

Good luck out there!

Calling BS Read-Along Week 3: The Natural Ecology of BS

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 2 click here.

Well hi there! It’s week 3 of the read-along, and this week we’re diving in to the natural ecology of bullshit. Sounds messy, but hopefully by the end you’ll have a better handle on where bullshit is likely to flourish.

So what exactly is the ecology of bullshit and why is it important? Well, I think it helps to think of bullshit as a two step process. First, bullshit gets created. We set the stage for this in week one when we discussed the use of bullshit as a tool to make yourself sound more impressive or more passionate about something. However, the ecology of bullshit is really about the second step: sharing, spreading and enabling the bullshit. Like rumors in middle school, bullshit dies on the vine if nobody actually repeats it. There’s a consumer aspect to all of this, and that’s what we’re going to cover now. The readings this week cover three different-but-related conditions that allow for the growth of bullshit: psuedo-intellectual climates, psuedo-profound climates, and social media. Just like we talked about in week one, it is pretty easy to see when the unintelligent are propagating bullshit, but it is a little more uncomfortable to realize how often the more intelligent among us are responsible for their own breed of  “upscale bullshit”.

And where do you start if you have to talk about upscale bullshit? By having a little talk about TED. The first reading is a Guardian article that gets very meta by featuring a TED talk about how damaging the TED talk model can be. Essentially the author argues that we should be very nervous when we start to judge the value of information by how much it entertains us, how much fun we have listening to it, or how smart we feel by the end of it. None of those things are bad in and of themselves, but they can potentially crowd out things like truth or usefulness. While making information more freely available and thinking about how to communicate it to a popular audience is an incredibly valuable skill, leaving people with the impression that un-entertaining science is less valuable or truthful is a slippery slope.1

Want a good example of the triumph of entertainment over good information? With almost 40 million views, Amy Cuddy’s Wonder Woman/power pose talk is the second most watched TED talk of all time. Unfortunately, the whole thing is largely based on a study that  has (so far) failed to replicate. The TED website makes no note of this, and even the New York Times and Time magazine fail to note this when it comes up. Now to be fair, Cuddy’s talk wasn’t bullshit when she gave it, and it may not even be bullshit now. She really did do a study (with 21 participants) that found that power posing worked. The replication attempt that failed to find an effect (with 100 participants) came a few years later, and by then it was too late, power posing had already entered the cultural imagination. The point is not that Cuddy herself should be undermined, but that we should be really worried about taking a nice presentation as the final word on a topic before anyone’s even seen if the results hold up.

The danger here of course is that people/things that are viewed as “smart” can have a much farther reach than less intellectual outlets. Very few people would repeat a study they saw in a tabloid, but if the New York Times quotes a study approvingly most people are going to assume it is true. When smart people get things wrong, the reach can be much larger. One of the more interesting examples of the “how a smart person gets things wrong” vs “how everyone else gets things wrong” phenomena I’ve ever seen is from the 1987 documentary “A Private Universe”. In the opening scene Harvard graduates are interviewed at their commencement ceremony and asked a simple question quite relevant to anyone in Boston: why does it get colder in the winter? 21 out of 23 of them get it wrong (hint: it isn’t the earth’s orbit)….but they sound pretty convincing in their wrongness. The documentary then interviews 9th graders, who are clearly pretty nervous and stumble through their answers. About the same number get the question wrong as the Harvard grads, but since they are so clearly unsure of themselves that you wouldn’t have walked away convinced. The Harvard grads weren’t more correct, just more convincing.

Continuing with the theme of “not correct, but sounds convincing”, our next reading is the delightfully named  “On the reception and detection of pseudo-profound bullshit” from Gordon Pennycook.  Pennycook takes over where Frankfurt’s “On Bullshit” left off and actually attempts to empirically study our tendency to fall for bullshit. His particular focus is what others have called “obscurantism” defined as “[when] the speaker… [sets] up a game of verbal smoke and mirrors to suggest depth and insight where none exists”…..or as commenter William Newman said in response to my last post “adding zero dollars to your intellectual bank”. Pennycook proposes two possible reasons we fall for this type of bullshit:

  1. We generally like to believe things rather than disbelieve them (incorrect acceptance)
  2. Purposefully vague statements make it hard for us to detect bullshit (incorrect failure to reject)

It’s a subtle difference, but any person familiar with statistics at all will immediate recognize this as a pretty classic hypothesis test. In real life, these are not mutually exclusive. The study itself took phrases from two websites I just found out existed and am now totally amused by (Wisdom of Chopra and the New Age Bullshit Generator), and asked college students to rank how profound the (buzzword filled but utterly meaningless) sentences were2. Based on the scores, the researchers assigned a “bullshit receptivity scale” or BSR to each participant. They then went through a series of 4 studies that related bullshit receptivity to other various cognitive features. Unsurprisingly, they found that bullshit receptivity was correlated with belief in other potentially suspect beliefs (like paranormal activity), leading them to believe that some people have the classic “mind so open their brain falls out”. They also showed that those with good bullshit detection (i.e. those who could rank legitimate motivational quotes as profound while also ranking nonsense statements as nonsense) scored higher on analytical thinking skills. This may seem like a bit of a “well obviously” moment, but it does suggest that there’s a real basis to Sagan’s assertion that you can develop a mental toolbox to detect baloney. It also was a good attempt at separating out those who really could detect bullshit from those who simply managed to avoid it by saying nothing was profound. Like with the psuedo-intellectualism, the study authors hypothesized that some people are particularly driven to find meaning in everything, so they start finding it in places that it doesn’t exist.

Last but not least, we get to the mother of all bullshit spreaders: social media. While it is obvious social media didn’t create bullshit, it is undeniably an amazing bullshit delivery system. The last paper “Rumor Cascades“, attempts to quantify this phenomena by studying how rumors spread on Facebook. Despite the simple title, this paper is absolutely chock full of interesting information about how rumors get spread and shared on social media, and the role of debunking in slowing the spread of false information. To track this, they took rumors found on and used the Snopes links to track the spread of their associated rumors through Facebook. Along the way they pulled the number of times the rumor was shared, time stamps to see how quickly things were shared (answer: most sharing is done within 6 hours of a post going up), and if responding to a false rumor by linking to a debunking made a difference (answer: yes, if the mistake was embarrassing and the debunking went up quickly). I found this graph particularly interesting, as it showed a fast linking to Snopes (they called it being “snoped”) was actually pretty effective in getting the post taken down:

Snopetoreshare.pngIn terms of getting people to delete their posts, the most successful debunking links were things like “those ‘photos of Trayvon Martin the media doesn’t want you to see’ are not actually of Trayvon Martin“. They also found that while more false rumors are shared, true rumors spread more widely. Not a definitive paper by any means but a fascinating initial look at the new landscape. Love it or hate it, social media is not going away any time soon, and the more we understand about how it is used to spread information, the better prepared we can be3.

Okay, so what am I taking away from this week?

  1. If bullshit falls in the forest and no one hears it, does it make a sound? In order to fully understand bullshit, you have to understand how it travels. Bullshit that no one repeats does minimal damage.
  2. Bullshit can grow in different but frequently overlapping ecosystems Infotainment, the psuedo-profound, and close social networks all can spread bullshit quickly.
  3. Analytical thinking skills and debunking do make a difference The effect is not as overwhelming as you’d hope, but every little bit helps

I think separating out how bullshit grows and spreads from bullshit itself is a really valuable concept. In classic epidemiology disease causation is modeled using the “epidemiologic triad“, which looks like this (source):epidemiologictriad

If we consider bullshit a disease, based on the first three weeks I would propose its triad looks something like this:


And on that note, I’ll see you next week for some causality lessons!

Week 4 is up! If you want to read it, click here.

1. If you want  a much less polite version of this rant with more profanity, go here.
2. My absolute favorite part of this study is that part way through they included an “attention check” that asked the participants to skip the answers and instead write “I read the instructions” in the answer box. Over a third of participants failed to do this. However, they pretty much answered the rest of the survey the way the other participants did which kinda calls in to question how important paying attention is if you’re listening to bullshit.
3. It’s not a scientific study and not just about bullshit, but for my money the single most important blog post ever written about the spread of information on the internet is this one right here. Warning: contains discussions of viruses, memetics, and every controversial political issue you can think of. It’s also really long.

10 GIFs for Stats/Data People

Nope, this isn’t a gifts post, it is a GIFs post! It occurred to me this past week that one of the things I’m fairly well known for at work and in my personal life is my absolute dedication to gif usage. I send them as often as I can get away with at work (this shows up as “frequently employees novel communication methods to get her point across” on my review, if you’re curious), and I use them pretty regularly in personal emails, particularly around Fantasy Football/Game of Thrones Fantasy League Season. As such, it is a little weird that I almost never use them on my blog unless Ben’s involved. Well, that’s changing today! Here are 10 gifs that I use (or want to remember to use) in stats and data situations. While it will never have the market share of therapeutic geometry porn, I get a kick out of them:

  1. When you’ve been sitting through a really boring presentation full of opinions and theory, and someone finally gets to some numbers and evidence:                                              
  2. When someone’s trying to walk you through some risk assessments, but you’re pretty sure they’re mucking with definitions, confused about probability and independence, and you just want to do the math yourself: 
  3. When you’ve been working really hard on a pet theory, and your data is on point, your effect sizes look good and… dice:  Time to run a subgroup analysis!
  4. When you see some amazing data well used and it just makes you fundamentally happy: 
  5. When someone in a meeting uses a ridiculous statistic they clearly haven’t thought through or don’t understand, and you need to send something to your coworker who you just know understands your angst: 
  6. When you’ve done every analysis possible, in every iteration possible, and you can’t find a significant correlation between two things, but then someone asks if you’re 100% there’s actually no relationship between the two variables and you start trying to explain p-values and all they hear is: 
  7. When you import your data in to a new file type and suddenly everything just goes haywire: 
  8. When you’ve been working for hours on your SAS/R code and you’re waiting for it to run and goddammit this better work:
  9. When someone says “gee, I wish we had that data….” and you realize that you actually already pulled it together just for fun, and you’re so excited to say you have it:
  10. …..and then when you realize this makes you sound like an absolute crazy person: 

Got one I missed? Let me know!

Calling BS Read-Along Week 2: Spotting BS

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro, click here or if you want to go back to Week 1 click here.

Hey hey! Welcome back! It is week 2 of the read-along, and we’ve got some good stuff going on today. After spending last week learning what bullshit is, this week we’re going to focus on how to spot it in the wild. This is well timed because a few days ago I had a distressing discussion with a high school teacher-friend who had assigned her kids some of my Intro to Internet Science posts as a prelude to a section on fake news. She had asked them to write an essay about the topic of “fake stuff on the internet” before the discussion in class, and apparently more than a few of them said something to the effect of “that’s nice, but I’ve never heard of fake news so this is not a problem in my life”. Groooooooooooooooooooooooooooan 

Of course the problem with bullshit  is that no one warns you you’re going to see it, and no one slaps you afterwards and says “you just read that uncritically”.  With so much of the bullshit these days being spread by social media, inattentional blindness is in high gear. If 50% of study participants can’t see a gorilla when they’re trying to watch a bouncing ball, what makes you think you’re going to correctly spot bullshit while you’re trying to post pictures/think of funny status updates/score a political point against your uncle/see how your ex is doing????

The only hope is to teach yourself some ticks and remain eternally vigilant. In other words (and with apologies to Hunter S Thompson): I hate to advocate pessimism, skeptical toolkits, the GRIM test and constant paranoia, but they’ve always worked for me.

With that intro, let’s get to the readings! First up is Chapter 12 of Carl Sagan’s Demon Haunted World: Science as a Candle in the Dark: The Fine Art of Baloney Detection. I don’t think I’d read this chapter since I first read this book maybe 15 years ago or so, so it was a lot of fun to read again. Sagan starts by making a differentiation that will be familiar to those who read last week’s piece: those who believe common misconceptions vs those who promote them professionally. The example he uses is being able to contact the dead. He admits to his own longing to talk to his deceased parents  and how much appeal the belief that sometimes you can “feel” the dead has to most of us. As an atheist, he firmly believed the idea of life after death was baloney, but he gives a pass to the large number of people who believe in life after death or even those who believe they’ve had contact with the dead in their personal lives. To him, those beliefs are normal and even if you don’t think they are true or rational, they are hard to criticize. Where his wrath kicks in is those who seek to make money off of promoting this stuff and encouraging people to believe in irrational things, like psychics or mediums. He believes that undermining a society’s ability and desire to seek out independent truth and facts is one of the worst things a person can do. This isn’t just psychics doing this of course, but most of the advertising world as well, who will throw any “fact” at you if you just buy their product. In response to this constant barrage of misinformation and misdirection, he offers a “tool kit” for skeptical thinking. The whole thing is on the 4th and 5th page, but the short version is this:

  • Get independent confirmation of facts
  • Encourage debate
  • Don’t trust authority blindly
  • Come up with multiple possible explanations
  • Don’t stick to one explanation just because it is the one you thought of
  • Find something to quantify, which makes everything easier to compare
  • Make sure the whole chain of the argument works. Don’t let people mumble through part of it.
  • Prefer simple explanations (Occam’s razor)
  • Look for something falsifiable. If something can never be proven wrong, it is, well, never going to be proven wrong.
  • Keep a friendly statistician around at all times

Okay, fine, that last one’s mine, not Sagan’s, but he does come out swinging for well designed experiments. He also includes a really helpful list of the most common logical fallacies (if you want a nice online version, try this one). He concludes with a discussion of corporate advertising, sponsored research, and tobacco companies. Confusing science and skewed research helped promote tobacco for much longer than it should have stuck around.

With the stage set by Sagan, the rest of the readings include some specific tips and tricks to spot various issues with numbers and data. Some are basic plausibility checks, and some are more advanced. These are:

The “what does this number even mean” check: Last week we talked about bullshit as “unclarifiable unclarity”, and this case study is a good example of doing that with numbers. Written by West and Bergstrom, this example looks at a packet of hot cocoa that claims to be “99.9% caffeine free”.  It is not so much that the claim is implausible or even inaccurate, but that it is completely meaningless. If you’re measuring by weight, even a highly caffeinated drink will be mostly “caffeine free”. While it is likely the cocoa actually is low caffeine, this statistic doesn’t give you much insight. It is the appearance of information without any actual substance.

Fermi estimations: A technique named after Enrico Fermi, its focus is to get people to focus on getting people to guess numbers based on the order of magnitude (ie 10 vs 100 vs 1000, etc), not the exact number. When doing rough calculations with large numbers, this can actually yield surprisingly accurate results. To play around with making these estimates, they provide a link to this game here. There’s a good book on this and how to solve problems like “how many piano tuners work in New York City?” called Guesstimation if you’re really in to it.

Being able to focus in on the order of magnitude is surprisingly helpful in spotting bullshit, as is shown in the case study of food stamp fraud numbers. A news report from Fox News says that food stamp fraud costs tax payers $70 million dollars a year, and asked if this level of fraud means it is time to end food stamps. If we take that number at face value, is this a big deal? Using Fermi estimations, you can figure out a ballpark number for total food stamp payouts, and determine that this loss would be around .2% of all benefits paid. That is really close to the number you get if you dig up all the real numbers: .09% of all benefits paid.

GRIM testing: Edging in to the deeper end of the pool, this is a neat little trick that mostly has applications for those reviewing studies with small sample sizes. GRIM stands for “granularity-related inconsistency of means” test, and it is a way of quickly and easily looking for data problems. The full explanation (plus the fascinating story of its development) is here, but here’s the quick version: if your sample size is small and you are counting whole numbers, your mean has to end in very predictable decimal places. If it doesn’t, something’s wrong. For example, a study says that 10 people reported having an average of 2.24 children is bogus. Why? Because 2.24= total number of kids/10, and the total number of kids would have to be 22.4. There are a lot of possible explanations for this, but most of them get down the types of sloppiness or confusion that might make you question other parts of the paper.

By the way, if you want to leave the deep end of the pool and dive right in to the ocean, the author of the GRIM test has a SPRITE test that deals with the implications of standard deviations.

Newcomb-Benford Law: This law is one of my favorites because it was spotted back in 1881 for a reason that simply wouldn’t happen today: uneven wear on books. Back when slide rules were scarce and people had to actually look through a book of numbers to figure out what a logarithm for a certain value was,  an astronomer named Simon Newcomb noticed that the books were really worn out in the first sections where the numbers that started with low numbers were, and rather clean in the back where the leading digits were higher. He began to wonder if “random” numbers found in nature were more likely to start with small digits than large ones, then he just decided to declare it was so and said that the probability that the leading digits was a certain value d was equal to the log((d+1)/d). Basically, a random number like the population of a country will have a 30% chance of starting with 1, and only a 5% chance of starting with a 9.

Despite having very little proof other than a worn out book, it turns out this law is actually pretty true. Machine generated data can gum up the works a bit, but natural phenomena tend to follow this rule. Benford got his name in there by pulling data from hundreds of sources: rivers, populations, physical constants, even random numbers from the pages of Reader’s Digest and categorizing them by leading digit. He got 20,000 numbers together and found that low leading digits simply WERE more common. The proposed mathematical explanations for this are not light reading no matter what they promise, but it is pretty much enough to know that it is a thing. It has been used to detect election fraud and is also used in forensic accounting, but basically all the layperson needs to know is that numbers lists that start with high digits aren’t as plausible as those that start with low ones.

And one more for the road: It is worth noting that there is actually another Benford Law that would be not-irrelevant in a course like this. Benford’s Law of Controversy states that “passion is inversely proportional to the amount of real information available”.

All of these tricks may seem like a lot to keep in mind, so if you want some practice take the advice I give to the high school students: find a cause you really care about and go read bad arguments or propaganda from the “other side”. As I’ve mentioned before, your ability to do math improves dramatically when said math helps you prove a point you feel emotionally attached to. Using this to your advantage while learning these tricks might help you get them down a little faster. Of course the problem with learning these tricks is that unless you’re entirely hypocritical, eventually you might have to turn them around on your own side, so be forewarned of that.To this day the high point of my blogging career is when my political activist brother left me a voicemail screaming “I JUST LEFT A MEETING WITH PEOPLE I LIKE MAKING A POINT I AGREE WITH BUT THEY USED BAD STATISTICS THAT I FIGURED OUT WERE WRONG AND I COULDN’T STOP STARING AT THEM AND I HAD TO CORRECT THEM IN FRONT OF EVERYONE AND THEN THEY TOLD ME IT DIDN’T MATTER AND NOW I’M MAD AT THEM AND YOU!!!!”.

So what am I taking away from this week? A few things:

  1. Even if you’re not a “numbers person”, a good sense of how numbers work can go a long way towards checking the plausibility of a claim
  2. Paranoia is just good sense if people really are out to get you. People who are trying to sell you something are not the most trustworthy sources
  3. Math tricks are fun
  4. People named Benford come up with an unusual number of bullshit related laws

I’m still checking that last one out, but it seems plausible.

And that wraps up this week! Next week we’ll be wallowing in “the natural ecology of bullshit”, so make sure you meander back next Sunday for that. Bring boots. It’ll be fun.

Week 3 is now up! Read it here.

Moral Outrage, Cleansing Fires and Reasonable Expectations

Last week, the Assistant Village Idiot forwarded me a new paper called “A cleansing fire: Moral outrage alleviates guilt and buffers threats to one’s moral identity“. It’s behind a ($40) paywall, but Reason magazine has an interesting breakdown of the study here, and the AVI does his take here. I had a few thoughts about how to think about a study like this, especially if you don’t have access to the paper.

So first, what did the researchers look at and what did they find? Using Mechanical Turk, the researchers had subject read articles that talked about either labor exploitation in other countries or the effects of climate change. They found that personal feelings of guilt about those topics predicted greater outrage at a third-party target, a greater desire to punish that target, and that getting a chance to express that outrage decreased guilt and increased feelings of personal morality. The conclusion being reported is (as the headline says) “Moral outrage is self-serving” and “Perpetually raging about the world’s injustices? You’re probably overcompensating.”

So that’s what’s being reported.  So how do we think through this when we can’t see the paper? Here’s 5 things I’d recommend:

  1. Know what you don’t know about sample sizes and effect sizes Neither the abstract nor the write ups I’ve seen mention how large the effects reported were or how many people participated. Since it was a Mechanical Turk study I am assuming the sample size was reasonable, but the effect size is still unknown. This means we don’t know if it’s one of those unreasonably large effect sizes that should alarm you a bit or one of those small effect sizes that is statically but not practically significant. Given that reported effect size heavily influences the false report probability, this is relevant.
  2. Remember the replication possibilities Even if you think a study found something quite plausible, it’s important to remember that fewer than half of psychological studies end up replicating exactly as the first paper reported. There are lots of possibilities for replication, and even if the paper does replicate it may end up with lots of caveats that didn’t show up in the first paper.
  3. Tweak a few words and see if your feelings change Particularly when it comes to political beliefs, it’s important to remember that context matters. This particular studies calls to mind liberal issues, but do we think it applies to conservative issues too? Everyone has something that gets them upset, and it’s interesting to think through how that would apply to what matters to us. When the commenters read the study article, some of them quickly pointed out that of course their own personal moral outrage was self serving. Free speech advocates have always been forthright that they don’t defend pornographers and offensive people because they like those people, but because they want to preserve free speech rights for themselves and others. Self serving moral outrage isn’t so bad when you put it that way.
  4. Assume the findings will get more generic In addition to the word tweaks in point #3, it’s likely that subsequent replications will tone down the findings. As I covered in my Women Ovulation and Voting post, 3 studies took findings from “women change their vote and values based on their menstrual cycle” to “women may exhibit some variation in face preference based on menstrual cycle”. This happened because some parts of the initial study failed to replicate, and some caveats got added. Every study that’s done will draw another line around the conclusions and narrow their scope.
  5. Remember the limitations you’re not seeing One of the most important parts of any papers is where the authors discuss the limitations of their own work. When you can’t read the paper, you can’t see what they thought their own limitations where. Additionally, it’s hard to tell if there were any interesting non-findings that didn’t get reported. The limitations that exist from the get go give a useful indication of what might come up in the future.

So in other words….practice reasonable skepticism. Saves time, and the fee to read the paper.