5 Things You Should Know About the “Backfire Effect”

I’ve been ruminating a lot on truth and errors this week, so it was perhaps well timed that someone sent me this article on the “backfire effect” a few days ago. The backfire effect is a name given to a psychological phenomena in which attempting to correct someone’s facts actually increases their belief in their original error. Rather than admit they are wrong when presented with evidence they narrative goes, people double down. Given the current state of politics in the US, this has become a popular thing to talk about. It’s popped up in my Facebook feed and is commonly cited as the cause of the “post-fact” era.

So what’s up with this? Is it true that no one cares about facts any more? Should I give up on this whole facts thing and find something better to do with my time?

Well, as with most things, it turns out it’s a bit more complicated than that. Here’s a few things you should know about the state of this research:

  1. The most highly cited paper focused heavily on the Iraq War The first paper that made headlines was from Nyhan and Reifler back in 2010, and was performed on college students at a Midwest Catholic University. They presented some students with stories including political misperceptions, and some with stories that also had corrections. They found that the students that got corrections were more likely to believe the original misperception. The biggest issue this showed up with was whether or not WMDs were found in Iraq. They also tested facts/corrections around the tax code and stem cell research bans, but it was the WMD findings that grabbed all the headlines. What’s notable is that the research was performed in 2005 and 2006, when the Iraq War was heavily in the news.
  2. The sample size was fairly small and composed entirely of college students One of the primary weaknesses of the first papers (as stated by the authors themselves) is that 130 college students are not really a representative sample. The sample was half liberal and 25% conservative. It’s worth noting that they believe that was a representative sample for their campus, meaning all of the conservatives were in an environment where they were the minority. Given that one of the conclusions of the paper was that conservatives seemed to be more prone to this effect than liberals, it’s an important point.
  3. A new paper with a broader sample suggest the “backfire effect” is actually fairly rare. Last year, two researchers (Porter and Wood) polled 8,100 people from all walks of life on 36 political topics and found…..WMDs in Iraq were actually the only issue that provoked a backfire effect. A great Q&A with them can be found here. This is fascinating if it holds up because it means the original research was mostly confirmed, but any attempt at generalization was pretty wrong.
  4. When correcting facts, phrasing mattered One of the more interesting parts of the Porter/Wood study was when the researchers described how they approached their corrections. In their own words “Accordingly, we do not ask respondents to change their policy preferences in response to facts–they are instead asked to adopt an authoritative source’s description of the facts, in the face of contradictory political rhetoric“. They reject heartily “corrections” that are aimed at making people change their mind on a moral stance (like say abortion) and focus only on facts. Even with the WMD question they found that the more straightforward and simple the correction statement, the more people of all political persuasions accepted it.
  5. The 4 study authors are now working together In an exceptionally cool twist, the authors who came to slightly different conclusions are now working together. The Science of Us gives the whole story here, but essentially Nyhan and Reifler praised Porter and Wood’s work, then said they should all work together to figure out what’s going on. They apparently gathered a lot of data during the height of election season and hopefully we will see those results in the near future.

I think this is an important set of points, both because it’s heartwarming (and intellectually awesome!) to see senior researchers accepting that some of their conclusion may be wrong and actually working with others to improve their own work. Next, I think it’s important because I’ve heard a lot of people in my personal life commenting that “facts don’t work” so they basically avoid arguing with those who don’t agree with them. If it’s true that facts DO work as long as you’re not focused on getting someone to change their mind on the root issue, then it’s REALLY important that we know that. It’s purely anecdotal, but I can note that this has been my experience with political debates. Even the most hardcore conservatives and liberals I know will make concessions if you clarify you know they won’t change their mind on their moral stance.

Calling BS Read-Along Week 7: Big Data

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 6 click here.

Well hello week 7! This week we’re taking a look at big data, and I have to say this is the week I’ve been waiting for. Back when I first took a look at the syllabus, this was the topic I realized I knew the least about, despite the fact that it is rapidly becoming one of the biggest issues in bullshit today. I was pretty excited to get in to this weeks readings, and I was not disappointed. I ended up walking away with a lot to think about, another book to read, and a decent amount to keep me up at night.

Ready? Let’s jump right in to it!

First, I suppose I should start with at least an attempt at defining “big data”. I like the phrase from the Wiki page here “Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.” Forbes goes further and compiles 12 definitions here. If you come back from that rabbit hole, we can move in to the readings.

The first reading for the week is “Six Provocations for Big Data” by danah boyd and Kate Crawford. The paper starts off with a couple of good quotes (my favorite: ” Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care”) and a good vocab word/warning for the whole topic: apophenia, the tendency to see patterns where none exist. There’s a lot in this paper (including a discussion about what Big Data actually is), but the six provocations the title talks about are:

  1. Automating Research Changes the Definition of Knowledge Starting with the example of Henry Ford using the assembly line, boyd and Crawford question how radically Big Data’s availability will change what we consider knowledge. If you can track everyone’s actual behavior moment by moment, will we end up de-emphasizing the why of what we do or broader theories of development and behavior? If all we have is a (big data) hammer, will all human experience end up looking like a (big data) nail?
  2. Claims to Objectivity and Accuracy are Misleading I feel like this one barely needs to be elaborated on (and is true of most fields), but it also can’t be said often enough. Big Data can give the impression of accuracy due to sheer volume, but every researcher will have to make decisions about data sets that can introduce bias. Data cleaning, decisions to rely on certain sources, and decisions to generalize are all prone to bias and can skew results. An interesting example given was the original Friendster (Facebook before there was Facebook for the kids, the Betamax to Facebook’s VHS for the non-kids). The developers had read the research that people in real life have trouble maintaining social networks of over 150 people, so they capped the friend list at 150. Unfortunately for them, they didn’t realize that people wouldn’t use online networks the same way they used networks in real life. Perhaps unfortunately for the rest of us, Facebook did figure this out, and the rest is (short term) history.
  3. Bigger Data are Not Always Better Data Guys, there’s more to life than having a large data set. Using Twitter data as an example, they point out that large quantities of data can be just as biased (one person having multiple accounts, non-representative user groups) as small data sets, while giving some people false confidence in their results.
  4. Not all Data are Equivalent With echos of the Friendster example from the second point, this point flips the script and points out that research done using online data doesn’t necessarily tell us how people interact in real life. Removing data from it’s context loses much of it’s meaning.
  5. Just Because it’s Accessible Doesn’t Make it Ethical The ethics of how we use social media isn’t limited to big data, but it definitely has raised a plethora of questions about consent and what it means for something to be “public”. Many people who would gladly post on Twitter might resent having those same Tweets used in research, and many have never considered the implications of their Tweets being used in this context. Sarcasm, drunk tweets, and tweets from minors could all be used to draw conclusions in a way that wouldn’t be okay otherwise.
  6. Limited Access to Big Data Creates New Digital Divides In addition to all the other potential problems with big data, the other issue is who owns and controls it. Data is only as good as your access to it, and of course nothing obligates companies who own it to share it, or share it fairly, or share it with people who might use it to question their practices. In assessing conclusions drawn from big data, it’s important to keep all of those issues in mind.

The general principles laid out here are a good framing for the next reading the Parable of the Google Flu, an examination of why Google’s Flu Trends algorithm consistently overestimated influenza rates in comparison to CDC reporting. This algorithm was set up to predict influenza rates based on the frequency of various search terms in different regions, but over 108 weeks examined it overestimated rates 100 times, sometimes by quite a bit. The paper contains a lot of interesting discussion about why this sort of analysis can err, but one of the most interesting factors was Google’s failure to account for Google itself. The algorithm was created/announced in 2009, and some updates were announced in 2013. Lazer et al point out that over that time period Google was constantly refining its search algorithm, yet the model appears to assume that all Google searches are done only in response to external events like getting the flu. Basically Google was attempting to change the way you search, while assuming that no one could ever change the way you search. They call this internal software tinkering “blue team” dynamics, and point out that it’s going to be hell on replication attempts. How do you study behavior across a system that is constantly trying to change behavior? Also considered are “red team” dynamics, where external parties try to “hack” the algorithm to produce results they want.

Finally we have an opinion piece from a name that seems oddly familiar, Jevin West, called “How to improve the use of metrics: learn from game theory“. It’s short, but got a literal LOL from me with the line “When scientists order elements by molecular weight, the elements do not respond by trying to sneak higher up the order. But when administrators order scientists by prestige, the scientists tend to be less passive.” West points out that when attempting to assess a system that can respond immediately to your assessment, you have to think carefully about what behavior your chosen metrics reward. For example, currently researchers are rewarded for publishing a large volume of papers. As a result, there is concern over the low quality of many papers, since researchers will split their findings in to the “least publishable unit” to maximize their output. If the incentives were changed to instead have researchers judged based on only their 5 best papers, one might expect the behavior to change as well. By starting with the behaviors you want to motivate in mind, you can (hopefully) create a system that encourages those behaviors.

In addition to those readings, there are two recommend readings that are worth noting. The first is Cathy O’Neil’s Weapons of Math Destruction (a book I’ve started but not finished), which goes in to quite a few examples of problematic algorithms and how they effect our lives. Many of O’Neil’s examples get back to point #6 from the first paper in ways most of don’t consider. Companies maintaining control over their intellectual property seems reasonable, but what if you lose your job because your school system bought a teacher ranking algorithm that said you were bad? What’s your recourse? You may not even know why you got fired or what you can do to improve. What if the algorithm is using a characteristic that it’s illegal or unethical to consider? Here O’Neil points to sentencing algorithms that give harsher jail sentences to those with family members who have also committed a crime. Because the algorithm is supposedly “objective”, it gets away with introducing facts (your family members involvement in crimes you didn’t take part in) that a prosecutor would have trouble getting by a judge under ordinary circumstances. In addition, some algorithms can help shape the very future they say they are trying to predict. Why are Harvard/Yale/Stanford the best colleges in the US News rankings? Because everyone thinks they’re the best. Why do they think that? Look at the rankings!

Finally, the last paper is from Peter Lawrence with “The Mismeasurement of Science“. In it Lawrence lays out an impassioned case that the current structure around publishing causes scientists to spend too much time on the politics of publication and not enough on actual science. He also questions heavily who is rewarded by such a system, and if those are the right people. It reminded me of another book I’ve started but not finished yet “Originals: How Non-Conformists Move the World”. In that book Adam Grant argues that if we use success metrics based on past successes, we will inherently miss those who might have a chance at succeeding in new ways. Nicholas Nassim Taleb makes a similar case in Antifragile, where he argues that some small percentage of scientific funding should go to “Black Swan” projects….the novel, crazy, controversial destined-to-fail type research that occasionally produces something world-changing.

Whew! A lot to think about this week and these readings did NOT disappoint. So what am I taking away from this week? A few things:

  1. Big data is here to stay, and with it come ethical and research questions that may require new ways of thinking about things.
  2. Even with brand new ways of thinking about things, it’s important to remember the old rules and that many of them still apply
  3. A million plus data points does not  =/= scientific validity
  4. Measuring systems that can respond to being measured should be approached with some idea of what you’d like that response to be, along with some plans for change if you have unintended consequences
  5. It is increasingly important to scrutinize sources of data, and to remember what might be hiding in “black box” algorithms
  6. Relying too heavily on the past to measure the present can increase the chances you’ll miss the future.

That’s all for this week, see you next week for some publication bias!

Week 8 is up! Read it here.

I Got a Problem, Don’t Know What to do About It

Help and feedback request! This past weekend I encountered an interesting situation where I discovered that a study I had used to help make a point in several posts over the years has come under some scrutiny (full story at the bottom of the post). I have often blogged about meta-science, but this whole incident got me thinking about meta-blogging, and what the responsibility of someone like me is when you find out a study you’ve leaned on may not be as good as you thought it was. I’ve been poking around the internet for a few days, and I really can’t find much guidance on this.

I decided to put together a couple quick poll questions to gauge people’s feelings on this. Given that I tend to have some incredibly savvy readers, I would also love to hear more lengthy opinions either in the comments or sent to me directly.  The polls will stay open for a month, and I plan on doing a write up of the results. The goal of these poll questions is to assess a starting point for error correction, as I completely acknowledge the specifics of a situation may change people’s views. If you have strong feelings about what would make you take error correction more or less seriously, please leave it in the comments!

Why I’m asking (aka the full story)

This past weekend I encountered a rather interesting situation that I’m looking for some feedback on. I was writing my post for week 6 of the Calling BS read-along, and remembered an interesting study that found that  people were more likely to find stories with “science pictures” or graphs credible than those that were just text. It’s a study I had talked about in one of my Intro to Internet Science posts  and I have used it in presentations to back up my point that graphs are something you should watch closely. Since the topic of the post was data visualization and the study seemed relevant, I included it in the intro to my write up.

The post had only been up for a few hours when I got a message from someone tipping me off that the lab the study was from was under some scrutiny for some questionable data/research practices. They thought I might want to review the evidence and consider removing the reference to the study from my post. While the study I used doesn’t appear to be one of the ones being reviewed at the moment, I did find the allegations against the lab concerning. Since the post didn’t really change without the citation, I edited the post to remove the citation and replaced it with a note alerting people the paragraph had been modified. I put a full explanation at the bottom of the post that included the links to a summary of the issue and the research lab’s response.

I didn’t stop thinking about it though. There’s not much I could have done about using the study originally….I started citing it almost a full year before concerns were raised, and the “visuals influence perception” point seemed reasonable. I’ll admit I missed the story about the concerns with the research group, but even if I’d seen it I don’t know if I would have remembered that they were the ones who had done that study. Now that I know though, I’ve been mulling over what the best course of action is in situations like this. As someone who at least aspires to blog about truth and accuracy, I’ve always felt that I should watch my own blogging habits pretty carefully. I didn’t really question removing the reference, as I’ve always tried to update/modify things when people raise concerns. I also don’t modify posts after they’ve been published without noting that I’ve done so, other than fixing small typos. I feel good about what I did with that part.

What troubled me more was the question of “how far back to I go?” As I mentioned, I know I’ve cited that study previously. I know of at least one post where I used it, and there may be more. Given that my Intro to Internet Science series is occasionally assigned by high school teachers, I feel I have some obligation to go a little retro on this.

 

Current hypothesis (aka my gut reaction)

My gut reaction here is that I should probably start keeping an updates/corrections/times I was wrong page just to discuss these issues. While I think notations should be made in the posts themselves, some of them warrant their own discussion. If I’m going to blog about where others go wrong, having a dedicated place to discuss where I go wrong seems pretty fair.  I also would likely put some links to my “from the archives” columns to have a repository for posts that have more updates versions. Not only would this give people somewhere easy to look for updates, give some transparency to my own process and weaknesses, but it would also probably give me a better overview of where I tend to get tripped up and help me improve. If I get really crazy I might even start doing root cause analysis investigations in to my own missteps. Thoughts on this or examples of others doing this would be appreciated.

 

Calling BS Read-Along Week 6: Data Visualization

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 5 click here.

Oh man oh man, we’re at the half way point of the class! Can you believe it? Yup, it’s Week 6, and this week we’re going to talk about data visualization. Data visualization is an interesting topic because good data with no visualization can be pretty inaccessible, but a misleading visualization can render good data totally irrelevant. Quite the conundrum. [Update: a sentence that was originally here has been removed. See bottom of the post for the original sentence and the explanation] It’s easy to think of graphics as “decorations” for the main story, but as we saw last week with the “age at death graph”, sometimes those decorations get far more views than the story itself.

Much like last week, there’s a lot of ground to cover here, so I’ve put together a few highlights:

Edward Tufte The first reading is the (unfortunately not publicly available) Visual Display of Quantitative Information by the godfather of all data viz Edward Tufte.  Since I actually own this book I went and took a look at the chapter, and was struck by how much of his criticism was really a complaint about the same sort of “unclarifiable unclarity” we discussed in Week 1 and 2. Bad charts can arise because of ignorance of course, but frequently they exist for the same reason verbal or written bullshit does. Sometimes people don’t care how they’re presenting data as long as it makes their point, and sometimes they don’t care how confusing it is as long as they look impressive. Visual bullshit, if you will. Anything from Tufte is always worth a read, and this book is no exception.

Next up are the “Tools and Tricks” readings which are (thankfully) quite publicly available. These cover a lot of good ground themselves, so I suggest you read them.

Misleading axes The first reading goes through the infamous but still-surprisingly-commonly-used case of the chopped y-axis. Bergstrom and West put forth a very straightforward rule that I’d encourage the FCC to make standard in broadcasting: bar charts should have a y-axis that starts at zero, line charts don’t have to. Their reasoning is simple: bar charts are designed to show magnitude, line charts are designed to show variation, therefore they should have different requirements. A chart designed to show magnitude needs to show the whole picture, whereas one designed to show variation can just show variation. There’s probably a bit of room to quibble about this in certain circumstances, but most of the time I’d let this bar chart be your guide:

They give several examples of charts, sometimes published or endorsed by fairly official sources screwing this up, just to show us that no one’s immune. While the y-axis gets most of the attention, it’s worth noting the x-axis should be double check too. After all, even the CDC has been known to screw that up. Also covered are the problems with multiple y-axes, which can give impressions about correlations that aren’t there or have been scaled-for-drama. Finally, they cover what happens when people invert axes and just confuse everybody.

Proportional Ink The next tool and trick reading comes with a focus on “proportional ink” and is similar to the “make sure your bar chart axis includes zero” rule the first reading covered. The proportional ink rule is taken from the Tufte book and it says: “The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented”. 

[Added for clarity: While Tufte’s rule here can refer to all sorts of design choices, the proportional ink rule hones in on just one aspect: the shaded area of the graph.] This rule is pretty handy because it gives some credence to the assertion made in the misleading axes case study: bar charts need to start at zero, line charts don’t. The idea is that since bar charts are filled in, not starting them at zero violates the proportional ink rule and is misleading visually. To show they are fair about this, the case study also asserts that if you fill in the space under a line graph you should be starting at zero. It’s all about the ink.

Next, we dive in to the land of bubble charts, and then things get really murky. One interesting problem they highlight is that in this case following the proportional ink rule can actually lead to some visual confusion, as people are pretty terrible at comparing the sizes of circles. Additionally, there are two different ways to scale circles: area and radius. Area is probably the fairer one, but there’s no governing body enforcing one way or the other. Basically, if you see a graph using circles, make sure you read it carefully. This goes double for doughnut charts. New rule of thumb: if your average person can’t remember how to calculate the area of a shape, any graph made with said shape will probably be hard to interpret. Highly suspect shapes include:

  • Circles
  • Anything 3-D
  • Pie charts (yeah, circles with angles)
  • Anything that’s a little too clever

To that last point, they also cover some of the more dense infographics that have started popping up in recent years, and how carefully you must read what they are actually saying in order to judge them accurately. While I generally applaud designers who take on large data sets and try to make them accessible, sometimes the results are harder to wade through than a table might have been. My dislike for infographics is pretty well documented, so I feel compelled to remind everyone of this one from Think Brilliant:

Lots of good stuff here, and every high school math class would be better off if they taught a little bit more of this right from the start. Getting good numbers is one thing, but if they’re presented in a deceptive or difficult to interpret way, people can still be left with the wrong impression.

Three things I would add:

  1. Track down the source if possible One of the weird side effects of social media is that pictures are much easier to share now, and very easy to detach from their originators. As we saw last week with the “age at death” graph, sometimes graphs are created to accompany nuanced discussions and then the graph gets separated from the text and all context is lost. One of the first posts I ever had go somewhat viral had a graph in it, and man did that thing travel. At some point people stopped linking to my original article and started reporting that the graph was from published research. Argh! It was something I threw together in 20 minutes one morning! It even had axis/scale problems that I pointed out in the post and asked for more feedback! I gave people the links to the raw data! I’ve been kind of touchy about this ever since….and I DEFINITELY watermark all my graphs now. Anyway, my personal irritation aside, this happens to others as well. In my birthday post last year I linked to a post by Matt Stiles who had put together what he thought was a fun visual (now updated) of the most common birthdays. It went viral and  quite a few people misinterpreted it, so he had to put up multiple amendments.  The point is it’s a good idea find the original post for any graph you find, as frequently the authors do try to give context to their choices and may provide other helpful information.
  2. Beware misleading non-graph pictures too I talk about this more in this post, but it’s worth noting that pictures that are there just to “help the narrative” can skew perception as well. For example, one study showed that news stories that carry headlines like “MAN MURDERS NEIGHBOR” while showing a picture of the victim cause people to feel less sympathy for the victim than headlines that say “LOCAL MAN MURDERED”. It seems subconsciously people match the picture to the headline, even if the text is clear that the picture isn’t of the murderer. My favorite example (and the one that the high school students I talk to always love) is when the news broke that only .2% of Tennessee welfare applicants tested under a mandatory drug testing program tested positive for drug use. Quite a few news outlets published stories talking about how low the positive rate was, and most of them illustrated the story with a picture of a urine sample or blood vial. The problem? The .2% positive rate came from a written drug test. The courts in Tennessee had ruled that taking blood or urine would violate the civil rights of welfare applicants, and since lawmakers wouldn’t repeal the law, they had to test them somehow. More on that here. I will guarantee you NO ONE walked away from those articles realizing what kind of drug testing was actually being referenced.
  3. A daily dose of bad charts is good for you Okay, I have no evidence for that statement, I just like looking at bad charts. Junk Charts by Kaiser Fung and the WTF VIZ tumblr and Twitter feed are pretty great.

Okay, that’s all for Week 6! We’re headed in to the home stretch now, hang in there kids.

Week 7 is up! Read it here.

Update from 4/10/17 3:30am ET (yeah, way too early): This post originally contained the following sentence in the first paragraph: “Anyway it’s an important issue to keep in mind since there’s evidence that suggests that merely seeing a graph next to text can make people perceive a story as more convincing and data as more definitive, so this is not a small problem.”  After I posted, it was pointed out to me that the study I linked to in that  sentence is from a lab whose research/data practices have recently come in for some serious questioning.  The study I mentioned doesn’t appear to be under fire at the moment, but the story is still developing and it seems like some extra skepticism for all of their results is warranted. I moved the explanation down here so as to not interrupt the flow of the post for those who just wanted a recap. The researcher under question (Brian Wansink) has issued a response here.

5 Things You Should Know About Statistical Process Control Charts

Once again I outdo myself with the clickbait-ish titles, huh? Sorry about that, I promise this is actually a REALLY interesting topic.

I was preparing a talk for a conference this week (today actually, provided I get this post up when I plan to), and I realized that statistical process control charts (or SPC charts for short) are one of the tools I use quite often at work but don’t really talk about here on the blog. Between those and my gif usage, I think you can safely guess why my reputation at work is a bit, uh, idiosyncratic. For those of you who have never heard of an SPC chart, here’s a quick orientation. First, they look like this:

(Image from qimacros.com, and excellent software for generating these)

The chart is used for plotting something over time….hours, days, weeks, quarters, years, or “order in line”…take your pick.  Then you map some ongoing process or variable you are interested in…..say employee sick calls. You measure employee sick calls in some way (# of calls or % of employees calling in) in each time period. This sets up a baseline average, along with “control limits”, which are basically 1, 2 and 3 standard deviation ranges. If at some point your rate/number/etc starts to go up or down, the SPC chart can tell you if the change is significant or not based on where it falls on the plot.  For example, if you have one point that falls outside the 3 standard deviation line, that’s significant. If two in a row fall outside the 2 standard deviation line, that’s significant as well. The rules for this vary by industry, and Wiki gives a pretty good overview here. At the end of this exercise you have a really nice graph of how you’re doing with a good visual of any unusual happenings, all with some statistical rigor behind it. What’s not to love?

Anyway, I think because they take a little bit of getting used to,  SPC charts do not always get the love they deserve. I would like to rectify this travesty, so here’s 5 things you should know about them to tempt you to go learn more about them:

  1. SPC charts are probably more useful for most business than hypothesis testing While most high school level statistics classes at least take a stab at explaining p-values and hypothesis testing to kids, almost none of them even show an example of a control chart. And why not? I think it’s a good case of academia favoring itself. If you want to test a new idea against an old idea or to compare two things at a fixed point in time p-values and hypothesis testing are pretty good. That’s why they’re used in most academic research. However, if you want see how things are going over time, you need statistical process control. Since this is more relevant for most businesses, people who are trying to keep track of any key metric should DEFINITELY know about these.   Six Sigma and many process improvement class teach statistical process control, but they still don’t seem widely used outside of those settings. Too bad. These graphs are  practical, they can be updated easily, and it gives you a way of monitoring what’s going on and lot of good information about how your process are going. Like what? Well, like #2 on this list:
  2. SPC charts track two types of variation Let’s get back to my sick call example. Let’s say that in any given month, 10% of your employees call in sick. Now most people realize that not every month will be exactly 10%. Some months it’s 8%, some months it’s 12%. What statistical process control charts help calculate is when those fluctuations are most likely just random (known as common cause variation) and the point at which they are probably not so random (special cause variation). It sets parameters that tell you when you should pay attention. They are better than p-values for this because you’re not really running an experiment every month….you just want to make sure everything’s progressing as it usually does. The other nice part is this translates easily in to a nice visual for people, so you can say with confidence “this is how it’s always been” or “something unusual is happening here” and have more than your gut to rely on.
  3. SPC charts help you test new things, or spot concerning trends quickly SPC charts were really invented for manufacturing plants, and were perfected and popularized in post-WWII Japan. One of the reasons for this is that they really loved having an early warning about when a machine might be breaking down or an employee might not be following the process. If the process goes above or below a certain red line (aka the “upper/lower control limit”) you have a lot of confidence something has gone wrong and can start investigating right away. In addition to this, you can see if a change you made helps anything. For example, if you do a handwashing education initiative, you can see what percentage of your employees call in sick the next month. If it’s below the lower control limit, you can say it was a success, just like with traditional p-values/hypothesis testing. HOWEVER, unlike p-values/hypothesis testing, SPC charts make allowances for time. Let’s say you drop the sick calls to 9% per month, but then they stay down for 7 months. Your SPC chart rules now tell you you’ve made a difference. SPC charts don’t just take in to account the magnitude of the change, but also the duration. Very useful for any metric you need to track on an ongoing basis.
  4. They encourage you not to fix what isn’t broken One of the interesting reasons SPC charts caught on so well in the manufacturing world is that the idea of “opportunity cost” was well established. If your assembly line puts out a faulty widget or two, it’s going to cost you a lot of money to shut the whole thing down. You don’t want to do that unless it’s REALLY broken. For our sick call example, it’s possible that what looks like an increase (say to 15% of your workforce) isn’t a big deal and that trying to interfere will cause more harm than good. Always good to remember that there are really two ways of being wrong: missing a problem that does exist, and trying to fix one that doesn’t.
  5. There are quite a few different types One of the extra nice things about SPC charts is that there are actually 6 types to chose from, depending on what kind of data you are working with. There’s a helpful flowchart to pick your type here, but a good computer program (I use QI macros) can actually pick for you. One of the best parts of this is that some of them can deal with small and varying sample sizes, so you can finally show that going from 20% to 25% isn’t really impressive if you just lowered your volume from 5 to 4.

So those are some of my reasons you should know about these magical little charts. I do wish they’d get used more often because they are a great way of visualizing how you’re doing on an ongoing basis.

If you want to know more about the math behind them and more uses (especially in healthcare), try this presentation. And wish me luck on my talk! Pitching this stuff right before lunch is going to be a challenge.

Calling BS Read-Along Week 5: Statistical Traps and Trickery

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 4 click here.

Well hi there! Welcome to week 5 of the Calling Bullshit Read-Along. An interesting program note before we get started: there is now a “suitable for high school students” version of the Calling Bullshit website here. Same content, less profanity.

This week we dive in to a topic that could be its own semester long class “Statistical Traps and Trickery“. There are obviously a lot of ways of playing around with numbers to make them say what you want, so there’s not just one topic for the week. The syllabus gives a fairly long list of tricks, and the readings hit some highlights and link to some cool visualizations. One at a time these are:

Simpson’s Paradox This is a bit counterintuitive, so this visualization of the phenomena is one of the more helpful ones I’ve seen. Formally, Simpson’s paradox is when “the effect of the observed explanatory variable on the explained variable changes directions when you account for the lurking explanatory variable”. Put more simply, it is when the numbers look like there is bias in one direction, but when you control for another variable the bias goes in the other  direction. The most common real life example of this is when UC Berkeley got sued for discriminating against women in grad school admissions, only to have the numbers show they actually slightly favored women. While it was true they admitted more men than women, when you controlled for individual departments a higher proportion of women were getting in to those programs. Basically a few departments with lots of female applicants were doing most of the rejecting, and their numbers were overshadowing the other departments. If you’re still confused, check out the visual, it’s much better than words.

The Will Rogers Phenomenon I love a good pop culture reference in my statistics (see here and here), and thus have a soft spot for the Will Rogers Phenomenon.  Based on the quote “When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states”, this classic paper points to an interesting issue raised by improvements in diagnostic technology. In trying to compare outcomes for cohorts of lung cancer patients from different decades, Feinstein realized that new imaging techniques were resulting in more patients being classified as having severe disease. While these patients were actually more severe than their initial classification, they were also less severe than their new classification. In other words, the worst grade 1 patients were now the best grade 3 patients , making it look like survival rates were improving for both the grade 1 group (who lost their highest risk patients) and group 3 (who gained less severe patients). Unfortunately for all of us, none of this represented a real change in treatment, it was just numerical reshuffling.

Lead time bias Also mentioned in the reading above, this is the phenomena of “improving” survival rates simply by catching diseases earlier. For example, let’s say you were unfortunate enough to get a disease that would absolutely kill you 10 years from the day you got it. If you get diagnosed 8 years in, it looks like you survived for 2 years. If everyone panics about it and starts testing everyone for this disease, they might start catching it earlier. If improved testing now means the disease is caught at the 4 year mark instead of the 8 year mark, it will appear survival has improved by 4 years. In some cases though, this doesn’t represent a real increase in the length of survival, just an increase in the length of time you knew about it.

Case Study: Musicians and mortality This case study combines a few interesting issues, and examines a graph of musician “average age at death” which went viral.

As the case study covers, there are a few issues with this graph, most notably that it right-censors the data. Basically, musicians in newer genres die young because they still are young. While you can find Blues artists in their 80s, there are no octogenarian rappers. Without context though, this graph is fairly hard to interpret correctly. Most notably quite a few people (including the Washington Post) confused “average age at death” with “life expectancy”, which both appear on the graph but are very different things when you’re discussing a cohort that is still mostly alive. While reviewing what went wrong in this graph is interesting, the best part of this case study comes at the end where the author of the original study steps in to defend herself. She points out that she herself is the victim of a bit of a bullshit two step. In her paper and the original article, she included all the proper caveats and documented all the shortcomings of her data analysis, only to have the image go viral without any of them. At that point people stopped looking at the source and misreported things, and she rightly objects to being blamed for that. This reminds me of something someone sent me a few years ago:

Case Study: On Track Stars Cohort Effects and Not Getting Cocky In this case study, Bergstrom quite politely takes aim at one of his own graphs, and points out a time he missed a caveat for some data. He had created a graph that showed how physical performance for world record holders declines with age:

He was aware of two possible issues in the data: 1) that it represents only the world records, not how individuals vary and 2) that it only showed elite athletes. What a student pointed out to him is that there was probably a lot of sample size variation in here too.  The cohort going for the record in the 95-100 year old age group is not the same size as the cohort going for the record in the 25-30 year old age group. It’s not an overly dramatic oversight, but it does show how data issues can slip in without you even realizing it.

Well those are all the readings for the week, but there were a few other things mentioned in the list of stats tricks that I figured I’d point to my own writings on:

Base Rate Fallacy: A small percentage of a large number is often larger than a large percentage of a small number. I wrote about this in “All About that Base Rate“.

Means vs Medians: It truly surprises me how often I have to point out to people how that average might be lying to you.

Of course the godfather of all of this is How to Lie With Statistics, which should be recommended reading for every high school student in the country.

While clearly I could go on and on about this, I will stop here. See you next week when we dive in to visualizations!

Week 6 is up, read it here!

Blood Sugar Model Magik?

An interesting new-to-me study came on my radar this week “Personalized Nutrition by Prediction of Glycemic Responses” published by Zeevi et al in 2015. Now, if you’ve ever had the unfortunate experience of talking about food with me in real life, you probably know I am big on  quantifying things and particularly obsessed with blood sugar numbers. The blood sugar numbers thing started when I was pregnant with my son and got gestational diabetes. 4 months of sticking yourself with a needle a couple of times a day will do that to a person.

Given that a diagnosis of gestational diabetes is correlated with a much higher risk of an eventual Type 2 diabetes diagnosis, I’ve been pretty interested in what effects blood sugar numbers. One of those things is the post-prandial glucose response (PPGR) or basically how high your blood sugar numbers go after you eat a meal. Unsurprisingly, chronically high numbers after meals tend to correlate with overall elevated blood sugar and diabetes risk. To try and help people manage this response the glycemic index was created, which attempted to measure what an “average” glucose response to particular foods. This sounds pretty good, but the effects of using this as a basis for food choices in non-diabetics have been kind of mixed. While it appears that eating all high glycemic index foods (aka refined carbs) is bad, it’s not clear that parsing things out further is very helpful.

There are a lot of theories about why glycemic index may not work that well: measurement issues (it measures an area under a curve without taking in to account the height of the spike), the quantities of food eaten (watermelon has a high glycemic index, but it’s hard to eat too much of it calorie-wise), or the effects of mixing foods with each other (the values were determined by having people eat just one food at a time). Zeevi et al had yet another theory: maybe the problem was taking the “average” response. Given that averages can often hide important information about the population they’re describing, they wondered if individual variability was mucking about with the accuracy of the numbers.

To test this theory, they recruited 800 people, got a bunch of information about them, and hooked them up to a continuous glucose monitor and had them log what they ate. They discovered that while some foods caused a similar reaction in everyone (white bread for example), some foods actually produced really different responses (pizza or bananas for example). They then used factors like BMI, activity level, gut microbiome data to build a model that they hoped would predict who would react to what food.

To give this study some real teeth, they then took the model they built and applied it to 100 new study participants. This is really good because it means they tested if they overfit their model….i.e. tailored it too closely to the original group to get an exaggerated correlation number. They showed that their model worked just as well on the new group as the old group (r=.68 vs r=.70). To take it a step further, they recruited 26 more people, got their data, then feed them a diet predicted to be either “good” or “bad” for them.  They found overall that eating the “good” diet helped keep blood sugar in check as compared to just regular carbohydrate counting.

The Atlantic did a nice write up of the study here, but a few interesting/amusing things I wanted to note:

  1. Compliance was high Nutrition research has been plagued by self reporting bias and low compliance to various diets, but apparently that wasn’t a problem in this study. The researchers found that by emphasizing to people what the immediate benefit to them would be (a personalized list of “good” and “bad” foods, people got extremely motivated to be honest. Not sure how this could be used in other studies, but it was interesting.
  2. They were actually able to double blind the study Another chronic issue with nutrition research is the inability to blind people to what they’re eating. However, since people didn’t know what their “good” foods were, it actually was possible to do some of that for this study. For example, some people were shocked to find that their “good” diet had included ice cream or chocolate.
  3. Carbohydrates  and fat content were correlated with PPGR, but not at the same level for everyone At least for glucose issues, it turns out the role of macronutrients was more pronounced in some people than others. This has some interesting implications for broad nutrition recommendations.
  4. Further research confirmed the issues with glycemic index  In the Atlantic article, some glycemic index proponents were cranky because this study only compared itself to carb counting, not the glycemic index. Last year some Tufts researchers decided to focus just on the glycemic index response and found that inter-person variability was high enough that they didn’t recommend using it.
  5. The long term effects remain to be seen It’s good to note that the nutritional intervention portion of this study was just one week, so it’s not yet clear if this information will be helpful in the long run. On the one hand, it seems like personalized information could be really helpful to people…it’s probably easier to avoid cookies if you know you can still have ice cream. On the other hand, we don’t yet know how stable these numbers are. If you cut out cookies entirely but keep ice cream in your diet, will your body react to it the same way in two years?

That last question, along with “how does this work in the real world” is where the researchers are going next. They want to see if people getting personalized information are less likely to develop diabetes over the long term. I can really see this going either way. Will people get bored and revert to old eating patterns? Will they overdo it on foods they believe are “safe”? Or will finding out you can allow some junk food increase compliance and avoid disease? As you can imagine, they are having no trouble recruiting people. 4,000 people (in Israel) are already on their waiting list, begging to sign up for future studies. I’m sure we’ll hear more about this in the years to come.

Personally, I’m fascinated by the whole concept. I read about this study in Robb Wolf’s new book “Wired to Eat“, in which he proposes a way people can test their own tolerance for various carbohydrates at home. Essentially you follow a low to moderate carbohydrate paleo (no dairy, no legumes, no grain) plan for 30 days, then test your blood glucose response to a single source of carbohydrates every day for 7 days. I plan on doing this and will probably post the results here. Not sure what I’ll do with the results, but like I said, I’m a sucker for data experiments like this.

Calling BS Read-Along Week 4: Causality

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 3 click here.

Well hello week 4! We’re a third of the way through the class, and this week we’re getting a crash course in correlation/causation confusion, starting with this adapted comic:

Man, am I glad we’re taking a look at this. Correlating variables is one of the most common statistical techniques there is, but it is also one of the most commonly confused. Any time two variables are correlated, there are actually quite a few possible explanations such as:

  1. Thing A caused Thing B (causality)
  2. Thing B caused Thing A (reversed causality)
  3. Thing A causes Thing B which then makes Thing A worse (bidirectional causality)
  4. Thing A causes Thing X causes Thing Y which ends up causing Thing B (indirect causality)
  5. Some other Thing C is causing both A and B (common cause)
  6. It’s due to chance (spurious or coincidental)

You can find examples of each here, but the highlight is definitely the Spurious Correlations website.  Subjects include the theory that Nicolas Cage movies cause drownings and why you don’t want to eat margarine in Maine.

With that framing, the first reading is an interesting anecdote that highlights both correlation/causation confusion AND why sometimes it’s the uncorrelated variables that matter. In Milton Friedman’s thermostat analogy, Friedman ponders what would happen if you tried to analyze the relationship between indoor temperature, outdoor temperature and energy usage in a home. He points out that indoor temperature would be correlated with neither variable, as the whole point is to keep that constant. If you weren’t familiar with the system, you could conclude that using energy caused a drop in temperatures, and that the best way to stay warm would be to turn off the furnace. A good anecdote to keep in mind as it illustrates quite a few issues all at once.

Next up is the awesomely named paper “Storks Deliver Babies (p = 0.008)“. In it, Robert Mathews takes the birth rates in 17 European countries and correlates them with the approximate number of storks in each country and finds a correlation coefficient of .62.  As the title of the paper suggests, this correlation is statistically significant. The author uses this to show the weaknesses of some traditional statistical analyses, and how easy it is to get ridiculous results that sound impressive.

Misleading statistics is also the subject of the Traffic Improvements case study, where a  Seattle news station complained that a public works project cost $74 million but only made the average commute 2 seconds faster, leading to the conclusion that the spending was not correlated with any improvements. When you dig a bit deeper though, you discover that the volume the highway could accomodate rose by 30,000 cars/day.  If you take cars/day as a variable, the spending was correlated with an improvement. This is a bit like the Milton Friedman thermostat example: just because a variable stays constant doesn’t mean it’s not involved. You have to look at the whole system.

Speaking of the whole system, I was interested to note that part way through the case study the Calling BS overlords cited Boston’s own Big Dig and mention that “Boston traffic got better”. As a daily commuter in to Boston, I would like to mention that looking at the whole system here also gives a slightly more complicated picture. While it is true that the Big Dig allowed more cars to move through the city underground, a Boston Globe report noted that this only helped traffic along the route that got worked on. Traffic elsewhere in the city (like say, the area I commute to) got much worse during this time frame, and Boston never lost it’s ranking as one of the most congested cities. Additionally, while the improvements made it possible to handle more cars on the road, the cost overrun severely hampered the cities ability to build or maintain it’s public transportation. Essentially by overspending on getting more cars through, the Big Dig made it necessary for more people to drive. Depending on which metric you pick, the Big Dig is both correlated with success AND failure…plus a tax bill I’m still chipping in money towards on TOP of what I pay for subpar commuter rail service. Not that I’m bitter or anything.

One interesting issue to note here is that sometimes even if journalists do a good job reporting on the nuances of correlation/causation, editors or headline writers can decide to muddy the issue. For example, Slate Star Codex did a great piece on how 4 different news outlets wrote a headline on the same study: 

Unless you were pretty on top of things, I don’t think most people would even recognize those middle two headlines were about the same study as the first and fourth. The Washington Post had to take a headline down after they had declared that if women wanted to stop violence against them they should get married. The new improved headline is below, but the original is circled in red:

It’s easy to think of headlines as innocuous if the text is good, but subtle shifts in headlines do color our perception of anything that comes after it. (I’ve collected most of my links on this research here)

Alright, back to the readings.

Our last piece goes back to 1897 and is written by Mr Correlation Coefficient himself: Karl Pearson. The math to work out the correlation coefficients had no sooner been done than Pearson started noticing people were misusing it. He was particularly concerned about people attributing biological causality to things that actually came from a common cause. Glad to see we’ve moved beyond that. Interestingly, history tells us that in Pearson’s day this was the fault of statisticians who used different methods to get correlations they wanted. After Pearson helped make correlations more rigorous, the problem flipped to people over-attributing meaning to correlations they generated. In other words, 100 years ago people put in correlations that didn’t belong, now they fail to take them out.

Okay, that’s it for this week! We’ll see you back here next week for Statistical traps and trickery.

Week 5 is up! Read it here.

 

Number Blindness

“When the facts are on your side, pound the facts. When the law is on your side, pound the law. When neither is on you side, pound the table.” – old legal adage of unclear origin

Recently I’ve been finding it rather hard to go on Facebook. It seems like every time I log in, someone I know has chosen that moment to start a political debate that is going poorly. It’s not that I mind politics or have a problem with strong political opinions, but what bugs me is how often suspect numbers are getting thrown out to support various points of view. Knowing that I’m a “numbers person”, I have occasionally had people reach out asking me to either support or refute whatever number it is that is being used, or use one of my posts to support/refute what is being said. While some of these requests are perfectly reasonable requests for explanations, I’ve gotten a few recently that were rather targeted “Come up with a reason why my opponent is wrong” type things, with a heavy tone of “if my opponent endorses these numbers, they simply cannot be correct”. This of course put me in a very meta mood, and got me thinking about how we argue about numbers. As a result, I decided to coin a new term for a logical fallacy I was seeing: Number Blindness.

Number Blindness: The phenomena of becoming so consumed by an issue that your cease to see numbers as independent entities and view them only as props whose rightness or wrongness is determined solely by how well they fit your argument

Now I want to make one thing very clear up front: the phenomena I’m talking about is not simply criticizing or doubting numbers or statistics. A tremendous amount of my blogging time is spent writing about why you actually should doubt many of the numbers that are flashed before your eyes. Criticism of numbers is a thing I fully support, no matter whose “side” you’re on.

I am also not referring to people who say that numbers are “irrelevant” to the particular discussion or said that I missed the point. I actually like it when people say that, because it clears the way to have a purely moral/intellectual/philosophical discussion. If you don’t really need numbers for a particular discussion, go ahead and leave them out of it.

The phenomena I’m talking about is actually when people want to involve numbers in order to buffer their argument, but take any discussion of those numbers as offensive to their main point. It’s a terrible bait and switch and it degrades the integrity of facts. If the numbers you’re talking about were important enough to be included in your argument, then they are important enough to be held up for debates about their accuracy. If you’re pounding the table, at least be willing to admit that’s what you’re doing.

Now of course all of this got inspired by some particular issues, but I want to be very clear: everyone does this. We all want to believe that every objective fact points in the direction of the conclusion that we want. While most people are acutely aware of this tendency in whichever political party they disagree with, it is much harder to see it in yourself or in your friends. Writing on the internet has taught me to think carefully about how I handle criticism, but it’s also taught me a lot about how to handle praise. Just like there are many people who only criticize you because you are disagreeing with them, there are an equal number who only praise you because you’re saying something they want to hear. I’ve written before about the idea of “motivated numerancy” (here and for the Sojourners blog here), but studies do show that ability to do math rises and falls depending on how much you like the conclusions that math provides….and that phenomena gets worse the more intelligent you are. As I said in my piece for Sojourners “Your intellectual capacity does NOT make you less likely to make an error — it simply makes you more likely to be a hypocrite about your errors.”

Now in the interest of full disclosure, I should admit that I know number blindness so well in part because I still fall prey to it. It creeps up every time I get worked up about a political or social issue I really care about, and it can slip out before I even have a chance to think through what I’m saying. One of the biggest benefits of doing the type of blogging I do is that almost no one lets me get away with it, but the impulse still lurks around. Part of why I make up these fallacies is to remind myself that guarding against bias and selective interpretation requires constant vigilance.

Good luck out there!

Calling BS Read-Along Week 3: The Natural Ecology of BS

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 2 click here.

Well hi there! It’s week 3 of the read-along, and this week we’re diving in to the natural ecology of bullshit. Sounds messy, but hopefully by the end you’ll have a better handle on where bullshit is likely to flourish.

So what exactly is the ecology of bullshit and why is it important? Well, I think it helps to think of bullshit as a two step process. First, bullshit gets created. We set the stage for this in week one when we discussed the use of bullshit as a tool to make yourself sound more impressive or more passionate about something. However, the ecology of bullshit is really about the second step: sharing, spreading and enabling the bullshit. Like rumors in middle school, bullshit dies on the vine if nobody actually repeats it. There’s a consumer aspect to all of this, and that’s what we’re going to cover now. The readings this week cover three different-but-related conditions that allow for the growth of bullshit: psuedo-intellectual climates, psuedo-profound climates, and social media. Just like we talked about in week one, it is pretty easy to see when the unintelligent are propagating bullshit, but it is a little more uncomfortable to realize how often the more intelligent among us are responsible for their own breed of  “upscale bullshit”.

And where do you start if you have to talk about upscale bullshit? By having a little talk about TED. The first reading is a Guardian article that gets very meta by featuring a TED talk about how damaging the TED talk model can be. Essentially the author argues that we should be very nervous when we start to judge the value of information by how much it entertains us, how much fun we have listening to it, or how smart we feel by the end of it. None of those things are bad in and of themselves, but they can potentially crowd out things like truth or usefulness. While making information more freely available and thinking about how to communicate it to a popular audience is an incredibly valuable skill, leaving people with the impression that un-entertaining science is less valuable or truthful is a slippery slope.1

Want a good example of the triumph of entertainment over good information? With almost 40 million views, Amy Cuddy’s Wonder Woman/power pose talk is the second most watched TED talk of all time. Unfortunately, the whole thing is largely based on a study that  has (so far) failed to replicate. The TED website makes no note of this [Update: After one of the original co-authors publicly stated they no longer supported the study in Oct 2016, TED added the following note to the summary of the talk “Note: Some of the findings presented in this talk have been referenced in an ongoing debate among social scientists about robustness and reproducibility. Read Amy Cuddy’s response under “Learn more” below.”], and even the New York Times and Time magazine fail to note this when it comes up. Now to be fair, Cuddy’s talk wasn’t bullshit when she gave it, and it may not even be bullshit now. She really did do a study (with 21 participants) that found that power posing worked. The replication attempt that failed to find an effect (with 100 participants) came a few years later, and by then it was too late, power posing had already entered the cultural imagination. The point is not that Cuddy herself should be undermined, but that we should be really worried about taking a nice presentation as the final word on a topic before anyone’s even seen if the results hold up.

The danger here of course is that people/things that are viewed as “smart” can have a much farther reach than less intellectual outlets. Very few people would repeat a study they saw in a tabloid, but if the New York Times quotes a study approvingly most people are going to assume it is true. When smart people get things wrong, the reach can be much larger. One of the more interesting examples of the “how a smart person gets things wrong” vs “how everyone else gets things wrong” phenomena I’ve ever seen is from the 1987 documentary “A Private Universe”. In the opening scene Harvard graduates are interviewed at their commencement ceremony and asked a simple question quite relevant to anyone in Boston: why does it get colder in the winter? 21 out of 23 of them get it wrong (hint: it isn’t the earth’s orbit)….but they sound pretty convincing in their wrongness. The documentary then interviews 9th graders, who are clearly pretty nervous and stumble through their answers. About the same number get the question wrong as the Harvard grads, but since they are so clearly unsure of themselves that you wouldn’t have walked away convinced. The Harvard grads weren’t more correct, just more convincing.

Continuing with the theme of “not correct, but sounds convincing”, our next reading is the delightfully named  “On the reception and detection of pseudo-profound bullshit” from Gordon Pennycook.  Pennycook takes over where Frankfurt’s “On Bullshit” left off and actually attempts to empirically study our tendency to fall for bullshit. His particular focus is what others have called “obscurantism” defined as “[when] the speaker… [sets] up a game of verbal smoke and mirrors to suggest depth and insight where none exists”…..or as commenter William Newman said in response to my last post “adding zero dollars to your intellectual bank”. Pennycook proposes two possible reasons we fall for this type of bullshit:

  1. We generally like to believe things rather than disbelieve them (incorrect acceptance)
  2. Purposefully vague statements make it hard for us to detect bullshit (incorrect failure to reject)

It’s a subtle difference, but any person familiar with statistics at all will immediate recognize this as a pretty classic hypothesis test. In real life, these are not mutually exclusive. The study itself took phrases from two websites I just found out existed and am now totally amused by (Wisdom of Chopra and the New Age Bullshit Generator), and asked college students to rank how profound the (buzzword filled but utterly meaningless) sentences were2. Based on the scores, the researchers assigned a “bullshit receptivity scale” or BSR to each participant. They then went through a series of 4 studies that related bullshit receptivity to other various cognitive features. Unsurprisingly, they found that bullshit receptivity was correlated with belief in other potentially suspect beliefs (like paranormal activity), leading them to believe that some people have the classic “mind so open their brain falls out”. They also showed that those with good bullshit detection (i.e. those who could rank legitimate motivational quotes as profound while also ranking nonsense statements as nonsense) scored higher on analytical thinking skills. This may seem like a bit of a “well obviously” moment, but it does suggest that there’s a real basis to Sagan’s assertion that you can develop a mental toolbox to detect baloney. It also was a good attempt at separating out those who really could detect bullshit from those who simply managed to avoid it by saying nothing was profound. Like with the psuedo-intellectualism, the study authors hypothesized that some people are particularly driven to find meaning in everything, so they start finding it in places that it doesn’t exist.

Last but not least, we get to the mother of all bullshit spreaders: social media. While it is obvious social media didn’t create bullshit, it is undeniably an amazing bullshit delivery system. The last paper “Rumor Cascades“, attempts to quantify this phenomena by studying how rumors spread on Facebook. Despite the simple title, this paper is absolutely chock full of interesting information about how rumors get spread and shared on social media, and the role of debunking in slowing the spread of false information. To track this, they took rumors found on Snopes.com and used the Snopes links to track the spread of their associated rumors through Facebook. Along the way they pulled the number of times the rumor was shared, time stamps to see how quickly things were shared (answer: most sharing is done within 6 hours of a post going up), and if responding to a false rumor by linking to a debunking made a difference (answer: yes, if the mistake was embarrassing and the debunking went up quickly). I found this graph particularly interesting, as it showed a fast linking to Snopes (they called it being “snoped”) was actually pretty effective in getting the post taken down:

Snopetoreshare.pngIn terms of getting people to delete their posts, the most successful debunking links were things like “those ‘photos of Trayvon Martin the media doesn’t want you to see’ are not actually of Trayvon Martin“. They also found that while more false rumors are shared, true rumors spread more widely. Not a definitive paper by any means but a fascinating initial look at the new landscape. Love it or hate it, social media is not going away any time soon, and the more we understand about how it is used to spread information, the better prepared we can be3.

Okay, so what am I taking away from this week?

  1. If bullshit falls in the forest and no one hears it, does it make a sound? In order to fully understand bullshit, you have to understand how it travels. Bullshit that no one repeats does minimal damage.
  2. Bullshit can grow in different but frequently overlapping ecosystems Infotainment, the psuedo-profound, and close social networks all can spread bullshit quickly.
  3. Analytical thinking skills and debunking do make a difference The effect is not as overwhelming as you’d hope, but every little bit helps

I think separating out how bullshit grows and spreads from bullshit itself is a really valuable concept. In classic epidemiology disease causation is modeled using the “epidemiologic triad“, which looks like this (source):epidemiologictriad

If we consider bullshit a disease, based on the first three weeks I would propose its triad looks something like this:

triadofbullshit

And on that note, I’ll see you next week for some causality lessons!

Week 4 is up! If you want to read it, click here.

1. If you want  a much less polite version of this rant with more profanity, go here.
2. My absolute favorite part of this study is that part way through they included an “attention check” that asked the participants to skip the answers and instead write “I read the instructions” in the answer box. Over a third of participants failed to do this. However, they pretty much answered the rest of the survey the way the other participants did which kinda calls in to question how important paying attention is if you’re listening to bullshit.
3. It’s not a scientific study and not just about bullshit, but for my money the single most important blog post ever written about the spread of information on the internet is this one right here. Warning: contains discussions of viruses, memetics, and every controversial political issue you can think of. It’s also really long.