Calling BS Read-Along Week 1: Intro to BS

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro, click here.

Well hello hello and welcome to Week 1 of the Read-Along! Before we get started I wanted to give a shout out to the Calling Bullshit Twitter feed, and not just because they informed me yesterday that they are jealous of my name. They post some useful stuff over there, so check them out.

We’re kicking off this thing with an Introduction to Bullshit.   Now you may think you and bullshit are already well acquainted, but it never hurts to set some definitions up front. The first reading is a quick blog post that explains what is commonly known as either “Brandolini’s Law” or “The Bullshit Asymmetry Principle”, which states that “The amount of energy needed to refute bullshit is an order of magnitude bigger than to produce it”. 

Even if you didn’t know there was a name for this, you know the feeling: you’re in a political discussion when someone decides to launch in to something absolutely crazy about “the other side”. Feeling defensive, you look up whatever it is their talking about to find evidence to refute it. Even with a smartphone this can take a few minutes. You find one source disagreeing with them, they declare it biased. You find another, it’s not well sourced enough. One more, from a credible person/publication who is normally on “their side” aaaaand…they drop it with a shrug and mumble that it wasn’t that important anyway. That’s 5-10 minutes of your life gone over something it took them less than 30 seconds to blurt out. Ugh.

Okay, so we all know it when we see it…..but what is bullshit? The obvious answer is to go with the precedent established in Jacobellis vs Ohio and merely declare that “I know it when we see it“, but somehow I doubt that will get you full credit on the exam. If we’re going to spend a whole semester looking at this, we’re going to have to get more specific. Luckily since bullshit is not a new phenomena, there’s actually some pre-existing literature on the topic. One of the better known early meditations on the topic is from 1986 and is simply called “On Bullshit“. For all my readers who are pedantic word nerds (and I know there’s more than one of you!) I recommend this, if only for the multiple paragraphs examining whether “humbug!” and “bullshit!” are interchangeable or not. That discussion led me to the transcript of the 1980 lecture “On the Prevalence of Humbug” by Max Black, which is not in the course but also worth a read.

Now “humbug” isn’t used commonly enough for me to have a real opinion about what it means, but Frankfurt uses it to set an important stage: “humbug” is not just about misrepresenting something, but also about your reasons for doing the misrepresenting. In Black’s essay, he asserts that “humbug” misrepresentations are not actually so much about trying to get someone to believe something untrue as about making yourself look better. This isn’t the “yeah I have a girlfriend, but she’s in Canada” version of looking better either, but a version of looking better where you come across as more passionate, more dedicated and more on board with your cause than anyone else. The intent is not to get someone to believe that what you are saying is literal truth, but to leave them with a certain impression about your feelings on some matter, and about you in general. In other words, there’s an inherently social component to the whole thing.

After the humbug meditations, Frankfurt moves in to the actual term bullshit and how it compares to regular old lying. The social aspect still remains, he claims, as we possibly would remain friends with a bullshitter, but not a liar. In Frankfurt’s view, a lie seeks to alter one particular fact, bullshit seeks to alter the whole landscape. A liar also has some idea about where truth is and is trying to veer away from it, but bullshit is just picking and choosing facts, half facts, and lies as they fit or don’t fit a purpose. In other words, bullshit is not necessarilyan intent to subvert truth, but an indifference to truth. He also looks at why bullshit has been proliferating: we have more chances to communicate, and more topics to communicate about. Even if our percentage of bullshit stays steady, today’s communication overload means there will be more of it, and the number of complex topics we’re confronted with encourage us to bullshit even further. The essay ends on a fairly philosophical note, concluding that bullshit proliferates the more we doubt that we can ever know the objective reality of anything.  Well then.

I liked the essay overall, as I hadn’t really thought of the social component of bullshit in these terms before. The idea that there’s some sort of philosophical underpinning to the whole endeavor is a little interesting as well. But bullshit in the regular world has been around for forever, and we mostly know how to cope with it. What happens when it moves in to academia or other “higher” sources? That’s the subject of the next essay “Deeper in to Bullshit” by GA Cohen. Cohen takes issue with Frankfurt’s focus on the intent of the talker, and wants to focus on the idea of things that are pure nonsense. In his world, it is not the lying/bluffing/indifference to truth part that is the essence of bullshit, but rather the lack of sense or “unclarifiable unclarity”. You know, the famous “if you dazzle them with brilliance, baffle them with bullshit” line of thought. Cohen also separates producers of this kind of bullshit in to two subcategories: those who aim to do this, and those who just happen to do this a lot. Fantastically, Cohen includes a little chart to clarify his version of bullshit vs Frankfurt’s:

bschart

So academia gets it’s own special brand of bullshit, but we’re not done yet. Going even further in to this topic, we get Eubanks and Schaeffer’s “A kind word for bullshit: The problem of academic writing“. Starting with the scholarly work of one Dave Barry, they point out the deep ambivalence about bullshit present in many parts of the academy. On the one hand, academics are acutely aware of the problem of bullshit and the corrosive nature of ignorance, but on the other hand, they are deeply afraid that much of what they produce may actually be bullshit. To quote Barry:

Suppose you are studying Moby-Dick. Anybody with any common sense would say that Moby-Dick is a big white whale, since the characters in the book refer to it as a big white whale roughly eleven thousand times. So in your paper, you say Moby-Dick is actually the Republic of Ireland. . . . If you can regularly come up with lunatic interpretations of simple stories, you should major in English.

This of course is especially common in the humanities and social sciences due to physics envy.

Eubanks and Schaeffer go on to split bullshitters in to two categories of their own “prototypical” bullshitters, like the original type Frankfurt described, or academic bullshit. Academic bullshit does, of course, share some qualities with prototypical bullshit, namely that it aims to enhance the reputation of the author at the expense of clear communication. They point out that this starts infecting academics while they are still students, when they have every incentive to make themselves look good to the professor, and barely any incentives to make themselves intelligible to the average person.

So with these four essays, what are my major takeaways?

  1. Bullshit must be understood in a social context. To put it on the same level as “lying”  is to miss a major motivation.
  2. Due to point #1, challenging bullshit can take tremendous effort. You not only have to challenge the lack of truth, but also might be undermining someone’s sense of self-importance. That second part tends to make the first part look like a cake walk.
  3. Academia, which should be one of our primary weapons against bullshit, has succeeded in creating their own special breeding ground for bullshit.
  4. Undoing point #3 faces all the challenges previously stated in #2, but edit the sentence like this: You not only have to challenge the lack of truth clarity, but also might be undermining someone’s sense of self-importance <insert “and career”>.
  5. I need to start using the word “humbug” more often.

The points about academics are particularly well taken, as there seems to be a common misconception that intelligence inoculates you against bullshit and self deception. When I give my talk about internet science to high school kids, it’s almost always AP classes and I have to REALLY emphasize the whole “don’t get cocky kid” point. That’s why I love showing them the motivated numeracy study I talk about here.  They are always visibly alarmed that high math ability actually makes you more prone to calculation errors if making an error will confirm a pre-existing belief you find important. As we examine bullshit and how to refute it, it’s important to note that preventing yourself from spreading bullshit is a great first step.

That does it for this week. See you next week, when we move on to “Spotting Bullshit”!

Week 2 is up! Go straight to it here.

Surveys, Privacy and the Usefulness of Lies

I’ve been thinking a lot about surveys this week (okay, I’m boring, I think a lot about them every week), but this week I have a particularly good reason. A few years ago, I wrote about a congressman named Daniel Webster and his proposal to eliminate the American Community Survey. I’ve been a little fascinated with the American Community Survey ever since, and last week I opened my mailbox to discover that we’d been selected to take it this year.

For those unfamiliar with the American Community Survey, it’s an ongoing survey by the Census Bureau that asks people lots of information about their houses, income, disability and employment status. Almost every time you see a chart that shows you “income by state” or “which county is the richest” or “places in the US with the least internet access”, the raw data came from the American Community Survey.  This obviously provides lots of good and useful information to many people and businesses, but it’s not without it’s critics. People like Congressman Webster object to the survey for reasons like government overreach, the cost and possible privacy issues with the mandatory* survey.

While I’ve written about this for years, I actually had never taken it so I was fairly excited to see what all the fuss was about. Given the scrutiny that’s been placed on the cost, I was interested to see that the initial mailing strongly encouraged me to take the survey online (using a code on the mailing) and cited all the cost savings associated with me doing so. Filling out surveys online almost certainly reduces cost, but in this day and age it also tends to increase the possible privacy issues. While the survey doesn’t ask for sensitive information like social security numbers, it does ask lots of detailed information about salary, work status, the status of your house, mortgage payments and electricity usage. I wouldn’t particularly want a hacker getting a hold of this, nor would most others I suspect.

I don’t particularly know how the Census Bureau should proceed with this survey or what Congress will decided to do, but it did get me thinking about privacy issues with online surveys and how to balance the need for data with these concerns. I work in an industry (healthcare) that is actually required by regulations to get feedback on how we’re doing and make changes accordingly, yet we also must balance privacy concerns and people who don’t want to give us information. Many people who have no problem calling you up and lecturing you about everything that went wrong while they were in the hospital absolutely freeze when you ask them to fill out a survey: they find it invasive. It’s a struggle. One of my favorite post election moments actually reflected this phenomena, in the form of a Chicago Tribune letter to the editor from a guy who said he’d never talked to a pollster in the run up to the election. His issue? He hates pollsters because they want to capture your every thought AND they never listen to people like him.  While many people like and appreciate services that reflect their perspective, are friendlier, more usable, and more tailored to their needs, many of us don’t want to be the person whose data gets taken to get there. For good reason too: our privacy is disappearing at an alarming rate, and data hacks are pretty much weekly news.

So how do survey purveyors get the trust back? One of the newest frontiers in this whole balancing act is actually coming from Silicon Valley, where tech companies are as desperate for user data as users are concerned about keeping it private. They have been advancing something called “differential privacy”, or the quest to use statistical techniques to render a data set collectively useful while rendering individual data points useless and unidentifiable. So how would this work?

My favorite of the techniques is something called “noise injection” where fake results are inserted in to the sample at a known level. For example: a survey asks you if you’ve ever committed a crime. Before you answer, you are told to flip a coin. If the coin says heads, you answer truthfully. If the coin says tails, you flip the coin again. If the coin says heads this time, you say “yes, I’ve committed a crime”. Tails, you haven’t. When the researchers go back in, they can take out the predicted fake answers and find the real number. For example, let’s say you started with 100 people. At the end of the test, you find that 35 say they committed a crime, and 65 say they haven’t. You know that 25 of those 35 should have answered “yes” due to coin flip, so you have 10 who really said “yes”. You can also subtract 25 from the 65 to get 40.

They now know the approximate real percentage of those who have committed a crime (20% in this example), but you can’t know if any individual response is true or not. This technique has possible holes in it (what if people don’t follow instructions?) and you have to cut your sample size in half, but  just asking people to admit to a crime directly with a “we promise not to share your data” actually doesn’t work so well either.  Additionally, the beauty of this technique is that it works better the larger your sample is.

Going forward we may see more efforts like this, even within the same survey or data set. While 20 years ago people may have been annoyed to fill out a section of a survey with fake data, today’s more privacy conscious consumers may be okay with it if it means their responses can’t be tied to them directly. I don’t know that the Census Bureau would ever use anything like this, but as we head towards the  2020 census, there will definitely be more talk about surveys, privacy and methodology.

*The survey is mandatory, but it appears the Census Bureau is prohibited by Congress from actually enforcing this.

Calling BS Read Along: Series Introduction

Well hello hello! A few weeks ago, someone forwarded me a syllabus for a new class at being offered at the University of Washington this semester Info 198: Calling Bullshit. The synopsis is simple: “Our world is saturated with bullshit. Learn to detect and defuse it. Obviously I was intrigued. The professors ( Carl T. Bergstrom and Jevin West) have decided to put their entire syllabus online along with links to weekly readings, and are planning to add some of  the lectures when they conclude the semester. Of course this interested me greatly, and I was excited to see that they pointed to some resources I was really familiar with, and some I wasn’t.

Given that I’m in the middle of a pretty grueling semester of my own, I thought this might be a great time to follow along with their syllabus, week by week, and post my general thoughts and observations as I went along. I’m very interested in how classes like this get thought through and executed, and what topics different people find critical in sharpening their BS detectors. Hopefully I’ll find some new resources for my own classroom talks, and see if there’s anything I’d add or subtract.

I’ll start with their introduction next week, but I’ll be following the schedule of lectures posted in the syllabus for each week:

  1. Introduction to bullshit
  2. Spotting bullshit
  3. The natural ecology of bullshit
  4. Causality
  5. Statistical traps
  6. Visualization
  7. Big data
  8. Publication bias
  9. Predatory publishing and scientific misconduct
  10. The ethics of calling bullshit.
  11. Fake news
  12. Refuting bullshit

I’ll be reading through each of the readings associated with each lecture, summarizing, adding whatever random thoughts I have, and making sure the links are posted. I’ll be adding a link for the next week’s reading as well. Anyone who’s interested can of course read along and add their own commentary, or just wait for my synopsis.

Happy debunking! (And go straight to Week 1 here)

Immigration, Poverty and Gumballs Part 2: The Amazing World of Gumball

Welcome to “From the Archives”, where I dig up old posts and see what’s changed in the years since I originally wrote them.

I’ve had a rather interesting couple weeks here in my little corner of the blogosphere. A little over a year ago, a reader asked me to write a post about a video he had seen kicking around that used gumballs to illustrate world poverty. With the renewed attention to immigration issues over the last few weeks, that video apparently went viral and brought my post with it. My little blog got an avalanche of traffic and with it came a new series of questions, comments and concerns about my original post.  The comments on the original post closed after 90 days, so I was pondering if I should do another post to address some of the questions and concerns I was being sent directly. A particularly long and thoughtful comment from someone named bluecat57 convinced me that was the way to go, and almost 2500 something words later, here we are. As a friendly reminder, this is not a political blog and I am not out to change your mind on immigration to any particular stance.  I actually just like talking about how we use numbers to talk about political issues and the fallacies we may encounter there.

Note to bluecat57: A lot of this post will be based on various points you sent me in your comment, but I’m throwing a few other things in there based on things other people sent me, and I’m also heavily summarizing what you said originally. If you want me to post your original comment in the comments section (or if you want to post it yourself) so the context is preserved, I’m happy to do so.

Okay, with that out of the way, let’s take another look at things!

First, a quick summary of my original post: The original post was a review of a video by a man named Roy Beck. The video in question (watch it here) was a demonstration centered around whether or not immigration to the US could reduce world poverty. In it, pulls out a huge number of gumballs, with each one representing 1 million poor people in the world, defined by the World Bank’s cutoff of “living on  less than $2/day” and demonstrates that the number of poor people is growing faster than we could possibly curb through immigration. The video is from 2010. My criticisms of the video fell in to 3 main categories:

  1. The number of poor people was not accurate. I believe it may have been at one point, but since the video is 7 years old and world poverty has been falling rapidly, they are now wildly out of date. I don’t blame Beck for his video aging, but I do get irritated his group continues to post it with no disclaimer.
  2. That the argument the video starts with “some people say that mass immigration in to the United States can help reduce world poverty” was not a primary argument of pro-immigration groups, and that using it was a strawman.
  3. That people liked, shared and found this video more convincing than they should have because of the colorful/mathematical demonstration.

My primary reason for posting about the video at all was actually point #3, as talking about how mathematical demonstrations can be used to address various issues is a bit of a hobby of mine.  However, it was my commentary on #1 and #2 that seemed to attract most of the attention. So let’s take a look at each of my points, shall we?

Point 1: Poverty measures, and their issues: First things first: when I started writing the original post and realized I couldn’t verify Beck’s numbers, I reached out to him directly through the NumbersUSA website to ask for a source for them. I never received a response. Despite a few people finding old sources that back Beck up, I stand by the assertion that those numbers are not currently correct as he cites them. It is possible to find websites quoting those numbers from the World Bank, but as I mentioned previously, the World Bank itself does not give those numbers.  While those numbers may have come from the World Bank at some point he’s out of date by nearly a decade, and it’s a decade in which things have rapidly changed.

Now this isn’t necessarily his fault. One of the reasons Beck’s numbers were rendered inaccurate so quickly was because reducing extreme world poverty has actually been a bit of a global priority for the last few years. If you were going to make an argument about the number of people living in extreme poverty going up, 2010 was a really bad year to make that argument:

world-population-in-extreme-poverty-absolute

Link to source

Basically he made the argument in the middle of an unprecedented fall in world poverty. Again, not his fault, but it does suggest why he’s not updating the video. The argument would seem a lot weaker starting out with “there’s 700 million desperately poor people in the world and that number falls by 137,000 people every day”.

Moving on though…is the $2/day measure of poverty a valid one? Since the World Bank and Beck both agreed to use it, I didn’t question it much up front, but at the prompting of commenters, I went looking. There’s an enormously helpful breakdown of global poverty measures here, but here’s the quick version:

  1. The $2/day metric is a measure of consumption, not income and thus is very sensitive to price inflation. Consumption is used because it (attempts to) account for agrarian societies where people may grow their own food but not earn much money.
  2. Numbers are based on individual countries self-reporting. This puts some serious holes in the data.
  3. The definition is set based on what it takes to be considered poor in the poorest countries in the world. This caused it’s own problems.

That last point is important enough that the World Bank revised it’s calculation method in 2015, which explains why I couldn’t find Beck’s older numbers anywhere on the World Bank website. Prior to that, it set the benchmark for extreme poverty based off the average poverty line used by the 15 poorest countries in the world. The trouble with that measure is that someone will always be the poorest, and therefore we would never be rid of poverty. This is what is known as “relative poverty”.

Given that one of the Millennium Development Goals focused on eliminating world poverty, the World Bank decided to update it’s estimates to simply adjust for inflation. This shifts the focus to absolute poverty, or the number of people living below a single dollar amount. Neither method is perfect, but something had to be picked.

It is worth noting that country self reports can vary wildly, and asking the World Bank to put together a single number is no small task. While the numbers presented, it is worth noting that even small revisions to definitions could cause huge change. Additionally, none of these numbers address country stability, and it is quite likely that unstable countries with violent conflicts won’t report their numbers. It’s also unclear to me where charity or NGO activity is counted (likely it varies by country).

Interestingly, Politifact looked in to a few other ways of measuring global poverty and found that all of them have shown a reduction in the past 2 decades, though not as large as the World Bank’s.  Beck could change his demonstration to use a different metric, but I think the point remains that if his demonstration showed the number of poor people falling rather than rising, it would not be very compelling.

Edit/update: It’s been pointed out to me that at the 2:04 mark he changes from using the $2/day standard to “poorer than Mexico”, so it’s possible the numbers after that timepoint do actually work better than I thought they would. It’s hard to tell without him giving a firm number. For reference, it looks like in 2016 the average income in Mexico is $12,800/year . In terms of a poverty measure, the relative rank of one country against others can be really hard to pin down. If anyone has more information about the state of Mexico’s relative rank in the world, I’d be interested in hearing it.

Point 2: Is it a straw man or not? When I posted my initial piece, I mentioned right up front that I don’t debate immigration that often. Thus, when Beck started his video with “Some people say that mass immigration in to the United States can help reduce world poverty. Is that true? Well, no it’s not. And let me show you why…..” I took him very literally. His demonstration supported that first point, that’s what I focused on. When I mentioned that I didn’t think that was the primary argument being made by pro-immigration groups, I had to go to their mission pages to see what their argument actually were. None mentioned “solving world poverty” as a goal. Thus, I called Beck’s argument a straw man, as it seemed to be refuting an argument that wasn’t being made.

Unsurprisingly, I got a decent amount of pushback over this. Many people far more involved in the immigration debates than I informed me this is exactly what pro-immigration people argue, if not directly then indirectly. One of the reasons I liked bluecat57’s comment so much, is that he gave perhaps the best explanation of this.To quote directly from one message:

“The premise is false. What the pro-immigration people are arguing is that the BEST solution to poverty is to allow people to immigrate to “rich” countries. That is false. The BEST way to end poverty is by helping people get “rich” in the place of their birth.

That the “stated goals” or “arguments” of an organization do not promote immigration as a solution to poverty does NOT mean that in practice or in common belief that poverty reduction is A solution to poverty. That is why I try to always clearly define terms even if everyone THINKS they know what a term means. In general, most people use the confusion caused by lack of definition to support their positions.”

Love the last sentence in particular, and I couldn’t agree more. My “clear definitions” tag is one of my most frequently used for a reason.

In that spirit, I wanted to explain further why I saw this as a straw man, and what my actual definition of a straw man is. Merriam Webster defines a straw man as “a weak or imaginary argument or opponent that is set up to be easily defeated“. If I had ever heard someone arguing for immigration say “well we need it to solve world poverty”, I would have thought that was an incredibly weak argument, for all the reasons Beck goes in to….ie there are simply more poor people than can ever reasonably be absorbed by one (or even several) developed country.  Given this, I believe (though haven’t confirmed) that every developed/rich country places a cap on immigration at some point. Thus most of the debates I hear and am interested in are around where to place that cap in specific situations and what to do when people circumvent it. The causes of immigration requests seem mostly debated when it’s in a specific context, not a general world poverty one.

For example, here’s the three main reasons I’ve seen immigration issues hit the news in the last year:

  1. Illegal immigration from Mexico (too many mentions to link)
  2. Refugees from violent conflicts such as Syria
  3. Immigration bans from other countries

Now there are a lot of issues at play with all of these, depending on who you talk to: general immigration policy, executive power, national security, religion, international relations, the feasibility of building a border wall, the list goes on and on. Poverty and economic opportunity are heavily at play for the first one, but so is the issue of “what do we do when people circumvent existing procedures”. In all cases if someone had told me that we should provide amnesty/take in more refugees/lift a travel ban for the purpose of solving world poverty, I would have thought that was a pretty broad/weak argument that didn’t address those issues specifically enough. In other words my characterization of this video as a straw man argument was more about it’s weakness as a pro-immigration argument than a knock against the anti-immigration side. That’s why I went looking for the major pro-immigration organizations official stances….I actually couldn’t believe they would use an argument that weak. I was relieved when I didn’t see any of them advocating this point, because it’s really not a great point. (Happy to update with examples of major players using this argument if you have them, btw).

In addition to the weaknesses of this argument as a pro-immigration point, it’s worth noting that from the “cure world poverty” side it’s pretty weak as well.  I mentioned previously that huge progress has been made in reducing world poverty, and the credit for that is primarily given to individual countries boosting their GDP and reducing their internal inequality. Additionally, even given the financial situation in many countries, most people in the world don’t actually want to immigrate.  This makes sense to me. I wouldn’t move out of New England unless there was a compelling reason to. It’s home. Thus I would conclude that helping poor countries get on their feet would be a FAR more effective way of eradicating global poverty than allowing more immigration, if one had to pick between the two. It’s worth noting that there’s some debate over the effect of healthy/motivated people immigrating and sending money back to their home country (it drains the country of human capital vs it brings in 3 times more money than foreign aid), but since that wasn’t demonstrated with gumballs I’m not wading in to it.

So yeah, if someone on the pro-immigration side says mass immigration can cure world poverty, go ahead and use this video….keeping in mind of course the previously stated issue with the numbers he quotes. If they’re using a better or more country or situation specific argument though (and good glory I hope they are), then you may want to skip this one.

Now this being a video, I am mindful that Beck has little control over how it gets used and thus may not be at fault for possible straw-manning, any more than I am responsible for the people posting my post on Twitter with  Nicki Minaj gifs  (though I do love a good Nicki Minaj gif).

Point 3 The Colorful Demonstration: I stand by this point. Demonstrations with colorful balls of things are just entrancing. That’s why I’ve watched this video like 23 times:

Welp, this went on a little longer than I thought. Despite that I’m sure I missed a few things, so feel free to drop them in the comments!

Does Popularity Influence Reliability? A Discussion

Welcome to the “Papers in Meta Science” where we walk through published papers that use science to scrutinize science. At the moment we’re taking a look at the paper “Large-Scale Assessment of the Effect of Popularity on the Reliability of Research” by Pfeiffer and Hoffman. Read the introduction here, and the methods and results section here.

Well hi! Welcome back to our review of how scientific popularity influences the reliability of results. When last we left off we had established that the popularity of protein interactions did not effect the reliability of results for pairings initially, but did effect the reliability of results involving those popular proteins. In other words, you can identify the popular kids pretty well, but figuring out who they are actually connected to gets a little tricky. People like being friends with the popular kids.

Interestingly, the overall results showed a much stronger effect for the “multiple testing hypothesis” than the “inflated error effect” hypothesis, meaning that many of the false positive results seem to be coming from the extra teams running many different experiments and getting a predictable number of false positives. More overall tests = more overall false positives. This effect was 10 times stronger than the inflated error effect, though that was still present.

So what do should we do here? Well, a few things:

  1. Awareness Researchers should be extra aware that running lots of tests on a new and interesting protein could result in less accurate results.
  2. Encourage novel testing Continue to encourage people to branch out in their research as opposed to giving more funding to those researching more popular topics
  3. Informal research wikis This was an interesting idea I hadn’t seen before: use the Wikipedia model to let researchers note things they had tested that didn’t pan out. As I mentioned when I reviewed the Ioannidis paper, there’s not an easy way of knowing how many teams are working on a particular question at any given time. Setting up a less formal place for people to check what other teams were doing may give researchers better insight in to how many false positives they can expect to see.

Overall, it’s also important to remember that this is just one study and that findings in other fields may be different. It would be interesting to see a similar thing repeated in a social science type filed or something similar to see if public interest makes results better or worse.

Got another paper you’re interested in? Let me know!

The White Collar Paradox

A few weeks back I blogged about what I am now calling “The Perfect Metric Fallacy“. If you missed it, here’s the definition

The Perfect Metric Fallacy: the belief that if one simply finds the most relevant or accurate set of numbers possible, all bias will be removed, all stress will be negated, and the answer to complicated problems will become simple, clear and completely uncontroversial.”  

As I was writing that post, I realized that there was an element I wasn’t paying enough attention to. I thought about adding it in, but upon further consideration, I realized that it was big enough that it deserved it’s own post. I’m calling it “The White Collar Paradox”. Here’s my definition:

The White Collar Paradox: Requiring that numbers and statistics be used to guide all decisions due to their ability to quantify truth and overcome bias, while simultaneously only giving attention to those numbers created to cater to ones social class, spot in the workplace hierarchy, education level, or general sense of superiority.

Now of course I don’t mean to pick on just white collar folks here, though almost all offenders are white collar somehow. This could just as easily have been called the “executive paradox” or the “PhD paradox” or lots of other things. I want to be clear who this is aimed at because  plenty of white collar workers have been on the receiving end of this phenomena as well, in the form of their boss writing checks to expensive consulting firms just to have those folks tell them the same stuff their employees did only on prettier paper and using more buzzwords. Essentially, anyone who prioritizes numbers that make sense to them out of their own sense of ego despite having the education to know better is a potential perpetrator of this fallacy.

Now of course wanting to understand the problem is not a bad thing, and quite frequently busy people do not have the time to sort through endless data points. Showing your work gets you lots of credit in class, but in front of the C-suite it loses everyone’s attention in less than 10 seconds (ask me how I know this). There is a value in learning how to get your message to match the interests of your audience. However, if the audience really wants to understand the problem, sometimes they will have to get a little uncomfortable. Sometimes the problem is arising precisely because they overlooked something that’s not very understandable to them, and preferring explanations that cater to what you already know is just using numbers to pad the walls of your echo chamber.

A couple other variations I’ve seen:

  1. The novel metric preference As in “my predecessor didn’t use this metric, therefore it has value”.
  2. The trendy metric  “Prestigious institution X has promoted this metric, therefore we also need this metric”
  3. The “tell me what I want to hear” metric Otherwise known as the drunk with a lamp post…using data for support, not illumination.
  4. The emperor has no clothes metric The one that is totally unintelligible but stated with confidence and no one questions it

That last one is the easiest to compensate for. For every data set I run, I always run it by someone actually involved in the work. The number of data problems that can be spotted by almost any employee if you show them your numbers and say “hey, does this match what you see every day?” is enormous. Even if there’s no problems with your data, those employees can almost always tell you where your balance metrics should be, though normally that comes in the form of “you’re missing the point!” (again, ask me how I know this).

For anyone who runs workplace metrics, I think it’s important to note that every person in the organization is going to see the numbers differently and that’s incredibly valuable. Just like high level execs specialize in forming long term visions that day to day workers might not see, those day to day workers specialize in details the higher ups miss. Getting numbers that are reality checked by both groups isn’t easy, but your data integrity will improve dramatically and the decisions you can make will ultimately improve.

Hans Rosling and Some Updates

I’ve been a bit busy with an exam, snow shoveling and a sick kiddo this week, so I’m behind on responding to emails and a few post requests I’ve gotten. Bear with me.

I did want to mention that Hans Rosling died, which is incredibly sad. If you’ve never seen his work with statistics or his amazing presentations, please check them out. His one our “Joy of Stats” documentary is particularly recommended. For something a little shorter, try his famous “washing machine” TED talk.

I also wanted to note that due to some recent interest, I have updated my “About” page with a little bit more of the story about how I got in to statistics in the first place. I’ve mentioned a few times that I took the scenic route, so I figured I’d put the story all in one place. Click on over and find out how the accidental polymath problems began.

As an added bonus, there are also some fun illustrations from my awesome cousin Jamison, who was kind enough to make some for me.  This is my favorite pair:

gpd_true_positivegpd_false_positive

See more of his work here.

Finally, someone sent me this syllabus for a new class called “Calling Bullshit” that’s being offered at the University of Washington this semester. I started reading through it, but I’m thinking it might be more fun as a whole series. It covers some familiar ground, but they have a few topics I haven’t talked about much on this blog. I’ll likely start that up by the end of February, so keep an eye out for that.