Calling BS Read Along: Series Introduction

Well hello hello! A few weeks ago, someone forwarded me a syllabus for a new class at being offered at the University of Washington this semester Info 198: Calling Bullshit. The synopsis is simple: “Our world is saturated with bullshit. Learn to detect and defuse it. Obviously I was intrigued. The professors ( Carl T. Bergstrom and Jevin West) have decided to put their entire syllabus online along with links to weekly readings, and are planning to add some of  the lectures when they conclude the semester. Of course this interested me greatly, and I was excited to see that they pointed to some resources I was really familiar with, and some I wasn’t.

Given that I’m in the middle of a pretty grueling semester of my own, I thought this might be a great time to follow along with their syllabus, week by week, and post my general thoughts and observations as I went along. I’m very interested in how classes like this get thought through and executed, and what topics different people find critical in sharpening their BS detectors. Hopefully I’ll find some new resources for my own classroom talks, and see if there’s anything I’d add or subtract.

I’ll start with their introduction next week, but I’ll be following the schedule of lectures posted in the syllabus for each week:

  1. Introduction to bullshit
  2. Spotting bullshit
  3. The natural ecology of bullshit
  4. Causality
  5. Statistical traps
  6. Visualization
  7. Big data
  8. Publication bias
  9. Predatory publishing and scientific misconduct
  10. The ethics of calling bullshit.
  11. Fake news
  12. Refuting bullshit

I’ll be reading through each of the readings associated with each lecture, summarizing, adding whatever random thoughts I have, and making sure the links are posted. I’ll be adding a link for the next week’s reading as well. Anyone who’s interested can of course read along and add their own commentary, or just wait for my synopsis.

Happy debunking!

Immigration, Poverty and Gumballs Part 2: The Amazing World of Gumball

Welcome to “From the Archives”, where I dig up old posts and see what’s changed in the years since I originally wrote them.

I’ve had a rather interesting couple weeks here in my little corner of the blogosphere. A little over a year ago, a reader asked me to write a post about a video he had seen kicking around that used gumballs to illustrate world poverty. With the renewed attention to immigration issues over the last few weeks, that video apparently went viral and brought my post with it. My little blog got an avalanche of traffic and with it came a new series of questions, comments and concerns about my original post.  The comments on the original post closed after 90 days, so I was pondering if I should do another post to address some of the questions and concerns I was being sent directly. A particularly long and thoughtful comment from someone named bluecat57 convinced me that was the way to go, and almost 2500 something words later, here we are. As a friendly reminder, this is not a political blog and I am not out to change your mind on immigration to any particular stance.  I actually just like talking about how we use numbers to talk about political issues and the fallacies we may encounter there.

Note to bluecat57: A lot of this post will be based on various points you sent me in your comment, but I’m throwing a few other things in there based on things other people sent me, and I’m also heavily summarizing what you said originally. If you want me to post your original comment in the comments section (or if you want to post it yourself) so the context is preserved, I’m happy to do so.

Okay, with that out of the way, let’s take another look at things!

First, a quick summary of my original post: The original post was a review of a video by a man named Roy Beck. The video in question (watch it here) was a demonstration centered around whether or not immigration to the US could reduce world poverty. In it, pulls out a huge number of gumballs, with each one representing 1 million poor people in the world, defined by the World Bank’s cutoff of “living on  less than $2/day” and demonstrates that the number of poor people is growing faster than we could possibly curb through immigration. The video is from 2010. My criticisms of the video fell in to 3 main categories:

  1. The number of poor people was not accurate. I believe it may have been at one point, but since the video is 7 years old and world poverty has been falling rapidly, they are now wildly out of date. I don’t blame Beck for his video aging, but I do get irritated his group continues to post it with no disclaimer.
  2. That the argument the video starts with “some people say that mass immigration in to the United States can help reduce world poverty” was not a primary argument of pro-immigration groups, and that using it was a strawman.
  3. That people liked, shared and found this video more convincing than they should have because of the colorful/mathematical demonstration.

My primary reason for posting about the video at all was actually point #3, as talking about how mathematical demonstrations can be used to address various issues is a bit of a hobby of mine.  However, it was my commentary on #1 and #2 that seemed to attract most of the attention. So let’s take a look at each of my points, shall we?

Point 1: Poverty measures, and their issues: First things first: when I started writing the original post and realized I couldn’t verify Beck’s numbers, I reached out to him directly through the NumbersUSA website to ask for a source for them. I never received a response. Despite a few people finding old sources that back Beck up, I stand by the assertion that those numbers are not currently correct as he cites them. It is possible to find websites quoting those numbers from the World Bank, but as I mentioned previously, the World Bank itself does not give those numbers.  While those numbers may have come from the World Bank at some point he’s out of date by nearly a decade, and it’s a decade in which things have rapidly changed.

Now this isn’t necessarily his fault. One of the reasons Beck’s numbers were rendered inaccurate so quickly was because reducing extreme world poverty has actually been a bit of a global priority for the last few years. If you were going to make an argument about the number of people living in extreme poverty going up, 2010 was a really bad year to make that argument:

world-population-in-extreme-poverty-absolute

Link to source

Basically he made the argument in the middle of an unprecedented fall in world poverty. Again, not his fault, but it does suggest why he’s not updating the video. The argument would seem a lot weaker starting out with “there’s 700 million desperately poor people in the world and that number falls by 137,000 people every day”.

Moving on though…is the $2/day measure of poverty a valid one? Since the World Bank and Beck both agreed to use it, I didn’t question it much up front, but at the prompting of commenters, I went looking. There’s an enormously helpful breakdown of global poverty measures here, but here’s the quick version:

  1. The $2/day metric is a measure of consumption, not income and thus is very sensitive to price inflation. Consumption is used because it (attempts to) account for agrarian societies where people may grow their own food but not earn much money.
  2. Numbers are based on individual countries self-reporting. This puts some serious holes in the data.
  3. The definition is set based on what it takes to be considered poor in the poorest countries in the world. This caused it’s own problems.

That last point is important enough that the World Bank revised it’s calculation method in 2015, which explains why I couldn’t find Beck’s older numbers anywhere on the World Bank website. Prior to that, it set the benchmark for extreme poverty based off the average poverty line used by the 15 poorest countries in the world. The trouble with that measure is that someone will always be the poorest, and therefore we would never be rid of poverty. This is what is known as “relative poverty”.

Given that one of the Millennium Development Goals focused on eliminating world poverty, the World Bank decided to update it’s estimates to simply adjust for inflation. This shifts the focus to absolute poverty, or the number of people living below a single dollar amount. Neither method is perfect, but something had to be picked.

It is worth noting that country self reports can vary wildly, and asking the World Bank to put together a single number is no small task. While the numbers presented, it is worth noting that even small revisions to definitions could cause huge change. Additionally, none of these numbers address country stability, and it is quite likely that unstable countries with violent conflicts won’t report their numbers. It’s also unclear to me where charity or NGO activity is counted (likely it varies by country).

Interestingly, Politifact looked in to a few other ways of measuring global poverty and found that all of them have shown a reduction in the past 2 decades, though not as large as the World Bank’s.  Beck could change his demonstration to use a different metric, but I think the point remains that if his demonstration showed the number of poor people falling rather than rising, it would not be very compelling.

Point 2: Is it a straw man or not? When I posted my initial piece, I mentioned right up front that I don’t debate immigration that often. Thus, when Beck started his video with “Some people say that mass immigration in to the United States can help reduce world poverty. Is that true? Well, no it’s not. And let me show you why…..” I took him very literally. His demonstration supported that first point, that’s what I focused on. When I mentioned that I didn’t think that was the primary argument being made by pro-immigration groups, I had to go to their mission pages to see what their argument actually were. None mentioned “solving world poverty” as a goal. Thus, I called Beck’s argument a straw man, as it seemed to be refuting an argument that wasn’t being made.

Unsurprisingly, I got a decent amount of pushback over this. Many people far more involved in the immigration debates than I informed me this is exactly what pro-immigration people argue, if not directly then indirectly. One of the reasons I liked bluecat57’s comment so much, is that he gave perhaps the best explanation of this.To quote directly from one message:

“The premise is false. What the pro-immigration people are arguing is that the BEST solution to poverty is to allow people to immigrate to “rich” countries. That is false. The BEST way to end poverty is by helping people get “rich” in the place of their birth.

That the “stated goals” or “arguments” of an organization do not promote immigration as a solution to poverty does NOT mean that in practice or in common belief that poverty reduction is A solution to poverty. That is why I try to always clearly define terms even if everyone THINKS they know what a term means. In general, most people use the confusion caused by lack of definition to support their positions.”

Love the last sentence in particular, and I couldn’t agree more. My “clear definitions” tag is one of my most frequently used for a reason.

In that spirit, I wanted to explain further why I saw this as a straw man, and what my actual definition of a straw man is. Merriam Webster defines a straw man as “a weak or imaginary argument or opponent that is set up to be easily defeated“. If I had ever heard someone arguing for immigration say “well we need it to solve world poverty”, I would have thought that was an incredibly weak argument, for all the reasons Beck goes in to….ie there are simply more poor people than can ever reasonably be absorbed by one (or even several) developed country.  Given this, I believe (though haven’t confirmed) that every developed/rich country places a cap on immigration at some point. Thus most of the debates I hear and am interested in are around where to place that cap in specific situations and what to do when people circumvent it. The causes of immigration requests seem mostly debated when it’s in a specific context, not a general world poverty one.

For example, here’s the three main reasons I’ve seen immigration issues hit the news in the last year:

  1. Illegal immigration from Mexico (too many mentions to link)
  2. Refugees from violent conflicts such as Syria
  3. Immigration bans from other countries

Now there are a lot of issues at play with all of these, depending on who you talk to: general immigration policy, executive power, national security, religion, international relations, the feasibility of building a border wall, the list goes on and on. Poverty and economic opportunity are heavily at play for the first one, but so is the issue of “what do we do when people circumvent existing procedures”. In all cases if someone had told me that we should provide amnesty/take in more refugees/lift a travel ban for the purpose of solving world poverty, I would have thought that was a pretty broad/weak argument that didn’t address those issues specifically enough. In other words my characterization of this video as a straw man argument was more about it’s weakness as a pro-immigration argument than a knock against the anti-immigration side. That’s why I went looking for the major pro-immigration organizations official stances….I actually couldn’t believe they would use an argument that weak. I was relieved when I didn’t see any of them advocating this point, because it’s really not a great point. (Happy to update with examples of major players using this argument if you have them, btw).

In addition to the weaknesses of this argument as a pro-immigration point, it’s worth noting that from the “cure world poverty” side it’s pretty weak as well.  I mentioned previously that huge progress has been made in reducing world poverty, and the credit for that is primarily given to individual countries boosting their GDP and reducing their internal inequality. Additionally, even given the financial situation in many countries, most people in the world don’t actually want to immigrate.  This makes sense to me. I wouldn’t move out of New England unless there was a compelling reason to. It’s home. Thus I would conclude that helping poor countries get on their feet would be a FAR more effective way of eradicating global poverty than allowing more immigration, if one had to pick between the two. It’s worth noting that there’s some debate over the effect of healthy/motivated people immigrating and sending money back to their home country (it drains the country of human capital vs it brings in 3 times more money than foreign aid), but since that wasn’t demonstrated with gumballs I’m not wading in to it.

So yeah, if someone on the pro-immigration side says mass immigration can cure world poverty, go ahead and use this video….keeping in mind of course the previously stated issue with the numbers he quotes. If they’re using a better or more country or situation specific argument though (and good glory I hope they are), then you may want to skip this one.

Now this being a video, I am mindful that Beck has little control over how it gets used and thus may not be at fault for possible straw-manning, any more than I am responsible for the people posting my post on Twitter with  Nicki Minaj gifs  (though I do love a good Nicki Minaj gif).

Point 3 The Colorful Demonstration: I stand by this point. Demonstrations with colorful balls of things are just entrancing. That’s why I’ve watched this video like 23 times:

 

Welp, this went on a little longer than I thought. Despite that I’m sure I missed a few things, so feel free to drop them in the comments!

Does Popularity Influence Reliability? A Discussion

Welcome to the “Papers in Meta Science” where we walk through published papers that use science to scrutinize science. At the moment we’re taking a look at the paper “Large-Scale Assessment of the Effect of Popularity on the Reliability of Research” by Pfeiffer and Hoffman. Read the introduction here, and the methods and results section here.

Well hi! Welcome back to our review of how scientific popularity influences the reliability of results. When last we left off we had established that the popularity of protein interactions did not effect the reliability of results for pairings initially, but did effect the reliability of results involving those popular proteins. In other words, you can identify the popular kids pretty well, but figuring out who they are actually connected to gets a little tricky. People like being friends with the popular kids.

Interestingly, the overall results showed a much stronger effect for the “multiple testing hypothesis” than the “inflated error effect” hypothesis, meaning that many of the false positive results seem to be coming from the extra teams running many different experiments and getting a predictable number of false positives. More overall tests = more overall false positives. This effect was 10 times stronger than the inflated error effect, though that was still present.

So what do should we do here? Well, a few things:

  1. Awareness Researchers should be extra aware that running lots of tests on a new and interesting protein could result in less accurate results.
  2. Encourage novel testing Continue to encourage people to branch out in their research as opposed to giving more funding to those researching more popular topics
  3. Informal research wikis This was an interesting idea I hadn’t seen before: use the Wikipedia model to let researchers note things they had tested that didn’t pan out. As I mentioned when I reviewed the Ioannidis paper, there’s not an easy way of knowing how many teams are working on a particular question at any given time. Setting up a less formal place for people to check what other teams were doing may give researchers better insight in to how many false positives they can expect to see.

Overall, it’s also important to remember that this is just one study and that findings in other fields may be different. It would be interesting to see a similar thing repeated in a social science type filed or something similar to see if public interest makes results better or worse.

Got another paper you’re interested in? Let me know!

The White Collar Paradox

A few weeks back I blogged about what I am now calling “The Perfect Metric Fallacy“. If you missed it, here’s the definition

The Perfect Metric Fallacy: the belief that if one simply finds the most relevant or accurate set of numbers possible, all bias will be removed, all stress will be negated, and the answer to complicated problems will become simple, clear and completely uncontroversial.”  

As I was writing that post, I realized that there was an element I wasn’t paying enough attention to. I thought about adding it in, but upon further consideration, I realized that it was big enough that it deserved it’s own post. I’m calling it “The White Collar Paradox”. Here’s my definition:

The White Collar Paradox: Requiring that numbers and statistics be used to guide all decisions due to their ability to quantify truth and overcome bias, while simultaneously only giving attention to those numbers created to cater to ones social class, spot in the workplace hierarchy, education level, or general sense of superiority.

Now of course I don’t mean to pick on just white collar folks here, though almost all offenders are white collar somehow. This could just as easily have been called the “executive paradox” or the “PhD paradox” or lots of other things. I want to be clear who this is aimed at because  plenty of white collar workers have been on the receiving end of this phenomena as well, in the form of their boss writing checks to expensive consulting firms just to have those folks tell them the same stuff their employees did only on prettier paper and using more buzzwords. Essentially, anyone who prioritizes numbers that make sense to them out of their own sense of ego despite having the education to know better is a potential perpetrator of this fallacy.

Now of course wanting to understand the problem is not a bad thing, and quite frequently busy people do not have the time to sort through endless data points. Showing your work gets you lots of credit in class, but in front of the C-suite it loses everyone’s attention in less than 10 seconds (ask me how I know this). There is a value in learning how to get your message to match the interests of your audience. However, if the audience really wants to understand the problem, sometimes they will have to get a little uncomfortable. Sometimes the problem is arising precisely because they overlooked something that’s not very understandable to them, and preferring explanations that cater to what you already know is just using numbers to pad the walls of your echo chamber.

A couple other variations I’ve seen:

  1. The novel metric preference As in “my predecessor didn’t use this metric, therefore it has value”.
  2. The trendy metric  “Prestigious institution X has promoted this metric, therefore we also need this metric”
  3. The “tell me what I want to hear” metric Otherwise known as the drunk with a lamp post…using data for support, not illumination.
  4. The emperor has no clothes metric The one that is totally unintelligible but stated with confidence and no one questions it

That last one is the easiest to compensate for. For every data set I run, I always run it by someone actually involved in the work. The number of data problems that can be spotted by almost any employee if you show them your numbers and say “hey, does this match what you see every day?” is enormous. Even if there’s no problems with your data, those employees can almost always tell you where your balance metrics should be, though normally that comes in the form of “you’re missing the point!” (again, ask me how I know this).

For anyone who runs workplace metrics, I think it’s important to note that every person in the organization is going to see the numbers differently and that’s incredibly valuable. Just like high level execs specialize in forming long term visions that day to day workers might not see, those day to day workers specialize in details the higher ups miss. Getting numbers that are reality checked by both groups isn’t easy, but your data integrity will improve dramatically and the decisions you can make will ultimately improve.

Hans Rosling and Some Updates

I’ve been a bit busy with an exam, snow shoveling and a sick kiddo this week, so I’m behind on responding to emails and a few post requests I’ve gotten. Bear with me.

I did want to mention that Hans Rosling died, which is incredibly sad. If you’ve never seen his work with statistics or his amazing presentations, please check them out. His one our “Joy of Stats” documentary is particularly recommended. For something a little shorter, try his famous “washing machine” TED talk.

I also wanted to note that due to some recent interest, I have updated my “About” page with a little bit more of the story about how I got in to statistics in the first place. I’ve mentioned a few times that I took the scenic route, so I figured I’d put the story all in one place. Click on over and find out how the accidental polymath problems began.

As an added bonus, there are also some fun illustrations from my awesome cousin Jamison, who was kind enough to make some for me.  This is my favorite pair:

gpd_true_positivegpd_false_positive

See more of his work here.

Finally, someone sent me this syllabus for a new class called “Calling Bullshit” that’s being offered at the University of Washington this semester. I started reading through it, but I’m thinking it might be more fun as a whole series. It covers some familiar ground, but they have a few topics I haven’t talked about much on this blog. I’ll likely start that up by the end of February, so keep an eye out for that.

 

Stats in the News: February 2017

I’ve had a couple interesting stats related news articles forwarded to me recently, both of which are worth a look for those interested in the way data and stats shape our lives.

First they came for the guys with the data

This one comes from the confusing world of European economics, and is accompanied by the rather alarming headline “Greece’s Response to Its Resurgent Debt Crisis: Prosecute the Statistician” (note: WSJ articles are behind a paywall, Google the first sentence of the article to access it for free). The article covers the rather concerning story of how Greece attempted to clean up it’s (notoriously wrong) debt estimates, only to turn around and prosecute the statistician they hired to do so. Unsurprisingly, things soured when his calculations showed they looked even worse than they’d said and were used to justify austerity measures. He’s been tried 4 times with no mathematical errors found, and it appears that he adhered to general EU accounting conventions in all cases. Unfortunately he still has multiple cases pending, and in at least one he’s up for life in prison.

Now I am not particularly a fan of economic data. Partially that’s because I’m not trained in that area, and partially because it appears to be some of the most easily manipulated data there is. The idea that someone could come up with a calculation standard that was unfair or favored one country over others is not crazy. There’s a million ways of saying “this assumption here is minor and reasonable but that assumption there is crazy and you’re deceptive for making it”. There’s nothing that guarantees that the EU recommended way of doing things was fair or reasonable, other than that they claim they are. Greece could have been screwed by German recommendations for debt calculations, I don’t know. However, prosecuting the person who did the calculations as opposed to vigorously protesting the accounting tricks is NOT the way to make your point….especially when he was literally hired to clean up known accounting tricks you never prosecuted anyone for.

Again, no idea who’s right here, but I do tend to believe (with all due respect to Popehat) that vagueness in data complaints is the hallmark of meritless thuggery. If your biggest complaint about a statistic is it’s outcome, then I begin to suspect your complaint is not actually a statistical one.

Safety and efficacy in Phase 1 clinical trials

The second article I got forwarded was an editorial from Nature, and is a call for an increased focus on efficacy in Phase 1 clinical trials. For those of you not familiar with the drug development world, Phase 1 trials currently only look at drug safety without having to consider whether or not they work. Currently about half of all drugs that proceed to phase 2 or phase 3 end up failing to demonstrate ANY efficacy.

The Nature editorial was spurred by a safety trial that went terribly wrong and ended up damaging almost all of the previously healthy volunteers. Given that there are a limited number of people willing to sign up to be safety test subjects, this is a big issue. Previously the general consensus had been to leave this up to companies to decide what was and was not worth proceeding with, believing that market forces would get companies to screen the drugs they were testing. However, given some recent safety failures and recent publications showing how often statistical manipulations are used to push drugs along have called this in to question. As we saw in our “Does Popularity Influence Reliability” series, this effect will likely be worse the more widely studied the topic is.

It should be noted that major safety failures and/or damage from experimental drugs is fairly rare, so much of this is really a resource or ethics debate. Statistically though, it also speaks to increasing the pre-study odds we talked in the “Why Most Published Research Findings are False” series. If we know that low pre-study odds are likely to lead to many false positives, then raising the bar for pre-study odds seems pretty reasonable. At the very least the company’s should have to submit a calculation, along with the rationale. I still maintain this should be a public function of professional associations.

Does Popularity Influence Reliability? Methods and Results

Welcome to the “Papers in Meta Science” where we walk through published papers that use science to scrutinize science. At the moment we’re taking a look at the paper “Large-Scale Assessment of the Effect of Popularity on the Reliability of Research” by Pfeiffer and Hoffman. Read the introduction here.

Okay, so when we left off last time, we were discussing the idea that findings in (scientifically) popular fields were less likely to be reliable than those in less popular fields.  The theory goes that popular fields would have more false positives (due to an overall higher number of experiments being run) or that increased competition would increase things like p-hacking and data dredging on the part of research teams, or both.

Methods: To test this hypothesis empirically, the researchers decided to look at the exciting world of protein interactions in yeast. While this is not what most people think about when they think of “popular” research, it’s actually a great choice. Since the general public probably is mostly indifferent to protein interactions, all the popularity studied here will be purely scientific. Any bias the researchers picked up will be from their scientific training, not their own pre-conceived beliefs.

To get data on protein interactions, the researchers pulled large data sets that were casting a wide net and smaller data sets that were looking for specific proteins and compared the results between the two. The thought was that the large data sets were testing large numbers of interactions all using the same algorithm and would be less likely to be biased by human judgement and could therefore be used to confirm or cast doubt on the smaller experiments that required more human intervention.

Thanks to the wonders of text mining, the sample size here was HUGE – about 60,000 statements/conclusions made about 30,000 hypothesized interactions. The smaller data sets had about 6,000 statements/conclusions about 4,000 interactions.

Results: The overall results showed some interesting differences in confirmation rates:

Basically, the more popular an interaction, the more often the interaction was confirmed. However, the more popular an interaction partner was, the less often it was confirmed. Confused? Try this analogy: think of protein interactions as the popular kids in school. The popular kids were fairly easy to identify, and researchers got the popular kids right a lot of the time. However, once they tried to turn that around and figure out who interacted with the popular kids later, they started getting a lot of false positives. Just like the less-cool kids in high school might overplay their relationship to the cooler kids, many researchers tried to tie their new findings to previously recognized popular findings.

This held true for both the “inflated error effect”  and the “multiple testing effect”. In other words, having a popular protein involved made both the individual statements or conclusions less likely to be validated, and ended up with more interactions that were found once but then never replicated. This held true across all types of experimental techniques, and it held true across databases that were curated by experts vs broader searches.

We’ll dive in to the conclusions we can draw from this next week.