State Level Representation: Graphed

I got in to an interesting email discussion this past weekend about a recent Daily Beast article “The Republican Lawmaker Who Secretly Created Reddit’s Women-Hating ‘Red Pill’“, that ended up sparking a train of thought mostly unrelated to the original topic (not uncommon for me). The story is an investigation in to a previously anonymous user who started an infamous subreddit, and the Daily Beast’s discovery that he was actually an elected official in the New Hampshire House of Representatives.

Given that I am originally from New Hampshire and all my family still lives there, I was intrigued by the story both for the “hey! that’s my state!” factor and the “oh man, the New Hampshire House of Representatives is really hard to explain to a national audience” level. Everyone I was emailing with either lives in New Hampshire or grew up there (as I did), so the topic quickly switched to how unusual the New Hampshire state legislature is, and how it’s hard for a national news outlet to truly capture that. For starters, the NH state House of Representatives has nearly as many seats (400) as the US House of Representatives (435), and double the number of seats of the next closest state (Pennsylvania with 200), all while having a state population of a little over 1 million people. Next is the low pay. For their service, those 400 people make a whopping $200 dollars for a two year term. Some claim this is not the lowest paying gig in the state level representation game, since other states like New Mexico pay no salary, but a quick look at this page shows that those state pay a daily per diem that would quickly go over $200. New Hampshire has no per diem, meaning most members of the House will spend more in gas money than they make during their term.

As you can imagine, this set up does not pull from a random sample of the population.

This conversation got me thinking about how often state level politicians get quoted in news articles, and got me wondering about how we interpret what those officials do. Growing up in NH gave me the impression that most state level representatives didn’t have much power, but in my current state (Massachusetts) they actually do have some clout and frequently move on to higher posts.

This of course got me curious about how other states did things. When lawmakers from individual states make the news, I suspect most of us assume that they operate much the same way as lawmakers in our own state do and that could lead to confusion about how powerful/not powerful the person we’re talking about really is. Ballotpedia breaks state legislatures down in to 3 categories: full time or close (10 states), high part-time (23 states), low part-time (17 states). A lot of that appears to have to do with the number of people you are representing. I decided to do a few graphs to illustrate.

First, here is the size of each states “lower house” vs the number of people each lower house member represents:

Note: Nebraska doesn’t have a lower house, at least according to Wikipedia. NH and CA are pretty clear outliers in terms of size and population, respectively.

State senates appear much less variable:

So next time you read an article about a state level representative doing something silly, keep this graph in mind. For some states, you are talking about a fairly well compensated person with lots of constituents, who probably had to launch a coordinated campaign to get their spot and may have higher ambitions. For other states, you’re talking about someone who was willing to show up.

Here’s the data  if you’re in to that sort of thing. I got the salary data here, the state population data here and the number of seats in the house here. As always, please update me if you see any errors!

Calling BS Read-Along Week 9: Predatory Publishing and Scientific Misconduct

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 8 click here.

Welcome back to Week 9 of the Calling Bullshit Read-Along! This week our focus is on predatory publishing and scientific misconduct. Oh boy. This is a slight change of focus from what we’ve been talking about up until now, and not for the better. In week one we established that in general, bullshit is different than lying in that it is not solely attempting to subvert the truth. Bullshit may be characterized by a (sometimes reckless) disregard for truth, but most bullshitters would be happy to stick to the truth if it fit their agenda. The subjects of this weeks readings are not quite so innocent, as most of our focus is going to be on misconduct by people who should have known better. Of course sometimes the lines between intentional and unintentional misconduct are a little less clear than one would hope, but for our purposes the outcome (less reliable research) is the same.  Let’s take a look.

To frame the topic this week, we start with a New York Times article “A Peek Inside the Strange World of Fake Academia“, which takes a look at, well, Fake Academia. In general “fake academia” refers to conferences and journals set up with very little oversight (one man runs 17 of them) or review but high price tags. The article looks at a few examples that agreed to publish abstracts created using the iPhone autocomplete feature or “featuring” keynote speakers who never agreed to speak. Many of these are run by a group called OMICS International, which has gotten in to legal trouble over their practices. However, some groups/conferences are much harder to classify. As the article points out, there’s a supply and demand problem here. More PhDs need publication credits than can get their work accepted by legitimate journals or conferences, so anyone willing to loosen the standards can make some money.

To show how bad the problem of “pay to play” journals/conferences are, the next article (by the same guys who brought us Retraction Watch) talks about a professor who decided to make up some scientists just to see if there was any credential checking going on at these places. My favorite of these was his (remarkably easy) quest to get Borat Sagdiyev (a senior researcher at the University of Kazhakstan) on the editorial board of the journal Immunology and Vaccines. Due to the proliferation of journals and conferences with low quality control, these fake people ended up with surprisingly impressive sounding resumes.  The article goes on to talk about researchers who make up co-authors, and came to the troubling conclusion that fake co-authors seemed to help publication prospects. There are other examples provided of “scientific identify fraud”: researchers finding their data has been published by other scientists (none of whom are real), researchers recommending that made up scientists review their work (the email addresses route back to themselves), and the previously mentioned pay-for-publication journals. The article wraps up with a discussion of even harder to spot chicanery: citation stuffing and general metrics gaming. As we discussed in Week 7 with Jevin West’s article, attempting to rank people in a responsive system will create incentives to maximize your ranking. If there is an unethical way of doing this, at least some people will find it.

That last point is also the focus of one of the linked readings “Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition“. The focus of this paper is on the current academic climate and its negative effect on research practices. To quote Goodhart’s law “when a measure becomes a target, it ceases to be a good measure”. It covers a lot of ground including increased reliance on performance metrics, decreased access to funding, and an oversupply of PhDs.  My favorite part of the paper was this table:

That’s a great overview of how the best intentions can go awry. This is all a set up to get to the meat of the paper: scientific misconduct. In a high stakes competitive environment, the question is not if someone will try to game the system, but how often it’s already happening and what you’re going to do about it. Just like in sports, you need to acknowledge the problem (*cough* steroid era *cough*) then come up with a plan to address it.

Of course the problem isn’t all on the shoulders of researchers, institutions or journals. Media and public relations departments tend to take the problem and run with it, as this Simply Statistics post touches on. According to them, the three stories that seem to get the most press are:

  1. The exaggerated big discovery
  2. Over-promising
  3. Science is broken

Sounds about right to me. They then go on to discuss how the search for the sensational story or the sensational shaming seems to be the bigger draw at the moment. If everyone focuses on short-term attention to a problem rather than the sometimes boring work of making actual tiny advancements or incremental improvements, what will we have left?

With all of this depressing lead in, you may be wondering how you can tell if any study is legitimate. Well, luckily the Calling Bullshit overlords have a handy page of tricks dedicated to just that! They start with the caveat that any paper anywhere can be wrong, so no list of “things to watch for” will ever catch everything. However, that doesn’t mean that every paper is at equal risk, so there are some ways to increase your confidence that the study you’re seeing is legitimate:

  1. Look at the journal As we discussed earlier, some journals are more reputable than others. Unless you really know a field pretty well, it can be pretty tough to tell a prestigious legitimate journal from a made up but official sounding journal. That’s where journal impact factor can help…it gives you a sense of how important the journal is to the scientific community as a whole. There are different ways of calculating impact factors, but they all tend to focus on how often the articles published in the various journals end up being cited by others, which is a pretty good way of figuring out how others view the journal. Bergstorm and West also give a link to their Google chrome extension the Eigenfactorizer, which color codes journals that appear in PubMed searches based on their Eigenfactor ranking.  I downloaded this and spent more time playing around with it than I probably want to admit, and it’s pretty interesting. To give you a sense of how it works, I typed in a few key words from my own field (stem cell transplant/cell therapies) and took a look at the results. Since it’s kind of a niche field, it wasn’t terribly surprising to see that most of the published papers are in yellow or orange journals. The journals under those colors are great and very credible, but most of the papers have little relevance to anyone not in the hem malignancies/transplant world. A few recent ones on CAR-T therapy showed up in red journals, as that’s still pretty groundbreaking stuff. That leads us to the next point…
  2. Compare the claim to the venue As I mentioned, the most exciting new thing going in the hematologic malignancies world right now is CAR-T and other engineered cell therapies. These therapies hold promise for previously incurable leukemias and lymphomas, and research institutions (including my employer) are pouring a lot of time, money and effort in to development. Therefore it’s not surprising that big discoveries are getting published in top tier journals, as everyone’s interested in seeing where this goes and what everyone else is doing. That’s the normal pattern. Thus, if you see a “groundbreaking” discovery published in a journal that no one’s ever heard of, be a little skeptical. It could be that the “establishment” is suppressing novel ideas, or it could be that the people in the field thought something was off with the research. Spoiler alert: it’s probably that second one.
  3. Are there retractions or questions about the research? Just this week I got pointed to an article about 107 retracted papers from the journal Tumor Biology due to a fake peer review scandal, the second time this has happened to this journal in the last year. No matter what the field, retractions are worth keeping an eye on.
  4. Is the publisher predatory? This can be hard to figure out without some inside knowledge, so check out the resources they link to.
  5. Preprints, Google Scholar, and finding out who the authors are Good tips and tricks about how to sort through your search results. Could be helpful the next time someone tells you it’s a good idea to eat chocolate ice cream for breakfast.

Whew, that’s a lot of ground covered. While it can be disappointing to realize how many instances/ways  of committing scientific misconduct there are, it’s worth noting that we currently have more access to more knowledge than at nearly any other time in human history. In week one, we covered that the more opportunities for communication there are, the more bullshit we will encounter. Science is no different.

At the same time, the sciences have a unique opportunity to lead the way in figuring out how to correct for the bullshit being created within it’s ranks and to teach the public how to interpret what gets reported. Some of the tools provided this week do point in a hopeful direction (not to mention the existence of this class!) are a great step in the right direction.

Well, that’s all I have for this week! Stay tuned for next week when we cover some more ethics, this time from the perspective of the bullshit-callers as opposed to the bullshit producers.

Week 10 is up! Read it here.

The Bullshit Two-Step

I’ve been thinking a lot about bullshit recently, and I’ve started to notice a bit of a pattern in the way bullshit gets relayed on social media. These days, it seems like bullshit is turning in to a multi-step process that goes a little something like this: someone posts/publishes something with lots of nuances and caveats. Someone else translates that thing for more popular consumption, and loses quite a bit of the nuance.  This happens with every share until finally the finished product is almost completely unrecognizable. Finally the story encounters someone who doesn’t agree with it, who then points out there should be more caveats. The sharer/popularizers promptly point at the original creator, and the creator throws their hands up and says “but I clarified those points in the original!!!!”. In other words:

The Bullshit Two-Step: A dance in which a story or research with nuanced points and  specific parameters is shared via social media. With each share some of the nuance or specificity is eroded, finally resulting in a story that is almost total bullshit but that no one individually feels responsible for. 

Think of this as the science social media equivalent of the game of telephone.

This is a particularly challenging problem for people who care about truth and accuracy, because so often the erosion happens one word at a time. Here’s an example of this happening with a Census Bureau statistic I highlighted a few years ago. Steps 1 and 2 are where the statistic started, step 4 is how it ended up in the press:

  1. The Census Bureau reports that half of all custodial (single) parents have court ordered child support.
  2. The Census Bureau also states (when talking about just the half mentioned in #1) that “In 2009, 41.2 percent of custodial parents received the full amount of child support owed them, down from 46.8 percent in 2007, according to a report released today by the U.S. Census Bureau. The proportion of these parents who were owed child support payments and who received any amount at all — either full or partial — declined from 76.3 percent to 70.8 percent over the period.
  3. That got published in the New York Times as “In 2009, the latest year for which data are available, only about 41 percent of custodial parents (predominantly women) received the child support they were owed. Some biological dads were deadbeats. ” No mention that this only covered half of custodial parents.
  4. This ended up in Slate (citing the Times) as “…. in a substantial number of cases, the men just quit their families. That’s why only 41 percent of custodial parents receive child support.” The “full amount” part got lost, along with all those with no court mandate who may or may not be getting money.

As you can see, very little changed between each piece, but a lot changed by the end. We went from “Half of all custodial parents receive court ordered child support. Of that half, only 41% have received the full amount this year.” to “only 41% of custodial parents receive child support at all”. We didn’t get there all at once, but we got there.  No one’s fully responsible, but no one’s innocent either. It’s the bullshit two-step.

I doubt there’s any one real source for this….sometimes I think these are legitimate errors in interpretation, sometimes people were just reading quickly and missed the caveat, sometimes people are just being sloppy. Regardless, I think it’s interesting to track the pathway and see how easy it is to lose meaning one or two words at a time. It’s also a good case for only citing primary sources for statistics, as it makes it harder to carry over someone else’s error.

 

Calling BS Read-Along Week 8: Publication Bias

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 7 click here.

Well hello Week 8! How’s everyone doing this week? A quick programming note before we get going: the videos for the lectures for the Calling Bullshit class are starting to be posted on the website here. Check them out!

This week we’re taking a look at publication bias, and all the problems that can cause. And what is publication bias? As one of the readings so succinctly puts it, publication bias  “arises when the probability that a scientific study is published is not independent of its results.” This is a problem because it not only skews our view of what the science actually says, but also is troubling because most of us have no way of gauging how extensive an issue it is.  How do you go about figuring out what you’re not seeing?

Well, you can start with the first reading, the 2005 John Ioannidis paper “Why Most Published Research Findings are False“.  This  provocatively titled yet stats heavy paper does a deep dive in to the math behind publication and why our current research practices/statistical analysis methods may lead to lots of false positives reported in the literature. I find this paper so fascinating/important I actually did a seven part deep dive in to it a few months ago, because there’s a lot of statistical meat in there that I think is important. If that’s TL;DR for you though, here’s the recap: the statistical methods we use to control for false positives and false negatives (alpha and beta) are insufficient to capture all the factors that might make a paper more or less likely to reach an erroneous conclusion.  Ioannidis lays out quite a few factors we should be looking at more closely such as:

  1. Prior probability of a positive result
  2. Sample size
  3. Effect size
  4. “Hotness” of field
  5. Bias

Ioannidis also flips the typical calculation of “false positive rate” or “false negative rate” to one that’s more useful for those of us reading a study: positive predictive value. This is the chance that any given study with a “positive” finding (as in a study that reports a correlation/significant difference, not necessarily a “positive” result in the happy sense) is actually correct. He adds all of the factors above (except hotness of field) in to the typical p-value calculation, and gives an example table of results. (1-beta is study power which includes sample size and effect size, R is his symbol for probability of a positive result, u is bias factor):

Not included is the “hotness” factor, where he points out that multiple research teams working on the same question will inevitably produce more false positives than just one team will. This is likely true even if you only consider volume of work, before you even get to corner cutting due to competition.

Ultimately, Ioannidis argues that we need bigger sample sizes, more accountability aimed at reducing bias (such as telling others your research methods up front or trial pre-registration), and to stop rewarding researchers only for being the first to find something (this is aimed at both the public and at journal editors). He also makes a good case that fields should be setting their own “pre-study odds” numbers and that researchers should have to factor in how often they should be getting null results.

It’s a short paper that packs a punch, and I recommend it.

Taking the issues a step further is a real life investigation contained in the next reading “Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy” from Turner et al in the New England Journal of Medicine. They reviewed all the industry sponsored antidepressant trials that had pre-registered with the FDA, and then reviewed journals to see which ones got published. Since the FDA gets the results regardless of publication, this was a chance to see what was made it to press and what didn’t. The results were disappointing, but probably not surprising:

Positive results that showed the drugs worked were almost always published, negative results that showed no difference from placebo  often went unpublished. Now the study authors did note they don’t know why this is, they couldn’t differentiate between the “file drawer” effect (where researchers put negative findings in their drawer and don’t publish them) and journals that rejected papers with null results. It seems likely both may be a problem. The study authors also found that the positive papers were presented as very positive, whereas some of the negative papers had “bundled” their results.

In defense of the anti-depressants and their makers, the study authors did find that a meta-analysis of all the results generally showed the drugs were superior to a placebo. Their concern was the magnitude of the effect may have been overstated. By not having many negative results to look it, the positive results are never balanced out and it appears the drugs are much more effective than they actually are.

The last reading is “Publication bias and the canonization of false facts.“by Nissen et al, a pretty in depth look at the effects of publication bias on our ability to distinguish between true and false facts. They set out to create a model of how we move an idea between theory and “established fact” through scientific investigation and  publication, and then test what publication bias would do to that process. A quick caveat from the end of the paper I want to give up front: this model is supposed to represent the trajectory of investigations in to “modest” facts, not highly political or big/sticky problems. Those beasts have their own trajectory, much of which has little to do with publication issues. What we’re talking about here is the type of fact that would get included in a textbook with no footnote/caveat after 12 or so supportive papers.

They start out by looking at the overwhelming bias towards publishing “positive” findings. Those papers that find a correlation, reject the null hypothesis, or find statistically significant differences are all considered “positive” findings. Almost 80% of all published papers are “positive” findings, and in some fields this is as high as 90%. While hypothetically this could mean that researchers just pick really good questions, the Turner et al paper and the Ioannidis analysis suggest that this is probably not the full story. “Negative” findings (those that fail to reject the null or find no correlation or difference) just aren’t published as often as positive ones. Now again, it’s hard to tell if this is the journals not publishing or researchers not submitting, or a vicious circle where everyone blames everyone else, but here we are.

The paper goes on to develop a model to test how often this type of bias may lead to the canonization of false facts. If negative studies are rarely published and almost no one knows how many might be out there, it stands to reason that at least some “established facts” are merely those theories whose counter-evidence is sitting in a file drawer. The authors base their model on the idea that every positive publication will increase belief, and negative ones will decrease it, but they ALSO assume we are all Bayesians about these things and constantly updating our priors. In other words, our chances of believing in a particular fact as more studies get published probably look a bit like that line in red:

This is probably a good time to mention that the initial model was designed only to look at publication bias, they get to other biases later. They assumed that the outcomes of studies that reach erroneous conclusions are all due to random chance, and that the beliefs in question were based only on the published literature.

The building of the model was pretty interesting, so you should definitely check that out if you like that sort of thing. Overall though, it is the conclusions that I want to focus on. A few things they found:

  1. True findings were almost always canonized
  2. False findings were canonized more often if the “negative” publication rate was low
  3. High standards for evidence and well designed experiments are not enough to overcome publication bias/reporting negative results

That last point is particularly interesting to me. We often ask for “better studies” to establish certain facts, but this model suggests that even great studies are misleading if we’re seeing a non-random sample. Indeed, their model showed that if we have a negative publication rate of under 20%, false facts would be canonized despite high evidence standards. This is particularly alarming since the antidepressant study found around a 10% negative publication rate.

To depress us even further, the authors then decided to add researcher bias in to the mix and put some p-hacking in to play. Below is their graph of the likelihood of canonizing a false fact vs the actual false positive rate (alpha). The lightest line is what happens wehn alpha = .05 (a common cut off), and each darker line shows what happens if people are monkeying around to get more positive results than they should:

Figure 8 from “Research: Publication bias and the canonization of false facts”

Well that’s not good.

On the plus side, the paper ends by throwing yet another interesting parameter in to the mix. What happens if people start publishing contradictory evidence when a fact is close to being canonized? While it would be ideal if negative results were published in large numbers up front, does last minute pushback work? According to the model, yes, though not perfectly. This is a ray of hope because it seems like in at least some fields, this is what happens. Negative results that may have been put in the file drawer or considered uninteresting when a theory was new can suddenly become quite interesting if they contradict the current wisdom.

After presenting all sorts of evidence that publishing more negative findings is a good thing, the discussion section of the paper goes in to some of the counterarguments. These are:

  1. Negative findings may lead to more true facts being rejected
  2. Publishing too many papers may make the scientific evidence really hard to wade through
  3. Time spent writing up negative results may take researchers away from other work

The model created here predicts that #1 is not true, and #2 and #3 are still fairly speculative. On the plus side, the researchers do point to some good news about our current publication practices that may make the situation better than the model predicts:

  1. Not all results are binary positive/negative They point out that if results are continuous, you could get “positive” findings that contradict each other. For example, if a correlation was positive in one paper and negative in another paper, it would be easy to conclude later that there was no real effect even without any “negative” findings to balance things out.
  2. Researchers drop theories on their own Even if there is publication bias and p-hacking, most researchers are going to figure out that they are spending a lot more time getting some positive results than others, and may drop lines of inquiry on their own.
  3. Symmetry may not be necessary The model assumes that we need equal certainty to reject or accept a claim, but this may not be true. If we reject facts more easily than we accept them, the model may look different.
  4. Results are interconnected The model here assumes that each “fact” is independent and only reliant on studies that specifically address it. In reality, many facts have related/supporting facts, and if one of those supporting facts gets disproved it may cast doubt on everything around it.

Okay, so what else can we do? Well, first recognize the importance of “negative” findings. While “we found nothing” is not exciting, it is important data. They call on journal editors to consider the possible damage of considering such papers uninteresting. Next, they point to new journals springing up dedicated just to “negative results” as a good trend. They also suggest that perhaps some negative findings should be published as pre-prints without peer review. This wouldn’t help settle questions, but it would give people a sense of what else might be out there, and it would settle some of the time commitment problems.

Finally a caveat which I mentioned at the beginning but is worth repeating: this model was created with “modest” facts in mind, not huge sticky social/public health problems. When a problem has a huge public interest/impact (like say smoking and lung cancer links) people on both sides come out of the woodwork to publish papers and duke it out. Those issues probably operate under very different conditions than less glamorous topics.

Okay, over 2000 words later, we’re done for this week! Next week we’ll look at an even darker side of this topic: predatory publishing and researcher misconduct. Stay tuned!

Week 9 is up! Read it here.

5 Things You Should Know About the “Backfire Effect”

I’ve been ruminating a lot on truth and errors this week, so it was perhaps well timed that someone sent me this article on the “backfire effect” a few days ago. The backfire effect is a name given to a psychological phenomena in which attempting to correct someone’s facts actually increases their belief in their original error. Rather than admit they are wrong when presented with evidence they narrative goes, people double down. Given the current state of politics in the US, this has become a popular thing to talk about. It’s popped up in my Facebook feed and is commonly cited as the cause of the “post-fact” era.

So what’s up with this? Is it true that no one cares about facts any more? Should I give up on this whole facts thing and find something better to do with my time?

Well, as with most things, it turns out it’s a bit more complicated than that. Here’s a few things you should know about the state of this research:

  1. The most highly cited paper focused heavily on the Iraq War The first paper that made headlines was from Nyhan and Reifler back in 2010, and was performed on college students at a Midwest Catholic University. They presented some students with stories including political misperceptions, and some with stories that also had corrections. They found that the students that got corrections were more likely to believe the original misperception. The biggest issue this showed up with was whether or not WMDs were found in Iraq. They also tested facts/corrections around the tax code and stem cell research bans, but it was the WMD findings that grabbed all the headlines. What’s notable is that the research was performed in 2005 and 2006, when the Iraq War was heavily in the news.
  2. The sample size was fairly small and composed entirely of college students One of the primary weaknesses of the first papers (as stated by the authors themselves) is that 130 college students are not really a representative sample. The sample was half liberal and 25% conservative. It’s worth noting that they believe that was a representative sample for their campus, meaning all of the conservatives were in an environment where they were the minority. Given that one of the conclusions of the paper was that conservatives seemed to be more prone to this effect than liberals, it’s an important point.
  3. A new paper with a broader sample suggest the “backfire effect” is actually fairly rare. Last year, two researchers (Porter and Wood) polled 8,100 people from all walks of life on 36 political topics and found…..WMDs in Iraq were actually the only issue that provoked a backfire effect. A great Q&A with them can be found here. This is fascinating if it holds up because it means the original research was mostly confirmed, but any attempt at generalization was pretty wrong.
  4. When correcting facts, phrasing mattered One of the more interesting parts of the Porter/Wood study was when the researchers described how they approached their corrections. In their own words “Accordingly, we do not ask respondents to change their policy preferences in response to facts–they are instead asked to adopt an authoritative source’s description of the facts, in the face of contradictory political rhetoric“. They reject heartily “corrections” that are aimed at making people change their mind on a moral stance (like say abortion) and focus only on facts. Even with the WMD question they found that the more straightforward and simple the correction statement, the more people of all political persuasions accepted it.
  5. The 4 study authors are now working together In an exceptionally cool twist, the authors who came to slightly different conclusions are now working together. The Science of Us gives the whole story here, but essentially Nyhan and Reifler praised Porter and Wood’s work, then said they should all work together to figure out what’s going on. They apparently gathered a lot of data during the height of election season and hopefully we will see those results in the near future.

I think this is an important set of points, both because it’s heartwarming (and intellectually awesome!) to see senior researchers accepting that some of their conclusion may be wrong and actually working with others to improve their own work. Next, I think it’s important because I’ve heard a lot of people in my personal life commenting that “facts don’t work” so they basically avoid arguing with those who don’t agree with them. If it’s true that facts DO work as long as you’re not focused on getting someone to change their mind on the root issue, then it’s REALLY important that we know that. It’s purely anecdotal, but I can note that this has been my experience with political debates. Even the most hardcore conservatives and liberals I know will make concessions if you clarify you know they won’t change their mind on their moral stance.

Calling BS Read-Along Week 7: Big Data

Welcome to the Calling Bullshit Read-Along based on the course of the same name from Carl Bergstorm and Jevin West  at the University of Washington. Each week we’ll be talking about the readings and topics they laid out in their syllabus. If you missed my intro and want the full series index, click here or if you want to go back to Week 6 click here.

Well hello week 7! This week we’re taking a look at big data, and I have to say this is the week I’ve been waiting for. Back when I first took a look at the syllabus, this was the topic I realized I knew the least about, despite the fact that it is rapidly becoming one of the biggest issues in bullshit today. I was pretty excited to get in to this weeks readings, and I was not disappointed. I ended up walking away with a lot to think about, another book to read, and a decent amount to keep me up at night.

Ready? Let’s jump right in to it!

First, I suppose I should start with at least an attempt at defining “big data”. I like the phrase from the Wiki page here “Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.” Forbes goes further and compiles 12 definitions here. If you come back from that rabbit hole, we can move in to the readings.

The first reading for the week is “Six Provocations for Big Data” by danah boyd and Kate Crawford. The paper starts off with a couple of good quotes (my favorite: ” Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care”) and a good vocab word/warning for the whole topic: apophenia, the tendency to see patterns where none exist. There’s a lot in this paper (including a discussion about what Big Data actually is), but the six provocations the title talks about are:

  1. Automating Research Changes the Definition of Knowledge Starting with the example of Henry Ford using the assembly line, boyd and Crawford question how radically Big Data’s availability will change what we consider knowledge. If you can track everyone’s actual behavior moment by moment, will we end up de-emphasizing the why of what we do or broader theories of development and behavior? If all we have is a (big data) hammer, will all human experience end up looking like a (big data) nail?
  2. Claims to Objectivity and Accuracy are Misleading I feel like this one barely needs to be elaborated on (and is true of most fields), but it also can’t be said often enough. Big Data can give the impression of accuracy due to sheer volume, but every researcher will have to make decisions about data sets that can introduce bias. Data cleaning, decisions to rely on certain sources, and decisions to generalize are all prone to bias and can skew results. An interesting example given was the original Friendster (Facebook before there was Facebook for the kids, the Betamax to Facebook’s VHS for the non-kids). The developers had read the research that people in real life have trouble maintaining social networks of over 150 people, so they capped the friend list at 150. Unfortunately for them, they didn’t realize that people wouldn’t use online networks the same way they used networks in real life. Perhaps unfortunately for the rest of us, Facebook did figure this out, and the rest is (short term) history.
  3. Bigger Data are Not Always Better Data Guys, there’s more to life than having a large data set. Using Twitter data as an example, they point out that large quantities of data can be just as biased (one person having multiple accounts, non-representative user groups) as small data sets, while giving some people false confidence in their results.
  4. Not all Data are Equivalent With echos of the Friendster example from the second point, this point flips the script and points out that research done using online data doesn’t necessarily tell us how people interact in real life. Removing data from it’s context loses much of it’s meaning.
  5. Just Because it’s Accessible Doesn’t Make it Ethical The ethics of how we use social media isn’t limited to big data, but it definitely has raised a plethora of questions about consent and what it means for something to be “public”. Many people who would gladly post on Twitter might resent having those same Tweets used in research, and many have never considered the implications of their Tweets being used in this context. Sarcasm, drunk tweets, and tweets from minors could all be used to draw conclusions in a way that wouldn’t be okay otherwise.
  6. Limited Access to Big Data Creates New Digital Divides In addition to all the other potential problems with big data, the other issue is who owns and controls it. Data is only as good as your access to it, and of course nothing obligates companies who own it to share it, or share it fairly, or share it with people who might use it to question their practices. In assessing conclusions drawn from big data, it’s important to keep all of those issues in mind.

The general principles laid out here are a good framing for the next reading the Parable of the Google Flu, an examination of why Google’s Flu Trends algorithm consistently overestimated influenza rates in comparison to CDC reporting. This algorithm was set up to predict influenza rates based on the frequency of various search terms in different regions, but over 108 weeks examined it overestimated rates 100 times, sometimes by quite a bit. The paper contains a lot of interesting discussion about why this sort of analysis can err, but one of the most interesting factors was Google’s failure to account for Google itself. The algorithm was created/announced in 2009, and some updates were announced in 2013. Lazer et al point out that over that time period Google was constantly refining its search algorithm, yet the model appears to assume that all Google searches are done only in response to external events like getting the flu. Basically Google was attempting to change the way you search, while assuming that no one could ever change the way you search. They call this internal software tinkering “blue team” dynamics, and point out that it’s going to be hell on replication attempts. How do you study behavior across a system that is constantly trying to change behavior? Also considered are “red team” dynamics, where external parties try to “hack” the algorithm to produce results they want.

Finally we have an opinion piece from a name that seems oddly familiar, Jevin West, called “How to improve the use of metrics: learn from game theory“. It’s short, but got a literal LOL from me with the line “When scientists order elements by molecular weight, the elements do not respond by trying to sneak higher up the order. But when administrators order scientists by prestige, the scientists tend to be less passive.” West points out that when attempting to assess a system that can respond immediately to your assessment, you have to think carefully about what behavior your chosen metrics reward. For example, currently researchers are rewarded for publishing a large volume of papers. As a result, there is concern over the low quality of many papers, since researchers will split their findings in to the “least publishable unit” to maximize their output. If the incentives were changed to instead have researchers judged based on only their 5 best papers, one might expect the behavior to change as well. By starting with the behaviors you want to motivate in mind, you can (hopefully) create a system that encourages those behaviors.

In addition to those readings, there are two recommend readings that are worth noting. The first is Cathy O’Neil’s Weapons of Math Destruction (a book I’ve started but not finished), which goes in to quite a few examples of problematic algorithms and how they effect our lives. Many of O’Neil’s examples get back to point #6 from the first paper in ways most of don’t consider. Companies maintaining control over their intellectual property seems reasonable, but what if you lose your job because your school system bought a teacher ranking algorithm that said you were bad? What’s your recourse? You may not even know why you got fired or what you can do to improve. What if the algorithm is using a characteristic that it’s illegal or unethical to consider? Here O’Neil points to sentencing algorithms that give harsher jail sentences to those with family members who have also committed a crime. Because the algorithm is supposedly “objective”, it gets away with introducing facts (your family members involvement in crimes you didn’t take part in) that a prosecutor would have trouble getting by a judge under ordinary circumstances. In addition, some algorithms can help shape the very future they say they are trying to predict. Why are Harvard/Yale/Stanford the best colleges in the US News rankings? Because everyone thinks they’re the best. Why do they think that? Look at the rankings!

Finally, the last paper is from Peter Lawrence with “The Mismeasurement of Science“. In it Lawrence lays out an impassioned case that the current structure around publishing causes scientists to spend too much time on the politics of publication and not enough on actual science. He also questions heavily who is rewarded by such a system, and if those are the right people. It reminded me of another book I’ve started but not finished yet “Originals: How Non-Conformists Move the World”. In that book Adam Grant argues that if we use success metrics based on past successes, we will inherently miss those who might have a chance at succeeding in new ways. Nicholas Nassim Taleb makes a similar case in Antifragile, where he argues that some small percentage of scientific funding should go to “Black Swan” projects….the novel, crazy, controversial destined-to-fail type research that occasionally produces something world-changing.

Whew! A lot to think about this week and these readings did NOT disappoint. So what am I taking away from this week? A few things:

  1. Big data is here to stay, and with it come ethical and research questions that may require new ways of thinking about things.
  2. Even with brand new ways of thinking about things, it’s important to remember the old rules and that many of them still apply
  3. A million plus data points does not  =/= scientific validity
  4. Measuring systems that can respond to being measured should be approached with some idea of what you’d like that response to be, along with some plans for change if you have unintended consequences
  5. It is increasingly important to scrutinize sources of data, and to remember what might be hiding in “black box” algorithms
  6. Relying too heavily on the past to measure the present can increase the chances you’ll miss the future.

That’s all for this week, see you next week for some publication bias!

Week 8 is up! Read it here.

I Got a Problem, Don’t Know What to do About It

Help and feedback request! This past weekend I encountered an interesting situation where I discovered that a study I had used to help make a point in several posts over the years has come under some scrutiny (full story at the bottom of the post). I have often blogged about meta-science, but this whole incident got me thinking about meta-blogging, and what the responsibility of someone like me is when you find out a study you’ve leaned on may not be as good as you thought it was. I’ve been poking around the internet for a few days, and I really can’t find much guidance on this.

I decided to put together a couple quick poll questions to gauge people’s feelings on this. Given that I tend to have some incredibly savvy readers, I would also love to hear more lengthy opinions either in the comments or sent to me directly.  The polls will stay open for a month, and I plan on doing a write up of the results. The goal of these poll questions is to assess a starting point for error correction, as I completely acknowledge the specifics of a situation may change people’s views. If you have strong feelings about what would make you take error correction more or less seriously, please leave it in the comments!

Why I’m asking (aka the full story)

This past weekend I encountered a rather interesting situation that I’m looking for some feedback on. I was writing my post for week 6 of the Calling BS read-along, and remembered an interesting study that found that  people were more likely to find stories with “science pictures” or graphs credible than those that were just text. It’s a study I had talked about in one of my Intro to Internet Science posts  and I have used it in presentations to back up my point that graphs are something you should watch closely. Since the topic of the post was data visualization and the study seemed relevant, I included it in the intro to my write up.

The post had only been up for a few hours when I got a message from someone tipping me off that the lab the study was from was under some scrutiny for some questionable data/research practices. They thought I might want to review the evidence and consider removing the reference to the study from my post. While the study I used doesn’t appear to be one of the ones being reviewed at the moment, I did find the allegations against the lab concerning. Since the post didn’t really change without the citation, I edited the post to remove the citation and replaced it with a note alerting people the paragraph had been modified. I put a full explanation at the bottom of the post that included the links to a summary of the issue and the research lab’s response.

I didn’t stop thinking about it though. There’s not much I could have done about using the study originally….I started citing it almost a full year before concerns were raised, and the “visuals influence perception” point seemed reasonable. I’ll admit I missed the story about the concerns with the research group, but even if I’d seen it I don’t know if I would have remembered that they were the ones who had done that study. Now that I know though, I’ve been mulling over what the best course of action is in situations like this. As someone who at least aspires to blog about truth and accuracy, I’ve always felt that I should watch my own blogging habits pretty carefully. I didn’t really question removing the reference, as I’ve always tried to update/modify things when people raise concerns. I also don’t modify posts after they’ve been published without noting that I’ve done so, other than fixing small typos. I feel good about what I did with that part.

What troubled me more was the question of “how far back to I go?” As I mentioned, I know I’ve cited that study previously. I know of at least one post where I used it, and there may be more. Given that my Intro to Internet Science series is occasionally assigned by high school teachers, I feel I have some obligation to go a little retro on this.

 

Current hypothesis (aka my gut reaction)

My gut reaction here is that I should probably start keeping an updates/corrections/times I was wrong page just to discuss these issues. While I think notations should be made in the posts themselves, some of them warrant their own discussion. If I’m going to blog about where others go wrong, having a dedicated place to discuss where I go wrong seems pretty fair.  I also would likely put some links to my “from the archives” columns to have a repository for posts that have more updates versions. Not only would this give people somewhere easy to look for updates, give some transparency to my own process and weaknesses, but it would also probably give me a better overview of where I tend to get tripped up and help me improve. If I get really crazy I might even start doing root cause analysis investigations in to my own missteps. Thoughts on this or examples of others doing this would be appreciated.