Who Votes When? Untangling Non-Citizen Voting

Right after the election, most people in America saw or heard about this Tweet from then President elect Trump:

I had thought this was just random bluster (on Twitter????? Never!), but then someone sent me  this article. Apparently that comment was presumably based on an actual study, and the study author is now giving interviews. It turns out he’s pretty unhappy with everyone….not just with Trump, but also with Trump’s opponents who claim that no non-citizens voted. So what did his study actually say? Let’s take a look!

Some background: The paper this is based on is called “Do Non-Citizens Vote in US Elections” by Richman et all and was published back in 2014. It took data from a YouGov survey and found that 6.4% of non-citizens voted in 2008 and 2.2% voted in 2010. Non-citizenship status was based on self report, as was voting status, though the demographic data of participants was checked with that of their stated voting district to make sure the numbers at least made sense.

So what stood out here? A few things:

  1. The sample size While the initial survey of voters was pretty large (over 80,000 between the two years) the number of those identifying themselves as non-citizens was rather low: 339 and 489 for the two years. There were a total of 48 people who stated that they were not citizens and that they voted. As a reference, it seems there are about 20 million non-citizens currently residing in the US.
  2. People didn’t necessarily know they were voting illegally One of the interesting points made in the study was that some of this voting may be unintentional. If you are not a citizen, you are never allowed to vote in national elections even if you are a permanent resident/have a green card. The study authors wondered if some people  didn’t know this, so they analyzed the education levels of those non-citizens who voted. It turns out non-citizens with less than a high school degree are more likely to vote than those with more education. This actually is the opposite trend seen among citizens AND naturalized citizens, suggesting that some of those voters have no idea what they’re doing is illegal.
  3. Voter ID checks are less effective than you might think If you’re first question up on reading #2 was “how could you just illegally vote and not know it?” you may be presuming your local polling place puts a little more in to screening people than they do. According to the participants in this study, not only were non-citizens allowed to register and cast a vote, but a decent number of them actually passed an ID check first. About a quarter of non-citizen voters said they were asked for ID prior to voting, and 2/3rds of those said they were then allowed to vote. I suspect this issue is that most polling places don’t actually have much to check their information against. Researching citizenship status would take time and money that many places just don’t have. Another interesting twist to this is that social desirability bias may kick in for those who don’t know voting is illegal. Voting is one of those things more people say they do than actually do, so if someone didn’t know they couldn’t legally vote they’d be more likely to say they did even if they didn’t. Trying to make ourselves look good is a universal quality.
  4. Most of the illegal voters were white Non-citizen voters actually tracked pretty closely with their proportion of the population, and about 44% of them were white. The next most common demographic was Hispanic at 30%, then black, then Asian. In terms of proportion, the same percent of white non-citizens voted as Hispanic non-citizens.
  5. Non-citizens are unlikely to sway a national election, but could sway state level elections When Trump originally referenced this study, he specifically was using it to discuss national popular vote results. In the Wired article, they do the math and find that even if all of the numbers in the study bear out it would not sway the national popular vote. However, the original study actually drilled down to a state level and found that individual states could have their results changed by non-citizen voters. North Carolina and Florida would both have been within the realm of mathematical possibility for the 2008 election, and for state level races the math is also there.

Now, how much confidence you place in this study is up to you. Given the small sample size, things like selection bias and non-response bias definitely come in to play. That’s true any time you’re trying to extrapolate the behavior of 20 million people off of the behavior of a few hundred. It is important to note that the study authors did a LOT of due diligence attempting to verifying and reality check the numbers they got, but it’s never possible to control for everything.

If you do take this study seriously, it’s interesting to note what the authors actually thought the most effective counter-measure against non-citizen voting would be: education. Since they found that low education levels were correlated with increased voting and that poll workers rarely turned people away, they came away from this study with the suggestion that simply doing a better job of notifying people of voting rules might be just as effective (and cheaper!) than attempting to verify citizenship. Ultimately it appears that letting individual states decide on their own strategies would also be more effective than anything on the federal level, as different states face different challenges. Things to ponder.


Men, Masculinity Threats and Voting in 2016

Back in February I did a post called Women, Ovulation and Voting in 2016, about various researchers attempts to prove or disprove a link between menstrual cycles and their voting preferences. As part of that critique, I had brought up a point that Andrew Gelman made about the inherently dubious nature of anyone claiming to find a 20+ point swing in voting preference. People just don’t tend to vary their party preference that much over anything, so they claim on it’s face is suspect.

I was thinking of that this week when I saw a link to this HBR article from back in April that sort of gender-flips the ovulation study.  In this research (done in March), they asked men whether they would vote for Trump or Clinton if the election were today. For half of the men they first asked them a question about how much their wives made in comparison to them. For the other half, they got that question after they’d stated their political preference. The question was intended to be a “gender prime” to get men thinking about gender and present a threat to their sense of masculinity. Their results showed that men who had to think about gender roles prior to answering political preference showed a 24 point shift in voting patterns. The “unprimed” men (who were asked about income after they were asked about political preference) had preferred Clinton by 16 points, and the “primed” men preferred Trump by 8 points. If the question was changed to Sanders vs Trump, the priming didn’t change the gap at all. For women, being “gender primed” actually increased support for Clinton and decreased support for Trump.

Now given my stated skepticism of 20+ point swing claims, I decided to check out what happened here. The full results of the poll are here, and when I took a look at the data there was one thing that really jumped out at me: a large percent of the increased support for Trump came from people switching from “undecided/refuse to answer/don’t know” to “Trump”.  Check it out, and keep in mind the margin of error is +/-3.9:


So basically men who were primed were more likely to give an answer (and that answer was Trump) and women who were primed were less like to answer at all. For the Sanders vs Trump numbers, that held true for men as well:


In both cases there was about a 10% swing in men who wouldn’t answer the question when they were asked candidate preference first, but would answer the question if they were “primed” first. Given the margin of error was +/-3.9 overall, this swing seems to be the critical factor to focus on…..yet it was not mentioned in the original article. One could argue that hearing about gender roles made men get more opinionated, but isn’t it also plausible the order of the questions caused a subtle selection bias? We don’t know how many men hung up on the pollster after being asked about their income with respect to their wives, or if that question incentivized other men to stay on the line. It’s interesting to note that men who were asked about their income first were more likely to say they outearned their wives, and less likely to say they earned “about the same” as them…..which I think at least suggests a bit of selection bias.

As I’ve discussed previously, selection bias can be a big a big deal…and political polls are particularly susceptible to it. I mentioned Andrew Gelman previously, and he had a great article this week about his research on “systemic non-response” in political polling. He took a look at overall polling swings, and used various methods to see if he could differentiate between changes in candidate perception and changes in who picked up the phone. His data suggests that about 66-85% of polling swings are actually due to a change in the number of Republicans and Democrats who are willing to answer pollsters questions as opposed to a real change in perception. This includes widely reported on phenomena such as “post convention bounce” or “post debate effects”. This doesn’t mean the effects studied in these polls (or the studies I covered above) don’t exist at all, but that they may be an order of magnitude more subtle than suggested.

So whether you’re talking about ovulation or threats to male ego, I think it’s important to remember that who answers is just as important as what they answer. In this case 692 people were being used to represent the 5.27 million New Jersey voters, so any the potential for bias is, well, gonna be yuuuuuuuuuuuuuuuuuuge.

Probability Paper and Polling Corrections

This is another post from my grandfather’s newsletter (intro to that here). When I first mentioned his newsletter, I mentioned that he manufactured probability paper for people who needed to do advanced calculations in the days before computers. I found some cool examples while looking through the 1975 issues recently, so I thought I’d show them off here.  First was this paper, used to determine what the “true” polling percentage is when you have a lot of undecided voters. He was using an equation he called Seder’s method to adjust the pollsters predictions:


To use it, you find the percent of people who responded to the survey with a definite answer as the x-axis, then look to the right to find the percentage of people who made a particular choice. Once you have that data point, you draw a line to the left (the traditional y-axis to find out how many people will probably end up going with a particular choice once they have to make one.

I decided to try it based on a recent Quinnipiac presidential election poll (from June 29th, 2016). This has Clinton polling at 39%, Trump at 37%, Johnson at 8% and Stein at 4%, with 12% answering some combination of Unknown/Undecided/Maybe Won’t Vote/Maybe someone else. Here what this would look like filled out:


As you can see, it adjusts everyone a little upward, with a little more going toward those polling with the low numbers. Whether or not this is the correct adjustment is up for debate, but it’s a fun little tool to use for those who don’t like equations.

This particular one was actually one of his easy ones. Here’s the paper for getting confidence intervals for Bernoulli probabilities:


It looks complicated, but compared to doing it by hand, this was MUCH easier. To show how much time we have on our hands now that computers do the complicated stuff, check out my take on the Bernoulli distribution here. That’s what I do while SAS is importing my files. Ah, technology.