Christmas Gifts For Stats and Data People

Well howdy? Only 11 days until Christmas, and I have in no way shape or form finished my shopping. I’m only ever good at coming up with gift ideas when they’re not actually needed, so I thought now was the perfect time to make a list of things you could get a statistician/data person in your life, if you were so inclined. Of course any of the books on my reading list here are pretty good, but if you’re looking for something more specific, read on!

  1. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century This was my December book, and it was phenomenal. An amazing history of statistical thought in science and the personalities that drove it. If you ever wanted to know who the “student” was in “student’s t distribution”, this is the book for you. Caveat: If you don’t understand that previous sentence, I’d skip this one.
  2. Statistical dinosaurs. You had me at “Chisquareatops“. Or maybe “Stegonormalus“.
  3. A pound of dice. Or cards. Or bingo balls.  Because you never know when you may have to illustrate probability theory on the fly. (Bonus: these “heroes of science” playing cards are extra awesome)
  4. For the music lover Prints of pop song infographics. Data visualization taken to the next level.
  5. Art supplies Maybe an x-y axis stamp or grid post it notes?
  6. 2016 year in review The best infographics of 2016 or the best mathematics writing of 2016. The first one is out already, but you’ll have to wait until March for that second one.

So Why ARE Most Published Research Findings False? Bias bias bias

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here ,Part 2  here, and Part 3 here.

Alright folks, we’re almost there. We covered a lot of mathematical ground here and last week ended with a few corollaries. We’ve seen the effects of sample size, study power, effect size, pre-study odds, bias and the work of multiple teams. We’ve gotten thoroughly depressed, and we’re not done yet. There’s one more conclusion we can draw, and it’s a scary one. Ioannidis holds nothing back, and he just flat out calls this section “Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias“. Okay then.

To get to this point, Ioannidis lays out a couple of things:

  1. Throughout history, there have been scientific fields of inquiry that later proved to have no basis….like phrenology for example. He calls these “null fields”.
  2. Many of these null fields had positive findings at some point, and in a number great enough to sustain the field.
  3. Given the math around positive findings, the effect sizes in false positives due to random chance should be fairly small.
  4. Therefore, large effect sizes discovered in null fields pretty much just measure the bias present in those fields….aka that “u” value we talked about earlier.

You can think about this like a coin flip. If you flip a fair coin 100 times, you know you should get about 50 heads and 50 tails. Given random fluctuations, you probably wouldn’t be too surprised if you ended up with a 47-53 split or even a 40-60 split. If you ended up with an 80-20 split however, you’d get uneasy. Was the coin really fair?

The same goes for scientific studies. Typically we look at large effect sizes as a good thing. After all, where there’s smoke there’s fire, right? However, Ioannidis points out that large effect sizes are actually an early warning sign for bias. For example, lets say you think that your coin is weighted a bit, and that you will actually get heads 55% of the time you flip it. You flip it 100 times and get 90 heads. You can react in one of 3 ways:

  1. Awesome, 90 is way more than 55 so I was right that heads comes up more often!
  2. Gee, there’s a 1 in 73 quadrillion chance that 90 heads would come up if this coin were fairly weighted. With the slight bias I thought was there, the chances of getting the results I did is still about 1 in 5 trillion. I must have underestimated how biased that coin was.
  3. Crap. I did something wrong.

You can guess which ones most people go with. Spoiler alert: it’s not #3.

The whole “an unexpectedly large effect size should make you nervous” phenomena is counterintuitive, but I’ve actually blogged about it before. It’s what got Andrew Gelman upset about that study that found that 20% of women were changing their vote around their menstrual cycle, and it’s something I’ve pointed out about the whole 25% of men vote for Trump if they’re primed to think about how much money their wives make.  Effect sizes of that magnitude shouldn’t be cause for excitement, they should be cause for concern. Unless you are truly discovering a previously unknown and overwhelmingly large phenomena, there’s a good chance some of that number was driven by bias.

Now of course, if your findings replicate, this is all fine, you’re off the hook. However if they don’t, the largeness of your effect size is really just a measure of your own bias. Put another way, you can accidentally find a 5% vote swing that doesn’t exist just because random chance is annoying like that, but to get numbers in the 20-25% range you had to put some effort in.

As Ioannidis points out, this isn’t even a problem with individual researchers, but in how we all view science. Big splashy new results are given a lot of attention, and there is very little criticism if the findings fail to replicate at the same magnitude. This means that a researchers have nothing but incentives to make sure the effect sizes they’re seeing as big as possible. In fact Ioannidis has found (in a different paper) that about half the time the first paper published on a topic shows the most extreme value ever found. That is way more than what we would expect to see if it were up to random chance. Ioannidis argues that by figuring out exactly how far these effect sizes deviate from chance, we can actually measure the level of bias.

Again, not a problem for those researchers who replicate, but something to consider for those who don’t. We’ll get in to that next week, in our final segment: “So What Can Be Done?”.

5 Things to Remember About Prescription Drugs

This past week, I had the tremendous pleasure of seeing one of my brother’s articles on the cover of the December issue of Christianity Today as part of a feature on pain killers. While my brother has done a lot of writing for various places over the years, his article “How Realizing My Addiction Had Chosen Me Began My Road to Recovery” was particularly special to see. In it, he recounts his story of getting addicted to pain killers after a medical crisis, and details his road to recovery. Most of the story is behind a paywall, but if you want a full copy leave me a comment or use the get in touch form and I’ll send you the word document.

As someone who was intimately involved with all of the events relayed in the article, it’s pretty self evident why I enjoyed reading it as much as I did. On a less personal note though, I thought he did a great job bringing awareness to an often overlooked pathway to addiction: a legitimate medical crisis. My brother’s story didn’t start at a party or with anything even remotely approaching “a good time”. His story started in the ER, moved to the ICU, and had about 7 months of not being able to eat food by mouth at the end. His bout with necrotizing pancreatitis was brutal, and we were on edge for several months as his prognosis shifted between “terrible” and “pretty bad”.

Through all that, the doctors had made decisions to put him on some major pain killers. Months later, when things were supposed to be improving, he found that his stomach was still having trouble, and went back to his doctor for more treatment. It was only then that he was told he had become an addict. The drugs that had helped save his life were now his problem.

Obviously he tells the rest of the story (well, all of the story) better than I do, so you should really go read it if your interested. What I want to focus on is the prescribing part of this. When talking about things like “the opioid crisis”, it’s tempting for many people to label these drugs as “good” or “bad”, and I think that misses the point (note to my brother who will read this: you didn’t make this mistake. I’m just talking in general here. Don’t complain about me to mom. That whole “stop sciencing at your brother” lecture is getting old). There’s a lot that goes in to the equation of whether or not a drug should be prescribed or even approved by the FDA, and a shift in one can change the whole equation.  Also, quick note, I’m covering ideal situations here. I am not covering when someone just plain screws up, though that clearly does happen:

  1. Immediate risk (acute vs chronic condition) In the middle of a crisis when life is on the line, it shouldn’t surprise anyone that “keeping you alive” because the primary endpoint. This should be obvious, but after a few years of working in an ER, I realized it’s not always so clear to people. For example, you would not believe the number of people who come in to the ER unconscious after a car accident or something who later come in and complain that their clothes were cut off. In retrospect it feels obvious to them that a few extra minutes could have been taken to preserve their clothing, but the doctors who saw them in that moment almost always feel differently. Losing even one life because you were attempting to preserve a pair of jeans is not something most medical people are willing to do. A similar thing happens with medications. If there is a concern your life is in danger, the math is heavily weighted in favor of throwing the most powerful stuff we have at the problem and figuring out the consequences later. This is what kicked off the situation my brother went through. At points in his illness they put his odds of making it through the night at 50-50. Thinking about long term consequences was a luxury he wasn’t always able to afford.
  2. Side effects vs effect of condition The old saying “the cure is worse than the disease” speaks to this one, and sometimes it’s unfortunately true. Side effects of a drug always have to be weighed against the severity of the condition they are treating. The more severe the condition, the more severe the allowable side effects. A medication that treats the common cold has to be safer than a medication that treats cancer. However, just because the side effects are less severe than the condition doesn’t mean they are okay or can’t be dangerous themselves (again, think chemotherapy for cancer), but for severe conditions trade offs are frequently made.  My brother had the misfortune of having one of the most painful conditions we know of, and the pain would have literally overwhelmed his system if nothing had been done. Prescription drugs don’t appear out of nowhere, and always must be compared to what they are treating when deciding if they are “good” or “bad”.
  3. Physical vs psychological/emotional consequences One of the more interesting grey areas of prescription drug assessment is the trade off between physical consequences vs psychological and emotional consequences. For better or worse, physical consequences are almost always given a higher weight than psychological/emotional consequences. This is one of the reasons we don’t have male birth control. For women, pregnancy is a physical health risk, for men, it’s not. If hormonal birth control increases a woman’s chances of getting blood clots, that’s okay as long as it’s still less impactful than pregnancy. For men however, there’s no such physical consequence and therefore the safety standards are higher. The fact that many people might actually be willing to risk physical consequences to prevent the emotional/psychological/financial consequences isn’t given as much weight as you might think. The fact that my brother got a doctor who helped him manage both of these was fantastic. His physical crutch had become a mental and emotional crutch, and the beauty of his doctor was that he didn’t underestimate the power of that.
  4. Available alternatives Drugs are not prescribed in vacuum, and it’s important to remember they are not the end all be all of care. If other drugs (or lifestyle changes) are proven to work just as well with fewer side effects, those may be recommended. In the case of my brother, his doctor helped him realize that mild pain was actually better than the side effects of the drugs he was taking. For those with chronic back pain, yoga may be preferable. This of course is also one of the arguments for things like legalized marijuana, as it’s getting harder to argue that those side effects are worse than those of opioids.
  5. Timing (course of condition and life span) As you can see from 1-4 above, there are lots of balls in the air when it comes to prescribing various drugs. Most of these factors actually vary over time, so a decision that is right one day may not be right the next. This was the crux of my brother’s story. Prescribing him high doses of narcotics was unequivocally the right choice when he initially got sick. However as time went on the math changed and the choice became different. One of the keys to his recovery was having his doctor clearly explain that this was not a binary….the choice to take the drug was right for months, and then it became wrong. No one screwed up, but his condition got better and the balance changed. This also can come in to play in the broader lifespan…treatments given to children are typically screened more carefully for long term side effects than those given to the elderly.

Those are the basic building blocks right there. As I said before, when one shifts, the math shifts. For my brother, I’m just glad the odds all worked in his favor.

So Why ARE Most Published Research Findings False? The Corollaries

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here and Part 2  here.

Okay, first a quick recap: Up until now, Ioannidis has spent most of the paper providing a statistical justification for considering not just study power and p values, but also made a case for including pre-study odds, bias measures, and the number of teams working on a problem as items to look at when trying to figure out if a published finding is true or not. Because he was writing a scientific paper and not a blog post, he did a lot less editorializing than I did when I was breaking down what he did. In this section he changes all that, and he goes through a point by point breakdown of what this all means with a set of 7 6 corollaries. The words here in bold are his, but I’ve simplified the explanations. Some of this is a repeat from the previous posts, but hey, it’s worth repeating.

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. In part 1 and part 2, we saw a lot of graphs that showed good study power had a huge effect on result reliability. Larger sample sizes = better study power.

Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. This is partially just intuitive, but also part of the calculation for study power. Larger effect sizes = better study power. Interestingly, Ioannidis points out here that given all the math involved, any field looking for effect sizes smaller than 5% is pretty much never going to be able to confirm their results.

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. That R value we talked about in part 1 is behind this one. Pre-study odds matter, and fields that are generating new hypotheses or exploring new relationships are always going to have more false positives than studies that replicate others or meta-analyses.

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. This should be intuitive, but it’s often forgotten. I work in oncology, and we tend to use a pretty clear cut end point for many of our studies: death. Our standards around this are so strict that if you die in a car crash less than 100 days after your transplant, you get counted in our mortality statistics. Other fields have more wiggle room. If you are looking for mortality OR quality of life OR reduced cost OR patient satisfaction, you’ve quadrupled your chance of a false positive.

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. This one’s pretty obvious. Worth noting: he points out “trying to get tenure” and “trying to preserve ones previous findings” are both sources of potential bias.

Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This was part of our discussion last week. Essentially it’s saying that if you have 10 people with tickets to a raffle, the chances that one of you wins is higher than the chances that you personally win. If we assume 5% of positive findings happen due to chance, having multiple teams work on a question will inevitably lead to more false positives.

Both before and after listing these 6 things out, Ioannidis reminds us that none of these factors are independent or isolated. He gives some specific examples from genomics research, but then also gives this helpful table.  To refresh your memory, the 1-beta column is study power (influenced by sample size and effect size), R is the pre-study odds (varies by field), u is bias, and the “PPV” column over on the side there is the chance that a paper with a positive finding is actually true. Oh, and “RCT” is “Randomized Control Trial”:

I feel a table of this sort should hang over the desk of every researcher and/or science enthusiast.

Now all this is a little bleak, but we’re still not entirely at the bottom. We’ll get to that next week.

Part 4 is up! Click here to read it.

5 Examples of Linear Relationships (That are Only Partially Linear)

Man, that title probably isn’t winning me any clickbait awards.

Anyway, I was catching up on my blog reading this past weekend, and I was intrigued by the Assistant Village Idiot’s “Conservation of Fear” post. In it, he mentions the idea that most of us probably have some sort of baseline disposition towards the world, and that circumstances aren’t always as important as we think they are.  We frequently assume that as good things increase so does our mood, and as bad things increase our mood goes lower, but he asserts this may not always be the case. Of course this being the AVI, he immediately then walks back on that assertion and points out that some circumstances are really important, and that fixing those can make a big difference in mood. So basically as some good circumstances increase we could get a nice linear gain in happiness, but at a certain point the relationship probably cuts out.

This uneven effect issue is not actually all that uncommon in human behavior. While generally people want to find (or recite) nice linear relationships between things (ie x causes y), we often run in to situations where things aren’t that simple. Sometimes x makes y go up….but then you get to a certain level of x and suddenly x is totally irrelevant to y. Sometimes above a certain level x makes y go down. You get the picture. Or maybe you don’t. Regardless, here are some examples!

  1. Income and Personal Happiness We all know the famous saying “money can’t buy happiness”. However, as anyone who has ever gone without money can tell you, that’s crap. Well, partial crap. A few years ago an investment group did some analysis and figured out that more money does make you happier, but only up to a certain household income. After that, it’s pretty much a wash. Overall for the US the cutoff was $75K. Basically an increase in salary from $30K to $40K will make you happier, but one from $110K to $120K doesn’t have the same effect. The linear relationship occurs for low numbers, but not high ones. For the curious, here’s the state to state breakdown:  If you think about it, this makes a lot of sense. If money is a struggle, it affects your happiness. Once you’ve stopped struggling, it stops having the same effect. So basically it’s more accurate to say that money can’t buy happiness, but a lack of money sure can stress you out.
  2. GDP and Subjective Well Being Related to #1, but slightly different: it’s not just your personal income that helps your well being, your country’s GDP can play a role too. Again though, only to a point. Check out this graph from Our World in DataSo countries that struggle to develop do take their toll on their citizens, but at some point development stops yielding returns in well being. It would be interesting to see if the effect of personal wealth varied with country GDP, but alas I can’t find that data.
  3. Sexual frequency and housework divisions If my ranting about linear relationships that aren’t entirely linear sound familiar, it’s because I’ve brought this up before in my (oft Googled, less often read) Sex, Models and Housework post and the follow up. My first post was about a study that caused a stir when it claimed that men who did more housework had less sex. The follow up covered a study that rejected a linear model, and instead grouped respondents in to “traditional”, “egalitarian” and “counter-cultural” couples. Despite the claims of the original study, they found that the relationships were only really linear within the groups, but that it was 3 different linear relationships. Egalitarian couples had the most sex and satisfaction, traditional couples had slightly less, and counter-cultural couples did the worst. The model worked much better when the three groups were treated separately than when they were treated as a continuous group.
  4. Age at first marriage The conventional wisdom states that waiting a bit to get married is good for you. It turns out that’s true, until a point. For each year you wait to get married past the age of 20, your chance of divorce goes down 11%. However, once you get to 32, your chance of divorce actually starts going back up. Basically the divorce risk curve is now a parabola:
  5. Expenses and income I found a couple examples of this in this technical paper for statisticians on how to handle partially linear logistic regressions. Basically, the consumption of many household items goes up with household income until a certain point where it stays pretty steady. Things like gas, electricity, and many consumer goods fall in this category.  Interestingly, overall income and expenses actually increase sort of linearly with age from 20-44, then decrease sort of linearly with age from 45-75+

This is a good thing to watch out for in general, as it makes summarizing the trends a little trickier. If you leave out a key modifier or the limits, you could end up giving someone a wrong impression or encouraging people to extrapolate beyond the scope of the model, and that will make the statistician in your life very sad. Know your limits people, and the limits of your data set!

So Why ARE Most Published Research Findings False? Bias and Other Ways of Making Things Worse

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so if you missed the intro, check it out here and check out Part 1 here.

First, a quick recap: Last week we took a look at the statistical framework that helps us analyze the chances that any given paper we are reading found a relationship that actually exists. This first involves turning the study design (assumed Type 1 and Type 2 error rate) in to a positive predictive value….aka given the assumed error rate, what is the chance that a positive result is actually true. We then added in a variable R or “pre-study odds” which sought to account for the fact that some fields are simply more likely to find true results than others due to the nature of their work. The harder it is to find a true relationship, the less likely it is that any apparently true relationship you do find is actually true. This is all just basic math (well, maybe not basic math), and provides us the coat hook on which to hang some other issues which muck things up even further.

Like bias.

Oh, bias: Yes, Ioannidis talks about bias right up front. He gives it the letter “u” and defines it as “the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias“. Note that he is specifically focusing on research that is published claiming to have found a relationship between to things. He does mention that bias could be used to bury true findings, but that is beyond the current scope. It’s also probably less common simply because positive findings are less common. Anyway, he doesn’t address reasons for bias at this point, but he does add it in to his table to show how much it mucks about with the equations:

This pretty much confirms our pre-existing beliefs that bias makes everything messy. Nearly everyone knows that bias screws things up and makes things less reliable, but Ioannidis goes a step further and seeks to answer the question “how much less reliable?”  He helpfully provides these graphs (blue line is low bias of .05, yellow is high bias of .8):

Eesh. What’s interesting to note here is that good study power (the top graph) has a pretty huge moderating effect on all levels of bias over studies with low power (bottom graph). This makes sense since study power is influenced by sample size and the size of the effect your are looking for. While even small levels of bias (the blue line) influence the chance of a paper being correct, it turns out good study design can do wonders for your work.  To put some numbers on this, a well powered study with 30% pre-study odds with a positive finding has a 83% chance of being correct with no bias. If that bias is 5%, the chances drop to about 80%. If the study power is dropped, you have about a 70% chance of a true finding being real. Drop the study power further and you’re under 60%. Keep your statisticians handy folks.

Independent teams, or yet another way to muck things up: Now when you think about bias, the idea of having independent teams work on the same problems sounds great. After all, they’re probably not all equally biased, and they can confirm each other’s findings right?

Well, sometimes.

It’s not particularly intuitive to think that having lots of people working on a research question would make results less reliable, but it makes sense. For every independent team working on the same research question, the chances that one of them gets a false positive finding goes up. This is a more complicated version of the replication crisis, because none of these teams necessarily have to be trying the same method to address the question. Separating out what’s a study design issue and what’s a false positive is more complicated than it seems. Mathematically, the implications of this are kind of staggering. The number of teams working on a problem (n) actually increase some of the factors exponentially. Even if you leave bias out of the equation, this can have an enormous impact on the believability of positive results:

If you compare this to the bias graph, you’ll note that having 5 teams working on the same question actually decreases the chances of have a true positive finding more than having a bias rate of 20% does….and that’s for well designed studies. This is terrible news because while many people have an insight in to how biased a field might be and how to correct for it, you rarely hear people discuss how many teams are working on the same problem.  That Indeed, researchers themselves may not know how many people are researching their question. I mean, think about how this is reported in the press “previous studies have not found similar things”.  Some people take that as a sign of caution, but many more take that as “this is groundbreaking”. Only time can tell which one is which, and we are not patient people.

Now we have quite a few factors to take in to account. Along with the regular alpha and beta, we’ve added R (pre-study odds),  u (bias) and n (number of teams). So far we’ve looked at them all in isolation, but next week we’re going to review what the practical outcomes are of each and how they start to work together to really screw us up. Stay tuned.

Part 3 is up! Click here to read “The Corollaries”

You Are Number Six

Happy almost Thanksgiving everyone! I hope your travels are safe, your turkey is well done, and that your refrigerator doesn’t die with all of your Thanksgiving meal contributions in it like mine did yesterday.

If the thought of me cleaning a Popsicle sludge flood off my floor doesn’t cheer you, perhaps this will:

At a recent family party, an older relative asked my son how old he was. My son, currently 4, looked at her and said “I am NOT a number!”

My husband is convinced he’s quoting the Prisoner, which is a pretty advanced pop culture reference for a 4 year old IMHO, even if there is an Iron Maiden song that uses the clip. I personally think he’s either lashing out at me or following in my footsteps. One of the two.

Regardless, that kid has a way with words. Happy Thanksgiving!

And for those of you getting shown up on your pop culture knowledge by someone born in 2012, here you go:

So Why ARE Most Published Research Findings False? A Statistical Framework

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper bearing that name. If you missed the intro, check it out here.

Okay, so last week I gave you the intro to the John Ioannidis paper Why Most Published Research Findings are False. This week we’re going to dive right in with the first section, which is excitingly titled “Modeling the Framework for False Positive Findings“.

Ioannidis opens the paper with a review of the replication crisis (as it stood in 2005 that is) and announces his intention to particularly focus on studies that yield false positive results….aka those papers that find relationships between things where no relationship exists.

To give a framework for understanding why so many of these false positive findings exists, he creates a table showing the 4 possibilities for research findings, and how to calculate how large each one is. We’ve discussed these four possibilities before, and they look like this:

Now that may not look too shocking off the bat, and if  you’re not in to this sort of thing you’re probably yawning a bit. However, for those of us in the stats world, this is a paradigm shift.  See historically stats students and researchers have been taught that the table looks like this:

basic2by2

This table represents a lot of the decisions you make right up front in your research, often without putting much thought in to it. Those values are used to drive error rates, study power and confidence intervals:

type1andtype2

The alpha value is used to drive the notorious “.05” level used in p-value testing, and is the chances that you would see a relationship more extreme than the one you’re seeing due to random chance.

What Ioannidis is adding in here is c, or the overall number of relationships you are looking at, and the R, which is the overall proportion of true findings to false findings in the field. Put another way, this is the “Pre-Study Odds”. It asks researchers to think about it up front: if you took your whole field and every study ever done in it, what would you say the chances of a positive finding are right off the bat?

Obviously R would be hard to calculate, but it’s a good add in for all researchers. If you have some sense that your field is error prone or that it’s easy to make false discoveries, you should be adjusting your calculations accordingly. Essentially he is asking people to consider the base rate here, and to keep it front and center.  For example, a drug company that has carefully vetted it’s drug development process may know that 30% of the drugs that make it to phase 2 trials will ultimately prove to work. On the other hand, a psychologist attempting to create a priming study could expect a much lower rate of success. The harder it is for everyone to find a real relationship, the greater the chances that a relationship you do find will also be a false positive. I think requiring every field to come up with an R would be an enormously helpful step in and of itself, but Ioannidis doesn’t stop there.

Ultimately, he ends up with an equation for the Positive Predictive Value (aka the chance that a positive result is true aka PPV aka the chance that a paper you read is actually reporting a real finding) which is PPV = (1 – β)R/(R – βR + α). For a study with a typical alpha and a good beta (.05 and .2, respectively), here’s what that looks like for various values of R:

prestudyvspoststudy

So the lower the pre-study odds of success, the more likely it is that a finding is a false positive rather than a true positive. Makes sense right?

Now most readers will very quickly note that this graph shows that you have a 50% chance of being able to trust the result at a fairly low level of pre-study odds, and that is true. Under this model, the study is more likely to be true than false if (1 – β)R > α. In the case of my graph above, this translates in to pre-study odds that are greater than 1/16. So where do we get the “most findings are false” claim?

Enter bias.

You see, Ioannidis was setting this framework up to remind everyone what the best case scenario was. He starts here to remind everyone that even within a perfect system, some fields are going to be more accurate than others simply due to the nature of the investigations they do, and that no field should ever expect that 100% accuracy is their ceiling. This is an assumption of the statistical methods used, but this assumption is frequently forgotten when people actually sit down to review the literature. Most researchers would not even think of claiming that their pre-study odds were more than 30%, yet very few would say off the top “17% of studies finding significant results in my field are wrong”, yet that’s what the math tells us. And again, that’s in a perfect system. Going forward we’re going to add more terms to the statistical models, and those odds will never get better.

In other words, see you next week folks, it’s all down hill from here.

Click here to go straight to part 2.

Voter Turnout vs Closeness of Race

I’ve seen a lot of talk about the Electoral College this past week, and discussion about whether or not the system is fair. I’m not particularly going to wade in to this one, but I did get curious if the closeness of the presidential race in a state influenced voter turnout overall. Under the current system, it would stand to reason that voters in states that have large gaps between the two parties (and thus know ahead of time which way their state is going to go) would be less motivated to vote than those living in states with close races. While other races are typically happening in most states that could drive voter turnout, we know that elections held during the presidential election have better turnout than midterm elections by a significant margin. The idea that being able to cast a vote for the president is a big driver of turnout seems pretty solid.

What I wanted to know is if the belief that you’re going to count a potentially “meaningful” vote in an election an even further driver of turnout. With all the commentary about the popular vote vs electoral college and with some petitioning to retroactively change the way we count the votes, it seemed relevant to know if the system we went in to voting day with had a noticeable impact on who voted.

While not all the numbers are final yet, I found voter turnout by state here, and the state results here.  I took the percent of the vote of the winning candidate and subtracted the percent of the vote of the second place candidate to get the % lead number, and plotted that against the voter turnout. Here’s the graph:

votingdiff

The r-squared is about 26.5% for an r of .5.  I didn’t take in to account any other races on the ballot, but I think it’s safe to at least theorize that believing your state is a lock in one direction or the other influences voter turnout. Obviously this provides no comment on how other systems would change things from here, only how people behave under the system we have today.

For those curious, here’s an uglier version of the same graph with state names:

votediffstatenames

It’s interesting to note that the Utah vote got split by McMullin, so the percent lead there is a bit skewed.

A few other fun facts:

  • The average turnout in states where the presidential race was close (<5% between the winning candidate and second place) was 65% vs 58% for all other states. A quick ANOVA tells me this is a statistically significant difference.
  • Once the gap between the winner and second place gets over 10%, things even out. States with a gap of 10-20% have about 58% voter turnout, and those with an over 20% gap have about a 57% voter turnout. Some of this may be even out as states with large gaps also likely take their time with counting their votes.
  • My state (Massachusetts) is one of the weird lopsided but high turnout states, and we had some really contentious ballot questions: charter schools expansion and recreational marijuana.

Again, none of this speaks to whether or not the process we have is a good one, but it’s important to remember that the rules in play at the time people make a decision tend to influence that decision.

I’ll update the post if these margins change significantly as more votes are counted.

What I’m Reading: November 2016

Like everyone else in the US, I’ve been reading a decent amount about the election. I have a few links of interest on that topic, but out of respect for the totally burned out folks, I have put those together at the end. I will however reiterate that I think this 2014 post from Slate Star Codex remains the most important blog post about the current political climate I have ever read.

Speaking of important blog posts, the Assistant Village Idiot’s “Underground DSM” post has been updated for 2016 and it continues to be one of the best pieces on mental health I have read. This needs to be a whole book AVI.

This month my book is “The Joy of X“, which I haven’t started reading yet. I’m hopping on a plane tomorrow morning though, so I plan on getting through most of it then. Also, I’m trying to put together a list of math or stats related books I want to read in 2017 (like this one from 2016), so if you have any recommendations I want to hear them!

An interesting piece on testing for fake data in research. The testing exploits the fact that making up realistic “random” data is a hell of a lot harder than it sounds.

For my teacher friends: the Mathematica Policy Research group took a look at teacher quality to see if that drove performance differences between low income and high income students, and it doesn’t. Poor kids and rich kids are actually equally likely to have good or bad math teachers:

On an academic note, here’s a ranking of colleges and their acceptance of viewpoint diversity.

I have no idea how accurate this primate hands family tree is, but it’s kind of awesome.

Ben, who I collaborated with on a Pop Science series earlier this year, did a series on the (possible?) death knell of the Pumpkin Spice craze by buying every Pumkin Spice product his store had to offer.

Okay, now below this line be politics:

*********************************************************************************************

This NYTs article is from August, but it covers a lot of interesting ground about how Facebook is skewing the way we talk about politics. I’ve put myself on FB timeouts more than once this election season, and I’ve enjoyed it every time.

There’s been a lot of talk about the electoral college this week, and whether or not it’s fair. This is one of those discussions that is sort of about numbers, but really about something else, so I’m not going in to that here. What I am interested in is Maine’s new experiment with ranked choice voting. More labor intensive to tally, but it’s got some interesting quirks that may change incentives for campaigns. A full explanation of how it works here.

A second Slate Star Codex link, but it’s too good not to share. Written the night before the election, he reminds everyone that in a close election over interpretation of the outcome is a dangerous game.

Also, I’ve gotten a request to start holding “Controversial Opinion” dinner parties. I kind of want to do this. There will be wine.