GPD Most Popular Posts of 2016

Well hello hello and happy (almost) New Year! As 2016 winds down and I recover from my 3 straight days of Christmas parties, I thought I’d take a look at my stats and do a little recap of the most popular posts on this blog for 2016.  To be clear, these are posts that were written during the year 2016 and the traffic is for 2016. I lumped some series together because otherwise it got a little too confusing, so technically a few 2015 posts snuck in here. I hope you’ll forgive me. And links are included, in case you want to catch up on something you missed.

  1. How Do They Call Elections So Early? A simple question sent to me by a family member on Facebook quickly became my most popular post of the year. Got posted by someone on Twitter with the caveat “Trigger Warning: Math”, which is definitely my favorite intro ever.
  2. Immigration, Poverty, and Gumballs Another response to a  reader question “hey what do you think of this video?” that turned in to a meta-rant about mathematical demonstrations that distract from the real issues being addressed. This post briefly ended up on Roy Beck’s Wikipedia page as “notable criticism” of his video, but he still never returned my email. Bummer.
  3. Intro to Internet Science (series) While technically this series started in 2015, the bulk of it (Parts 3-10) were posted in 2016 so I’m letting it make the list. This got a boost from a few teachers assigning it to their classes, which was pretty awesome.
  4. 6 Examples of Correlation/Causation Confusion Read the post that got the comment “a perfect example of what makes this blog so great”. Sure, it was my brother who said that, but I still take it as a win.
  5. 5 Example of Bimodal Distributions (None of Which Are Human Height) Pretty sure this one is just getting cribbed for homework assignments.
  6. 5 Things You Should Know About the Great Flossing Debate of 2016 This post got me my favorite compliment of the year, when my own dentist told me I “did a pretty good job” with this. I get weirdly overexcited about praise from my dentist.
  7. Pop Science (series) Probably my favorite thing I (co) wrote in 2016, the Pop Science series with Ben was REALLY fun.
  8. 5 Things You Should Know About Medical Errors and Mortality Because a good cause o’ death discussion always brings the clicks.
  9. 5 Studies About Politics and Bias to Get You Through Election Season I am told this was fairly helpful.
  10. More Sex, More Models, More Housework I think it’s my witty writing style that makes this one so popular.

While they didn’t get the same type of traffic, I have to say I enjoyed writing/inventing the Tim Tebow Fallacy and the Forrest Gump Fallacy, and working through my feelings in Accidental Polymath Problems.

Of course I would love to hear your favorites/comments/suggestions for 2017, and if you have any fun reader questions, feel free to send them my way as well.

Happy new year everyone, and thanks for reading!

So Why AREN’T Most Published Research Findings False? The Rebuttal

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here ,Part 2  here,  Part 3 here, Part 4 here, and Part 5 here.

Okay people, we made it! All the way through one of the most cited research papers of all time, and we’ve all lost our faith in everything in the process. So what do we do now? Well, let’s turn the lens around on Ioannidis. What, if anything, did he miss and how do we digest this paper? I poked around for a few critiques of him, just to give a flavor. This is obviously not a comprehensive list, but it hits the major criticisms I could find.

The Title While quite a few people had no problem with the contents of Ioannidis’s paper, some took real umbrage with the title, essentially accusing it of being clickbait before clickbait had really gotten going. Additionally, since many people never read anything more than the title of a paper, a title that blunt is easily used as a mallet by anyone trying to disprove any study they chose. Interestingly, there’s apparently some question regarding whether or not Ioannidis actually wrote the title or if it was the editors at Plos Medicine, but the point stands. Given that misleading headlines and reporting are hugely blamed by many (including yours truly) for popular misunderstanding of science, that would be a fascinating irony.

Failing to reject the null hypothesis does not mean accepting the null hypothesis This is not so much a criticism of Ioannidis as it is of those who use his work to promote their own causes. There is a rather strange line of thought out there that seems to believe that life, or science, is a courtroom. Under this way of thinking, when you undermine a scientist and their hypothesis, your client is de facto not guilty. This in not true. If you somehow prove that chemotherapy is less effective than previously stated, that doesn’t actually mean that crystals cure cancer. You never prove the null hypothesis, you only fail to reject it.

The definition of bias contained more nuance In a paper written in response to the Ioannidis paper, some researchers from Johns Hopkins took umbrage with the presentation of “bias” in the paper. Their primary grouse seemed to be intent vs consequence. Ioannidis presents bias as a factor based on consequence, i.e. the way it skews the final results. They disliked this and believed bias should be based on intent, pointing out numerous ways in which things Ioannidis calls “bias” could creep in innocently. For example, if you are looking for a drug that reduces cardiac symptoms but you also find that mortality goes down for patients who take the medication, are you really not going to report that because it’s not what you were originally looking for? By the strictest definition this is “data dredging”, but is it really? Humans aren’t robots. They’re going to report interesting findings where they see them.

The effect of multiple teams This is one of the more interesting quibbles with the initial paper. Mathematically, Ioannidis proved that having multiple teams working on the same research question would increase the chances of a false result. In the same Hopkins paper, the researchers question the math behind the “multiple teams lead to more false positives” assertion. They mention that for any one study, the odds stay the same as they always have been. Ioannidis counters with an argument that boils down to “yes, if you assume those in competition don’t get more biased”.  Interestingly, later research has shown that this effect does exist and is much worse in fields where the R factor (pre-study odds) is low.

So overall, what would I say are the major criticisms or cautions around this paper that I personally will employ?

  1. If you’re citing science, use scientific terms precisely. Don’t get sloppy with verbage just to make your life easier.
  2. Remember, scientific best practices all feed off each other Getting a good sample size and promoting caution can reduce both overall bias and the effect of bias that does exist. The effect of multiple team testing can be partially negated by high pre-study odds. If a team or researcher employs most best practices but misses one, that may not be a death blow to their research. Look at the whole picture before dismissing the research.
  3. New is exciting, but not always reliable We all like new and quirky findings, but we need to let that go. New findings are the least likely to play out later, and that’s okay. We want to cast a broad net, but for real progress we need a longer attention span.
  4. Bias takes many forms When we mention “bias” we often jump right to financial motivations. But intellectual and social pressure can be bias, competing for tenure can cause bias, and confirming ones own findings can cause bias.
  5. There are more ways of being wrong than there are ways of being right Every researcher wants a true finding. They really do. No one wants their life’s work undone. While some researchers may be motivated by results they like, I do truly believe that the majority of problems are caused by the whole “needle in a haystack” thing more than the “convenient truth” thing.

Alright, that wraps us up! I enjoyed this series, and may do more going forward. If you see a paper that piques your interest, let me know and I’ll look in to it.  Happy holidays everyone!

 

 

In Defense of Fake News

Fake news is all the rage these days. We’re supposed to hate it, to loathe it, to want it forever banned from our Facebook feeds, and possibly give it credit for/blame it for the election results. Now, given my chosen blog topics and my incessant preaching of internet skepticism, you would think I would be all in on hating fake news.

Nah,  too easy.

Instead I’m going to give you 5 reasons why I think the hate for fake news is overblown. Ready? Here we go!

  1. Fake news doesn’t have a real definition Okay, yeah I know. Fake news is clear. Fake news is saying that Hillary Clinton ran a child prostitution ring out of a DC pizza place. That’s pretty darn clear, right? Well, is it? The problem is that while there are a few stories that are clearly “fake news”, other things aren’t so clear. One mans “fake news” is another mans “clear satire”, and one woman’s “fake news” is another’s “blind item”.  Much of the controversy around fake news seems to center around the intent of the story (ie to deceive or make a particular candidate look bad), but that intention is quite often a little opaque. No matter what standard you set, someone will find a way to muddy the water.
  2. Fake news is just one gullible journalist away from being a “hoax” Jumping off point #1, let’s remember that even if Facebook bans “fake news” you still are going to be seeing fake news in your news feed. Why? Because sometimes journalists buy it. See if you or I believe a fake story, we “fell for fake news”. If a journalist with an established audience does it, it’s a “hoax”. Remember Jackie from Rolling Stone? Dan Rather and the Killian documents? Or Yasmin Seweid from just last month? All were examples of established journalists getting duped by liars and reporting those lies as news. You don’t always even need a person to spearhead the whole thing. For example, not too long ago a research study made headlines because it claimed eating ice cream for breakfast made you smarter. Now skeptical readers (ie all of you) will doubt that finding was founded, but you’d be reasonable in assuming the study at least existed. Unfortunately your faith would be unfounded, as Business Insider pointed out that no one reporting on this had ever seen the study.  Every article pointed to an article in the Telegraph which pointed to a website that claimed the study had been done, but the real study was not locatable. It may still be out there somewhere, but it ludicrously irresponsable of so many news outlets to publish it without even making sure it existed.
  3. Fake news can sometimes be real news In point #1, I mentioned that it was hard to actually put a real definition on “fake news”. If one had to try however, you’d probably say something like “a malaciously false story by a disreputable website or news group that attempts to discredit someone they don’t like”. That’s not a bad definition, but it is how nearly every politician initially categorizes every bad story about themselves. Take John Edwards for example, whose affair was exposed by the National Inquirer in 2007. At the time, his attorney said “”The innuendos and lies that have appeared on the internet (sic) and in the National Enquirer concerning John Edwards are not true, completely unfounded and ridiculous.” It was fake news, until it wasn’t. Figuring out what’s fake and who’s really hiding something isn’t always as easy at it looks.
  4. Fake news probably doesn’t change minds Now fake news obviously can be a huge problem. Libel is against the law for a reason, and no one should knowingly make a false claim about someone else. It hurts not just the target, but can hurt innocent bystanders as well.  But aside from that, people get concerned that these stories are turning people against those they would otherwise be voting for. Interestingly, there’s not a lot of good evidence for this. While the research is still new, the initial results suggest that people who believe bad things about a political candidate probably already believed those things, and that seeing the other side actually makes them more adament about what they already believed.  In other words, fake news is more a reflection of pre-existing beliefs than a creater of those beliefs.
  5. Fake news might make us all more cautious There’s an interesting paradox in life that sometime by making things safer you actually make them more dangerous. The classic example is roads: the “safer” and more segregated (transportation mode-wise) roads are, the more people get in to accidents. In areas where there is less structure, there are fewer accidents. Sometimes a little paranoia can go a long way. I think a similar effect could be caused by fake news. The more we suspect someone might be lying to us, the more we’ll scrutinize what we see. If Facebook starts promising that they’ve “screened” fake news out, it gives everyone an excuse to stop approaching the news with skepticism. That’s a bad move. While I reiterate that I never support libel, lying or “hoaxes”, I do support constant vigilance against “credible” news sources. With the glut of media we are exposed to, this is a must.

To repeat for the third time, I don’t actually support fake news.  Mostly this was just an exercise in contrarianism. But sometimes bad things can have upsides, and sometimes paranoia is just good sense.

So Why ARE Most Published Research Findings False? A Way Forward

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here ,Part 2  here,  Part 3 here, and Part 4 here.

Alright guys, we made it! After all sorts of math and bad news, we’re finally at the end. While the situation Ioannidis has laid out up until now sounds pretty bleak, he doesn’t let us end there. No, in this section “How Can We Improve the Situation” he ends with both hope and suggestions.  Thank goodness.

Ioannidis starts off with the acknowledgement that we will never really know which research findings are true and which are false. If we had a perfect test, we wouldn’t be in this mess to begin with. Therefore, anything we do to improve the research situation will be guessing at best. However, there are things that it seems would likely do some good. Essentially they are to improve the values of each of the “forgotten” variables in the equation that determines the positive predictive value of findings. These are:

  1. Beta/study power: Use larger studies or meta-analyses aimed at testing broad hypotheses
  2. n/multiple teams: Consider a totality of evidence or work done before concluding any one finding is true
  3. u/Bias: Register your study ahead of time, or work with other teams to register your data to reduce bias
  4. R/Pre-study Odds: Determine the pre-study odds prior to your experiment, and publish your assessment with your results

If you’ve been following along so far, none of those suggestions should be surprising to you. Let’s dive in to each though:

First, we should be using larger studies or meta-analyses that aggregate smaller studies. As we saw earlier, large sample size = higher study power -> blunts the impact of bias.  That’s a good thing. This isn’t fool proof though, as bias can still slip through and a large sample size means very tiny effect sizes can be ruled “statistically significant”. These studies are also hard to do because they are so resource intensive. Ioannidis suggests that large studies be reserved for large questions, though without a lot of guidance on how to do that.

Second, the totality of the evidence. We’ve covered a lot about false positives here, and Ioannidis of course reiterates that we should always keep them in mind. One striking finding should almost never be considered definitive, but rather compared to other similar research.

Third, steps must be taken to reduce bias. We talked about this a lot with the corollaries, but Ioannidis advocates hard that groups should tell someone else up front what they’re trying to do. This would (hopefully) reduce the tendency to say “hey, we didn’t find an effect for just the color red, but if you include pink and orange as a type of red, there’s an effect!”. Trial pre-registration gets a lot of attention in the medical world, but may not be feasible in other fields. At the very least, Ioannidis suggests that research teams share their strategy with each other up front, as a sort of “insta peer review” type thing. This would allow researchers some leeway to report interesting findings they weren’t expecting (ie “red wasn’t a factor, but good golly look at green!”) while reducing the aforementioned “well if you tweak the definition of red a bit, you totally get a significant result”.

Finally, the pre-study odds. This would be a moment up front for researchers to really assess how likely they are to find anything, and a number for others to use later to judge the research team by. Almost every field has a professional conference, and one would imagine determining pre-study odds for different lines of inquiry would be an interesting topic for many of them. Encouraging researchers to think up front about their odds of finding something interesting would be an interesting framing for everything yet to come.

None of this would fix everything, but it would certainly inject some humility and context in to the process from the get go. Science in general is supposed to be a way of objectively viewing the world and describing what you find. Turning that lens inward should be something researchers welcome, though obviously that is not always the case.

In that vein, next week I’ll be rounding up some criticisms of this paper along with my wrap up to make sure you hear the other side. Stay tuned!

 

Christmas Gifts For Stats and Data People

Well howdy? Only 11 days until Christmas, and I have in no way shape or form finished my shopping. I’m only ever good at coming up with gift ideas when they’re not actually needed, so I thought now was the perfect time to make a list of things you could get a statistician/data person in your life, if you were so inclined. Of course any of the books on my reading list here are pretty good, but if you’re looking for something more specific, read on!

  1. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century This was my December book, and it was phenomenal. An amazing history of statistical thought in science and the personalities that drove it. If you ever wanted to know who the “student” was in “student’s t distribution”, this is the book for you. Caveat: If you don’t understand that previous sentence, I’d skip this one.
  2. Statistical dinosaurs. You had me at “Chisquareatops“. Or maybe “Stegonormalus“.
  3. A pound of dice. Or cards. Or bingo balls.  Because you never know when you may have to illustrate probability theory on the fly. (Bonus: these “heroes of science” playing cards are extra awesome)
  4. For the music lover Prints of pop song infographics. Data visualization taken to the next level.
  5. Art supplies Maybe an x-y axis stamp or grid post it notes?
  6. 2016 year in review The best infographics of 2016 or the best mathematics writing of 2016. The first one is out already, but you’ll have to wait until March for that second one.

So Why ARE Most Published Research Findings False? Bias bias bias

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here ,Part 2  here, and Part 3 here.

Alright folks, we’re almost there. We covered a lot of mathematical ground here and last week ended with a few corollaries. We’ve seen the effects of sample size, study power, effect size, pre-study odds, bias and the work of multiple teams. We’ve gotten thoroughly depressed, and we’re not done yet. There’s one more conclusion we can draw, and it’s a scary one. Ioannidis holds nothing back, and he just flat out calls this section “Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias“. Okay then.

To get to this point, Ioannidis lays out a couple of things:

  1. Throughout history, there have been scientific fields of inquiry that later proved to have no basis….like phrenology for example. He calls these “null fields”.
  2. Many of these null fields had positive findings at some point, and in a number great enough to sustain the field.
  3. Given the math around positive findings, the effect sizes in false positives due to random chance should be fairly small.
  4. Therefore, large effect sizes discovered in null fields pretty much just measure the bias present in those fields….aka that “u” value we talked about earlier.

You can think about this like a coin flip. If you flip a fair coin 100 times, you know you should get about 50 heads and 50 tails. Given random fluctuations, you probably wouldn’t be too surprised if you ended up with a 47-53 split or even a 40-60 split. If you ended up with an 80-20 split however, you’d get uneasy. Was the coin really fair?

The same goes for scientific studies. Typically we look at large effect sizes as a good thing. After all, where there’s smoke there’s fire, right? However, Ioannidis points out that large effect sizes are actually an early warning sign for bias. For example, lets say you think that your coin is weighted a bit, and that you will actually get heads 55% of the time you flip it. You flip it 100 times and get 90 heads. You can react in one of 3 ways:

  1. Awesome, 90 is way more than 55 so I was right that heads comes up more often!
  2. Gee, there’s a 1 in 73 quadrillion chance that 90 heads would come up if this coin were fairly weighted. With the slight bias I thought was there, the chances of getting the results I did is still about 1 in 5 trillion. I must have underestimated how biased that coin was.
  3. Crap. I did something wrong.

You can guess which ones most people go with. Spoiler alert: it’s not #3.

The whole “an unexpectedly large effect size should make you nervous” phenomena is counterintuitive, but I’ve actually blogged about it before. It’s what got Andrew Gelman upset about that study that found that 20% of women were changing their vote around their menstrual cycle, and it’s something I’ve pointed out about the whole 25% of men vote for Trump if they’re primed to think about how much money their wives make.  Effect sizes of that magnitude shouldn’t be cause for excitement, they should be cause for concern. Unless you are truly discovering a previously unknown and overwhelmingly large phenomena, there’s a good chance some of that number was driven by bias.

Now of course, if your findings replicate, this is all fine, you’re off the hook. However if they don’t, the largeness of your effect size is really just a measure of your own bias. Put another way, you can accidentally find a 5% vote swing that doesn’t exist just because random chance is annoying like that, but to get numbers in the 20-25% range you had to put some effort in.

As Ioannidis points out, this isn’t even a problem with individual researchers, but in how we all view science. Big splashy new results are given a lot of attention, and there is very little criticism if the findings fail to replicate at the same magnitude. This means that a researchers have nothing but incentives to make sure the effect sizes they’re seeing as big as possible. In fact Ioannidis has found (in a different paper) that about half the time the first paper published on a topic shows the most extreme value ever found. That is way more than what we would expect to see if it were up to random chance. Ioannidis argues that by figuring out exactly how far these effect sizes deviate from chance, we can actually measure the level of bias.

Again, not a problem for those researchers who replicate, but something to consider for those who don’t. We’ll get in to that next week, in our final segment: “So What Can Be Done?”.

5 Things to Remember About Prescription Drugs

This past week, I had the tremendous pleasure of seeing one of my brother’s articles on the cover of the December issue of Christianity Today as part of a feature on pain killers. While my brother has done a lot of writing for various places over the years, his article “How Realizing My Addiction Had Chosen Me Began My Road to Recovery” was particularly special to see. In it, he recounts his story of getting addicted to pain killers after a medical crisis, and details his road to recovery. Most of the story is behind a paywall, but if you want a full copy leave me a comment or use the get in touch form and I’ll send you the word document.

As someone who was intimately involved with all of the events relayed in the article, it’s pretty self evident why I enjoyed reading it as much as I did. On a less personal note though, I thought he did a great job bringing awareness to an often overlooked pathway to addiction: a legitimate medical crisis. My brother’s story didn’t start at a party or with anything even remotely approaching “a good time”. His story started in the ER, moved to the ICU, and had about 7 months of not being able to eat food by mouth at the end. His bout with necrotizing pancreatitis was brutal, and we were on edge for several months as his prognosis shifted between “terrible” and “pretty bad”.

Through all that, the doctors had made decisions to put him on some major pain killers. Months later, when things were supposed to be improving, he found that his stomach was still having trouble, and went back to his doctor for more treatment. It was only then that he was told he had become an addict. The drugs that had helped save his life were now his problem.

Obviously he tells the rest of the story (well, all of the story) better than I do, so you should really go read it if your interested. What I want to focus on is the prescribing part of this. When talking about things like “the opioid crisis”, it’s tempting for many people to label these drugs as “good” or “bad”, and I think that misses the point (note to my brother who will read this: you didn’t make this mistake. I’m just talking in general here. Don’t complain about me to mom. That whole “stop sciencing at your brother” lecture is getting old). There’s a lot that goes in to the equation of whether or not a drug should be prescribed or even approved by the FDA, and a shift in one can change the whole equation.  Also, quick note, I’m covering ideal situations here. I am not covering when someone just plain screws up, though that clearly does happen:

  1. Immediate risk (acute vs chronic condition) In the middle of a crisis when life is on the line, it shouldn’t surprise anyone that “keeping you alive” because the primary endpoint. This should be obvious, but after a few years of working in an ER, I realized it’s not always so clear to people. For example, you would not believe the number of people who come in to the ER unconscious after a car accident or something who later come in and complain that their clothes were cut off. In retrospect it feels obvious to them that a few extra minutes could have been taken to preserve their clothing, but the doctors who saw them in that moment almost always feel differently. Losing even one life because you were attempting to preserve a pair of jeans is not something most medical people are willing to do. A similar thing happens with medications. If there is a concern your life is in danger, the math is heavily weighted in favor of throwing the most powerful stuff we have at the problem and figuring out the consequences later. This is what kicked off the situation my brother went through. At points in his illness they put his odds of making it through the night at 50-50. Thinking about long term consequences was a luxury he wasn’t always able to afford.
  2. Side effects vs effect of condition The old saying “the cure is worse than the disease” speaks to this one, and sometimes it’s unfortunately true. Side effects of a drug always have to be weighed against the severity of the condition they are treating. The more severe the condition, the more severe the allowable side effects. A medication that treats the common cold has to be safer than a medication that treats cancer. However, just because the side effects are less severe than the condition doesn’t mean they are okay or can’t be dangerous themselves (again, think chemotherapy for cancer), but for severe conditions trade offs are frequently made.  My brother had the misfortune of having one of the most painful conditions we know of, and the pain would have literally overwhelmed his system if nothing had been done. Prescription drugs don’t appear out of nowhere, and always must be compared to what they are treating when deciding if they are “good” or “bad”.
  3. Physical vs psychological/emotional consequences One of the more interesting grey areas of prescription drug assessment is the trade off between physical consequences vs psychological and emotional consequences. For better or worse, physical consequences are almost always given a higher weight than psychological/emotional consequences. This is one of the reasons we don’t have male birth control. For women, pregnancy is a physical health risk, for men, it’s not. If hormonal birth control increases a woman’s chances of getting blood clots, that’s okay as long as it’s still less impactful than pregnancy. For men however, there’s no such physical consequence and therefore the safety standards are higher. The fact that many people might actually be willing to risk physical consequences to prevent the emotional/psychological/financial consequences isn’t given as much weight as you might think. The fact that my brother got a doctor who helped him manage both of these was fantastic. His physical crutch had become a mental and emotional crutch, and the beauty of his doctor was that he didn’t underestimate the power of that.
  4. Available alternatives Drugs are not prescribed in vacuum, and it’s important to remember they are not the end all be all of care. If other drugs (or lifestyle changes) are proven to work just as well with fewer side effects, those may be recommended. In the case of my brother, his doctor helped him realize that mild pain was actually better than the side effects of the drugs he was taking. For those with chronic back pain, yoga may be preferable. This of course is also one of the arguments for things like legalized marijuana, as it’s getting harder to argue that those side effects are worse than those of opioids.
  5. Timing (course of condition and life span) As you can see from 1-4 above, there are lots of balls in the air when it comes to prescribing various drugs. Most of these factors actually vary over time, so a decision that is right one day may not be right the next. This was the crux of my brother’s story. Prescribing him high doses of narcotics was unequivocally the right choice when he initially got sick. However as time went on the math changed and the choice became different. One of the keys to his recovery was having his doctor clearly explain that this was not a binary….the choice to take the drug was right for months, and then it became wrong. No one screwed up, but his condition got better and the balance changed. This also can come in to play in the broader lifespan…treatments given to children are typically screened more carefully for long term side effects than those given to the elderly.

Those are the basic building blocks right there. As I said before, when one shifts, the math shifts. For my brother, I’m just glad the odds all worked in his favor.

So Why ARE Most Published Research Findings False? The Corollaries

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here and Part 2  here.

Okay, first a quick recap: Up until now, Ioannidis has spent most of the paper providing a statistical justification for considering not just study power and p values, but also made a case for including pre-study odds, bias measures, and the number of teams working on a problem as items to look at when trying to figure out if a published finding is true or not. Because he was writing a scientific paper and not a blog post, he did a lot less editorializing than I did when I was breaking down what he did. In this section he changes all that, and he goes through a point by point breakdown of what this all means with a set of 7 6 corollaries. The words here in bold are his, but I’ve simplified the explanations. Some of this is a repeat from the previous posts, but hey, it’s worth repeating.

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. In part 1 and part 2, we saw a lot of graphs that showed good study power had a huge effect on result reliability. Larger sample sizes = better study power.

Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. This is partially just intuitive, but also part of the calculation for study power. Larger effect sizes = better study power. Interestingly, Ioannidis points out here that given all the math involved, any field looking for effect sizes smaller than 5% is pretty much never going to be able to confirm their results.

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. That R value we talked about in part 1 is behind this one. Pre-study odds matter, and fields that are generating new hypotheses or exploring new relationships are always going to have more false positives than studies that replicate others or meta-analyses.

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. This should be intuitive, but it’s often forgotten. I work in oncology, and we tend to use a pretty clear cut end point for many of our studies: death. Our standards around this are so strict that if you die in a car crash less than 100 days after your transplant, you get counted in our mortality statistics. Other fields have more wiggle room. If you are looking for mortality OR quality of life OR reduced cost OR patient satisfaction, you’ve quadrupled your chance of a false positive.

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. This one’s pretty obvious. Worth noting: he points out “trying to get tenure” and “trying to preserve ones previous findings” are both sources of potential bias.

Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This was part of our discussion last week. Essentially it’s saying that if you have 10 people with tickets to a raffle, the chances that one of you wins is higher than the chances that you personally win. If we assume 5% of positive findings happen due to chance, having multiple teams work on a question will inevitably lead to more false positives.

Both before and after listing these 6 things out, Ioannidis reminds us that none of these factors are independent or isolated. He gives some specific examples from genomics research, but then also gives this helpful table.  To refresh your memory, the 1-beta column is study power (influenced by sample size and effect size), R is the pre-study odds (varies by field), u is bias, and the “PPV” column over on the side there is the chance that a paper with a positive finding is actually true. Oh, and “RCT” is “Randomized Control Trial”:

I feel a table of this sort should hang over the desk of every researcher and/or science enthusiast.

Now all this is a little bleak, but we’re still not entirely at the bottom. We’ll get to that next week.

Part 4 is up! Click here to read it.