What I’m Reading: Farming, the Future, Risk, AI, and Other Things Keeping Me Up At Night

Note: I started trying to do a regular reading list here, but my reading list has sent me in to an existential tailspin this month, so I’m going to just reflect a little on all of that, talk about some farming, and then I’m going to remind you that artificial intelligence is probably the biggest threat to humanity you haven’t bothered worrying about today. Figured you may want that heads up.

I don’t normally read novels, but my farmer brother loves Wendell Berry and has been encouraging me to read Jayber Crow for quite some time and a few weeks ago I actually got around to it. It’s one of his “Port Williams” novels, which all take place from the perspectives of various members of a fictitious town in Kentucky starting in the 1920s and ending in the 1970s. There’s a lot of religious and theological themes to the novel that quite a few of my readers will probably have opinions about, but that’s not what caught my eye. What intrigued me was how in 300 pages or so, the novel takes the main character from a young man to an old man, and the reflections on how his slice of America and their approach to land changed in that time, and the impact it had on the community. If you don’t have a farming philosophy (or didn’t spend most of your childhood around people who had one, whether they called it that or not), this may not strike you as much as it did me. I grew up hearing about my grandfather’s (who would have been about Jayber’s age) approach and how my uncle tried to change it, how my other Uncle took it over when my grandfather died, and then how my brother continues the tradition. Land use  as a reflection of greater social change is kind of a thing in my family, and it was interesting to see that captured in an novel format. The subtle influence of technology on the perceptions of land and farming were also rather fascinating. Also at this point it’s kind of nice to read a recounting of the 20th century that’s not entirely Boomer-centric.

Concurrent with that book, I also read “But What If We’re Wrong? Thinking About the Present As If It Were the Present” by Chuck Klosterman. In it, Klosterman reflects on all the ways we talk about the past, and continuously reminds us that people in 100 years will remember us quite differently than we like to think. He points out that we all know this in general, but if you point to anything specific, our first reaction is to get defensive and explain that whatever particular idea we’re discussing is one of the ones that will endure. We’re willing to acknowledge that the future will be different, but only if that difference is familiar.

To further mess with my self-perception, I still haven’t entirely recovered from reading Antifragile a few months ago. There’s a lot of good stuff in that book (including a discussion of the Lindy Effect, a helpful rule of thumb for what ideas will actually persist in 100 years), but what Taleb is really famous for is his concern about Black Swans. Black Swans are events that are unexpected. They are hard to predict. They are not really what we were focusing on. They shape history dramatically, but we all forget about it because we focus our statistical predictions on things that have a prior probability (if you want to blame the Bayesians) or things that have larger risks (if you want to blame the frequentists).

So on to this mix of risk and uncertainty and reflections on the past and future, I decided to start reading more about artificial intelligence (AI) risk. For the sake of my insomnia, this was an error. However, now that I’m here I’d like to share the pain. If you want an exceptionally good comprehensive overview of where we’re at, try the Wait But Why post  on the topic, or for something shorter try this. If you’re feeling really lazy, let me summarize:

  1. We are racing like hell to create something smarter than ourselves
  2. We will probably succeed, and sooner than you might think
  3. Once we do that, pretty much by definition whatever we create will start improving itself faster than we can, towards goals that are not the same as ours
  4. The ways this could go horribly wrong are innumerable
  5. Almost no one  appears worried about this

By #5 of course I mean no one on my Facebook feed. Bill Gates, Stephen Hawking and Elon Musk are actually all pretty damn concerned about this. The problem is that something like this is so far outside our current experience that it’s hard for most people to even conceive of it being a risk….but that lack of familiarity doesn’t actually translate in to lack of risk. If you want the full blow by blow I suggest you go back and read the articles I suggested, but here’s a quick story to illustrate why AI is so risky:

You know that old joke where someone starts pouring you water or serving you food and says “say when”, then fails to stop when you say things like “stop” or “enough” or “WHAT ARE YOU DOING YOU MANIAC” because “you didn’t say when!”? Well, I know people who will take that joke pretty far. They will spill water on the table or overflow your plate or whatever they need to sell the joke. However, I have never met a person who will go back to the faucet and get more water just to come back to the table and keep pouring. That’s a line that all humans, no matter how dedicated to their joke, understand. It wouldn’t even occur to most people. When we’re talking about computers though, that line doesn’t exist. Computers keep going. Anyone who’s ever accidentally crashed a program by creating an infinite while loop knows this. The oldest joke in the coding world is “the problem with computers is that they do exactly what you tell them to”. Even the most malicious humans have a fundamental bias towards keeping humanity in existence. Maybe not individuals, but the species as a whole is normally not a target. AI won’t have this bias.

Now, I’m not saying we’re all doomed, but I’m definitely in anxious avenue here:

The fact that almost everyone I know has spent more time thinking about their opinion on Donald Trump’s hair than AI risk doesn’t help. At a bare minimum, this should at least register on the national list of things people talk about, and I don’t even think it’s in the top 1000.

On the minus side, this reading list has made me a little pensive. On the plus side, I’m kind of like that anyway. On the DOUBLE plus side, bringing up AI risk in the middle of any political conversation is an incredibly useful tool for getting people to stop talking to you about opinions you’re sick of hearing.

Anyway, if you’d like to send me some lighter and  happier reading for February, I might appreciate it.

The Perfect Metric Fallacy

“The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can’t be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can’t be measured easily really isn’t important. This is blindness. The fourth step is to say that what can’t be easily measured really doesn’t exist. This is suicide.” – Daniel Yankelovich

“Andy Grove had the answer: For every metric, there should be another ‘paired’ metric that addresses the adverse consequences of the first metric” -Marc Andreessen

“I didn’t feel the ranking system you created adequately captured my feelings about the vendors we’re looking at, so instead I assigned each of them a member of the Breakfast Club. Here, I made a poster.” -me

I have a confession to make: I don’t always like metrics. There. I said it. Now most people wouldn’t hesitate to make a declaration like that, but for someone who spends a good chunk of her professional and leisure time playing around with numbers it’s kind of a sad thing to have to say. Some metrics are totally fine of course, and super useful. On the other hand, there are times when it seems like the numbers subsume the actual goal, and those become front and center. This is bad. In statistics, numbers are a means to an end, not the end. I need a name for this flip flop, so from here on out I’m calling it  “The Perfect Metric Fallacy”.

The Perfect Metric Fallacy: The belief that if one simply finds the most relevant or accurate set of numbers possible, all bias will be removed, all stress will be negated, and the answer to complicated problems will become simple, clear and completely uncontroversial.

As someone who tends to blog about numbers and such, I see this one a lot.  On the one hand, data and numbers are wonderful because they help us identify reality, improve our ability to compare things, spot trends, and overcome our own biases. On the other hand, picking the wrong metric out of convenience or bias and relying too heavily on it can make everything I just named worse plus piss everyone around you off.

Damn.

While I have a decent number of my own stories about this, what frustrates me is how many I hear from others. When I tell people these days that I’m in to stats and data, almost a third of people respond with some sort of horror story about how data or metrics are making their professional lives miserable. When I talk to teachers, this number goes up to 100%.

This really bums me out.

It seems after years of  disconnected individuals going with their guts and kind of screwing everything up, people decided that now we should put numbers on those grand ideas to prove that they were going to work. When these ideas now fail, people either blame the numbers (if you’re the person who made the decision) or the people who like the numbers (if you’re everybody else).  So why do we let this happen? Almost everyone up front knows that numbers are really just there to guide decision making, so why do we get so obsessed with them?

  1. Math class teaches us that if you play with numbers long enough, there will be a right answer There’s a lot of times in life when your numbers have to be perfect. Math class. Your tax return. You know the drill. Endless calculations, significant figures, etc, etc. In statistics, that’s not true. It’s a phenomena known as “false precision“, where you present data in a way that makes it look more accurate than it really can be. My favorite example of this is a clinic I worked with at one point. They reported weight to two significant figures (as in 130.45 lbs), but didn’t have a standard around whether or not people had to take their coat off before they weighed them. In the beginning of the post, I put a blurb about me converting a ranking system in to a Breakfast Club Poster. This came up after I was presented with a 100 point scale to rank 7 vendor against each other in something like 16 categories. When you have 3 days to read through over 1000 pages of documentation and assign scores, your eyes start to blur a little and you start getting a little existential about the whole thing. Are these 16 categories really the right categories? Do they cover everything I’m getting out of this? Do I really feel 5 points better about this vendor than that other one, and are both of them really 10 points better than that 3rd one? Or did I just start increasing the strictness of my rankings as I went along, or did I get nicer as I had to go faster, or what? It wasn’t a bad ranking system, but the problem was me. If I can’t promise I kept consistent in my rankings over 3 days, how can I attest to my numbers at the end?
  2. We want numbers to take the hit for unpleasant truths A few years ago someone sent me a comic strip that I have promptly sent along to nearly everyone who complains to me about bad metrics in the workplace: This almost always gets a laugh, and most people then admit that it’s not the numbers they have a problem with, it’s the way they’re being used. There’s a lot of unpleasant news to deliver in this world, and people love throwing up numbers to absorb the pain. See, I would totally give you a raise or more time to get things done but the numbers say I can’t. When people know you’re doing exactly what you were going to do to begin with, they don’t trust any number you put up. This gets even worse in political situations. So please, for the love of God, if the numbers you run sincerely match your pre-existing expectations, let people look over your methodology, or show where you really tried to prove yourself wrong. Failing to do this gives all numbers a bad rap.
  3. Good Data is Hard to Find One of the reasons statistician continues to be a profession is because good data is really really really hard to find, and good methods for analysis actually require a lot of leg work. Over the course of trying to find a “perfect metric” many people end up believing that part of being “perfect” is being easily obtainable. As my first quote mentions, this is ridiculous. It’s also called the McNamara Fallacy, and it warns us that the easiest things to quantify are not always the most important.
  4. Our social problems are complicated The power of numbers is strong. Unfortunately, the power of some social problems is even stronger. Most of our worst problems are multi faceted, which of course is why they haven’t been solved yet. When I decided to use metrics to address my personal weight problem, I came up with 10 distinct categories to track for one primary outcome measure. That’s 365,000 data points a year, and that’s just for me. Scaling that up is immensely complicated, and introduces all sorts of issues of variability among individuals that don’t exist when you’re looking at just one person. Even if you do luck out and find a perfect metric, in a constantly shifting system there is a good chance that improving that metric will cause a problem somewhere else. Social structures are like Jenga towers, and knocking on piece out of place can have unforeseen consequences. Proceed with caution, and don’t underestimate the value of small successes.

Now again, I do believe metrics are incredibly valuable and used properly can generate good insights. However, in order to prevent your perfect metric from turning in to a numerical bludgeon, you have to keep an eye on what your goal really is. Are you trying to set kids up for success in life or get them to score well on a test? Are you trying to maximize employee productivity or keep employees over the long term? Are you looking for a number or a fall guy? Can you know what you’re looking to find out with any sort of accuracy? Things to ponder.

 

Fun With Funnel Plots

During my recent series on “Why Most Published Research Findings Are False“, we talked a lot about bias and how it effects research. One of the classic ways of overcoming bias in research is either to 1) do a very large well publicized study that definitively addresses the question or 2) pull together all of the smaller studies that have been done and analyze their collective results. Option #2 is what is referred to as a meta-analysis, because we are basically analyzing a whole bunch of analyses.

Now those of you who are paying attention may wonder how effective that whole meta-analysis thing is. If there’s some sort of bias in either what gets published or all of the studies being done, wouldn’t a study of the studies show the same bias?

Well, yeah, it most certainly would. That’s why there’s a kind of cool visual tool available to people conducting these studies to take a quick look at the potential for bias. It’s called a funnel plot, and it looks exactly as you would expect it to:

Basically you take every study you can find about a topic, and you map the effect size noted on the x-axis, and the size of the study on the y-axis.  With random variation, the studies should look like a funnel: studies with small numbers of people/data points will vary a lot more than larger studies, and both will converge on the true effect size. This technique has been used since the 80s, but was popularized by the excitingly titled paper “Bias in Meta-Analysis Can Be Detected by a Simple Graphical Test”.  This paper pointed out that if you gather all studies together and don’t get a funnel shape, you may be looking at some bias. This bias doesn’t have to be on the part of the researchers by the way….publication bias would cause part of the funnel to go missing as well.

The principle behind all this is pretty simple: if what we’re looking at is a true effect size, our experiments will swing a bit around the middle. To use the coin toss analogy, a fairly weighted coin tossed 10 times will sometimes come up 3 heads, 7 tails or vice versa, but if you toss it 100 times it will probably be much closer to 50-50. The increased sample size increases the accuracy, but everything should be centered around the same number….the “true” effect size.

To give an interesting real life example, take the gender wage gap. Now most people know (or should know) that the commonly quoted “women earn 77 cents on the dollar” stat is misleading. The best discussion of this I’ve seen is Megan McArdle’s article here, and in it an interesting fact emerges: even controlling for everything possible, no study has found that women outearn men.  Even the American Enterprise Institute and the Manhattan Institute both put the gap at 94 to 97 cents on the dollar for women. At one point in the AEI article, they opine that such a small gap “may not be significant at all”, but that’s not entirely true. The fact that no one seems to find a small gap going the other direction actually suggests the gap may be real. In other words, if the true gap was zero, at least half of the studies should show women out earning men. If the mid-line is zero, we only have half the funnel. Now this doesn’t tell us what the right number is or why it’s there, but it is a pretty good indication that the gap is something other than zero. Please note: The McArdle article is from 2014, so if there’s new data that shows women out earn men in a study that controls for hours worked and education level, send it my way.

Anyway, the funnel plot is not without it’s problems. Unfortunately there’s not a lot of standards around how to use it, and changing the scale of the axis can make it look more or less convincing than it really should be. Additionally, if the number of studies is small, it is not as accurate. Finally, it should be noted that missing part of the funnel is not definitive proof that publication or other bias exists. It could be that those compiling the meta-analysis had a hard time finding all the studies done, or even that the effect size varies based on methodology.

Even with those problems, it’s an interesting tool to at least be aware of, as it is fairly frequently used and is not terribly hard to understand once you know what it is. You’re welcome.

How To Read a Headline: Are Female Physicians Better?

Over the years I’ve spilled a lot of (metaphorical) ink on how to read science on the internet. At this point almost everyone who encounters me frequently IRL has heard my spiel, and few things give me greater pleasure than hearing someone say “you changed the way I read about science”. While I’ve written quite a fewer longer pieces on the topic, recently I’ve been thinking a lot about what my “quick hits” list would be. If people could only change a few things in the way they read science stories,  what would I put on the list?

Recently, a story hit the news about how you might live longer if your doctor is a woman and it got me thinking. As someone who has worked in hospitals for over a decade now, I had a strong reaction to this headline. I have to admit, my mind started whirring ahead of my reading, but I took the chance to observe what questions I ask myself when I need to pump the brakes. Here they are:

  1. What would you think if the study had said the opposite of what it says? As I admitted up front, when I first heard this study, I reacted. Before I’d even made it to the text of the article I had theories forming. The first thing I did to slow myself down was to think “wait, how would you react if the headline said the opposite? What if the study found that patients of men did better?” When I ran through those thoughts, I realized they were basically the same theories. Well, not the same…more like mirror image, but they led to the same conclusion. That’s when I realized I wasn’t thinking through the study and it’s implications, I was trying to make the study fit what I already believed. I admit this because I used this knowledge to mentally hang a big “PROCEDE WITH CAUTION” sign on the whole topic. To note, it doesn’t matter what my opinion was here, what matters is that it was strong enough to muddy my thoughts.
  2. Is the study linked to? My first reaction (see #1) kicked in before I had even finished the headline, so unfortunately “is this real” comes second. In my defense, I was already seeing the headlines on NPR and such, but of course that doesn’t always mean there’s a real study. Anyway, in this case of this study, there is a real identified study (with a link!) in JAMA.  As a note, even if the study is real, I distrust any news coverage that doesn’t provide a link to the source. In 2017, that’s completely inexcusable.
  3. Do all the words in the headline mean what you think they mean? Okay, I’ve covered headlines here, but it bears repeating: headlines are a marketing tool. This study appeared under several headlines such as “You Might Live Longer if Your Doctor is a Woman“. What’s important to note here is that by “live longer” they meant “slightly lower 30 day mortality after discharge from the hospital”, by doctor they meant “hospitalist”, and by “you” they meant “people over 65 who have been hospitalized”. Primary care doctors and specialists were not covered by this study.
  4. What’s the sample size and effect size? Okay, once we have the definitions out of the way, now we can start with the numbers. For this study, the sample size was fantastic….about 1.5 million hospital admissions. The effect size however….not so much. For patients treated by female physicians vs male, the 30 day mortality dropped from 11.49% to 11.07%. That’s not nothing (about a 5% drop), but mathematically speaking it’s really hard to reliably measure effect sizes of under 5% (Corollary #2)  even when you have a huge sample size. To their credit, the study authors do include the “number to treat”, and note that you’d have to have 233 patients treated by female physicians over male physicians in order to save one life. That’s a better stat than the one this article tried to use “Put another way – if you were to replace all the male doctors in the study with women, 32,000 fewer people would die a year.” I am going to bet that wouldn’t actually work out that way. Throw “of equal quality” in there next time, okay?
  5. Is this finding the first of it’s kind? As I covered recently in my series on “Why Most Published Research Findings Are False“, first of their kind exploratory studies are some of the least reliable types of research we have. Even when they have good sample sizes, they should be taken with a massive grain of salt. As a reference, Ioannidis puts the chances that a positive finding is true for a study like this at around 20%. Even if subsequent research proves the hypothesis, it’s likely that the effect size will diminish considerably in subsequent research. For a study that starts off with a 5% effect size, that could be a pretty big hit. It’s not bad to continue researching the question, but drawing conclusions or changing practice over one paper is a dangerous game, especially when the study was observational.

So after all this, do I believe this study? Well, maybe. It’s not implausible that personal characteristics of doctors can effect patient care. It’s also very likely that the more data we have, the more we’ll find associations like this. However, it’s important to remember that proving causality is a long and arduous process, and that reacting to new findings with “well it’s probably more complicated than that” is an answer that’s not often wrong.

New Year’s Resolution: Stats/Data Book List 2017

Last year I started a bit of a tradition with my “to read” list of numbers/stats/math books for 2016. This year I wanted to continue the tradition by putting together a list of books I’m reading in 2017.

January: The Numerati 
This promises to be an interesting and potentially frightening look at how people are collecting your data and using it to attempt to predict behavior.

February: The Mathematics of Love
I mean, how better to say “I love you” on Valentine’s Day than to learn all the mathematical models and patterns that are being used to predict/describe romantic behavior?

March: The Seven Pillars of Statistical Wisdom
Like “The Lady Tasting Tea” last year, this promises to be an interesting history of statistical thought.

April: Good Charts
From the Harvard Business School, a simple guide to good charts.

May: Bad Pharma
I loved Ben Goldacre’s “Bad Medicine”, and I’m hoping “Bad Pharma” is just as good.

June: A Numerate Life
I’ve enjoyed John Allen Paulos and his books, so I’m interested to read his autobiography…hopefully on the beach.

July: Thinking in Numbers
From the guy who wrote “Born on a Blue Day”, an autistic savant describes how he sees the world in numbers.

August: Winning With Data  
Another “big data” book, this one looks to teach people how to take advantage of the information that’s out there.

September: The Theory that Would Not Die
A history of Bayes Theorem, and why it won’t go away

October: Naked Statistics
I’ve heard this book recommended a few times, so I figured I’d give it a whirl.

November: Best Mathematics Writing 2016
This book isn’t even out yet, but once it is I’m all over it.

December: The Truthful Art
Another data viz book that looks promising.

Of course if I missed anything good, let me know!

Data Driven Weight Loss: A Tale of Scales and Spreadsheets

In honor of the New Year and New Year’s Resolutions and such, I’m trying out a different type of post today.  This post isn’t  about statistical theory or stats in the news, but actually about how I personally use data in my daily life. If you’re not particularly interested in messy data, personal data, or weight loss, I’d skip this one.

Ah, it’s that time of year again! The first day of 2017, a time of new beginnings and resolutions we might give up by February. Huzzah!

I don’t mean to be snarky about New Year’s Resolutions. I’m actually a big fan of them, and tend to make them myself. I’m still trying to figure out some for 2017, but in the meantime I wanted to comment on the most common New Year’s Resolution for most people: health, fitness, and weight loss.  If Mr Nielsen there is to be believed, a third of Americans resolve to lose weight every year, and another third want to focus on staying fit and healthy. It’s a great goal, and one that is unfortunately challenging for many people. It seems there are a million systems to advise people on nutrition, exercise plans and other such things, and I am not about to add anything to that mix. What I do have however is my own little homegrown data project I’ve been tinkering with for the last 9 months or so. This is the daily system I use to help me work on my own health and fitness by using my own data to identify challenges and drive improvements. While it certainly isn’t everyone’s cup of tea, I’d gotten a few questions about it IRL, so I thought I’d put it up for anyone who was interested in either losing weight or just seeing the process.

First, some personal background: Almost exactly 2 years ago (literally: December 31st, 2014), I decided to start meeting with a nutritionist to help me figure out my diet and lose some weight. Like lots of people who have a lot on their plate (pun intended), I had a ridiculous amount of trouble keeping my weight in a healthy range. The nutritionist helped quite a bit and I made some good progress (and lost half the weight I wanted to!), but I realized at some point I would have to learn how to manage things on my own.  Having an actual person track what you are doing and hold you accountable is great and was working well, but I wanted something I could keep up without having to make an appointment.

Now, the math background: Around the same time I was pondering my weight loss/nutritionist dilemma  I got asked to give a talk at a conference on the topic “What Gets Measured Gets Managed”. One of the conference organizers had worked with me a few years earlier and said “I know you were always finding ways of pulling data to fix interesting problems, do you have anything recent you’d like to present?” Now this got me thinking about my weight. How was it that I could always find a data driven way to address a work problem, but couldn’t quite pull it together for something important to me in my personal life? I had tried calorie counting in the past, and I had always gotten frustrated with the time it took and the difficulty in obtaining precise measurements, but what if I could come up with some simpler alternative metrics?  With my nutritionists blessing (she had a remarkable tolerance for my love of stats), I decided to work on it.

The General Idea: Since calories were out, I decided to  play around with the idea of giving myself a general “score” for a day. If I could someone capture a broad range of behaviors that contributed to weight gain and the frequency in which I engaged in them, I figured I could figure out exactly what my trouble spots were, troubleshoot more effectively and make sure I stayed on track.  At the end of each week I’d add up my weekly score and weigh myself. If I lost weight, no problem. If I gained weight, I’d tweak things.

The Categories: The first step was to come up with categories that covered every possible part of my day or decision I felt contributed noticeably to my weight. I aimed for 10 because base 10 rules our lives. My categories fell in to four types:

  1. Meals and snacks I eat 3 meals and 2 snacks each day, so each got their own category.
  2. Treat foods Foods I need to watch: sweets/desserts, chips, and alcohol each got their own category
  3. Health specific issues I have celiac disease and have to avoid gluten. Since eating gluten seems to make me either ridiculously sick or ravenously hungry, I gave it a category so I could note if I thought I got exposed
  4. Healthy behaviors I ultimately only track exercise here, but I have considered adding sleep or other non-food behaviors too.

The Scores:  Each score ranges from 0 to 5, with zero meaning “perfect, wouldn’t change a thing” and five meaning “gosh, that was terribly ill-advised”.  Between those two extremes, I came up with a slightly different scoring system for each category.

  1. Meals and snacks Basically how full I feel after I eat.  I lay out a reasonable serving or meal beforehand, and then index the score from there. If I take an extra bite or two because the food just tastes good, I give myself a 1. If I was totally stuffed, it’s a 5. Occasionally I’ve even changed my ranking after the fact when I get to the next meal and discover I’m not hungry.
  2. Treat foods One serving = 1 point, 2 servings = 3 points, more than that is 4 or 5. The key here is serving. Eating a bunch of tortilla chips before a meal at a mexican restaurant is almost never one serving, and a margarita at the same restaurant is probably both alcohol and sugar. It helps to research serving sizes for junk food before attempting this one.
  3. Health specific issues For gluten, if I think I got a minor exposure, it’s a 1. The larger the exposure I got, the higher the ranking I give it. The day I got served a hamburger with what was supposed to be a gluten free only to discover it wasn’t? Yeah, that’s a 5.
  4. Exercise I generally map out my workouts for the week, then my score is based on how much I complete. A zero means I did the whole thing, a 5 means I totally skipped it. I like this because it incentivizes me to start a workout, even if I don’t finish it.

With 10 categories ranked 0 to 5, my daily score would be somewhere between 0 “perfect wouldn’t change a thing” and 50 or “trying to re-enact the gluttony scene from Seven“.  To start, I figured I’d want to be below a score of 5 per day or 35 per week. Since I am not built for suffering, that seemed manageable.

Obviously all of those scores are a bit of a judgment call. If I lose track of what I ate or feel unhappy with it, I give myself a 5. I try not to over think these rankings to much, and just go with my gut. That mexican meal with the chips and margarita for example was a 5 for the chips, 3 for feeling full after dinner, 1 for the alcohol and 1 for the sugary margarita. Is that 100% accurate? Doubtful, but does a score of 10 seem about right for that meal? Sure. Will my scale be lower the day after a meal like that? No. A score of 10 works. With the categories and the score, my weekly spreadsheets end up looking like this:

trackerpic

How I use this data: Okay, so remember how I started this with “what gets measured gets managed”? I use this data to find weak spots, figure out where I’m having the most trouble, and to come up with solutions. For example, every month I add my scores up and figure out which category is my worst one. When I first started, I realized that I actually skipped a lot of workouts. When I looked at the data, I noticed that I would have one good week of working out followed by one bad week. When I thought about it, I realized I was trying to complete really intense workouts, and that I was basically burning myself out and needing to take a week off to regroup. When I decided to actually decrease the intensity of my workout plan, I stopped skipping days. Since the workout you actually do tends to be better than the one you only aspire to do, this was a win. Another trend I noted was that I frequently overate at dinner. This was solved by packing a bigger lunch. There’s a few other realizations like this, and they all had pretty simple fixes. For January I’m working on reducing the number of chips I eat, because damn can I not eat chips in moderation.

The results: So has this worked? Yes! Since April, I’ve lost almost 5 points off my BMI, which takes me from the obese/overweight line to the healthy/overweight line. Here’s my 7 day moving average (last 7 days averaged together) score plotted against a once a week weigh in. The red line is my goal of 5:

healthtrackerpic

Note: There are some serious jumps on this chart, mostly because I can retain a crazy amount of water if I eat too much sodium.

At the moment I’ve decided to give myself a month off from weigh-ins since the holidays can be so crazy, but on that last weigh in I was only 3 lbs away from being in the normal BMI range. As I mentioned, I’m not built for suffering. Slow weight loss is fine with me.

It’s interesting to note that I actually don’t make my goal of 5 or less per day all the time. Over the 274 days I’ve been tracking, I only was at 5 or under about 70% of the time. I still lost weight. I’ve thought about raising my limit and trying to stay under it all the time, but as long as this is still working I’m going to stick with it.

General thoughts: Much of the philosophy behind how I pulled this data actually comes from the quality improvement “good enough” world, as opposed to the hard research “statistical significance” world. The weigh in data is always there to test my hypotheses. If my scoring system said I was fine but my weight was going up, I would change it. I’m sure that I have not accurately categorized every day I’ve had since April, but as long as my daily scores are close enough to reality, it works. It’s the general trend of healthy behaviors that matters, not any individual day. The most important information I’ve gotten out of this process is what small tweaks I can make to help myself be more healthy. Troubleshooting the life I actually have a getting specific feedback about which areas I have problems with has been immensely helpful. Too often health and diet advice advises us to impose Draconian limits on ourselves that set us up for failure. By tracking specific behaviors and tweaks over the course of months, it’s a lot easier to figure out the high impact changes we can make.

If I had any advice for anyone wanting to try a similar system, it would be to really customize the categories you track and to think through a ranking system that makes sense to you. Once I invented my system, I actually only have to spend about 45 seconds a day ranking myself. I only change things if I see the weight creeping up or if some piece seems to not be working. At this point I review the categories and scores monthly to see if any new patterns are emerging. In the quality improvement world, we call this a PDSA cycle: Plan, Do, Study, Act. Plan what you want to do, do what you said you would do, study what you did, act on the new knowledge. By having data on individual aspects of my daily life, this process became more manageable.

Happy tracking!

GPD Most Popular Posts of 2016

Well hello hello and happy (almost) New Year! As 2016 winds down and I recover from my 3 straight days of Christmas parties, I thought I’d take a look at my stats and do a little recap of the most popular posts on this blog for 2016.  To be clear, these are posts that were written during the year 2016 and the traffic is for 2016. I lumped some series together because otherwise it got a little too confusing, so technically a few 2015 posts snuck in here. I hope you’ll forgive me. And links are included, in case you want to catch up on something you missed.

  1. How Do They Call Elections So Early? A simple question sent to me by a family member on Facebook quickly became my most popular post of the year. Got posted by someone on Twitter with the caveat “Trigger Warning: Math”, which is definitely my favorite intro ever.
  2. Immigration, Poverty, and Gumballs Another response to a  reader question “hey what do you think of this video?” that turned in to a meta-rant about mathematical demonstrations that distract from the real issues being addressed. This post briefly ended up on Roy Beck’s Wikipedia page as “notable criticism” of his video, but he still never returned my email. Bummer.
  3. Intro to Internet Science (series) While technically this series started in 2015, the bulk of it (Parts 3-10) were posted in 2016 so I’m letting it make the list. This got a boost from a few teachers assigning it to their classes, which was pretty awesome.
  4. 6 Examples of Correlation/Causation Confusion Read the post that got the comment “a perfect example of what makes this blog so great”. Sure, it was my brother who said that, but I still take it as a win.
  5. 5 Example of Bimodal Distributions (None of Which Are Human Height) Pretty sure this one is just getting cribbed for homework assignments.
  6. 5 Things You Should Know About the Great Flossing Debate of 2016 This post got me my favorite compliment of the year, when my own dentist told me I “did a pretty good job” with this. I get weirdly overexcited about praise from my dentist.
  7. Pop Science (series) Probably my favorite thing I (co) wrote in 2016, the Pop Science series with Ben was REALLY fun.
  8. 5 Things You Should Know About Medical Errors and Mortality Because a good cause o’ death discussion always brings the clicks.
  9. 5 Studies About Politics and Bias to Get You Through Election Season I am told this was fairly helpful.
  10. More Sex, More Models, More Housework I think it’s my witty writing style that makes this one so popular.

While they didn’t get the same type of traffic, I have to say I enjoyed writing/inventing the Tim Tebow Fallacy and the Forrest Gump Fallacy, and working through my feelings in Accidental Polymath Problems.

Of course I would love to hear your favorites/comments/suggestions for 2017, and if you have any fun reader questions, feel free to send them my way as well.

Happy new year everyone, and thanks for reading!

So Why AREN’T Most Published Research Findings False? The Rebuttal

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here ,Part 2  here,  Part 3 here, Part 4 here, and Part 5 here.

Okay people, we made it! All the way through one of the most cited research papers of all time, and we’ve all lost our faith in everything in the process. So what do we do now? Well, let’s turn the lens around on Ioannidis. What, if anything, did he miss and how do we digest this paper? I poked around for a few critiques of him, just to give a flavor. This is obviously not a comprehensive list, but it hits the major criticisms I could find.

The Title While quite a few people had no problem with the contents of Ioannidis’s paper, some took real umbrage with the title, essentially accusing it of being clickbait before clickbait had really gotten going. Additionally, since many people never read anything more than the title of a paper, a title that blunt is easily used as a mallet by anyone trying to disprove any study they chose. Interestingly, there’s apparently some question regarding whether or not Ioannidis actually wrote the title or if it was the editors at Plos Medicine, but the point stands. Given that misleading headlines and reporting are hugely blamed by many (including yours truly) for popular misunderstanding of science, that would be a fascinating irony.

Failing to reject the null hypothesis does not mean accepting the null hypothesis This is not so much a criticism of Ioannidis as it is of those who use his work to promote their own causes. There is a rather strange line of thought out there that seems to believe that life, or science, is a courtroom. Under this way of thinking, when you undermine a scientist and their hypothesis, your client is de facto not guilty. This in not true. If you somehow prove that chemotherapy is less effective than previously stated, that doesn’t actually mean that crystals cure cancer. You never prove the null hypothesis, you only fail to reject it.

The definition of bias contained more nuance In a paper written in response to the Ioannidis paper, some researchers from Johns Hopkins took umbrage with the presentation of “bias” in the paper. Their primary grouse seemed to be intent vs consequence. Ioannidis presents bias as a factor based on consequence, i.e. the way it skews the final results. They disliked this and believed bias should be based on intent, pointing out numerous ways in which things Ioannidis calls “bias” could creep in innocently. For example, if you are looking for a drug that reduces cardiac symptoms but you also find that mortality goes down for patients who take the medication, are you really not going to report that because it’s not what you were originally looking for? By the strictest definition this is “data dredging”, but is it really? Humans aren’t robots. They’re going to report interesting findings where they see them.

The effect of multiple teams This is one of the more interesting quibbles with the initial paper. Mathematically, Ioannidis proved that having multiple teams working on the same research question would increase the chances of a false result. In the same Hopkins paper, the researchers question the math behind the “multiple teams lead to more false positives” assertion. They mention that for any one study, the odds stay the same as they always have been. Ioannidis counters with an argument that boils down to “yes, if you assume those in competition don’t get more biased”.  Interestingly, later research has shown that this effect does exist and is much worse in fields where the R factor (pre-study odds) is low.

So overall, what would I say are the major criticisms or cautions around this paper that I personally will employ?

  1. If you’re citing science, use scientific terms precisely. Don’t get sloppy with verbage just to make your life easier.
  2. Remember, scientific best practices all feed off each other Getting a good sample size and promoting caution can reduce both overall bias and the effect of bias that does exist. The effect of multiple team testing can be partially negated by high pre-study odds. If a team or researcher employs most best practices but misses one, that may not be a death blow to their research. Look at the whole picture before dismissing the research.
  3. New is exciting, but not always reliable We all like new and quirky findings, but we need to let that go. New findings are the least likely to play out later, and that’s okay. We want to cast a broad net, but for real progress we need a longer attention span.
  4. Bias takes many forms When we mention “bias” we often jump right to financial motivations. But intellectual and social pressure can be bias, competing for tenure can cause bias, and confirming ones own findings can cause bias.
  5. There are more ways of being wrong than there are ways of being right Every researcher wants a true finding. They really do. No one wants their life’s work undone. While some researchers may be motivated by results they like, I do truly believe that the majority of problems are caused by the whole “needle in a haystack” thing more than the “convenient truth” thing.

Alright, that wraps us up! I enjoyed this series, and may do more going forward. If you see a paper that piques your interest, let me know and I’ll look in to it.  Happy holidays everyone!

 

 

In Defense of Fake News

Fake news is all the rage these days. We’re supposed to hate it, to loathe it, to want it forever banned from our Facebook feeds, and possibly give it credit for/blame it for the election results. Now, given my chosen blog topics and my incessant preaching of internet skepticism, you would think I would be all in on hating fake news.

Nah,  too easy.

Instead I’m going to give you 5 reasons why I think the hate for fake news is overblown. Ready? Here we go!

  1. Fake news doesn’t have a real definition Okay, yeah I know. Fake news is clear. Fake news is saying that Hillary Clinton ran a child prostitution ring out of a DC pizza place. That’s pretty darn clear, right? Well, is it? The problem is that while there are a few stories that are clearly “fake news”, other things aren’t so clear. One mans “fake news” is another mans “clear satire”, and one woman’s “fake news” is another’s “blind item”.  Much of the controversy around fake news seems to center around the intent of the story (ie to deceive or make a particular candidate look bad), but that intention is quite often a little opaque. No matter what standard you set, someone will find a way to muddy the water.
  2. Fake news is just one gullible journalist away from being a “hoax” Jumping off point #1, let’s remember that even if Facebook bans “fake news” you still are going to be seeing fake news in your news feed. Why? Because sometimes journalists buy it. See if you or I believe a fake story, we “fell for fake news”. If a journalist with an established audience does it, it’s a “hoax”. Remember Jackie from Rolling Stone? Dan Rather and the Killian documents? Or Yasmin Seweid from just last month? All were examples of established journalists getting duped by liars and reporting those lies as news. You don’t always even need a person to spearhead the whole thing. For example, not too long ago a research study made headlines because it claimed eating ice cream for breakfast made you smarter. Now skeptical readers (ie all of you) will doubt that finding was founded, but you’d be reasonable in assuming the study at least existed. Unfortunately your faith would be unfounded, as Business Insider pointed out that no one reporting on this had ever seen the study.  Every article pointed to an article in the Telegraph which pointed to a website that claimed the study had been done, but the real study was not locatable. It may still be out there somewhere, but it ludicrously irresponsable of so many news outlets to publish it without even making sure it existed.
  3. Fake news can sometimes be real news In point #1, I mentioned that it was hard to actually put a real definition on “fake news”. If one had to try however, you’d probably say something like “a malaciously false story by a disreputable website or news group that attempts to discredit someone they don’t like”. That’s not a bad definition, but it is how nearly every politician initially categorizes every bad story about themselves. Take John Edwards for example, whose affair was exposed by the National Inquirer in 2007. At the time, his attorney said “”The innuendos and lies that have appeared on the internet (sic) and in the National Enquirer concerning John Edwards are not true, completely unfounded and ridiculous.” It was fake news, until it wasn’t. Figuring out what’s fake and who’s really hiding something isn’t always as easy at it looks.
  4. Fake news probably doesn’t change minds Now fake news obviously can be a huge problem. Libel is against the law for a reason, and no one should knowingly make a false claim about someone else. It hurts not just the target, but can hurt innocent bystanders as well.  But aside from that, people get concerned that these stories are turning people against those they would otherwise be voting for. Interestingly, there’s not a lot of good evidence for this. While the research is still new, the initial results suggest that people who believe bad things about a political candidate probably already believed those things, and that seeing the other side actually makes them more adament about what they already believed.  In other words, fake news is more a reflection of pre-existing beliefs than a creater of those beliefs.
  5. Fake news might make us all more cautious There’s an interesting paradox in life that sometime by making things safer you actually make them more dangerous. The classic example is roads: the “safer” and more segregated (transportation mode-wise) roads are, the more people get in to accidents. In areas where there is less structure, there are fewer accidents. Sometimes a little paranoia can go a long way. I think a similar effect could be caused by fake news. The more we suspect someone might be lying to us, the more we’ll scrutinize what we see. If Facebook starts promising that they’ve “screened” fake news out, it gives everyone an excuse to stop approaching the news with skepticism. That’s a bad move. While I reiterate that I never support libel, lying or “hoaxes”, I do support constant vigilance against “credible” news sources. With the glut of media we are exposed to, this is a must.

To repeat for the third time, I don’t actually support fake news.  Mostly this was just an exercise in contrarianism. But sometimes bad things can have upsides, and sometimes paranoia is just good sense.

So Why ARE Most Published Research Findings False? A Way Forward

Welcome to “So Why ARE Most Published Research Findings False?”, a step by step walk through of the John Ioannidis paper “Why Most Published Research Findings Are False”. It probably makes more sense if you read this in order, so check out the intro here , Part 1  here ,Part 2  here,  Part 3 here, and Part 4 here.

Alright guys, we made it! After all sorts of math and bad news, we’re finally at the end. While the situation Ioannidis has laid out up until now sounds pretty bleak, he doesn’t let us end there. No, in this section “How Can We Improve the Situation” he ends with both hope and suggestions.  Thank goodness.

Ioannidis starts off with the acknowledgement that we will never really know which research findings are true and which are false. If we had a perfect test, we wouldn’t be in this mess to begin with. Therefore, anything we do to improve the research situation will be guessing at best. However, there are things that it seems would likely do some good. Essentially they are to improve the values of each of the “forgotten” variables in the equation that determines the positive predictive value of findings. These are:

  1. Beta/study power: Use larger studies or meta-analyses aimed at testing broad hypotheses
  2. n/multiple teams: Consider a totality of evidence or work done before concluding any one finding is true
  3. u/Bias: Register your study ahead of time, or work with other teams to register your data to reduce bias
  4. R/Pre-study Odds: Determine the pre-study odds prior to your experiment, and publish your assessment with your results

If you’ve been following along so far, none of those suggestions should be surprising to you. Let’s dive in to each though:

First, we should be using larger studies or meta-analyses that aggregate smaller studies. As we saw earlier, large sample size = higher study power -> blunts the impact of bias.  That’s a good thing. This isn’t fool proof though, as bias can still slip through and a large sample size means very tiny effect sizes can be ruled “statistically significant”. These studies are also hard to do because they are so resource intensive. Ioannidis suggests that large studies be reserved for large questions, though without a lot of guidance on how to do that.

Second, the totality of the evidence. We’ve covered a lot about false positives here, and Ioannidis of course reiterates that we should always keep them in mind. One striking finding should almost never be considered definitive, but rather compared to other similar research.

Third, steps must be taken to reduce bias. We talked about this a lot with the corollaries, but Ioannidis advocates hard that groups should tell someone else up front what they’re trying to do. This would (hopefully) reduce the tendency to say “hey, we didn’t find an effect for just the color red, but if you include pink and orange as a type of red, there’s an effect!”. Trial pre-registration gets a lot of attention in the medical world, but may not be feasible in other fields. At the very least, Ioannidis suggests that research teams share their strategy with each other up front, as a sort of “insta peer review” type thing. This would allow researchers some leeway to report interesting findings they weren’t expecting (ie “red wasn’t a factor, but good golly look at green!”) while reducing the aforementioned “well if you tweak the definition of red a bit, you totally get a significant result”.

Finally, the pre-study odds. This would be a moment up front for researchers to really assess how likely they are to find anything, and a number for others to use later to judge the research team by. Almost every field has a professional conference, and one would imagine determining pre-study odds for different lines of inquiry would be an interesting topic for many of them. Encouraging researchers to think up front about their odds of finding something interesting would be an interesting framing for everything yet to come.

None of this would fix everything, but it would certainly inject some humility and context in to the process from the get go. Science in general is supposed to be a way of objectively viewing the world and describing what you find. Turning that lens inward should be something researchers welcome, though obviously that is not always the case.

In that vein, next week I’ll be rounding up some criticisms of this paper along with my wrap up to make sure you hear the other side. Stay tuned!