On Wansink, the Joy of Cooking, and Multiple Comparisons

I’m down with a terrible cold this week, so I figured I’d just do a short update on the Buzzfeed article everyone’s sending me about the latest on Brian Wansink . The article does a pretty good job of recapping the situation up until now, so feel free to dive on in to the drama.

The reason this update is particularly juicy is because somehow Buzzfeed got a hold of a whole bunch of emails from within the lab, and it turns out a lot of the chicanery was a feature not a bug. The whole thing is so bad that even the Joy of Cooking went after Wansink today on Twitter, and DAMN is that a cookbook that can hold a grudge. Posting the whole thread because it’s not every day you see a cookbook publisher get in to it about research methodology:

Now normally I would think this was a pretty innocuous research methods dispute, but given Wansink’s current situation, it’s hard not to wonder if the cookbook has a point. Given what we now know about Wansink, the idea that he was chasing headlines seems a pretty reasonable charge.

However, in the (rightful) rush to condemn Wansink, I do want to make sure we don’t get too crazy here. For example, the Joy of Cooking complains that Wansink only picked out 18 recipes to look at out of 275. In and of itself, that is NOT a problem. Sampling from a larger group is how almost all research is done. The problem only arises if those samples aren’t at least somewhat random, or if they’re otherwise cherry picked. If he really did use recipes with no serving sizes to prove that “serving sizes have increased” that’s pretty terrible.

Andrew Gelman makes a similar point about one of the claims in the Buzzfeed article. Lee (the author) stated that “Ideally, statisticians say, researchers should set out to prove a specific hypothesis before a study begins.” While Gelman praises Lee for following the story and says Wansink’s work is “….the most extraordinary collection of mishaps, of confusion, that has ever been gathered in the scientific literature – with the possible exception of when Richard Tol wrote alone.” he also gently cautions that we shouldn’t go to far. The problem, he says, is not that Wansink didn’t start out with a specific hypothesis or that he ran 400 comparisons, it’s that he didn’t include that part in the paper.

I completely agree with this, and it’s a point everyone should remember.

For example, when I wrote my thesis paper, I did a lot of exploratory data analysis. I had 24 variables, and I compared all of them to obesity rates and/or food insecurity status. I didn’t have a specific hypothesis about which one would be significant, I just ran all the comparisons. When I put the paper together though, I included every comparison in the Appendix, clarified the number I did, and then focused on discussing the ones whose p values were particularly low. My cutoff was .05, but I used the Bonferri correction method to figure out which ones to talk about. That method is pretty simple….if you do 20 comparisons and want an alpha of less than .05, you divide .05 by 20 = .0025. I still got significant results, and I had the bonus of giving everyone all the information. If anyone ever wanted to replicate any part of what I did, or compare a different study to mine, they could do so.

Gelman goes on to point out that in many cases there really isn’t one “right” way of doing stats, so the best we can do is be transparent. He sums up his stance like this: “Ideally, statisticians say, researchers should report all their comparisons of interest, as well as as much of their raw data as possible, rather than set out to prove a specific hypothesis before a study begins.”

This strikes me as a very good lesson. Working with uncertainty is hard and slow going, but we have to make due with what we have. Otherwise we’ll be throwing out every study that doesn’t live up to some sort of hyper-perfect ideal, which will make it very hard to do any science at all. Questioning is great, but believing nothing is not the answer. That’s a lesson we all could benefit from. Well, that and “don’t piss off a cookbook with a long memory.” That burn’s gonna leave a mark.

 

Biology/Definitions Throwback: 1946

It’s maple syrup season up here in New England, which means I spent most of yesterday in my brother’s sugar shack watching sap boil and teaching my son things like how to carry logs, stack wood, and sample the syrup. Maple syrup making is a fun process, mostly because there’s enough to do to make it interesting, but not so much to do that you can’t be social while doing it.

During the course of the day, my brother mentioned that he had found an old stack of my grandfather’s textbooks, published in the 1940s:

Since he’s a biology teacher, he was particularly interested in that last one. When he realized there was a chapter on “heredity and eugenics” he of course had to start there. There were a few interesting highlights of this section. For example, like most people in 1946, the authors were pretty convinced that proteins were responsible for heredity. This wasn’t overly odd, since even the guy who discovered DNA thought proteins were the real workhorses of inheritance. Still, it was interesting to read such a wrong explanation for something we’ve been taught our whole lives.

Another interesting part was where they reminded readers that despite the focus on the father’s role in heredity, that there was scientific consensus that children also inherited traits from their mother. Thanks for the reassurance.

Then there was their descriptions of mental illness. It was interesting that some disorders (manic depressive, schizophrenia) were clearly being recognized and diagnosed in a way that was at least recognizable today, while others were not mentioned at all (autism). Then there were entire categories we’ve done away with, such as feeblemindedness, along with the “technical” definitions for terms like idiot and moron:

I have no idea how commonly those were used in real life, but it was an odd paragraph to read.

Of course this is the sort of thing that tends to make me reflective. What are we convinced of now that will look sloppy and crude in the year 2088?

Personalized Nutrition: Blood Sugar Testing Round 1

Quite some time ago I did a blog post about some research out of Israel on personalized glucose responses.  The study helped people test their own unique glucose response to various types of foods, then created personalized diet plans based on those foods. They found that some foods that were normally considered “healthy” caused extreme responses in some people, while some foods often deemed “unhealthy” were fine. I had mentioned that I wanted to try a similar experiment on myself, and yet I never did…..until now.

Recently the authors of the study came out with a book called “The Personalized Diet” which goes over their research and how someone could apply it to their life. While they recommend testing various meals, I decided to start testing specific carbohydrates sources from fruit, grains or starches to see what things looked like. I adhered to a few rules to keep things fair:

  1. I tested all of these at midday, noon or 2pm. Glucose response can vary over the course of the day, so I decided to try midday on weekends. The tests below happened over a series of weeks, but all at the same time of day.
  2. Each serving was 50g of carbohydrates. I’ve seen people do this with “effective carbs” (carbs-fiber) or portion size, but I decided to do this with just plain old carbs.
  3. I ate all the food in 15 minutes. The times below represent “time from first bite” but I tried to make sure I had finished off the portion at around the 15 minute mark.
  4. I tried not to do much for the first 60 minutes, just to make sure the readings wouldn’t be changed by exercise/movement.
  5. I’ve only tried each food item once. Technically I should do everything twice to confirm, but I haven’t had time yet.

Now some results! I don’t have a ton of data yet, but I thought my haul so far was pretty interesting. Note: my fasting glucose is running a bit high, which is why I’m interested in this experiment to begin with. First, my fruit experiments:

Wow, cherries, what did I ever do to you? I was interested to see that so many fruits were identical, with one major outlier. I would never have thought something like pineapple would be so different from cherries. While I should probably repeat the measurement, it kind of supports the idea that some food responses are a bit unusual.

Next, my starches/grains. Note the shift of the peak from 30 minutes for fruit, to 60 minutes for most of these foods:

A couple of interesting things here:

  1. I was surprised how much higher the variability was.
  2. Sweet potatoes apparently are not my friend
  3. Kind of surprising that white rice and brown rice were almost exactly the same
  4. Beans and lentils were the lowest reaction of any food I tested

Overall I thought these results were very interesting, and I’ll have to consider how to use them going forward. My next step is going to be to try to expand this list, and then move on to some junk food/fast food/takeout meals to see if my response is significantly different to things like burgers and fries vs pizza. That should be interesting. I’ll probably base that on a serving (or at least the serving I eat) as opposed to carb count. If I get really crazy I may try some desserts or maybe some alcohol. You know, for science.

Let it never be said I’m not willing to suffer for my art.

What I’m Reading: February 2018

No, there haven’t been 18 school shootings so far this year. Parkland is a tragedy, but no matter what your position, spreading stats like these doesn’t help.

I’ve followed Neuroskeptic long enough to know I should be skeptical of fMRI studies, and this paper shows why: some studies trying to look at brain regions may be confounded by individual variation. In other words, what was identified as “change” may have just been individual differences.

Speaking of questionable data, I’ve posted a few times about Brian Wansink and the ever-growing scrutiny of his work. This week his marquee paper was called in to question: the bottomless bowl experiment.  This experiment involved diners with “self-refilling” bowls of tomato soup, and the crux of the finding is that without visual cues people tend to underestimate how much they’ve eaten. The fraud accusations were surprising, given that:

  1. This finding seems really plausible
  2. This finding pretty much kicked off Wansink’s career in the public eye

If this finding was based on fake data, it seems almost certain everything that ever came out of his lab is suspect. Up until now I think the general sense was more that things might have gotten sloppy as the fame of his research grew, but a fake paper up front would indicate a different problem.

Related: a great thread on Twitter about why someone should definitely try to replicate the soup study ASAP. Short version: the hypothesis is still plausible and your efforts will definitely get you attention.

Another follow-up to a recent post: AL.com dives in to Alabama school districts to see if school secession (i.e. schools that split off from a county system to a city controlled system) are racially motivated.  While their research was prompted by a court ruling that one particular proposed split was racially motivated, they found that in general schools didn’t significantly change their racial or class makeup all that much when they split off from larger districts. What they did find was that cities who split off their schools ended up spending more per student than they did when they were part of a county system. This change isn’t immediate, but a few years out it was almost universally true. This suggests that taxpayers are more agreeable to increasing tax rates when they have more control over where the money is going. Additionally, the new schools tend to wind up more highly rated than the districts they left, and the kids do better on standardized testing. Interesting data, and it’s nice to see a group look at the big picture.

 

5 Things About that “Republicans are More Attractive than Democrats” Study

Happy Valentine’s Day everyone! Given the spirit of the day, I thought it was a good time to post about a study Korora passed along a few days ago called “Effects of physical attractiveness on political beliefs”, which garnered a few headlines for it’s findings that being attractive was correlated with being a Republican. For all of you interested in what was actually going on here, I took a look at the study and here’s what I found out:

  1. The idea behind the study was not entirely flattering. Okay, while the whole “my party is hotter than your party” thing sounds like compliment, the premise of this study was actually a bit less than rosy. Essentially the researchers hypothesized that since attractive people are known to be treated better in many aspects of life, those who were more attractive may get a skewed version of how the world works. Their belief/experience that others were there to help them and going to treat them fairly may cause them to develop a “blind spot” that caused them to believe people didn’t need social programs/welfare/anti-discrimination laws  as much as less attractive people might think.
  2. Three hypotheses were tested Based on that premise, the researchers decided to test three distinct hypotheses. First, that attractive people were more likely to believe things like “my vote matters” and “I can make a difference”, regardless of political party. Second, they asked them about ideology, and third partisanship. I thought that last distinction was interesting, as it drew a distinction between the intellectual undertones and the party affiliation.
  3. Partisans are more attractive than ideologues. To the shock of no one, better looking people were much more likely to believe they would have a voice in the political process, even when controlled for education and income. When it came to ideology vs partisanship though, things got a little interesting. Attractive people were more likely to rate themselves as strong Republicans, but not necessarily as strong conservatives. In fact in the first data set they used (from the years 1972, 1974 and 1976) only one year should any association between conservatism and attractiveness, but all 3 sets showed a strong relationship between being attractive and saying you were a Republican. The later data sets (2004 and 2011) show the same thing, with the OLS coefficient for being conservative about half (around .30) of what the coefficient for Republicanism was (around .60). This struck me as interesting because the first headline I saw specifically said “conservatives” were more attractive, but that actually wasn’t the finding. Slight wording changes matter.
  4. We can’t rule out age cohort effects When I first saw the data sets, I was surprised to see some of the data was almost 40 years old. Then I saw they used data from 2004 and 2011 and felt better. Then I noticed that the 2004 and 2011 data was actually taken from the Wisconsin Longitudinal Study, whose participants were in high school in 1957 and have been interviewed every few years ever since. Based on the age ranges given, the people in this study were born between 1874 and 1954, with the bulk being 1940-1954. While the Wisconsin study controlled for this by using high school yearbook photos rather than current day photos, the fact remains that we only know where the subjects politics ended up (not what they might have been when they were young) and we don’t know if this effect persists in Gen X or millenials. It also seems a little suspect to me that one data set came during the Nixon impeachment era, as strength of Republican partisanship dropped almost a whole point over the course of those 4 years. Then again, I suppose lots of generations could claim a confounder.
  5. Other things still  are higher predictors of affiliation. While overall the study looked at the effect of attractiveness by controlling  for things like age and gender, the authors wanted to note that those other factors actually still played a huge role. The coefficients for the association of Republican leanings with age (1.08) and education (.57) for example  were much higher than attractiveness the coefficient for attractiveness (.33). Affinity for conservative ideology/Republican partisanship was driven by attractiveness (.37/.72) but also by income (.60/.62) being non-white (-.59/-1.55) and age (.99/1.45). Education was a little all over the place…it didn’t have an association with ideology (-.06), but it did with partisanship (.94). In every sample, attractiveness was one of the smallest of the statistically significant associations.

While this study is interesting, I would like to see it replicated with a younger cohort to see if this was a reflection of an era or a persistent trend. Additionally, I would be interested to see some more work around specific beliefs that might support the initial hypothesis that this is about social programs. With the noted difference between partisanship and ideology, it might be hard to hang your hat on an particular belief as the driver.

Regardless, I wouldn’t use it to start a conversation with your Tinder date. Good luck out there.

Idea Selection and Survival of the Fittest

It probably won’t come as a shock to you that I spend a lot of time ruminating over why there are so many bad ideas on the internet. Between my Intro to Internet Science, my review of the Calling Bullshit class, and basically every other post I’ve written on this site, I’ve put a lot of thought in to this.

One of the biggest questions that seems to come up when you talk about truth in the social media age is a rather basic “are we seeing something new here, or are we just seeing more of what’s always happened?” and what are the implications for us as humans in the long run? It’s a question I’ve struggled a lot with, and I’ve gone back and forth in my thinking. On the one hand, we have the idea that social media simply gives bigger platforms to bad actors and gives the rest of more situations in which we may be opining about things we don’t know much about.  On the other hand, there’s the idea that something is changing, and it’s going to corrupt our way of relating to each other and the truth going forward. Yeah, this and AI risk are pretty much what keeps me up at night.

Thus I was interested this week to see this Facebook post by Eliezer Yudkowsky about the proliferation of bad ideas on the internet. The post is from July, but I think it’s worth mentioning. It’s long, but in it Yudkowsky raises the theory that we are seeing the results of hypercompetition of ideas, and they aren’t pretty.

He starts by pointing out that in other fields, we’ve seen the idea that some pressure/competition is good, but too much can be bad. He uses college admissions and academic publishing as two examples. Basically, if you have 100 kids competing for 20 slots, you may get all the kids to step up their game. If you have 10,000 kids competing for 1 slot, you get widespread cheating and  test prep companies that are compared to cartels. Requiring academics to show their work is good, “publish or perish” leads to shoddy practices and probably the whole replication crisis. As Goodheart’s law states “When a measure becomes a target, it ceases to be a good measure”. In practical terms, hypercompetition ends up with a group that optimizes for one thing and only one thing, while leaving the back door completely unguarded.

Now take that entire idea and apply it to news and information in the social media age. While there are many good things about democratizing the spread of information, we have gone from moderate competition (get a local newspaper or major news network to pay attention to you, then everyone will see your story) to hypercompetition (anyone can get a story out there, you have to compete with billions of other stories to be read). With that much competition, we are almost certainly not going to see the best or most important stories rise to the top, but rather the ones that have figured out how to game the system….digital, limbic, or some combination of both. That’s what gets us to Toxoplasma of Rage territory, where the stories that make the biggest splash are the ones that play on your worst instincts. As Yudkowsky puts it “Print magazines in the 1950s were hardly perfect, but they could sometimes get away with presenting a complicated issue as complicated, because there weren’t 100 blogs saying otherwise and stealing their clicks”.

Depressed yet? Let’s keep going.

Hypercompetitive/play to your worst instincts stories clearly don’t have a great effect on the general population, but what happens to those who are raised on nothing other than that? In one of my favorite lines of the post, Yudkowsky says “If you look at how some groups are talking and thinking now,  “intellectually feral children” doesn’t seem like entirely inappropriate language.”  I’ve always thought of things like hyperreality in terms of virtual reality vs physical reality or artificial intelligence vs human intelligence, but what if we are kicking that off all on our own? Wikipedia defines it as “an inability of consciousness to distinguish reality from a simulation of reality, especially in technologically advanced postmodern societies.”, and isn’t that exactly what we’re seeing here on many topics? People use technology, intentionally or unintentionally to build bubbles that skew their view of how the world works, but consistently get reinforcement that they are correct?

Now of course it’s entirely possible that this is just a big “get off my lawn” post and that we’ll all be totally fine. It’s also entirely possible that I should not unwind from long weeks by drinking Pinot Noir and reading rationalist commentary on the future of everything, as it seems to exacerbate my paranoid tendencies. However, I do think that much of what’s on the internet today is the equivalent of junk food, and living in an environment full of junk food doesn’t seem to be working out too well for many of us. In physical health, we may have reached the point where our gains begin to erode, and I don’t think it’s crazy to think that a similar thing could happen intellectually.  Being a little more paranoid about why we’re seeing certain stories or why we’re clicking on certain links may not be the worst thing. For those of us who have still developing kids, making sure their ideas get challenged may be progressively more critical.

Good luck out there.

Tidal Statistics

I’m having a little too much fun lately with my “name your own bias/fallacy/data error” thing, so I’ve decided I’m going to make it a monthly-ish feature. I’m gathering the full list up under the “GPD Lexicon” tab.

For this month, I wanted to revisit a phrase I introduced back in October: buoy statistic. At the time I defined the term as:

Buoy statistic: A statistic that is presented on its own as free-floating, while the context and anchoring data is hidden from initial sight.

This was intended to cover a pretty wide variety of scenarios, such as when we hear things like “women are more likely to do thing x” without being told that the “more likely” is 3 percentage points over men.

While I like this term, today I want to narrow it down to a special subcase: tidal statistics. I’m defining those as…..

Tidal Statistic: A metric that is presented as evidence of the rise or fall of one particular group, subject or issue, during a time period when related groups also rose or fell on the same metric

So for example, if someone says “after the CEO said something silly, that company’s went down on Monday” but they don’t mention that the whole stock market went down on Monday, that’s a tidal statistic. The statement by itself could be perfectly true, but the context changes the meaning.

Another example: recently Vox.com did an article about racial segregation in schools in which they presented this graph:

Now this graph initially caught my eye because they had initially labeled it as being representative of the whole US (they later went back and corrected it to clarify that this was just for the south), and I started to wonder how this was impacted by changing demographic trends. I remembered seeing some headlines a few years back that white students were now a minority-majority among school age children, which means at least some of that drop is likely due a decrease in schools whose student populations are > 50% white.

Turns out my memory was correct, and according to the National Center for Education Statistics, in the fall of 2014, white students became a minority majority in the school system at 49.5% of the school age population.  For context, when the graph starts (1954) the US was about 89% white. I couldn’t find what that number was for just school age kids, but it was likely much higher than 49.5%.   So basically if you drew a similar graph for any other race, including white kids, you would see a drop. When the tide goes down, every related metric goes down with it.

Now to be clear, I am not saying that school segregation isn’t a problem or that the Vox article gets everything wrong. My concern is that graph was used as one of their first images in a very lengthy article, and they don’t mention the context or what that might mean for advocacy efforts. Looking at that graph, we have no idea what percentage of that drop is due to a shrinking white population and what is due to intentional or de facto segregation. It’s almost certainly not possible to substantially raise the number of kids going to schools who have more than 50% white kids, simply because the number of schools like that is shrinking.  Vox has other, better, measures of success further down in the article, but I’m disappointed they chose to lead with one that has a major confounder baked in.

This is of course the major problem with tidal statistics. The implication tends to be “this trend is bad, following our advice can turn it around”. However, if the trend is driven by something much broader than what’s being discussed, any results you get will be skewed. Some people exploit this fact, some step in to it accidentally, but it is an interesting way that you can tell the truth and mislead at the same time.

Stay safe out there.

 

Praiseworthy Wrongness: Dr vs Ms

I ran across a pretty great blog post this week that I wanted to call attention to. It’s by a PhD/science communicator Bethany Brookshire who blogs at Scicurious.com and hosts a podcast Science for the People*.

The post recounts her tale of being wrong on the internet in a Tweet that went viral.

For those too lazy to click the link, it happened like this: early one Monday morning, she checked her email and noticed that two scientists she’d reached out to for interviews had gotten back to her, one male and one female.  The male had started his reply with “Dear Ms. Brookshire”, and the woman “Dear Dr Brookshire”. She felt like this was a thing that had happened before, so she sent this Tweet:

After sending it and watching it get passed around, she started to feel uneasy. She realized that since she actually reached out to a LOT of people for interviews over the last 2 years, she could actually pull some data on this. Her blog post is her in depth analysis of what she found (and I recommend you read the whole thing), but basically she was wrong. While only 7% of people called her “Dr Brookshire”, men were actually slightly more likely to do so than women. Interestingly, men were also more likely to launch is to their email without using any name, and women were actually more likely to use “Ms”. It’s a small sample size so you probably can’t draw any conclusions other than this: her initial Tweet was not correct. She finishes her post with a discussion of recency bias and confirmation bias, and how things went awry.

I kept thinking about this blog post after I read it, and I realized it’s because what she did here is so uncommon in the 2018 social media world. She got something wrong quite publicly, and she was willing to fess up and admit it. Not because she got caught or might have gotten caught (after all, no one had access to her emails) but simply because she realized she should check her own assumptions and make things right if she could. I think that’s worthy of praise, and the kind of thing we should all be encouraging of.

As part of my every day work, I do a lot of auditing of other people’s work and figuring out where they might be wrong. This means I tend to do a lot of meditating on what it means to be wrong….how we handle it, what we do with it, and how to make it right. One of the things I always say to staff when we’re talking about mistakes is that the best case scenario is that you don’t make a mistake, but the second best case is that you catch it yourself. Third best is that we catch it here, and fourth best is someone else has to catch us. I say that because I never want staff to try to hide errors or cover them up, and I believe strongly in having a “no blame” culture in medical care. Yes, sometimes that means staff might think confessing is all they have to do, but when people’s health is at stake the last thing you want is for someone to panic and try to cover something up.

I feel similarly about social media. The internet has made it so quick and easy to announce something to a large group before you’ve thought it through, and so potentially costly to get something wrong that I fear we’re going to lose the ability to really admit when we’ve made a mistake. Would it have been better if she had never erred? Well, yes. But once she did I think self disclosure is the right thing to do. In our ongoing attempt to call bullshit on internet wrongness, I think  giving encouragement/praise to those who own their mistakes is a good thing. Being wrong and then doubling down (or refusing to look in to it) is far worse than stepping back and reconsidering your position. The rarer this gets, the more I feel the need to call attention those who are willing to do so.

No matter what side of an issue you’re on, #teamtruth should be our primary affiliation.

*In the interest of full disclosure, Science for the People is affiliated with the Skepchick network, which I have also blogged for at their Grounded Parents site. Despite that mutual affiliation and the shared first name, I do not believe I have ever met or interacted with  Dr Brookshire. Bethany’s a pretty rare first name, so I tend to remember it when I meet other Bethanys (Bethanii?)