GPD Year In Review: Top Posts of 2017

As the year winds down, it’s always a good time to take a look back at the year and see what’s happened and what the hopes are for 2018. My capstone class has finished up and been graded, and in a few days I’ll get the official piece of paper that says I’ve finished my studies. While I’m not planning on leaving my current job at the moment, I have some consulting work lined up and want to spend some time pulling together some of my writing from over the years. I’ll keep myself busy, don’t you worry.

While I’m working on that, I figured I’d continue my tradition of reviewing my most popular posts of the year. It’s always fun to see what was popular at the time I wrote it and what continued in Google popularity after it went up. While some of my most popular posts continue to be old ones (Correlation/Causation Confusion, Bimodal Distributions and Immigration, Poverty and Gumballs are Perennial Favorites), this list is only posts that were written in 2017:

  1. Calling BS Read-Along This series that followed along with the syllabus for the Calling Bullshit class and the University of Washington was far and away my most popular of the year, helped out quite a bit by the professors Tweeting out links to my posts. Definitely one of my more fun blogging experiences.
  2. The Real Dunning-Kruger Graph At first I was sort of surprised to see this one this high on the list, but I realized that it’s popularity has been boosted by a steady stream of Google traffic. It appears that quite a few people had the same question I did, and hopefully my post helped them out.
  3. Immigration, Poverty and Gumballs Part 2: The Amazing World of Gumball  After my original Immigration, Poverty and Gumballs post went mini-viral, I put together this post to address some of the responses. Not quite as popular, but still gets some traffic.
  4. 10 Gifs for Stats/Data People In my perfect world, this would be my most popular post. In this one, stats and data gifs are still pretty niche.
  5. Using Data to Fight Fraud: the Carlisle Method Doing a follow up to this post is definitely on my 2018 to do list.
  6. 5 Things You Should Know About Orchestras and Blind Auditions I am glad to know that there are people in this world who were as interested in the study behind the anecdote as I was.
  7. A Loss for so Many A post I wish I never had to write, but one I was happy got shared.
  8. 5 Things You Should Know About the Backfire Effect With political polarization all around us,
    this post may be one of the more important ones I did this year….how to actually convince people of facts when they don’t agree with you.
  9. How You Ask the Question: NFL Edition More examples of how asking questions differently can generate different responses. This one produced some debate on other blogs about whether or not this was really an example of “the same question” or whether it was an example of slight wording changes changing the question entirely, but regardless I think it’s a good example of how little word choices make a difference.
  10. Perfect Metric Fallacy This post probably was one of my most gratifying, as it got passed around an office that was going through this exact issue. Hearing people’s reactions made me laugh.

Here’s to 2018!

 

5 Things About Personality and Cold Weather

As I mentioned on Sunday, I’ve been itching to do a deep dive in to this new paper about how people who grow up in cold regions tend to have different personalities than those who don’t. As someone who grew up in the New England area, it’s pretty striking to me how every warmer weather city in the US seems more outgoing than what I’m used to. Still, despite my initial belief I was curious how one goes about proving that people in cold-weather cities are less agreeable. While the overall strategy is pretty simple (give personality tests to different people in different climates, compare answers) I figured there’d likely be some interesting nuance I’d be interested in.

Now that I’ve finally read the paper, here’s what I found out:

  1. To make the findings more broadly applicable, study multiple countries One of the first things I noticed when I pulled up the paper is that there were a surprising number of Chinese names among the author list. I had assumed this was just a US based study, but it turns out it was actually a cross-cultural study using both the US and China for data sets. This makes the findings much stronger than they would be otherwise.
  2. There are 3 possible mechanisms for climate effecting personality I’ve talked about the rules for proving causality before, and the authors wasted no time in introducing a potential mechanism to explain a cold weather/agreeableness link. There are three main theories: people in cold weather were more likely to be herders which requires less cooperation than farming or fishing, people in cold weather are more susceptible to pathogens so they unconsciously avoid each other, and people may migrate to areas that fit their (group) personalities. Thus, it’s possible that the cold doesn’t make people disagreeable, but rather that disagreeable people move to cold climates. Insert joke about Bostonians here.
  3. The personality difference were actually present for every one of the Big 5 traits. Interestingly, every one of the Big 5 personality traits was higher in those who lived in nicer climates: extraversion, agreeableness, openness to new experience, conscientiousness and emotional stability. The difference in agreeableness was not statistically significant for the Chinese group. Here are the differences, along with what variables appear to have made a difference (note: “temperature clemency” means how far off the average temperature is from  72 degrees):
  4. Reverse causality was controlled for One of the interesting things about the findings is that the authors decided to control for the factors listed in #2 to determine what was causing what. They specifically asked people about where they grew up to control for selective (adult) migration, and in the Chinese part of the study actually asked about prior generations as well. They controlled for things like influenza incidence (as a proxy for pathogen presence) as well. Given that the finding persisted after these controls, it seems more likely that weather causes these other factors.
  5. Only cold climates were examined One of the more interesting parts of this to me is what wasn’t studied: uncomfortably warm temperatures. Both China and the US are more temperate to the south and colder to the north. The “temperature clemency” variable looked specifically at temperatures that deviated from 72 degrees, but only in the low temperature direction. It would be interesting to see what unreasonably hot temperatures did to personalities….is it a linear effect? Do some personality traits drop off again? I’d be curious.

Overall I thought this was an interesting study. I always appreciate it when multiple cultures are considered, and I thought the findings seemed pretty robust. Within the paper and in the notes at the end, the authors repeatedly mentioned that they tried most of the calculations a few different ways to make sure that their findings were robust and didn’t collapse with minor changes. That’s a great step in the right direction for all studies. Stay warm everyone!

What I Wish I Was Reading: December 2017

With guests at the house, a sick kiddo and snow in the forecast, I have had no time to read this new paper on how regional temperature affects population agreeableness. I will be doing so soon however, because as someone who’s heard a lot about how unfriendly Boston is I’d like some validation for my go to “we’re rude because we’re cold” excuse.

Funny story: when my out of town guests picked up their (4 wheel drive) rental car, the lady behind the counter mocked them and said “expecting some snow or something”? When they got to my house and we confirmed that there is actually snow in the forecast, they wondered why she was so condescending about it. We explained that for Bostonians, a forecast of 4-6 inches over 20 hours isn’t really “snow”. They informed me that in Seattle, they’d be calling out the National Guard.

Also, my sister-in-law (married to my teacher/farmer brother) has informed me her new parenting slogan is “There’s no such thing as bad weather, only bad clothes” we she apparently got from this book of the same name. I like this theory. It goes nicely with my adulthood slogan of “There’s no such thing as strong coffee, only weak people.”

I hope to have a review of the paper up on Wednesday this week, stay tuned.

The Assistant Village Idiot linked  to this article (via Lelia) about those with no visual memory. I’ve been pondering this as I’m pretty sure my visual memory has some gaps.  I can’t read facial expressions baseline, and one of my recurring stress nightmares is being handed documents/books that I recognize but can’t decipher the text. I feel something’s related here, but I have to reread the article before I comment further.

Also, I know I always chide people to read behind the headline, but this headline’s so good I’m pretty sure I’ll love it when I finally get to read it: 5 Sport Science Studies that “Failed”. The author specifically took note of studies he saw that asked interesting questions and got negative results. He wanted to talk about this to fight the impression that the only interesting findings were positive findings.

Work Hours and Denominators

I was talking to a few folks about work recently, and an interesting issue came up that I’d never particularly thought about before. I’ve mentioned in the past the power of denominators to radically change what we’re looking at, how averages might be lying to you, and how often people misreport “hours worked”…..but I don’t know that I’d ever put all 3 together.

When answering the question “how many hours do you work per day”, most full time workers generally name a number between 8 and 10 hours a day. Taken literally though, it occurred to me that the answer is really somewhere between 6 and 7 hours on average, as most people aren’t working on the weekends. So basically when asked “average hours per day” we assume “average hours per working day” and answer accordingly.

This is a minor thing, but it actually is part of why the actual “hours worked” numbers and the reported “hours worked” numbers don’t always add up. When people try to figure out how much the average American is working, they take things like vacation/holiday weeks in to account. The rest of us don’t tend to do that, and instead report on how much we worked when we worked full weeks. A slight shift in denominator, but it can end up shifting the results by a decent amount.

Not the most groundbreaking insight ever, but I always get a little interested when I realize we’ve all been assuming a denominator that’s not explicitly clarified.

I’m Thinking of a Word That Starts With a…..

I’ve mentioned before that I like to try to find unusual ways of teaching my 5 year old son statistical concepts by relating them to things he likes. This pretty much doesn’t work, but this week I tried it again and attempted to use a discussion about letters to segue in to a discussion about perception vs data. He’s getting in to some reading fundamentals now, and is incredibly curious about what words start with which letters. This leads to our new favorite game “Let’s talk about ____ words!” where we name a letter and then just think of as many words as we can that start with that letter.

This game is fun, but he’s a little annoyed at letters that make more than one sound. This week he got particularly irritated at the letter “c”, which he felt was hogging all the words while leaving “k” and “s” with none. I started trying to explain to him that “s” in particular was doing pretty alright for itself, but after discussing “cereal” and “circus” he was pretty convinced that “s” was in trouble.

As I was defending the English language’s treatment of the letter “s”, I started to wonder what the most common first letter of words actually was. I also wondered if it was different for “kids words” vs “all words”. After some poking around on the internet, I discovered that there’s a decent amount of variation depending on what word list you go with. I decided to take a look at three lists:

  1. All unique words appearing more than 100,000 times in all books on Google ngrams (Note: I had to go to the original file here. The list they provide on that site and the Wiki page is actually the most common first letters for all words used, not just unique words. That’s why “t” is the most common….it’s counting every instance of “the” separately)
  2. The 1,000 most commonly used English language words (of Up-Goer 5 fame)
  3. The Dolch sight words list, used to teach kids to read

Comparing the percent of words starting with each letter on each list got me this graph:

As I suspected, “s” does quite well for itself across the board, though it really shines in the “core words” list. “K” on the other hand is definitely being left out. It’s interesting to see what letters do well in bigger word sets (like c, p and m), and which ones are only in the smaller sets (b, t, o and w). “W” seems very popular for early reading lists because of words like “what”, “where”, “why”. “S” actually is really interesting, as it appears to kick off lots of common-but-not-basic words. My guess is this is because of its participation in letter combinations like “sh” and “sch”.

Anyway, my son didn’t really seem to grasp the “the plural of anecdote is not data” lesson, so I pointed out to him that both “Spiderman” and “superhero” started with “S”. At that point he agreed that yes, lots of words started with “s” and went back to feeling bad for “K”. At least that we can agree upon.

Now please enjoy my favorite Sesame Street alphabet song ever: ABCs of the Swamp

Cornell Food and Brand Lab: an Update

After mentioning the embattled Brian Wansink and the Cornell Food and Brand Lab in my post last week, a friend passed along the most recent update on this story. Interestingly it appears Buzzfeed is the news outlet doing the most reporting on this story as it continues to develop.

A quick review:
Brian Wansink is the head of the Cornell Food and Brand Lab, which publishes all sorts of interesting research about the psychology behind eating and how we process information on health. Even if you’ve never heard his name, you may have heard about his work….studies like “people eat more soup if their bowl refills so they don’t know how much they’ve eaten”  or “kids eat more vegetables when they have fun names” tend to be from his lab.

About a year ago, he published a blog post where he praised one of his grad students for taking a data set that didn’t really show much and turning it in to 5 publishable papers.  This turned in to an enormous scandal as many people quickly pointed out that a feat like that almost certainly involved lots of data tricks that would make the papers results very likely to be false.  As the scrutiny went up, things got worse as now people were pouring over his previous work.

Not only did this throw Wansink’s work in to question, but a lot of people (myself included) who had used his work in their work now had to figure out whether or not to retract or update what they had written. Ugh.

So where are we now?
Well as I mentioned, Buzzfeed has been making sure this doesn’t drop. In September, they reported that the aforementioned “veggie with fun names” study had a lot of problems. Worse yet, Wansink couldn’t produce the data when asked.   What was incredibly concerning is that this particular paper is part of a program Wansink was piloting for school lunches. With his work under scrutiny, over $20 million in research and training grants may have gone towards strategies that may not actually be effective. To be clear, the “fun veggie name study” wasn’t the only part of this program, but it’s definitely not encouraging to find out that parts of it are so shaky.

To make things even worse, they are now reporting that several of his papers that allegedly were done on three different topics in three different years sent to three different sample populations show the exact same number of survey respondents: 770. Those papers are being reviewed.

Finally, the report he has a 4th paper being retracted, this one on WWII veterans and cooking habits. An interview with the researcher who helped highlight the issues with the paper is up here at Retraction Watch, and some of the problems with the paper are pretty amazing. When asked where he first noted problems, he said: “First there is the claim that only 80% of people who saw heavy, repeated combat during WW2 were male.”  Yeah, that seems a little off. Wansink has responded to the Buzzfeed report to say that this was due to a spreadsheet error.

Overall, the implications of this are going to be felt for a while. While only 4 papers have been retracted so far, Buzzfeed reports that 8 more have planned corrections, and over 50 are being looked at. With such a prolific lab and results that are used in so many places, this story could go on for years. I appreciate the journalists keeping up on this story as it’s an incredibly important cautionary tale for members of the scientific community and the public alike.

Food Insecurity: A Semester in Review

I mentioned a few months back that I was working on my capstone project for my degree this semester. I’ve mostly finished it up (just adjusting some formatting), so I thought it would be a good moment to post on my project and some of my findings. Since I have to present this all in a week or two, it’s a good moment to gather my thoughts as well.

Background:
The American Time Use Survey is a national survey carried out by the Bureau of Labor Statistics that surveys Americans about how they spend their time. From 2014-2016 they administered a survey module that asked specifically about health status and behaviors. They make the questionnaire and data files publicly available here.

What interested me about this data set is that they asked specifically about food insecurity….i.e. “Which of the following statements best describes the amount of food eaten in your household in the last 30 days – enough food to eat, sometimes not enough to eat, or often not enough to eat?” Based on that data, I was able to compare those who were food secure (those who said “I had enough food to eat”) vs the food insecure (those who said they “sometimes” or “frequently” did not have enough to eat.

This is an interesting comparison to make, because there’s some evidence that in the US these two groups don’t always look like what you’d expect. Previous work has found that people who report they are food insecure actually tend to weigh more than those who are food secure. I broke my research down in to three categories:

  1. Confirmation of BMI differences
  2. Comparison of health habits between food secure and food insecure people
  3. Correlation of specific behaviors with BMI within the food insecure group

Here’s what I found:

Confirmation of BMI differences:
Yes, the paradox is true for this data set. Those who were “sometimes” or “frequently” food insecure were almost 2 BMI points heavier than those who were food secure…around 10-15 pounds for most height ranges. Level of food insecurity didn’t seem to matter, and the effect persisted even after controlling for public assistance and income.

Interestingly, my professor asked me if the BMI difference was due more to food insecure people being shorter (indicating a possible nutritional deficiency) or from being heavier, and it turns out it’s both. The food insecure group was about an inch shorter and 8 lbs heavier than the food secure group.

Differences in health behaviors or status:
Given my sample size (over 20,000), most of the questions they asked ended up having statistically significant differences. The ones that seemed to be both practically and statistically significant were:

  1. Health status People who were food insecure were WAY more likely to say they were in poor health. This isn’t terribly surprising since disability would impact people’s assessment of their health status and ability to work/earn a living.
  2. Shopping habits While most people from both groups did their grocery shopping at grocery stores, food insecure people were more likely to use other stores like “supercenters” (i.e. Walmart or Target) and convenience stores or “other” types of stores. Food secure people were more likely to use places like Costco or Sam’s Club. Unsurprisingly, people who were food insecure were much more likely to say they selected their stores based on the prices. My brother had asked specifically up front if “food deserts” were an issue, so I did note that the two groups answered “location” was a factor in their shopping at equal rates.
  3. Soda consumption Food insecure people were much more likely to have drank soda in the last 7 days (50% vs 38%) and much less likely to say it was a diet soda (40% vs 21.5%) than the food secure group.
  4. Exercise Food insecure people were much less likely to have exercised in the last 7 days (50.5%) than food secure people were (63.9%). Given the health status ranking, this doesn’t seem surprising.
  5. Food shopping/preparation Food insecure people were much more likely to be the primary food shopper and preparer. This makes sense when you consider that food insecurity is a self reported metric. If you’re the one looking at the bills, you’re probably more likely to feel insecure than if you’re not. Other researchers have noted that many food stamp recipients will also cut their own intake to make sure their children have enough food.

Yes, I have confidence intervals for all of these, but I’m sparing you.

BMI correlation within the food insecure group:
Taking just the group that said they were food insecure, I then took a look at which factors were most associated with higher BMIs. These were:

  1. Time spent eating Interestingly, increased time spent eating was actually associated with lower BMIs. This may indicate that people who can plan regular meal times might be healthier than those eating while doing other things (the survey asked about both).
  2. Drinking beverages other than water Those who regularly drank beverages other than water were heavier than those who didn’t
  3. Lack of exercise No shock here
  4. Poor health The worse the self assessed health, the higher the BMI. It’s hard to tease out the correlation/causation here. Are people in bad health due to an obesity related illness (like diabetes) or are they obese because they have an issue that makes it hard for them to move (like a back injury)? Regardless, this correlation was QUITE strong: people in “excellent” health had BMIs almost 5 points lower than those in “poor” health.
  5. Being the primary shopper I’m not clear on why this association exists, but primary shoppers were 2 BMI points heavier than those that shared shopping duties.
  6. Public assistance  Those who were food insecure AND received public assistance were heavier than those who were just food insecure.

It should be noted that I did nothing to establish causality here, everything reported is just an association. Additionally, it’s interesting to note a few things that didn’t show up here: fast food consumption, shopping locations and snacking all didn’t make much of a difference.

While none of this is definitive, I thought it was an interesting exploration in to the topic. I have like 30 pages of this stuff, so I can definitely clarify anything I didn’t go in to. Now to put my presentation together and be done with this!

 

Eating Season

Happy almost Thanksgiving! Please enjoy this bit of trivia I recently stumbled on about American food consumption patterns during this time of year! It’s from the book “Devoured: From Chicken Wings to Kale Smoothies – How What We Eat Defines Who We Are” by Sophie Egan.

From page 173:

A few paragraphs later, she goes a bit more in depth about what happens to shopping habits (note: she quotes the embattled Cornell Food and Brand lab, but since their data matches another groups data on this, I’m guessing it’s pretty solid):

I had no idea that “eating season” had gone so far outside the bounds of what I think of as the holiday season. Kinda makes you wonder if this is all just being driven by winter and the holidays are just an excuse.

On a related note, my capstone project is done/accepted with no edits and I will probably be putting up some highlights about my research in to food insecurity and health habits on Sunday.

Happy Thanksgiving!

5 Interesting Things About IQ Self-Estimates

After my post last week about what goes wrong when students self-report their grades, the Assistant Village Idiot left a comment wondering about how this would look if we changed the topic to IQ. He wondered specifically about Quora, a question asking/answering website that has managed to spawn its own meta-genre of questions asking “why is this website so obsessed with IQ?“.

Unsurprisingly, there is no particular research done on specific websites and IQ self-reporting, but there is actually some interesting literature on people’s ability to estimate their own IQ and that of those around them. Most of this research comes from a British researcher from the University College London, Adrian Fuhrman.  Studying how well people actually know themselves kinda sounds like a dream job to me, so kudos to you Adrian. Anyway, ready for the highlights?

  1. IQ self estimates are iffy at best One of the first things that surprised me about IQ self-estimates vs actual IQ was how weak the correlation was. One study found an r=.3, another r=.19.  This data was gathered from people who first took a test, then were asked to estimate their results prior to actually getting them. In both cases, it appears that people are sort of on the right track, but not terrific at pinpointing how smart they are. One wonders if this is part of the reason for the IQ test obsession….we’re rightfully insecure about our ability to figure this out on our own.
  2. There’s a gender difference in predictions Across cultures, men tend to rank their own IQ higher than women do, and both genders consistently rank their male relatives (fathers, grandfathers and sons) as smarter than their female relatives (mothers, grandmothers and daughters). This often gets reported as male hubris vs female humility (indeed, that’s the title of the paper), but I note they didn’t actually compare it to results. Given that many of these studies are conducted on psych undergrad volunteers, is it possible that men are more likely to self select when they know IQ will be measured? Some of these studies had average IQ guesses of 120 (for women) and 127 (for men)….that’s not even remotely an average group, and I’d caution against extrapolation.
  3. Education may be a confounding factor for how we assess others One of the other interesting findings in the “rate your family member” game is that people rank previous generations as half a standard deviation less intelligent than they rank themselves. This could be due to the Flynn effect, but the other suggestion is that it’s hard to rank IQ accurately when educational achievement is discordant. Within a cohort, education achievement is actually pretty strongly correlated with IQ, so re-calibrating for other generations could be tricky.  In other words, if you got a master’s degree and your grandmother only graduated high school, you may think your IQ is further apart than it really is. To somewhat support this theory, as time has progressed, the gap between self rankings and grandparent rankings has closed. Interesting to think how this could also effect some of the gender effects seen in #2, particularly for prior generations.
  4. Being smart may not be the same as avoiding stupidity One of the more interesting studies I read looked at the correlation between IQ self-report and personality traits, and found that some traits made your more likely to think you had a high IQ. One of these traits was stability, which confused me because you don’t normally think of stable people as being overly high on themselves. When I thought about it for a bit though, I wonder if stable people were defining being “smart” as “not doing stupid things”.  Given that many stupid actions are probably more highly correlated with impulsiveness (as opposed to low IQ), this could explain the difference. I don’t have proof, but I suspect a stable person A with an IQ of 115 will mostly do better than an unstable person B with an IQ of 115, but person A may attribute this difference to intelligence rather than impulse control. It’s an academic distinction more than a practical one, but it could be confusing things a bit.
  5. Disagreeableness is associate with higher IQs, and self-perception of higher IQs  Here’s an interesting chicken and egg question for you: does having a high IQ make you more disagreeable or does being disagreeable make you think you have a higher IQ? Alternative explanation: is some underlying factor driving both? It turns out having a high IQ is associate both with being disagreeable and being disagreeable is associated with ranking your IQ as higher than others. This probably effects some of the IQ discussions to a certain degree….the “here’s my high IQ now let’s talk about it” crowd probably really is not as agreeable as those who want to talk about sports or exchange recipes.

So there you have it! My overall impression from reading this is that IQ is one of those things where people don’t appreciate or want to acknowledge small differences. In looking at some of the studies of where people ranking their parents against each other, I was surprised how many were pointing to a 15 point gap between parents, or a 10 point gap between siblings. Additionally, it’s interesting that we appear to have a pretty uneasy relationship with IQ tests in general. Women in the US for example are more likely to take IQ tests than men are but less likely to trust their validity. To confuse things further, they are also more likely to believe they are useful in educational settings. Huh? I’d be interested to see a self-estimated IQ compared to an actual understanding of what IQ is/is not, and then compare that to an actual scored IQ test. That might flesh out where some of these conflicting feelings were coming from.

Grade Prediction by Subject

I saw an interesting study this week that seems to play in to two different topics I’ve talked about here: self reporting bias and the Dunning-Kruger effect.

The study was “Examining the accuracy of students’ self-reported academic grades from a correlational and a discrepancy perspective: Evidence from a longitudinal study“, and it took a look at how accurate students self-reported grades were. This is not the first time someone has looked at this, but it did add two key things to the mix: non-US test scoring and different academic subjects over different years of school. The students surveyed were Swiss, and they were asked to report their most recent grade in 4 different subjects. This was then compared to their actual most recent grade. The results were pretty interesting (UR=under-report, OR=Over-report, T1-T3 are years of school):

Unsurprisingly, kids were much more likely to over-report than under-report. Since most of the differences were adding a half point or so (out of 6), one wonders if this is just a tendency to round up in our own favor. Interestingly, a huge majority of kids were actually quite honest about their ability….about 70% for most years. The authors also noted that the younger kids were more likely to be honest than the older kids.

I think this is a really interesting example of how self-reporting biases can play out. It’s easy to think of bias as something that’s big and overwhelming, but studies like this suggest that most bias is small for any given individual. A rounding error here, and accidental report of your grade from last semester….those are tiny for each person but can add up over a group. I suspect if we looked at those older students who reported their grades as inaccurately high, we would discover that they had gotten high grades in previous years. There does seem to be a bias towards reporting your high water mark rather than your current status….kinda like the jock who continues to claim they can run a 5 minute mile long after they cease to be able to do so.

The phenomena is pretty well known, but it’s always interesting to see the hard numbers.