New Year’s Updates and Experiments

Hey hey! Happy (almost) new year!

New experiment: My Year in Surveys

I don’t typically do new year’s resolutions, but I do like this time of year for sitting back and taking stock of where things are and what I might like to focus on going forward.  I have a few ideas, but they’re really pretty vague to the point of being useless: be healthier, manage stress better, spend more time with my loved ones. As often happens when I am faced with vague requests, the idea floated through my mind that I wish I had some more data about what was going on.

It occurred to me that there’s actually no good reason I can’t get the data I’m looking for. I design surveys for people as part of my side work, why not design one for myself to gather data around what I’m up to on a daily basis? Thus my 2018 self-survey was born. It’s a work in progress, but here’s the general set-up:

  1. Every day I take the same  survey (built in Google Forms) that asks questions about my health, stress status, and family life.
  2. Every time I encounter a need for a new answer (or new question), I add it and track it going forward.
  3. Snarky/ridiculous answers are allowed and are built in to the survey. In fact my stress management section is kind of based on this and asks about both “Psychodrama of the Moment” and “What are you obsessing about today?”

So far it takes me <5 minutes to complete, and asks some questions I’m pretty interested to trend out. For example, on the stress management page I have a list of different feelings I might be having that day. I will be fascinated to see what feelings are my most commonly reported. I’ll be interested to see how the survey changes over the year, and I’ll probably be doing some sort of reports about what I’m finding. It’s possible I’ll end up abandoning the whole thing by mid-January, but I think the active updating part will  appeal to my tinkering/don’t like to get bored side.

Top posts update:

I did my “top posts of 2017” post 2 weeks ago thinking nothing could really change between then and the end of the year. Then my Dunning-Kruger post got linked to in an apparently popular Reddit thread and it claimed the top spot for the year. Teach me to jump the gun.

Read the headline update: 

I’ve talked before about how you should always read more than the headline on an article, and I’ve also pointed out how every time an article is posted on social media people seem to feel freer to move away from the original source material. This article captured a new phenomena that’s related to both: news outlets posting their own stories to Twitter under different (and misleading) text that doesn’t jive with their own headlines/articles. The Hill is apparently quite terrible about this, though I’m sure with Twitter moving to 280 characters they’ll clear this right up, right guys?

That’s all I got, see you all in 2018!

5 Interesting Resources for Snowflake Math Lessons

Happy National Make a Paper Snowflake Day (or National Make Cut Out Snowflakes Day for the purists)!

I don’t remember why I stumbled on this holiday this year, but I thought it would be a really good time to remind everyone that snowflakes are a pretty cool (no pun intended) basis for a math lesson. My sister-in-law teaches high school math and informs me that this is an excellent thing to give kids to do right before winter break. I’m probably a little late for that, but just in case you’re looking for some resources, here are some good ones I’ve found:

  1. Khan Academy Math for Fun and Glory  If you ever thought the problem with snowflake cutting is that it wasn’t technical enough, then this short video is for you. Part of a bigger series that is pretty fun to work through, this video is a great intro to how to cut a mathematically/anatomically(?) correct snowflake.
  2. Computer snowflake models There’s some interesting science behind computer snowflake models, and this site takes you through some of the most advanced programs for doing so. It seems like a fun exercise, but apparently modeling crystal growth has some pretty interesting applications. Gallery of images here, and an overview of the mathematical models here.
  3. Uniqueness of snowflakes Back in the real world, there’s an interesting and raging debate over the whole “no two snowflakes are alike” thing. According to this article,  “Yes, with a caution”, “Likely but unprovable” or “it depends on what you mean by unique” are all acceptable answers.
  4. Online snowflake maker If you’re desperate to try out some of the math lessons you just learned but can’t find your scissors, this online snowflake generator has you covered.
  5. Other winter math If you’re still looking for more ideas, check out this list of winter related math activities. In addition to snowflake lessons around symmetry, patterns and Koch snowflakes, they have penguin and snowman math.

Happy shoveling!

 

Pictures of the Season

Happy day before Christmas/3 days after the solstice!

This time of year I’m always on the lookout for interesting seasonal data sets/visualizations, and I’ve found  some good ones this year.

The first is this really cool visual of how long the shortest day of the year is across the US (original at the WaPo here):

It’s interesting that moving a little over an hour south of where I grew up appears to mean I gained an extra 15-20 minutes of daylight on the darkest day of the year. My sister on the other hand is currently in Juneau, Alaska and is ruing her short days.

There’s also this Vox article from last year that shows how many Christmas trees are grown by state. There’s an (unlinkable) interactive map that I found really interesting, as it shows how many trees each state cuts. 4 states produces over a million trees, and I have to admit I could not have named them. If you’re bored, try to name them before you check it out. I’ll leave the answer in the comments.

The last interesting data set is from a paper a commenter left on my Eating Season post. It’s called “The Seasonal Periodicity of Healthy Contemplations About Exercise and Weight Loss: Ecological Correlational Study” and it’s a study that looks at Google search trends for the topics of “exercise” and “weight loss”. Apparently searches for “weight loss” and some related terms peak twice a year (in the winter and summer months) and searches for “exercise” peak in the winter. My first thought with the exercise searches was that the winter drove people to search for ways of exercising indoors. The authors apparently had the same thought, since they also decided to see if a rise in Google searches for “exercise” correlated with the latitude the search was coming from. It did:

Coming from a city that’s had freezing rain/snow for the last several days, I can say I’m all in on indoor exercising. Alternatively, if someone wants to suggest we just all hibernate for the next few months, I’d be good with that too.

GPD Year In Review: Top Posts of 2017

As the year winds down, it’s always a good time to take a look back at the year and see what’s happened and what the hopes are for 2018. My capstone class has finished up and been graded, and in a few days I’ll get the official piece of paper that says I’ve finished my studies. While I’m not planning on leaving my current job at the moment, I have some consulting work lined up and want to spend some time pulling together some of my writing from over the years. I’ll keep myself busy, don’t you worry.

While I’m working on that, I figured I’d continue my tradition of reviewing my most popular posts of the year. It’s always fun to see what was popular at the time I wrote it and what continued in Google popularity after it went up. While some of my most popular posts continue to be old ones (Correlation/Causation Confusion, Bimodal Distributions and Immigration, Poverty and Gumballs are Perennial Favorites), this list is only posts that were written in 2017:

  1. Calling BS Read-Along This series that followed along with the syllabus for the Calling Bullshit class and the University of Washington was far and away my most popular of the year, helped out quite a bit by the professors Tweeting out links to my posts. Definitely one of my more fun blogging experiences.
  2. The Real Dunning-Kruger Graph At first I was sort of surprised to see this one this high on the list, but I realized that it’s popularity has been boosted by a steady stream of Google traffic. It appears that quite a few people had the same question I did, and hopefully my post helped them out.
  3. Immigration, Poverty and Gumballs Part 2: The Amazing World of Gumball  After my original Immigration, Poverty and Gumballs post went mini-viral, I put together this post to address some of the responses. Not quite as popular, but still gets some traffic.
  4. 10 Gifs for Stats/Data People In my perfect world, this would be my most popular post. In this one, stats and data gifs are still pretty niche.
  5. Using Data to Fight Fraud: the Carlisle Method Doing a follow up to this post is definitely on my 2018 to do list.
  6. 5 Things You Should Know About Orchestras and Blind Auditions I am glad to know that there are people in this world who were as interested in the study behind the anecdote as I was.
  7. A Loss for so Many A post I wish I never had to write, but one I was happy got shared.
  8. 5 Things You Should Know About the Backfire Effect With political polarization all around us,
    this post may be one of the more important ones I did this year….how to actually convince people of facts when they don’t agree with you.
  9. How You Ask the Question: NFL Edition More examples of how asking questions differently can generate different responses. This one produced some debate on other blogs about whether or not this was really an example of “the same question” or whether it was an example of slight wording changes changing the question entirely, but regardless I think it’s a good example of how little word choices make a difference.
  10. Perfect Metric Fallacy This post probably was one of my most gratifying, as it got passed around an office that was going through this exact issue. Hearing people’s reactions made me laugh.

Here’s to 2018!

 

5 Things About Personality and Cold Weather

As I mentioned on Sunday, I’ve been itching to do a deep dive in to this new paper about how people who grow up in cold regions tend to have different personalities than those who don’t. As someone who grew up in the New England area, it’s pretty striking to me how every warmer weather city in the US seems more outgoing than what I’m used to. Still, despite my initial belief I was curious how one goes about proving that people in cold-weather cities are less agreeable. While the overall strategy is pretty simple (give personality tests to different people in different climates, compare answers) I figured there’d likely be some interesting nuance I’d be interested in.

Now that I’ve finally read the paper, here’s what I found out:

  1. To make the findings more broadly applicable, study multiple countries One of the first things I noticed when I pulled up the paper is that there were a surprising number of Chinese names among the author list. I had assumed this was just a US based study, but it turns out it was actually a cross-cultural study using both the US and China for data sets. This makes the findings much stronger than they would be otherwise.
  2. There are 3 possible mechanisms for climate effecting personality I’ve talked about the rules for proving causality before, and the authors wasted no time in introducing a potential mechanism to explain a cold weather/agreeableness link. There are three main theories: people in cold weather were more likely to be herders which requires less cooperation than farming or fishing, people in cold weather are more susceptible to pathogens so they unconsciously avoid each other, and people may migrate to areas that fit their (group) personalities. Thus, it’s possible that the cold doesn’t make people disagreeable, but rather that disagreeable people move to cold climates. Insert joke about Bostonians here.
  3. The personality difference were actually present for every one of the Big 5 traits. Interestingly, every one of the Big 5 personality traits was higher in those who lived in nicer climates: extraversion, agreeableness, openness to new experience, conscientiousness and emotional stability. The difference in agreeableness was not statistically significant for the Chinese group. Here are the differences, along with what variables appear to have made a difference (note: “temperature clemency” means how far off the average temperature is from  72 degrees):
  4. Reverse causality was controlled for One of the interesting things about the findings is that the authors decided to control for the factors listed in #2 to determine what was causing what. They specifically asked people about where they grew up to control for selective (adult) migration, and in the Chinese part of the study actually asked about prior generations as well. They controlled for things like influenza incidence (as a proxy for pathogen presence) as well. Given that the finding persisted after these controls, it seems more likely that weather causes these other factors.
  5. Only cold climates were examined One of the more interesting parts of this to me is what wasn’t studied: uncomfortably warm temperatures. Both China and the US are more temperate to the south and colder to the north. The “temperature clemency” variable looked specifically at temperatures that deviated from 72 degrees, but only in the low temperature direction. It would be interesting to see what unreasonably hot temperatures did to personalities….is it a linear effect? Do some personality traits drop off again? I’d be curious.

Overall I thought this was an interesting study. I always appreciate it when multiple cultures are considered, and I thought the findings seemed pretty robust. Within the paper and in the notes at the end, the authors repeatedly mentioned that they tried most of the calculations a few different ways to make sure that their findings were robust and didn’t collapse with minor changes. That’s a great step in the right direction for all studies. Stay warm everyone!

What I Wish I Was Reading: December 2017

With guests at the house, a sick kiddo and snow in the forecast, I have had no time to read this new paper on how regional temperature affects population agreeableness. I will be doing so soon however, because as someone who’s heard a lot about how unfriendly Boston is I’d like some validation for my go to “we’re rude because we’re cold” excuse.

Funny story: when my out of town guests picked up their (4 wheel drive) rental car, the lady behind the counter mocked them and said “expecting some snow or something”? When they got to my house and we confirmed that there is actually snow in the forecast, they wondered why she was so condescending about it. We explained that for Bostonians, a forecast of 4-6 inches over 20 hours isn’t really “snow”. They informed me that in Seattle, they’d be calling out the National Guard.

Also, my sister-in-law (married to my teacher/farmer brother) has informed me her new parenting slogan is “There’s no such thing as bad weather, only bad clothes” we she apparently got from this book of the same name. I like this theory. It goes nicely with my adulthood slogan of “There’s no such thing as strong coffee, only weak people.”

I hope to have a review of the paper up on Wednesday this week, stay tuned.

The Assistant Village Idiot linked  to this article (via Lelia) about those with no visual memory. I’ve been pondering this as I’m pretty sure my visual memory has some gaps.  I can’t read facial expressions baseline, and one of my recurring stress nightmares is being handed documents/books that I recognize but can’t decipher the text. I feel something’s related here, but I have to reread the article before I comment further.

Also, I know I always chide people to read behind the headline, but this headline’s so good I’m pretty sure I’ll love it when I finally get to read it: 5 Sport Science Studies that “Failed”. The author specifically took note of studies he saw that asked interesting questions and got negative results. He wanted to talk about this to fight the impression that the only interesting findings were positive findings.

Work Hours and Denominators

I was talking to a few folks about work recently, and an interesting issue came up that I’d never particularly thought about before. I’ve mentioned in the past the power of denominators to radically change what we’re looking at, how averages might be lying to you, and how often people misreport “hours worked”…..but I don’t know that I’d ever put all 3 together.

When answering the question “how many hours do you work per day”, most full time workers generally name a number between 8 and 10 hours a day. Taken literally though, it occurred to me that the answer is really somewhere between 6 and 7 hours on average, as most people aren’t working on the weekends. So basically when asked “average hours per day” we assume “average hours per working day” and answer accordingly.

This is a minor thing, but it actually is part of why the actual “hours worked” numbers and the reported “hours worked” numbers don’t always add up. When people try to figure out how much the average American is working, they take things like vacation/holiday weeks in to account. The rest of us don’t tend to do that, and instead report on how much we worked when we worked full weeks. A slight shift in denominator, but it can end up shifting the results by a decent amount.

Not the most groundbreaking insight ever, but I always get a little interested when I realize we’ve all been assuming a denominator that’s not explicitly clarified.

I’m Thinking of a Word That Starts With a…..

I’ve mentioned before that I like to try to find unusual ways of teaching my 5 year old son statistical concepts by relating them to things he likes. This pretty much doesn’t work, but this week I tried it again and attempted to use a discussion about letters to segue in to a discussion about perception vs data. He’s getting in to some reading fundamentals now, and is incredibly curious about what words start with which letters. This leads to our new favorite game “Let’s talk about ____ words!” where we name a letter and then just think of as many words as we can that start with that letter.

This game is fun, but he’s a little annoyed at letters that make more than one sound. This week he got particularly irritated at the letter “c”, which he felt was hogging all the words while leaving “k” and “s” with none. I started trying to explain to him that “s” in particular was doing pretty alright for itself, but after discussing “cereal” and “circus” he was pretty convinced that “s” was in trouble.

As I was defending the English language’s treatment of the letter “s”, I started to wonder what the most common first letter of words actually was. I also wondered if it was different for “kids words” vs “all words”. After some poking around on the internet, I discovered that there’s a decent amount of variation depending on what word list you go with. I decided to take a look at three lists:

  1. All unique words appearing more than 100,000 times in all books on Google ngrams (Note: I had to go to the original file here. The list they provide on that site and the Wiki page is actually the most common first letters for all words used, not just unique words. That’s why “t” is the most common….it’s counting every instance of “the” separately)
  2. The 1,000 most commonly used English language words (of Up-Goer 5 fame)
  3. The Dolch sight words list, used to teach kids to read

Comparing the percent of words starting with each letter on each list got me this graph:

As I suspected, “s” does quite well for itself across the board, though it really shines in the “core words” list. “K” on the other hand is definitely being left out. It’s interesting to see what letters do well in bigger word sets (like c, p and m), and which ones are only in the smaller sets (b, t, o and w). “W” seems very popular for early reading lists because of words like “what”, “where”, “why”. “S” actually is really interesting, as it appears to kick off lots of common-but-not-basic words. My guess is this is because of its participation in letter combinations like “sh” and “sch”.

Anyway, my son didn’t really seem to grasp the “the plural of anecdote is not data” lesson, so I pointed out to him that both “Spiderman” and “superhero” started with “S”. At that point he agreed that yes, lots of words started with “s” and went back to feeling bad for “K”. At least that we can agree upon.

Now please enjoy my favorite Sesame Street alphabet song ever: ABCs of the Swamp

Cornell Food and Brand Lab: an Update

After mentioning the embattled Brian Wansink and the Cornell Food and Brand Lab in my post last week, a friend passed along the most recent update on this story. Interestingly it appears Buzzfeed is the news outlet doing the most reporting on this story as it continues to develop.

A quick review:
Brian Wansink is the head of the Cornell Food and Brand Lab, which publishes all sorts of interesting research about the psychology behind eating and how we process information on health. Even if you’ve never heard his name, you may have heard about his work….studies like “people eat more soup if their bowl refills so they don’t know how much they’ve eaten”  or “kids eat more vegetables when they have fun names” tend to be from his lab.

About a year ago, he published a blog post where he praised one of his grad students for taking a data set that didn’t really show much and turning it in to 5 publishable papers.  This turned in to an enormous scandal as many people quickly pointed out that a feat like that almost certainly involved lots of data tricks that would make the papers results very likely to be false.  As the scrutiny went up, things got worse as now people were pouring over his previous work.

Not only did this throw Wansink’s work in to question, but a lot of people (myself included) who had used his work in their work now had to figure out whether or not to retract or update what they had written. Ugh.

So where are we now?
Well as I mentioned, Buzzfeed has been making sure this doesn’t drop. In September, they reported that the aforementioned “veggie with fun names” study had a lot of problems. Worse yet, Wansink couldn’t produce the data when asked.   What was incredibly concerning is that this particular paper is part of a program Wansink was piloting for school lunches. With his work under scrutiny, over $20 million in research and training grants may have gone towards strategies that may not actually be effective. To be clear, the “fun veggie name study” wasn’t the only part of this program, but it’s definitely not encouraging to find out that parts of it are so shaky.

To make things even worse, they are now reporting that several of his papers that allegedly were done on three different topics in three different years sent to three different sample populations show the exact same number of survey respondents: 770. Those papers are being reviewed.

Finally, the report he has a 4th paper being retracted, this one on WWII veterans and cooking habits. An interview with the researcher who helped highlight the issues with the paper is up here at Retraction Watch, and some of the problems with the paper are pretty amazing. When asked where he first noted problems, he said: “First there is the claim that only 80% of people who saw heavy, repeated combat during WW2 were male.”  Yeah, that seems a little off. Wansink has responded to the Buzzfeed report to say that this was due to a spreadsheet error.

Overall, the implications of this are going to be felt for a while. While only 4 papers have been retracted so far, Buzzfeed reports that 8 more have planned corrections, and over 50 are being looked at. With such a prolific lab and results that are used in so many places, this story could go on for years. I appreciate the journalists keeping up on this story as it’s an incredibly important cautionary tale for members of the scientific community and the public alike.

Food Insecurity: A Semester in Review

I mentioned a few months back that I was working on my capstone project for my degree this semester. I’ve mostly finished it up (just adjusting some formatting), so I thought it would be a good moment to post on my project and some of my findings. Since I have to present this all in a week or two, it’s a good moment to gather my thoughts as well.

Background:
The American Time Use Survey is a national survey carried out by the Bureau of Labor Statistics that surveys Americans about how they spend their time. From 2014-2016 they administered a survey module that asked specifically about health status and behaviors. They make the questionnaire and data files publicly available here.

What interested me about this data set is that they asked specifically about food insecurity….i.e. “Which of the following statements best describes the amount of food eaten in your household in the last 30 days – enough food to eat, sometimes not enough to eat, or often not enough to eat?” Based on that data, I was able to compare those who were food secure (those who said “I had enough food to eat”) vs the food insecure (those who said they “sometimes” or “frequently” did not have enough to eat.

This is an interesting comparison to make, because there’s some evidence that in the US these two groups don’t always look like what you’d expect. Previous work has found that people who report they are food insecure actually tend to weigh more than those who are food secure. I broke my research down in to three categories:

  1. Confirmation of BMI differences
  2. Comparison of health habits between food secure and food insecure people
  3. Correlation of specific behaviors with BMI within the food insecure group

Here’s what I found:

Confirmation of BMI differences:
Yes, the paradox is true for this data set. Those who were “sometimes” or “frequently” food insecure were almost 2 BMI points heavier than those who were food secure…around 10-15 pounds for most height ranges. Level of food insecurity didn’t seem to matter, and the effect persisted even after controlling for public assistance and income.

Interestingly, my professor asked me if the BMI difference was due more to food insecure people being shorter (indicating a possible nutritional deficiency) or from being heavier, and it turns out it’s both. The food insecure group was about an inch shorter and 8 lbs heavier than the food secure group.

Differences in health behaviors or status:
Given my sample size (over 20,000), most of the questions they asked ended up having statistically significant differences. The ones that seemed to be both practically and statistically significant were:

  1. Health status People who were food insecure were WAY more likely to say they were in poor health. This isn’t terribly surprising since disability would impact people’s assessment of their health status and ability to work/earn a living.
  2. Shopping habits While most people from both groups did their grocery shopping at grocery stores, food insecure people were more likely to use other stores like “supercenters” (i.e. Walmart or Target) and convenience stores or “other” types of stores. Food secure people were more likely to use places like Costco or Sam’s Club. Unsurprisingly, people who were food insecure were much more likely to say they selected their stores based on the prices. My brother had asked specifically up front if “food deserts” were an issue, so I did note that the two groups answered “location” was a factor in their shopping at equal rates.
  3. Soda consumption Food insecure people were much more likely to have drank soda in the last 7 days (50% vs 38%) and much less likely to say it was a diet soda (40% vs 21.5%) than the food secure group.
  4. Exercise Food insecure people were much less likely to have exercised in the last 7 days (50.5%) than food secure people were (63.9%). Given the health status ranking, this doesn’t seem surprising.
  5. Food shopping/preparation Food insecure people were much more likely to be the primary food shopper and preparer. This makes sense when you consider that food insecurity is a self reported metric. If you’re the one looking at the bills, you’re probably more likely to feel insecure than if you’re not. Other researchers have noted that many food stamp recipients will also cut their own intake to make sure their children have enough food.

Yes, I have confidence intervals for all of these, but I’m sparing you.

BMI correlation within the food insecure group:
Taking just the group that said they were food insecure, I then took a look at which factors were most associated with higher BMIs. These were:

  1. Time spent eating Interestingly, increased time spent eating was actually associated with lower BMIs. This may indicate that people who can plan regular meal times might be healthier than those eating while doing other things (the survey asked about both).
  2. Drinking beverages other than water Those who regularly drank beverages other than water were heavier than those who didn’t
  3. Lack of exercise No shock here
  4. Poor health The worse the self assessed health, the higher the BMI. It’s hard to tease out the correlation/causation here. Are people in bad health due to an obesity related illness (like diabetes) or are they obese because they have an issue that makes it hard for them to move (like a back injury)? Regardless, this correlation was QUITE strong: people in “excellent” health had BMIs almost 5 points lower than those in “poor” health.
  5. Being the primary shopper I’m not clear on why this association exists, but primary shoppers were 2 BMI points heavier than those that shared shopping duties.
  6. Public assistance  Those who were food insecure AND received public assistance were heavier than those who were just food insecure.

It should be noted that I did nothing to establish causality here, everything reported is just an association. Additionally, it’s interesting to note a few things that didn’t show up here: fast food consumption, shopping locations and snacking all didn’t make much of a difference.

While none of this is definitive, I thought it was an interesting exploration in to the topic. I have like 30 pages of this stuff, so I can definitely clarify anything I didn’t go in to. Now to put my presentation together and be done with this!