The Real Dunning-Kruger Graph

I’m off camping this weekend, so you’re getting a short but important PSA.

If you’ve hung out on the internet for any length of time or in circles that talk about psych/cognitive biases a lot, you’ve likely heard of the Dunning-Kruger effect. Defined by Wiki as “a cognitive bias wherein persons of low ability suffer from illusory superiority, mistakenly assessing their cognitive ability as greater than it is.”, it’s often cited to explain why people who know very little about something get so confident in their ignorance.

Recently, I’ve seen a few references to it accompanied by a graph that looks like this (one example here):

While that chart is rather funny, you should keep in mind it doesn’t really reflect the graphs Dunning and Kruger actually obtained in their study. There were 4 graphs in that study (each one from a slightly different version of the study) and they looked like this:


Logic and reasoning (first of two):


And one more logic and reasoning (performed under different conditions):

So based on the actual graphs, Dunning and Kruger did not find that the lowest quartile thought they did better than the highest quartile, they found that they just thought they were more average than they actually were. Additionally it appears the 3rd quartile (above average but not quite the top), is the group most likely to be clearsighted about their own performance.

Also, in terms of generalizability, it should be noted that the participants in this study were all Cornell undergrads being ranked against each other. Those bottom quartile kids for the grammar graph are almost certainly not bottom quartile in comparison to the general population, so their overconfidence likely has at least some basis.  It’s a little like if I asked readers of this blog to rank their math skills against other readers of this blog….even the bottom of the pack is probably above average. When you’re in a self selected group like that,  your ranking mistakes may be more due to a misjudging of those around you as opposed to just an overconfidence in yourself.

I don’t mean to suggest the phenomena isn’t real (follow up studies suggest it is), but it’s worth keeping in mind that the effect is more “subpar people thinking they’re middle of the pack” than “ignorant people thinking they’re experts”. For more interesting analysis, see here, and remember that graphs drawn in MS Paint rarely reflect actual published work.


5 Things You Should Know About Orchestras and Blind Auditions

Unless you were going completely off the grid this week, you probably heard about the now-infamous “Google memo“.  Written by a (since fired) 28 year old software engineer at Google, the memo is a ten page long document where the author lays out his beliefs about why gender gaps in tech fields continue to exist. While the author did not succeed in getting any policies at Google changed, he did manage to kick off an avalanche of hot takes examining whether the gender/tech gap is due to nature (population level differences in interests/aptitude) or nurture (embedded social structures that make women unwelcome in certain spaces). I have no particular interest in adding another take to the pile, but I did see a few references to the “blind orchestra auditions study” that reminded me I had been wanting to write about that one for a while, to deep dive in to a few things it did or did not say.

For those of you who don’t know what I’m talking about, here’s the run down: back in the 1970s, top orchestras in the US were 5% female. By the year 2000, the were up to almost 30% female. Part of the reason for the change was the introduction of “blind auditions”, where the people who were holding tryouts couldn’t see the identity of the person trying out. This finding normally gets presented without a lot of context, but it’s good to note someone actually did decided to study this phenomena to see if the two things really were related or not. They got their hands on all of the tryout data for quite a few major orchestras (they declined to name which ones, as it was part of the agreement of getting the data) and tracked what happened to individual musicians as they tried out. This led to a data set that had overall population trends, but also could be used to track individuals. You can download the study here, but these are my highlights:

  1. Orchestras are a good place to measure changing gender proportions, because orchestra jobs don’t change. Okay, first up is an interesting “control your variables” moment. One of the things I didn’t realize about orchestras (though may be should have) is that the top ones have not changed in size or composition in years. So basically, if you suddenly are seeing more women, you know it’s because the proportion of women overall is increasing across many instruments. In the words of the authors ” An increase in the number of women from, say, 1 to 10, cannot arise because the number of harpists (a female-dominated instrument), has greatly expanded. It must be because the proportion female within many groups has increased.”
  2. Blind auditions weren’t necessarily implemented to cut down on sexism. Since this study is so often cited in the context of sexism and bias, I had not actually ever read why blind auditions were implemented in the first place. Interestingly, according to the paper written about it, the actual initial concern was nepotism. Basically, orchestras were filled with their conductors students, and other potentially better players were shut out. When they opened the auditions up further, they discovered that when people could see who was auditioning, they still showed preferential treatment based on resume. This is when they decided to blind the audition, to make sure that all preconceived notions were controlled for. The study authors chose to focus on the impact this had on women (in their words) “Because we are able to identify sex, but no other characteristics for a large sample, we focus on the impact of the screen on the employment of women.”
  3. Blinding can help women out Okay, so first up, the most often reported findings: blind auditions appear to account for about 25% of the increase in women in major orchestras. When they studied individual musicians, they found that women who tried out in blind and non-blind auditions were more successful in the blinded auditions. They also found that having a blind final round increased the chances a woman was picked by about 33%. This is what normally gets reported, and it is a correct reporting of the findings.
  4. Blinding doesn’t always help women out One of the more interesting findings of the study that I have not often seen reported: overall, women did worse in the blinded auditions. As I mentioned up front, the study authors had the data for groups and for individuals, and the findings from #3 were pulled from the individual data. When you look at the group data, we actually see the opposite effect. The study authors suggest one possible explanation for this: adopting a “blind” process dropped the quality of the female candidates. This makes a certain amount of sense. If you sense you are a borderline candidate, but also think there may be some bias against you, you would be more likely to put your time in to an audition where you knew the bias factor would be taken out. Still, that result interested me.
  5. The effects of blinding can depend on the point in the process Even after controlling for all sorts of factors, the study authors did find that bias was not equally present in all moments. For example, they found that blind auditions seemed to help women most in preliminary and final rounds, but it actually hurt them in the semi-final rounds. This would make a certain amount of sense….presumably people doing the judging may be using different criteria in each round, and some of those may be biased in different ways than others. Assuming that all parts of the process work the same way is probably a bad assumption to make.

Overall, while the study is potentially outdated (from 2001…using data from 1950s-1990s), I do think it’s an interesting frame of reference for some of our current debates. One article I read about it talked about the benefit of industries figuring out how to blind parts of their interview process because it gets them to consider all sorts of different people….including those lacking traditional educational requirements. With many industries dominated by those who went to exclusive schools, hiding identity could have some unexpected benefits for all sorts of people. However, as this study also shows, it’s probably a good idea to keep the limitations of this sort of blinding in mind. Even established bias is not a consistent force that produces identical outcomes at all time points, and any measure you institute can quickly become a target that changes behavior.  Regardless, I think blinding is a good thing. All of us have our own pitfalls, and we all might be a little better off if we see our expectations toppled occasionally.

5 Nutrition and/or Food Research Blogs I Like to Read

I’ve written a bit here over the years about nutrition research and my own general journey with weight management, but I realized I’ve only really referred in passing to the people who I read when I want to catch up on the field. I figured this was a pretty good time to do a post on that.

  1. For all things vegan: Anyone who followed my old old blog knows that I actually spent several years as a vegan. I eventually gave it up, but I still like to read up what’s going on in the world of plant based nutrition. Ginny Messina (aka the Vegan RD) is a registered dietitian who is a vegan primarily for ethical reasons. As such, she uses her dietitian training to help vegans be as healthy as possible, while also helping lead the charge for veganism to be more evidenced based when they stray out of ethics and in to nutrition claims. She writes critiques of other vegans work if she feels they overstate the evidence, and she even coauthored a book called “Even Vegans Die“. Overall a pretty awesome example of someone who advocates for a particular diet while also adhering to evidence.
  2. For the ancestral health crowd: If you’re paleo or just interested in how our evolutionary history influences how we think about food, Stephan Guyenet is a must read. A neuroscientist who specializes in obesity related research, his research focus is on why we overeat and what we can do about it. His book The Hungry Brain is one of the most well balanced science based nutrition books I’ve ever read, and has received quite a bit of praise for being honest and evidence based.
  3. For deep dives in to the science: There are not many bloggers that I read that make me go “holy crap did this person dig deep in to this paper”, but CarbSane is one blogger who gets that reaction from me on nearly every post. She doesn’t just read papers and give you the gist, she posts tables, cites other literature, and is basically a blog equivalent of a nutritional biochemistry class. She is probably the person most responsible for making me aware of the problem of misreprecitation in nutrition science, because she has the patience, knowledge and wherewithal to figure out exactly what commonly cited papers do and do not say. Oh, and she’s lost over 100 lbs too, so she actually has a good eye for what is and isn’t useful for real people to know. For a taste of what she does, try her posts on the “Biggest Loser Regain Study” that made headlines.
  4. For weight management and health policy: There’s really a bit of a tie here, as I really like both Yoni Freedhoff’s Weighty Matters blog and Darya Rose’s Summer Tomato for this topic.  Freedhoff is a Canadian MD who runs a weight loss center, and he blogs from the medical/health policy perspective. His book “The Diet Fix” covers evidence based ways of making any diet more effective, and he encourages people to take the approach (vegan, low carb, paleo, etc etc) that they enjoy the most. Darya Rose is a neuroscientist who also gives advice about how to make your “healthstyle” more practical and easier to stick to,  and her book “The Foodist” is on the same topic. I like them because they both continuously emphasize that anything too difficult or complicated is ultimately going to be tough to maintain. It’s all about making things easier on yourself.
  5. For those in the greater New Hampshire area: Okay, this ones pretty region specific, but the Co-op News blog from the Hanover Co-op has a special place in my heart. An online version of a newsletter that’s been going since 1936, it features frequent posts from my (dietitian) cousin and my (incredible chef) uncle. It’s a great place to check out if you need advice on anything from using up summer vegetables to figuring out if macaroni and cheese is okay to eat. It also serves to remind me that I should invite myself over to their house more often. That food looks awesome.

Bonus round: if your looking for some one off reads, this week I read this takedown of the science in the vegan documentary “What the Health” and enjoyed it. I also liked this paper that reviewed the (now infamous) Ancel Keys “7 Countries Study” and shed some light on what the original study did and did not say.

Of course if you have a favorite resource, I’d love to hear it!

Follow Up Gazette – the Science Section

James over at “I Don’t Know, But” had a brilliant idea this week for a journal called “The Follow Up Gazette” (motto: all the things we found out later), that would re-report the news after all the facts were in. His examples were mostly local news, but I would like to throw my hat in the ring to be the editor of the science section. James is of course fully capable of this job himself, but DAMN do I want to do something like this. Let me help James, please.

I’ve been  thinking a lot about this topic, as I had yet another run in with a TED talk recently. We got a question at work from a prospective bone marrow donor asking if we were using a particular collection device in our harvests. None of us had ever even heard of this device, and we were all rather confused where she had gotten her information. A quick google search gave us the answer: there is a TED talk from 8 years ago explaining the device and promising to revolutionize the way marrow harvests are done. Investigating further, we discovered that while this device had gained FDA approval for use in humans,  we couldn’t find any research in humans proving its efficacy, or really any mention of it in the literature past 2009. It’s clear something didn’t go quite as planned, though I’ll be damned if I can find a publicly available record of what. Calling around various people in the field confirmed that no one was using it and that it was not being actively marketed, but we found very few details as to why.

This got me thinking: how would the content of TED talks change if everyone who gave one was required to give an update 5 or 10 years later?

This may seem like a minor point, but I do think it skews our view of science and development of products to hear all of the hype and none of the follow up. Seeing the headline “Brand new drug promises 5 years of extra life for people with horrible disease” juxtaposed with “Actually in practice it was only like 3 months” might help temper our expectations. Alternatively, it may yield that some things were actually shown to be better/safer/whatever than actually thought. My mother recently mentioned that she saw a beautiful house built under power lines, and it hit her that she hadn’t heard a “living under power lines is unhealthy” reference in years. She mentioned that she  assumed that meant that the evidence had shown otherwise, and indeed it has. The Follow-Up Gazette science section would address both sides of this coin, the over hype and the fear mongering. Ideally this would not only educate people in how to consume media, but also encourage media to be slightly more circumspect in their reporting.

James: consider this my application, and thank you in advance for your consideration.



A few weeks ago, I wrote a post about a phenomena I had started seeing that I ended up dubbing premature expostulation. I defined this phenomena as “The act of claiming definitively that a person, group or media outlet has not reported on, responded to or comment on an event or topic, without first establishing whether or not this is true. ” Since writing that post, I have been seeing mention of a related phenomena that I felt was distinct enough to merit its own term. In this version, you actually have checked to see what various sources say, enough that you cite them directly, but you misrepresent what they actually say anyway. More formally, we have:

Misreprecitation: The act of directly citing a piece of work  to support your argument, when even a cursory reading of the original work shows it does not actually support your argument.

Now this does not necessarily have to be done with nefarious motives, but it is hard to think of a scenario in which this isn’t incredibly sketchy. Where premature expostulation is mostly due to knee jerk reactions, vagueness and a failure to do basic fact checking, misreprecitation requires a bit more thought and planning. In some cases it appears to be a pretty direct attempt to mislead, in others it may be due to copying someone else’s interpretation without checking it out yourself, but its never good for your argument.

Need some examples? Let’s go!

The example that actually made me think of this was the recent kerfluffle over Nancy MacLean’s book “Democracy in Chains”. Initially met by praise as a leftist take down of right wing economic thought, the book quickly got embroiled in controversy when (as far as I can tell) actual right wing thinkers started reading it. At that point several of them who were familiar with the source material noted that quotes were chopped up in ways that dramatically changed the meaning, and other contextual problems. You can read a pretty comprehensive list of issues here, and overview of the problems and links to all the various responses here, and Vox’s (none to flattering) take here. None of it makes MacLean look particularly good, most specifically because this was supposed to be a scholarly work. When your citations are your strong point, your citations better be correct.

I’ve also seen this happen quite a bit with books that endorse popular diets. Carbsane put together a list of issues in the citations of the low carb book “Big Fat Surprise”, and others have found issues with vegan promoting books. While some of these seem to be differences in interpretation of evidence, some are a little sketchier. Now, as with premature expostulation, some of these issues don’t change the fundamental point….but some do. Overall a citation avalanche is no good if it turns out you had to tweak the truth to get there.

I think there’s three things that cause a particularly fertile breeding ground for misreprecitation: 1) an audience who is sympathetic to your conclusions and 2) an audience who is unlikely to be familiar with the source documents 3) difficulty accessing source documents. That last point may be why books are particularly prone to this error, since you’d have to actually put the book down and go look up a reference. This also may be a case where blogs have the accuracy advantage due to being so public. I know plenty of people who read blogs they don’t agree with, but I know fewer who would buy a whole book dedicated to discrediting their ideas. That increases the chances that no critical person will read your book, they have less recourse once they do read it (notes in the margin aren’t as good as a comments section), and it’s harder for anyone to fact check. Not saying bloggers can’t do it, just thinking they’d be called on it faster.

Overall it’s a pretty ridiculous little trick, as the entire point of citing others work should be to strengthen your argument. In the best case scenario, people could be confused because they misread/failed to understand/copied an interpenetration of the work they read someone else make. In the worst case scenario, they know what they are doing and are counting on their in-group not actually checking their work. Regardless, it needed a name, and now it has one.

Short Takes: Gerrymandering, Effect Sizes, Race Times and More

I seem to have a lot of articles piling up that I have something to say about, but not enough for a full post. Here’s 4 short takes on 4 current items:

Did You Hear the One About the Hungry Judges?
The AVI sent me an article this week about a hungry judge study I’ve heard referenced multiple times in the context of willpower and food articles. Basically, the study shows that judges rule in favor of prisoners requesting parole 65% of the time at the beginning of the day and 0% of the time right before lunch. The common interpretation is that we are so driven by biological forces that we override our higher order functioning when they’re compromised. The article rounds up some of the criticisms of the paper, and makes a few of its own…namely that an effect size that large could never have gone unnoticed. It’s another good example of “this psychological effect is so subtle we needed research to tease it out, but so large that it noticeably impacts everything we do” type research, and that should always raise an eyebrow. Statistically, the difference in rulings is as profound as the difference between male and female height. The point is, everyone would know this already if it were true. So what happened here? Well,this PNAS paper covers it nicely but here’s the short version: 1) the study was done in Israel  2) This court does parole hearings by prison, 3 prisons a day with a break in between each 3) prisoners who have legal counsel go first 4) lawyers often represent multiple people, and they chose the order of their own cases 5) the original authors lumped “case deferred” and “parole denied” together as one category. So basically the cases are roughly ordered from best to worst up front, and each break starts the process over again. Kinda makes the results look a little less impressive, huh?

On Inter-Country Generalization and Street Harassment
I can’t remember who suggested it, but I saw someone recently suggest that biology or nutrition papers in PubMed or other journal listings should have to include a little icon/picture at the top that indicated what animal the study was done on. They were attempting to combat the whole “Chemical X causes cancer!” hoopla that arises when we’re overdosing mice on something. I would like to suggest we actually do the same thing with countries, maybe use their flags or something. Much like with the study above, I think tipping people off that we can’t make assumptions things are working the same way they work in the US or whatever country you hail from. I was thinking about that when I saw this article from Slate with the headline “Do Women Like Being Sexually Harassed? Men in a New Survey Say Yes“. The survey has some disturbing statistics about how often men admit to harassing or groping women on the street (31-64%) and why they do it (90% say “it’s fun”), but it’s important to note it surveyed men exclusively in the Middle East and Northern Africa. Among the 4 countries, results and attitudes varied quite a bit, making it pretty certain that there’s a lot of cultural variability at play here. While I thought the neutral headline was a little misleading on this point, the author gets some points for illustrating the story with signs (in Arabic) from a street harassment protest in Cairo. I only hope other stories reporting surveys from other countries do the same.

Gerrymandering Update: Independent Commissions May Not be That Great (or Computer Models Need More Validating)
In  my last post about gerrymandering, I mentioned that some computer models showed that independent commissions did a much better job of redrawing districts than state legislatures did. Yet another computer model is disputing this idea, showing that they aren’t. To be honest I didn’t read the working paper here and I’m a little unclear over what they compared to what, but it may lend credibility to the Assistant Village Idiot’s comment that those drawing district maps may be grouping together similar types of people rather than focusing on political party. That’s the sort of thing that humans of all sorts would do naturally and computers would call biased. Clearly we need a few more checks here.

Runner Update: They’re still slow and my treadmill is wrong
As an update to my marathon times post, I recently got sent this websites report that  showed that US runners for all distances are getting slower. They sliced and diced the data a bit and found some interesting patterns: men are slowing down more than women and slower runners are getting even slower. However, even the fastest runners have slowed down about 10% in the last two decades. They pose a few possible reasons: increased obesity in the general population, elite runners avoiding races due to the large numbers of slower runners, or in general leaving to do ultras/trail races/other activities. On a only tangentially related  plus side, I thought I was seriously slowing down in my running until I discovered that my treadmill was incorrectly calibrated to the tune of over 2 min/mile.  Yay for data errors in the right direction.



A Pie Chart Smorgasbord

This past week I was complaining about pie charts to a friend of mine, and I was trying to locate this image to show what I was complaining about:


I knew I had come across this on Twitter, and in finding the original thread, I ALSO discovered all sorts of people defending/belittling the lowly pie chart. Now I generally fall in the anti-pie chart camp, but these made me happy. I sourced what I could find a source on, but will update if anyone knows who I should credit.

First, we have the first and best use for a pie chart:

No other chart represents that data set quite as well.

Sometimes though, you feel like people are just using them to mess with you:


Sometimes the information they convey can be surprising:

But sometimes the conclusions are just kind of obvious:

And you have to know how to use them correctly:

They’re not all useless, there are some dramatic exceptions:

If you want more on pie charts, try these 16, uh, creative combinations, or read why they’re just the worst here.