# 5 Ways that Average Might Be Lying to You

One of the very first lessons every statistics students learns in class is how to use measures of central tendency to assess data. While in theory this means most people should have at least a passing familiarity with the terms “average” or “mean, median and mode”, the reality is often quite different. For whatever reason, when presented with a statement about your average we seem to forget the profound vulnerabilities of the “average”. Here’s some of the most common:

1. Leaving a relevant confounder out of your calculations Okay, so maybe we can never get rid of all the confounders we should, but that doesn’t mean we can’t try at least a little. The most commonly quoted statistic I hear that leaves out relevant confounders is the “Women make 77 cents  for every dollar a man earns” claim.  Now this is a true statement IF you are comparing all men in the US to all women in the US, but it gets more complicated if you want to compare male/female pay by hours worked or within occupations. Of course “occupation and hours worked” are two things most people actually tend to assume are included in the original statistic, but they are not. The whole calculation can get really tricky (Politifact has a good breakdown here), but I have heard MANY people tag “for the exact same work” on to that sentence without missing a beat. Again, it’s not possible to control for every confounder, but your first thought when you hear a comparison of averages should be to make sure your assumptions about the conditions are accurate.
2. A subset of the population could be influencing the value of the whole population. Most people are at least somewhat familiar with the idea of outlier type values and “if Bill Gates walks in to a bar, the average income goes way up” type issues. What we less often consider is how different groups being included/excluded from a calculation can influence things. For example, in the US we are legally required to educate all children through high school. The US often does not do well when it comes to international testing results. However in this review by the Economic Policy Institute, they note that in some of the countries (Germany and Poland for example) certain students are assigned to a “vocational track” quite early and may not end up getting tested at all. Since those children likely got put on that track because they weren’t good test takers, the average scores go up simply by removing the lowest performers. We saw a similar phenomena within the US when more kids started taking the SATs. While previous generations bemoaned the lower SAT scores of “kids these days” the truth was those were being influenced by expanding the pool of test takers to include a broader range of students. Is that the whole explanation? Maybe not, but it’s worth keeping in mind.
3. The values could be bimodal (or another non-standard distribution) One of my first survey consulting gigs consisted of taking a look at some conference attendee survey data to try and figure out what the most popular sessions/speakers were. One of the conference organizers asked me if he could just get a list of the sessions with the highest average ranking. That sounded reasonable, but I wasn’t sure that was what they really wanted. You see, this organization actually kind of prided itself on challenging people and could be a little controversial. I was fairly sure that they’d feel very differently about a session that had been ranked mostly 1’s and 10’s, as opposed to a session that had gotten all 5’s and 6’s. To distill the data to a simple average would be to lose a tremendous amount of information about the actual distribution of the ratings. It’s like asking how tall the average human is…..you get some information, but lose a lot in the process. Neither the mean or median account for this.
4. The standard deviations could be different Look, I get why people don’t always report on standard deviations….the phrase itself probably causes you to lose at least 10% of readers automatically. However, just because two data sets have the same average doesn’t mean the members of those groups look the same. In #3 I was referring to those groups that have two distinct peaks on either side of the average, but even less dramatic spreads can cause the reality to look very different than the average suggests.
5. It could be statistically significant but not practically significant. This one comes up all the time when people report research findings. You find that one group does “more” of something than another. Group A is happier than Group B.  When you read these, it’s important to remember that given a sample size large enough ANY difference can become statistically significant. A good hint this may be an issue is when people don’t tell you the effect size up front. For example, in this widely reported study it was shown that men with attractive wives are more satisfied with their marriages in the first 4 years. The study absolutely found a correlation between attractiveness of the wife and the husband’s marital satisfaction….a gain of .36 in satisfaction (out of a possible 45 points) for every 1 point increase in attractiveness (on a scale of 1 to 10). That’s an interesting academic finding, but probably not something you want to knock yourself out worrying about.

Beware the average.

# 5 Ways to Statistically Analyze Your Fantasy Football League

For the past few years I’ve been playing in a fantasy football league with a few folks I grew up with. One of the highlights of the league is the weekly recap/power rankings sent out by our league commissioner. Recently I had to fill in for him, and it got me thinking about how to use various statistical analysis methods to figure out who the best team was overall and who was doing better as the season progressed. I figured since I put the work in, I might as well put a post together going over what I did.  Also, I’m completely tanking this year, so this gives me something a little more fun to focus on1. Our league is a ten team, head to head match-up, PPR league, for what it’s worth.

1. Mean Comparison Using Tukey’s Method: The first and most obvious question I had when looking at the numbers was who was really better than who to a statistically significant level? ESPN provides a good running total of points scored by each team, but I was curious at what level those differences were statistically significant. The Tukey method lets you calculate a number that shows how far apart average scores have to be before the difference is significant. I had Minitab help me out and got that 36 points was the critical difference in our league at this point in the season. It also gave me this nifty table, with my score and feelings in red:So really there are three distinct groups, each connoted with a different letter. Kyle is showing a bit of spunk here though and rising a bit above the rest, while Heidi is drifting towards the bottom.  I also did this analysis using the scores of each person’s opponents, and despite the perception that some people have gotten lucky, none of the means were significantly different when it came to opponent score.
2. Box Plot or Box and Whisker Diagram: So if the mean comparison gives us a broad picture of the entire season’s performance, with many teams clumped together, how do we tease out some further detail? I decided to use a box plot, because using quartiles and medians rather than averages helps account for fluky games. As anyone who has ever played fantasy sports knows, even the worst team can have a player explode one week…or have normally good players tank completely. Showing the median performance is more informative of how the player is doing week to week, and how likely they are to outscore opponents. Since I did this at week 11, the box represents about 6 games, and each tail represents about 3.

The worst part about this graph is it called my best game an outlier.  Why you gotta be so negative there box plot? What did I ever do to you?

This shows a few interesting things, namely that three players in our league (Ryan, David and JA) have nearly the same median but are having wildly different seasons. It also is one of the clearest ways of putting all the data on one graph. I tried a histogram, and boy did that get messy with 10 different people to keep track of.

3. Regression Lines/Line of Best FitOkay, so now that we have a good picture of the season, let’s see some trends! Because of course fantasy football, like all sports, cares a lot more about where you end than where you start. Players get injured, people have weak benches, people come back from suspensions, etc etc. By fitting a regression line we can see where everyone started and where they’re headed: Now this shows us some interesting patterns. I checked the significance levels on these, and 7 of them actually had significant patterns (my scores, David and Jonathan’s were not significant at the .05 level). This is how I ultimately determined the rankings I sent out. Amusingly, one of our most all over the place players didn’t actually get a linear relationship as the best fitting model. I ignored that, but it made me laugh.
4. Games over League Median (GOLM): This is one I’m working on just for giggles. Basically it’s the number of games each player has played where they scored over the median number of points our league scores. For example, out of the 110 individual performances so far in our league this year, the median score is 133.2 I then calculated the percentage of games each team scored above that number. I was hoping to figure out something a little more accurate than just wins and losses, because of course it doesn’t matter what the league scores…only what your opponent scores. Here’s what I got:I added a line that I will dub “the line of fairness”. Basically, this is where everyone should be based on their scores. If you’re above the line, you’ve actually had a lucky season with more wins then scores over the median. If you’re below the line, you’ve had an unlucky season. On the line is a perfectly fair season. The further away from the line, the more out of range your season has been.
5. Normal Distribution Comparisons: This one isn’t for the overall league, but does give you a good picture of your weekly competition. I wasn’t actually sure I could do this one because I wasn’t sure my data was normally distributed, but Ryan-Joiner assured me that was an okay assumption to make in this case. Basically, I wanted to see what my chances were of beating my opponent (Ryan) this week. I wasn’t expecting much, and I didn’t get it:  I did the math to figure out my exact chances, but gave up when it got too depressing. Let’s just say my chances are rather, um, slim. Svelte even. Sigh.

So that’s that! Got any interesting ways of looking at small sample sizes like this? Let me know! I’ll need something to keep me entertained during the games tomorrow, as I certainly won’t be enjoying watching my team.

1. I renamed my team the Sad Pandas. That’s how bad it is. I grabbed Peyton with my first pick and everything has been downhill from there.
2. I also checked the medians for each week, then took the median of that to see if there was a significant difference on a week to week basis. That number was 135, so I didn’t worry about it.

# Political ages…mean vs median?

I just found out The Economist has a daily chart feature!

Today’s graph about age of population vs age of cabinet ministers is pretty fascinating:

It did leave me with a few questions though…..who did they count as cabinet ministers?  I don’t know enough about the governments in these countries to know what that equates to.  Also, why average vs median?
I initially thought this chart might have been representing Congress, not the Cabinet.  I took a look at my old friend the Congressional Research Service Report and discovered that at the beginning of the 112th Congress in 2011, the average age was  57.7 years, which would make this chart about right.  I had to dig a bit further to get the ages of the Cabinet, but it turns out their average age is 59.75.  I was surprised the data points would be so close together actually….especially since that 57.7 was for Jan 2011, so it’s actually 59.2 or so now.
In case you’re curious, 7 members of the cabinet are under 60.  The youngest is Shaun Donovan (46), Department of Housing and Urban Development.  The oldest is Leon Panetta (74), Department of Defense. Panetta is actually the only member over 70.  Half of them are in their 60s, 5 in the 50s, and 2 in their 40s.
I felt a little ashamed I only could have given name/position to 5 of them before looking them all up.  That’s not great, especially when you realize I’m counting Biden.  Still, I comforted myself with the fact that I bet that beats a very large percentage of Americans.
A quick look for other data suggests that median age of populations is the more commonly reported value.  The median age of the cabinet was actually 61, in case you’re curious.