5 Fun Intersections of Math and Birthdays

Well hi there! Guess what? Today’s my birthday! Hurray for another trip around the sun! This of course reminded me of all the great birthday related math things that are out there, so I thought I’d go ahead and put a few together just for kicks:

  1. The birthday paradox A classic problem in statistics, the birthday paradox asks some form of the question “If you have 23 people in a room, what are the chances at least two of them have the same birthday?”. The answer is 50%, and it’s not really a paradox but just a thing people have trouble understanding. Better Explained has a nice breakdown of the problem here, which reminds readers that exponents are hard and that part of you immediately focused on your own birthday. As always, the chances of something happening somewhere are higher than any particular thing happening to you. My favorite way of viewing this problem came from the Assistant Village Idiot, who explains it by asking people to imagine they’re throwing darts randomly at squares and inquiring how long  they think it would take before two darts wind up in the same square.
  2. Cheryl’s birthday I love when math puzzles go viral, and Cheryl’s birthday was a pretty good one. If you missed it, here it is: And here is the Guardian’s explanation of the answer.
  3. Common birthdays This graphic of common birthdays is both an interesting infographic and a cautionary tale of using ordinal data on the uninitiated. Basically, the author put together a visual representation/heat map of the most common birthdays by rank (as in 1-366), had it go viral, then had people complaining to him that it “wasn’t accurate”.  He was rightfully irritated since it was just something he’d done for fun, but it’s a good reminder to fully think through visuals you see on the internet and to read the original sources for proper context.
  4. When is the old/young tipping point? Well, if you define “old” vs “young” as “when is over half the global population younger than me”, the 538 says the tipping point is somewhere in your late 20s. If you’re limiting yourself to just the US though, you have until you’re 37. Nathan Yau has a great visual here, and you can break it down by gender.
  5. And of course, one of the most compelling statistical truths of all time: 

Can’t argue with that one. I’m gonna go have some cake.

Internet Science: Some Updates

Well, it’s that time of year again: back to school. Next week I am again headed back to high school to give the talk that spawned my Intro to Internet Science series last year.

This is my third year talking to this particular class, and I have a few updates that I thought folks might be interested in. It makes more sense if you read the series (or at least the intro to what I try to do in this talk), so if you missed that you can check it out here.

Last year, the biggest issue we ran in to was kids deciding they can’t believe ANY science, which I wrote about here. We’re trying to correct that a bit this year, without losing the “be skeptical” idea. Since education research kinda has a replication problem, all the things we’re trying are generally just a discussion between the teacher and I.

  1. Skin in the game/eliminating selection bias In order to make the class a little more interactive, I’ve normally given the kids a quiz to kick things off. We’ve had some trouble over the years getting the kids answers compiled, so this year we’re actually giving them the quiz ahead of time. This means I’ll be able to have the results available before the talk, so I can show them during the talk. I’m hoping this will help me figure out my focus a bit. When I only know the feedback of the kids who want to raise their hands, it can be hard to know which issues really trip the class up.
  2. Focus on p-values and failure to replicate In the past during my Crazy Stats Tricks part, I’ve tried to cram a lot in. I’ve decided this is too much, so I’m just going to include a bit about failed replications. Specifically, I’m going to talk about how popular studies get repeated even when it turns out they weren’t true. Talking about Wonder Woman and power poses is a pretty good attention getter, and I like to point out that the author’s TED talk page contains no disclaimer that her study failed to replicate (Update: As of October 2016, the page now makes note of the controversy). It does however tell us it’s been viewed 35,000,000 times.
  3. Research checklist As part of this class, these kids are eventually going to have to write a research paper. This is where the whole “well we can’t really know anything” issue got us last year. So to end the talk, we’re going to give the kids this research paper checklist, which will hopefully help give them some guidance. Point #2 on the checklist is “Be skeptical of current findings, theories, policies, methods, data and opinions” so our thought is to basically say “okay, I got you through #2….now you have the rest of the year to work through the rest”. I am told that many of the items on that list meet the learning objectives for the class, so this should give the teacher something to go off of for the rest of the year as well.

Any other thoughts or suggestions (especially from my teacher readers!) are more than welcome. Wish me luck!

The Signal and the Noise: Chapter 10

Chapter 10 was about poker, and how to make money playing poker. Apparently the key is to make it easy for lots of inexperienced people to play. When websites that made it easy to play got shut down, fewer inexperienced people made the effort and many previously “successful” players discovered they were now the fish. It’s a good reminder to keep an eye on the skill level of your competition in addition to your own.SignalNoiseCh10

5 Interesting Things Research Tells Us About Internet Trolls

I got in an interesting discussion this weekend with some folks about internet trolls. One person had made an offhanded comment about “anonymity bringing out the worst in people”, and was surprised when I informed them that the current research didn’t really support them in that. Depending on your perspective I am either the best or worst person to ever get in one of these discussions with, because I decided to do a little roundup of the current research on internet trolls. Hang in there with me, and you get a bonus SHEEPLE at the end:

  1. Defining trolling is actually kind of hard While most of us would say that trolling is a sort of “you know it when you see it” issue, the definitions used actually vary a bit. For example this study found trolls by asking participants directly “do you like to troll”, this study just counted Tweets that contained “bad words”, this study had researchers read through individual posts and had rank their “trollishness”, and this study had researchers track whole comment histories of banned forum users. None of those are necessarily wrong, but they all will catch slightly different sets of behavior and groups of commenters.
  2. Trolls who cause chaos online also like to cause chaos for researchers.  To the surprise of no one, those who admit they like to troll online also get a kick out of messing with researchers. When Whitney Phillips was writing her book about trolling, she tried to interview self professed trolls to see what motivated them. Unfortunately they kept making up stories then hanging up on her. That makes you wonder about research where people had to self define as trolls, like this widely reported study that said trolls tend to be sadists. Are trolls really sadists, or do those who say “yes I like to troll” also like to answer “yes” to questions on sadism quizzes? And is answering “yes” as a joke substantively different from answering “yes” in all seriousness?
  3. Who gets targeted is a complicated question One of the issues that arises due to the different definitions of trolls (see #1) is the question of “who gets targeted”. At this point “trolling” can be used to define anything from irritating but benign behavior to criticism of all types to abuse, threats and harassment. With so many varieties of trolling, figuring out who the targets are can be more difficult than it first appears. For example, this British marketing group found that male celebrities got twice the Twitter harassment as female celebrities. To note, the standard for “harassment” used there was a “bad word” filter, and the number or content of the celebrities Tweets were not rated. Given that Piers Morgan, Ricky Gervais, and Katie Hopkins ended up as the top three receivers of abuse, content appears to matter. Anyway, Cathy Young has this to say about the gender breakdown and Amanda Hess replied with this. We do know that young people (18-24) are the most likely to have problems, and there is a gender difference in type of harassment. From Pew Research:This is all age ranges:  Note: all of those terms were self defined and self reported, and there was no controlling for where those things occurred. In other words, people being called offensive names out of the blue in an innocuous situation were counted the same as someone calling you a name in the middle of a heated debate.
  4. Real names don’t necessarily help. Nearly as long as trolls have been discussed, people have been mentioning the enabling role of anonymity. A recent study suggests that may be less important than we think. A recent study of German social media showed that using real names frequently made people more hostile, not less. It turns out that the social signaling/credibility gained from online posts actually can empower people to get meaner. Oh boy.
  5. Controlling your own emotions might actually help The most common advice dispensed on this whole topic is of course “don’t feed the trolls”.  However, it can be a little tough figuring out what that means. When these researchers tried to create a predictive algorithm to see if they could identify trolls by their first ten posts on a site, they discovered that trolls tend to escalate when they have posts unfairly deleted. In order to find “unfair” deletions, they blinded an assessor to the identity of the poster, and asked them if it was offensive or not. It turns out that trolls really were more likely to have inoffensive posts deleted, and that their postings worsened significantly after that happened. Now of course this may have been the goal….moderators who are sick of someones posts entirely may get capricious with the hopes that they’ll get so mad they’ll leave, but it is an interesting insight. Also interesting from the paper: trolls comment more often but in fewer threads, they have worse overall writing quality, and they get more responses than other users. Yup, designed to irritate.

Unfortunately none of my research turned up any guidance on how likely this is to happen:

Stay safe out there.

5 Examples of Bimodal Distributions (None of Which Are Human Height)

Of all the strange things about statistics education in the US (and other countries for all I know) is the way we teach kids about the bimodal distribution. A bimodal distribution is a set of data that has two peaks (modes) that are at least as far apart as the sum of the standard deviations. It looks like this:

It’s an important distribution to know about, because if your data looks like this, your calculations for the average are going to be totally useless. For the distribution above for example, we’d get an average of (around) zero, which would tell us nearly nothing about the data itself, and would completely miss both peaks. So far so good. However, when this is taught in stats classes, the “real world” example most kids are given is human height….and human height is not bimodal. Bummer.

Given that it’s the start of the school year and all, I thought it would be a good time to provide teachers with some new examples. Now, depending on the underlying data set you might use, some of these examples may not make the “peaks separated by the length of the combined standard deviations” cutoff either…..but at least you’ll be wrong in new ways. That’s got to count for something, right?

  1. Starting salaries for lawyers On average new lawyers do well. In reality there are big winners and losers in the whole “getting a good job after graduation” game, and it shows in the salary distributions. Read the Above The Law complaint here.
  2. Book prices Book prices cluster around different price points, depending on whether your looking at paperbacks or hardcovers as God Plays Dice explains. If the gap between paperback and hardcovers isn’t wide enough for you, imagine you could pull price data for every book available on Amazon.com. You’d end up with a two modes, one for regular books and one for textbooks.
  3. Peak restaurant hours If you plotted a histogram of when every customer entered a restaurant on a given day, you’d end up with a bimodal distribution around 2 points: lunch and dinner. This type of histogram also tends to appear when you map road usage (morning and afternoon rush hours) and residential water/electricity usage (before and after work).
  4. Speed limits This one I actually couldn’t find much data on, but I’m guessing if you mapped out all the speed limits on every mile of road in the US (or maybe just your state), your distribution would end up clustered around 30/35 and then again around 60/65. Basically highways or regular roads. This distribution would also have the additional wrinkle of skewing differently based on whether we used miles of road or number of roads, but that’s a different matter entirely.
  5. Disease patterns There’s a rather fascinating two part blog post  by Jules J Berman that discusses bimodal cancer patters here and here. Basically these are cancers that appear similar but tend to hit rather different ages groups. For example Karposi’s sarcoma hits young men with AIDS and older men who do not have AIDS, and Berman argues that seeing these patterns should give us important clues about the diseases themselves. Possible explanations from Berman’s post:  1. Multiple environmental causes targeting different ages 2. Multiple genetic causes with different latencies 3. Multiple diseases classified under one name 4. Faulty or insufficient data 5. Combinations of 1,2,3 and 4.

Bimodal distributions are also a great reason why the number one rule of data analysis is to ALWAYS take a quick look at a graph of your data before you do anything. As you can see from the above examples, the peaks almost always contain their own important sets of information, and must be understood both separately and together in order be understood at all.

So what’s your favorite non-human height example?

The Signal and the Noise: Chapter 9

Chapter 9 has some interesting anecdotes about the quest to create a chess program that could beat Gary Kasparov. It covers some of the limits of humans and machines, and how they are almost better when used in tandem.

SignalNoiseCh9

Team USA, Women and the Olympics

With the Olympics officially coming to a close this past Sunday, a reader contacted me and asked about the performance of the female athletes of Team USA. He was curious if the number of medals won by US women in the Olympics had increased as a percentage, absolute count or both since the passing of Title IX. In a year that female athletes got a substantial amount of coverage, this seemed like an interesting question so I ran a few numbers.

Some caveats: Figuring out how many events there are each year is tougher than I thought, especially for the early Olympics. Because some of my data sources disagreed, some of these percentages might be off. Additionally, I may be slightly off on the percent won by women by a few points. In both the winter and summer Olympics, there are some mixed gender events…think paired figure skating. I couldn’t figure out how the data I pulled below was counting that, so it could vary a bit. Since there’s only 3 of those events in the winter Olympics and 9 in the summer, I decided to let it go. Finally, this only counts events, not athletes. Michael Phelps counts as a medal in each of his events, but the relay team also only counts as one. So basically, this reflects the gender breakdown by medal count, not by the number of male or female medalists we have. So Team USA basketball is one medal for each gender, despite making quite a few people “gold medalists”. All data sources at the bottom of the post

Okay, so let’s take a look!

First, how has the percent of Team USA medals won by women changed over time?

percentofmedals

Each of the lines is 8 years, if you’re trying to orient yourself. For the youngsters, the dip in 1980 is because we boycotted that year. As you can see though, the percentage has gone steadily up.

But what was the driver of this? The initial asker suggested the driver was Title IX, but I wondered if it might be more closely correlated with the expansion of women’s events. Of course neither of these would be entirely independently causal….we know the social forces that drove one likely drove the other. Anyway, here’s how the percentage of medals available to women varied with the percentage of medals won by women on Team USA for the Summer Olympics:

Summermedals

And winter:

wintermedals

The winter medals variability is almost all because of the low medal counts. The two years they were high were actually not very high medal count years (5 and 9), but basically the men only got 2. I ran a quick regression and the r-squared for the Summer Olympics is around .75, and the Winter about .4.

For juxtaposition, here’s the number of female NCAA Div 1 athletes superimposed on a different scale:NCAA athletes

I’m not going to do the overall regression because correcting for the multicollinearity (aka,  a regression with two factors that are correlated) can be a bit of a hassle, but I’m guessing it’s the expansion of events driving the medal count more than the number of D1 female athletes. However, it may be the increased number of athletes allowed the US to immediately take advantage of every expansion in medal events. Additionally having more talented female athletes probably incentivized the IOC to add more events.

Confusing correlation, but a great question!

The Team USA Medal Count came from here. The count for female athletes came from here. The number of events came from here for winter and here for summer. The number of events available to women is here. NCAA athlete counts are here.

Accidental Polymath Problems: 10 Subjects You Study Before You Find the One

After my comment last week that I’d sort of friend-zoned physics, I got to thinking about how many different subjects/career choices I stumbled through during my 20s. It’s incredibly interesting to me that even though society has started allowing (and frequently even encouraging) people to wait longer and longer before finding “the one” for marriage, we still put a lot of pressure on people to know exactly what they’re interested in by the age of 18…or 22 if you’re a little behind. Clearly college debt is a huge driver of this, but I do meet a bizarre number of high school students who really think most people figure out “their passion” before they’re even old enough to drink. While clearly there are plenty of people who find what topics they want to study early, I’d like to propose that the whole thing is a little more like dating then we normally think of it.

When I mention “subjects” here and “study”, I am covering a lot of ground. Studying could mean formally studying in school, or getting books out of the library, watching documentaries or talking to a lot of people in the field. While I mention careers, I’m not directly equating intellectual pursuits to careers or work because some people really don’t get to equate those two. It’s an unfortunate reality that many of us have to prioritize paying the bills over feeding our minds, and if you ever find yourself doing both at once you are incredibly lucky. With those caveats, and the knowledge that this is based on nothing but my own experience and that of my friends, here’s the 10 types of subjects you study before you find “the one”:

  1. The Celebrity Crush  Grey’s Anatomy. ER. Bones. CSI. Law and Order. Let’s face it, some professions get all the girls. No seriously, who among us with a television set hasn’t at some point hasn’t developed a crush on an entire profession/field of study? When I was 4 years old I watched a PBS special and spent months telling everyone I wanted to be a paleontologist and begging my parents for dinosaur books. Two decades or so later I binge watched the first 5 seasons of Bones I spent a solid 3 weeks desperately wanting to be a forensic anthropologist and reading every book my library had on the topic. While sometimes these can spark real career choices, most of the time the fantasy is better than the reality. I mean, I still adore dinosaurs but I would NEVER have the patience to catalogue a dig site. Some things just look better from afar.
  2. The Challenge Subject Similar to the celebrity crush, but you actually encountered it in your real life. This is the subject or path you pursue because you’re not sure you can actually get it. It’s not that you’re not legitimately interested, but if you’re honest with yourself it’s really your competitive streak that’s pushing you through. The truth will hit you when you finally mastered the subject, only to promptly realize that now you really never want to talk about it again. Want an example? Ask me about my biomedical engineering degree.
  3. The One that Requires Way Too Much Commitment Okay, so you found a subject you really like, and you think “hey, maybe I’d like to consider this as a profession”…and then you realize exactly how much work that would take. You like the subject, but the idea of working hundreds of hours or going to school for a decade to study it further strikes you as waaaaay too much commitment.  It’s ready to settle down, and it looks nice, but you just can’t be tied down like that. You have too many other interests, and there’s only so many hours in the day.
  4. The Summer Fling This is the subject you absolutely love, but only because of the setting you encountered it in. Maybe you got to learn about archeology while studying abroad in Egypt or you had an amazing professor who made an otherwise boring subject unbelievably interesting. When you try to pick this subject back up again, you realize that in a more mundane environment it actually is kind of boring. Ah well, at least you have the memories.
  5. The Artist This is the subject you love with all your heart, but you realize it will always be a bit of a free spirit. Maybe it’s literally an artistic field mashed up with another topic, or maybe it’s a subject you’re just kind of making up as you go along (like, say teaching people how to read science on the internet) but it doesn’t fit neatly in any sort of traditional box. It’s more exciting to you than almost any other topic, but no one else understands what you see in it and it’s DEFINITELY not a program of study anywhere.
  6. The Friend With Benefits This is the subject that comes really naturally to you without ever really having to put much effort in. It doesn’t excite you much, but people will pay you to do it and the effort is minimal. For me, this is quality and regulatory. You want someone to memorize obscure regulations, recite them at you when you step out of line, check your work and tell you your faults? I’m your girl. I can do that in my sleep. Ask anyone who’s ever lived with me. Anyway, this one doesn’t require a lot of investment either because it comes so naturally or you’re already qualified for it, but you know you could walk away at any time and never think about it again.
  7. The Safe One Related to the friend with benefits, but you committed to this one. It doesn’t excite you, but you think you can always find work in this field and it’s not terribly stressful. You sometimes think about leaving, but everything else seems less certain. Tends to work out pretty well unless the field totally collapses on you.
  8. The Friend Zoned This is the thing you always enjoying hearing about, but simply never want to commit to doing much reading about…..despite a bit of a feeling you should give it a chance. Maybe it’s a field where you could make a lot of money, or something your parents think you should try, but you just can’t bring yourself to try it out.
  9. The “We’re Better As Friends” A little like the friend zone, but this one is a mutual decision. It’s the subject you like studying and love to be around, but as soon as anything formal or structured was required you did terribly and bailed. Still, having it in your life makes your life richer, as long as it’s on low pressure terms. Interestingly, I try to convince many people that statistics should fall in this category for them. You don’t have to like studying math formally in order to benefit from having a little more statistics in your life.
  10. The “why did we never work out” This is the subject you always think is pretty great, but really spend very little time studying. You like it, but every time you find a free moment, you forget it exists, or it’s only offered as a class the one semester you’re already overloaded, etc etc. For me this is epidemiology. I’ve taken classes in it, it’s a natural fit, but I never quite seem to follow up. I really should give it a call sometime soon.

Of course the nice thing about intellectual pursuits is that you actually can juggle multiple different subjects at once with a lot less potential for drama than if you tried that while dating. For example, my current job is a mash up of my true love (statistics, analytics and process improvement) my friend with benefits (quality and regulatory) and the safe one (computer systems). My blogging is The Artist, and it gives me a place to research all my thoughts that don’t fit in any other box. I think acknowledging how many different types of intellectual pursuits there are (and how much you can learn from all of them!) could be useful for kids still trying to figure things out. Just like dating can help you hone in on what you want in a spouse, studying a lot of subjects can help you find that sweet spot of “things you want to talk about” and “things people want to pay you to talk about”.

Plus, isn’t the world a little more fun when you consider every new book a low key blind date?

5 Things I’ve Learned From Reading About Problems in Physics

One of my favorite things about getting an engineering degree was the amount of basic science classes I had to take. It gave me at least a dilettante’s knowledge of quite a few scientific fields, and I’ve always enjoyed using that background to keep at least half an eye on other scientific fields. Of all of those fields, my particular favorite is physics. I always loved physics in that “I’m so glad to see you, but let’s just be friends” kind of way, and I try to make sure I read at least a book or two a year about it.

A few months ago I read Lee Smolin’s book “The Trouble With Physics“, and was intrigued to read a breakdown of some of the current (well, ten years ago now) problems in the field. It got me pretty stressed out about string theory, which is not a problem I had expected to have that week. I digress. Anyway, this physics anxiety got a little worse when James over at I Don’t Know But posted about how physics needed some new ideas, and then he left me this link about the rather embarrassing 750 GeV diphoton excess incident. He compared the whole debacle to priming studies, which seemed fair. Anyway, since blogging is the primary way I deal with my science and statistics related anxiety problems, I thought I’d put together a post on why I actually love reading about issues in physics.  Ready? Let’s go!

  1. Reading outside your field gives you a new perspective on errors  Most of my working experience is in healthcare, and one of my degrees is in a psych field. When you’re familiar enough with your field, it can be pretty easy to figure out what all the most common errors are. Since professions tend to attract people who think similarly, it stands to reason that fields will all have certain errors they are particularly susceptible to. Reading outside your normal field is a good way of realizing what problems are actually pretty universal, which ones you may never have thought of, and (ideally!) how other fields have dealt with some problems. Additionally, it’s really easy to see the issues in fields that tend to capture headlines (psych, nutrition, etc), while other fields that are less accessible can seem like they don’t have any problems. Reminding yourself this isn’t true is kind of reassuring.
  2. Statistical noise is a problem for everyone One of the reasons I went in to statistics in the first place was the allure of how many different fields had to use it. At the time I loved the idea of learning a topic that basically every single discipline had to use. I still do. The link James mentioned originally was about a topic I won’t even pretend I can explain (750 GeV diphoton excess) but focused on a problem I’m REALLY familiar with: over-interpertation of statistical noise. Yeah, basically theoretical physicists published about 500 papers on a phenomena that appeared to be true but then didn’t replicate.  Oops. In their defense though, it was a really large anomaly in an area that was theoretically plausible and that they’d had success with before, which is pretty much the perfect storm for confirmation bias.
  3. So is failing to check basic assumptions. If I had to make a complaint about the way we teach  statistics to kids, I would argue that the biggest error we make is not emphasizing to them how important it is to check basic assumptions. Textbooks are always reminding you that you have to make sure assumptions x, y and z hold true before you can use certain equations….then they just let you assume all those things for the rest of the class and send you on your merry way. The real world doesn’t work like this.  That was evident back in May when a couple of retraction notices came out from the New Journal of Physics. There was no intentional misconduct, but the authors had assumed the data was symmetrical without checking that assumption. In Smolin’s book, he discusses a few fundamental string theory assumptions (mentioned in the second column on page 2 of this review) that didn’t actually have experimental evidence behind them, despite most people assuming otherwise.
  4. The goal is to push the limits. In my priming studies post, I mentioned that pushing the limits and studying the fringes of a field is a feature, not a bug. That sentiment is echoed in this interesting article about “The Data That Threatened to Break Physics“. It discusses the struggle of a researcher to cope with completely unexpected results that run contrary to conventional wisdom. In the case of superluminal neutrinos, the results turned out to be the fault of a faulty cable, but the lead researcher quite rightly asks what people thought he should have done differently. Suppressing a potentially controversial result is not really something we want to encourage, and the upshot of that may be that we end up with retractions. To quote the lead researcher: “The worst data are better than the best theory. If you look for reasonable results, you would never make a discovery, or at least you will never make an unexpected discovery”.
  5. Even when you stop studying people, you can’t get out of dealing with people. At it’s heart, science is as much about bias management as it is about discovery. It is really difficult to do much of the latter if you don’t do the former. In Smolin’s book, two of the most fascinating chapters were “How Do You Fight Sociology?” and “How Science Really Works” (covered a bit in this review). Smolin reviews how tenure, grant related politics and even just plain old ego and groupthink can influence what scientific theories get money and attention. All this occurs without any outside social pressure, since of course it doesn’t matter to most lay people if string theory is true or not. Smolin proposes that to counter this, universities should reserve some money/positions for those who are actually quite polarizing in their work. He proposes that we invest in scientific ideas like many stock market investors work: put most of your money in safe things, but put some of it on ideas that look a little nuts. Nicholas Nassim Taleb famously calls this “the black swan approach”. I’ve heard worse ideas.

So there you have it, and if you have any good physics book recommendations, I’m always looking!