Supplementary Calculations: June 18th, 2015

These are the supplementary calculations for this post.  Read that first or you’ll be confused.

Okay, you clicked.  Wow.  Good for you.  Alright….so to solve this problem we have to do a few different things:

1. First, we have to get our two distributions.  Normal distributions are noted using this notation N(average, variance).  Since the original paper gave standard deviation for each, I had to square it to get the variance.  Thus the two below, one for men, one for women.

2. Next we use these two distributions to make a new one that represents the distribution of the difference between the two distributions.  The nifty thing about normal distributions is this is weirdly easy to do.  We just subtract the averages to get the new average, then add the variances.  Not all transformations are this easy, but a simple one like this is nice and simple.

3. We use my Yugo/Cumberbatch equation!  Yay!  Basically what we are doing here is turning our newly acquired normal distribution and rejiggering it to make it in to the little black dress of distributions….the N(0, 1) distribution.  Every stats book you will ever find has probabilities for this distribution, so it’s the gold standard and makes our lives easier to boot.  When we do our math, we get that we’re looking for anything over .95.  Our handy dandy book tells me that’s .17 or a 17% chance.
16Jun15blog3

No One Asked Me: Height Differences

Do you think the height difference is too much if a girl is 4’11 and a guy is 5’8?
Just wondering what other people’s thoughts are… My friend thinks it’s too much of a difference.
-Anonymous

Found on Yahoo! Answers

Okay Anonymous, let’s talk this out.  There’s a few different ways of looking at this type of problem, and figuring out how much of a height difference is “too much”.  The first thing you should know is that height for males and females follow two different, but similar, normal (gaussian) distributions.   They look like this:

Aw cute, they're holding hands...er, tails!

Aw cute, they’re holding hands…er, tails!

That’s what it looks like when two normal distributions are similar in shape but have different averages.  For the most part they stay on their own sides, but there’s some overlap.  Now, some people1 will tell you this is a good example of a bimodal distribution, but there’s some controversy about that2, so tread lightly, Anonymous. Those same people will also tell you this means it looks like a camel….

This camel would like to point out that comparisons to him are ALSO a potentially inaccurate analogy and yet no one's writing papers about HIM.  He has feelings too you know.

This camel would like to point out that comparisons to his humps are ALSO a potentially inaccurate analogy and yet no one’s writing papers about that issue. He has feelings too you know.

….maybe you should avoid this analogy too. Regardless of what we call it or how we describe it, you’ve noticed this before and you know what it means – most women will wind up with men who are taller than them.  Quite handily for your question, people love to track these distributions as much as Stats 101 profs love to use them as examples, so we have a nice data set from 2007-2008 here.  We’ll work off of that.  I like this data set because it’s kind of cute, and height is measured rather than self reported, which means it’s likely more accurate than a lot of the other more shifty looking data sets out there.

Now Anon (can I call you Anon?), I’m going to assume from your question that you’re on the young side.  I don’t know if you’re pre-pubescent or post, but just know that if either of you are younger than 15 or 16, there’s still a shot one of you could grow a bit more.  For the purposes of this exercise though, let’s assume you’re both in the 20-29 range.  While technically adult height ranges could be anywhere from 21 inches to 8’3”, we know that in reality over 99%3 of adult  men you meet will be between 5’3” and 6’4”, and 99% of adult women will be between 4’10” and 6′ tall4.

Since you didn’t give your gender, we’re going to look at this from two different angles.

If you are the girl:

If you’re our 4’11” girl, you may not want to go ruling out a 9 inch height difference so quickly.  Only 33.1% of men are shorter than this, so it’s actually more like than not that you’ll meet a guy whose even taller than 5’8”….and thus more than 9” taller than you.

If you are the guy:

Well now the story kind of changes.  If you’re our guy here, your chances of meeting a girl shorter than 4’11” are actually pretty small….only 2.6% of women are shorter than this.  Thus nearly every woman you meet will be less than 9 inches taller than you.  You’ll actually meet women taller than you 5 times as often as you’ll meet a woman shorter than 4’11”.

If you just want to fit in:

Alright, so now that we told you your chances, lets take this from another angle.  What happens if you are not concerned about you personally, so much as you’re concerned about everyone else.  How often do people, in general, wind up with someone 9 inches taller than them or more?  Well, for that we have to use some slightly different data.  When I mentioned the bimodal controversy up there a while back, I linked to another nifty data set that gave me some more information:  mean and standard deviation of male and female heights.  Those are cool because now I can blatantly exploit their good nature to calculate how likely it is that we will find a woman 9″ shorter than her male partner.  To do this we….you know what?  You are not going to be interested.  If you want to see the calculations, go here.  I want to show you on of the equation I used though, because it’s initially kind of funny looking but when you see it in action you find yourself oddly attracted to it, like a Yugo or Benedict Cumberbatch:

The Yugo literally looks like a bad drawing of a car came to life

The Yugo literally looks like a bad drawing of a car came to life

Take that baby out for a spin and it tells you that there’s a 17% chance that he’s more than 9 inches taller than her.

BUT YOU’VE BEEN LIED TO, BY ME AND POSSIBLY BY EVERYONE:

Alright Anon, I gotta come clean.  I’ve been lying to you, and I’m sorry.  In my haste to impress you, I totally made a few things up that I shouldn’t have.  It’s hard for me to say this, but here we go: I never should have presumed that any of this was random. I was trying to make life easier on myself by making some assumptions, but I can’t do that to you now that we’re friends and all.  All the calculations I just did presume that men and women just get thrown together without anyone ever thinking anything or having preferences.  That’s completely not true and you and I both know it.  People like what they like,  and what they like frequently includes height preferences, at least enough to mess with my calculations.  As a little bit bonus of life advice, any time anyone shows you a statistic, it’s a good idea to try and figure out what assumptions they made on their way to getting it.  Everyone makes assumptions to get their math to be a little easier5, but sometimes those assumptions make our results way less useful.

So what’s the real story if we don’t presume everything’s random? Well, Fivethirtyeight.com did some good math on this a few months ago while answering a question about how often in real life a man would be shorter than his partner here, and they linked to an interesting study that suggested in the real world, 9 inch (or close) height differences would occur in 30% of couples. That’s even more than the 17% we came up with up earlier, and a few more calculat.  So you’re not alone Anon, not even close.  Everything else being equal, the height difference shouldn’t be a problem, and it definitely won’t be that unusual.  In fact, given how common it is, you may want to consider that one of your friends has a crush on whoever you’ve got your eye on, and is trying to talk you out of it so they can have him/her to themselves.  I’m unimpressed with their advice here, and the math agrees with me.  Good luck and god speed you crazy kids you.

 

1. Up to and including every stats 101 prof ever.
2. This paper ends with one of the most amusing conclusions of all time. Essentially they conclude that a. male/female height distributions are not really technically bimodal but b. there’s no other good quick classroom demonstration they can think of to illustrate the concept and c. readers should think of one and tell them so we can stop using this
3. 99% sounds like a lot, but keep in mind that at least in the USA this leaves nearly 2.5 million adults outside of these ranges. 1% of a large number is a LOT.
4. Tangentially related bonus fact:Right around 5’7” we hit a kind of magical crossover place, where it’s equally likely that men and women will be that height.  Put another way, if I were to say something like “my cousin is 5’7””, and you were trying to guess my cousin’s gender based only on that statement, you’d have to guess completely at random.
5. If you think I’m bad, don’t even get me started on physicists.

R&C: I Have a Cold

It never fails, every time I work more than 9 days in a row I come down with a cold.  This latest run was no exception, and the worst part was I didn’t get a day off until day 13.  Let me tell you, it was a good time.

It’s been a tough couple of months at work, and it’s been ages since I had a real vacation.  This has not gone unnoticed, so the most common thing I heard last week from coworkers/friends/family/everyone was something along the lines of “I’m not surprised you got sick, you’ve been so stressed out!”. Naturally, I started to wonder how much validity there was to this statement, what the mechanism was, and how big the effect size might be, because that’s how I deal with things.  I wondered if stress in and of itself was a factor, or if it was really the unhealthy behaviors that came with stress.  Luckily for me, some researchers out of Carnegie Mellon were all ready with my answer in their paper Psychological Stress and Susceptibility to the Common Cold.  It’s a pretty good paper, so I sketchnoted it while still sick which is why I accidentally misspelled “susceptibility” in the title and gave my sick man 4 arms.  Ah well.

16Jun15blog

 

So overall, some good proof that the presence of stress can actually increase your cold risk.  My coworkers were right, and I’m taking some Nyquil and going to bed.  Goodnight everyone

Gone but not forgotten

It’s been a long summer.  In April, I wrote to let you know my uncle had died unexpectedly.  A few weeks later, a different uncle of the same name also passed away.  On Tuesday, my grandfather died.  It’s an interesting coincidence that these three men were all named James, and that despite their disparate ages (56, 60, 89) all died within such a short time period.

I’ve done a lot of reflecting over the past several days, and I wanted to say a few words about my grandfather, then write a few things about where I go from here.  I’ve subdivided this so you can skip parts you’re not interested in.

James R King
My grandfather was the original stat-man in our family.  He quite literally wrote the book on it.  As we went through his stuff this weekend, I was amused to find that he had also been the original stats blogger in the family.  Apparently he had spent years running a stats newsletter where he wrote about stats topics that interested him and then sent it out to those who payed him $10 or so for the privilege of reading his thoughts. Judging by his archives, it seems to me quite a few people were interested in what he had to say.

My grandfather was truly a man of his time in many many ways.  He was hard working, hard drinking, driven by duty to God, country, family, intellectual curiosity and deep desire to see things work correctly.  He served in two wars (WWII, Korea), helped put a man on the moon, and had a deep disdain for stupidity.  As recently as a few months ago, he was grilling me about how to apply quality principles to health services environments.  He was annoyed that the administration of his assisted living facility wouldn’t take him on as an operational consultant.  He wasn’t trying to get money, he was just annoyed that things could be done better.  I’m not sure they ever knew how much free brain power they lost.

Since I got the new on Tuesday, I’ve been reflecting on what it means to watch another member of the greatest generation slip away.  For me, I have lost not only a grandfather, but someone who understood my way of viewing the world.  For all that “geek culture” has become mainstream, it’s still a bit of a lonely life for those of us who prefer to view the world through numbers and systems, and my grandfather was one of the few people I could count on to always know how I felt.  I’ll be raising a martini or two over a spreadsheet or three in his memory, I’m sure.

The Future of this Blog
Three deaths in 5 months is a lot, especially when the people involved were meaningful to your family structure.  I’ve been slow in posting this summer, and at this point, I’ve realized I need a complete break.  I started this blog as a fun project to work out some frustrations I had about political campaigns, and it worked well for that.  I’ve loved the readers I’ve had and the conversations that took place here.  I hope to get back to this at some point, to renew those conversations, but right now I don’t have it in me.

On the other hand…
I have some projects in the works you all might be interested in.  First and foremost, this blog has helped start an ongoing conversation with my (science teacher) brother about what it would take to give kids a good sense of how to apply math and science to the media that bombards them, and give them a good sense of practical scientific literacy.  These discussions have led to us start collaborating on an e-book/curriculum guide of sorts.  The idea is it would be a bit like this blog adapted for a classroom setting….a sort of “here’s how you take the dry concepts you’re hearing and here’s when you should use them in the real world”.  I’ll be posting periodic updates on this project, so you can check back for those.

Also, I know many of my readers have pretty awesome blogs of their own.  I’m always available for guest posts and/or random stats commentary if you miss me :).

Again, I want to thank everyone who has made this blog such a fun place for me to write.  The internet certainly has it’s ups and downs, but (in the words of the AVI) I have been happy to be part of this “small but excellent corner” of it.

Keep being 2SD above the norm, and good luck out there.

Autism and Labor

Commenter Erin brought up the recent hubbub regarding induced labor and autism, and while I’d like to comment on it, Science Based Medicine has already done a pretty thorough job.  They put the breakdown quite succinctly:

In the case of this study, either inducing/augmenting labor triggers autism in some children, children with autism are more likely to require induced labor, or some other factor(s) is a risk factor for both developing autism and needing to induce or augment labor. This current study does not contain data that can differentiate among these possibilities.

Induced labor is a hard thing to study because (unlike c-sections) induction is very rarely completely elective. It is almost always precipitated by some other complication.  It’s an interesting study though, and definitely indicates a need for more research.  Anything that gets people off the vaccine thing makes me happy.

30 Days of Data Storytelling: Day 4 and 5

Two videos for today, a long-ish one that gives more details about how to do things, and a Hans Rosling video that is a great example of a story with data.  I’ve seen the Rosling video a few times, but it’s worth a look just to see how he shows his data off.

The other video is a good primer of what to do and what not to do when presenting data.  If you have time, worth a watch.

Literally Unbelievable

In 2013, I’m pretty sure it’s a pretty universal experience to have at least one Facebook friend who is a bit of a train wreck.  I have one such person on my list, and for a variety of reasons I cannot delete him.  He is quite prone to daily postings of dozens of ridiculous political comments/links/cartoons that range from condescendingly disagreeable to outright offensive.  A large part of this offensiveness, IMHO, comes from the fact that a decent amount of what he posts isn’t actually true.


He seems to be a deep sucker for a story that fits his pre-existing narrative, and at least twice a day I see something out of him that doesn’t even pass a basic sniff test.  To be fair, he at least occasionally gets called out on this.  Apparently this has been getting to him though (the “hey that story’s not true” part), because last night he posted quite the disclaimer that let everyone know that he “thoroughly researches” every story he posts.  
A mere 10 hours later, with no irony and lots of anger, he posted this article: Lance Armstrong Fails Drug Test for Job at Target.
On the bright side, just a few posts down on my newsfeed, a different friend posted this list chronicling the 35 best times someone on Facebook thought The Onion was real.  These two friends don’t know each other, so it was pretty serendipitous.
It’s a great list, and apparently it’s drawn from a whole website of this sort of thing called Literally Unbelievable.
Check your sources people, check your sources.  

30 Days of Data Storytelling: Day 3

Doubling up on the posts since I got behind.

Today’s entry was this awesome data simulation/graph/narrative about Olympic long jumping.

I remember watching a few of these around the Olympics last year, and it was pretty cool.  It’s a good overview of raw data, with visuals and comparisons to put it in context.  Context is one of the most underutilized aspects of data presentation.  Hearing “he jumped 26 feet” is impressive, but hearing “he jumped from the edge of the court past the 3 point line” gives context.

It’s a short video, definitely worth a watch if you have the time.

30 Days of Storytelling: Day 2 (Pixar version)

So after posting the first two articles last week, I realized those were supposed to be a combined Day 1, making this the real day 2.

Day 2 was two interesting Pixar related things…one a list of their rules for great storytelling and the other a short (about 3 minutes) video where they tell a story with no words.  If you’ve ever seen a Pixar movie, you know they can tell a fantastic story, so it was interesting to read their take on the craft.

A few of their rules particularly stood out as relevant to data stories:

#2 You gotta keep in mind what’s interesting to you as an audience, not what’s fun to do as a writer. They can be very different.

#11 Putting it on paper lets you start fixing it. If it stays in your head, a perfect idea, you’ll never share it with anyone.

#17 No work is ever wasted. If it’s not working, let go and move on – it’ll come back around to be useful later.

I’m sure there are others that could apply, but those are the 3 that really struck me.  Sometimes I find fun and funky data that no one else is interested in.  I’m always having to refocus on the question at hand.  When you analyze data a lot, the “normal stuff” can get boring, but normal is interesting to someone who’s seeing it for the first time.  That bleeds in to #11….you can’t always know what’s interesting to people until you start to share it.  Testing reactions and assessing opinion is valuable.

When something flops, that’s when #17 comes in.  I store all the data I come across for future use.  It’s interesting how often something no one was interested in can later become critical.

The video’s just cute.  Show it to the small child in your life.