I’m Thinking of a Word That Starts With a…..

I’ve mentioned before that I like to try to find unusual ways of teaching my 5 year old son statistical concepts by relating them to things he likes. This pretty much doesn’t work, but this week I tried it again and attempted to use a discussion about letters to segue in to a discussion about perception vs data. He’s getting in to some reading fundamentals now, and is incredibly curious about what words start with which letters. This leads to our new favorite game “Let’s talk about ____ words!” where we name a letter and then just think of as many words as we can that start with that letter.

This game is fun, but he’s a little annoyed at letters that make more than one sound. This week he got particularly irritated at the letter “c”, which he felt was hogging all the words while leaving “k” and “s” with none. I started trying to explain to him that “s” in particular was doing pretty alright for itself, but after discussing “cereal” and “circus” he was pretty convinced that “s” was in trouble.

As I was defending the English language’s treatment of the letter “s”, I started to wonder what the most common first letter of words actually was. I also wondered if it was different for “kids words” vs “all words”. After some poking around on the internet, I discovered that there’s a decent amount of variation depending on what word list you go with. I decided to take a look at three lists:

  1. All unique words appearing more than 100,000 times in all books on Google ngrams (Note: I had to go to the original file here. The list they provide on that site and the Wiki page is actually the most common first letters for all words used, not just unique words. That’s why “t” is the most common….it’s counting every instance of “the” separately)
  2. The 1,000 most commonly used English language words (of Up-Goer 5 fame)
  3. The Dolch sight words list, used to teach kids to read

Comparing the percent of words starting with each letter on each list got me this graph:

As I suspected, “s” does quite well for itself across the board, though it really shines in the “core words” list. “K” on the other hand is definitely being left out. It’s interesting to see what letters do well in bigger word sets (like c, p and m), and which ones are only in the smaller sets (b, t, o and w). “W” seems very popular for early reading lists because of words like “what”, “where”, “why”. “S” actually is really interesting, as it appears to kick off lots of common-but-not-basic words. My guess is this is because of its participation in letter combinations like “sh” and “sch”.

Anyway, my son didn’t really seem to grasp the “the plural of anecdote is not data” lesson, so I pointed out to him that both “Spiderman” and “superhero” started with “S”. At that point he agreed that yes, lots of words started with “s” and went back to feeling bad for “K”. At least that we can agree upon.

Now please enjoy my favorite Sesame Street alphabet song ever: ABCs of the Swamp

Baby, You Can’t Drive My Car: Part 2

I didn’t get back to all the comments on my previous “Baby, You Can’t Drive My Car” post, but there were some good ones. Between those and some emails I got, there was some interesting statistical/follow-up issues raised that I felt deserved their own post.

First, the general “not getting driver’s licenses” trend among young people seems to actually be reflective of some larger delayed adulthood trends among “iGen”, or those born in 1995-2005. This Wall Street Journal article by a researcher studying the newest up and coming generation says the following ” As I found in analyzing seven large national surveys of teens, today’s adolescents are less likely to drive, drink, work, date, go out and have sex than were teens just 10 years ago. Today’s 18-year-olds look like 15-year-olds used to.” This may not be a bad thing. Per the article, this group is less likely to get in car accidents than teens in previous years, which is obviously a good thing. On the other hand, on my previous post commenter Michael pondered if delaying getting your driver’s license is good for your long term driving skills. Will the group that failed to get a license at 16 be getting in more accidents at 30 as a result? TBD.

Second point, related to the first, is the success of teen driver’s license initiatives. The initial results on these look good, but most of those numbers are obtained using raw numbers. In other words, we don’t know if those licensing laws improved the driving of 16 and 17 year olds, or if the reductions in crashes were due to fewer of them being able to drive at all. Not a bad outcome either way, but possibly a misleading one if the numbers pick back up at a later date. While I can definitely see some types of impulsiveness associated with teen drivers ebbing with age, there are certain errors inexperienced drivers might make regardless. Again, TBD.

Third point I started to wonder about….are we seeing a corresponding uptick in non-driver’s license IDs? If not, how are people without licenses identifying themselves? I can see people not wanting the hassle of getting a driver’s license, but it seems awfully hard to function without an ID. Even buying alcohol would be a challenge, at least in my area.

Fourth, related to #3, is the low driving rate in some states being driven by a population that can’t get a license? I mentioned in the first post that states with higher populations of those under 18 would likely have lower rates, but someone also noted that it’s not clear where the population numbers came from. Since some population counts include all residents regardless of citizenship, and since states vary wildly in allowing those without documentation of citizenship to get licenses, it’s possible that accounts for some of the differences.

 

 

 

What I’m Reading: November 2016

Like everyone else in the US, I’ve been reading a decent amount about the election. I have a few links of interest on that topic, but out of respect for the totally burned out folks, I have put those together at the end. I will however reiterate that I think this 2014 post from Slate Star Codex remains the most important blog post about the current political climate I have ever read.

Speaking of important blog posts, the Assistant Village Idiot’s “Underground DSM” post has been updated for 2016 and it continues to be one of the best pieces on mental health I have read. This needs to be a whole book AVI.

This month my book is “The Joy of X“, which I haven’t started reading yet. I’m hopping on a plane tomorrow morning though, so I plan on getting through most of it then. Also, I’m trying to put together a list of math or stats related books I want to read in 2017 (like this one from 2016), so if you have any recommendations I want to hear them!

An interesting piece on testing for fake data in research. The testing exploits the fact that making up realistic “random” data is a hell of a lot harder than it sounds.

For my teacher friends: the Mathematica Policy Research group took a look at teacher quality to see if that drove performance differences between low income and high income students, and it doesn’t. Poor kids and rich kids are actually equally likely to have good or bad math teachers:

On an academic note, here’s a ranking of colleges and their acceptance of viewpoint diversity.

I have no idea how accurate this primate hands family tree is, but it’s kind of awesome.

Ben, who I collaborated with on a Pop Science series earlier this year, did a series on the (possible?) death knell of the Pumpkin Spice craze by buying every Pumkin Spice product his store had to offer.

Okay, now below this line be politics:

*********************************************************************************************

This NYTs article is from August, but it covers a lot of interesting ground about how Facebook is skewing the way we talk about politics. I’ve put myself on FB timeouts more than once this election season, and I’ve enjoyed it every time.

There’s been a lot of talk about the electoral college this week, and whether or not it’s fair. This is one of those discussions that is sort of about numbers, but really about something else, so I’m not going in to that here. What I am interested in is Maine’s new experiment with ranked choice voting. More labor intensive to tally, but it’s got some interesting quirks that may change incentives for campaigns. A full explanation of how it works here.

A second Slate Star Codex link, but it’s too good not to share. Written the night before the election, he reminds everyone that in a close election over interpretation of the outcome is a dangerous game.

Also, I’ve gotten a request to start holding “Controversial Opinion” dinner parties. I kind of want to do this. There will be wine.

3 More Examples of Self Reporting Bias

Right after I put up my self reporting bias post last week, I saw a few more examples that were too good not to share. Some came from commenters, some were random stories I came across, but all of them could have made the original list. Here you go:

  1. Luxury good ratings Commenter Uncle Bill brought this one up in the comments section on the last post, and I liked it. The sunk cost fallacy  says that we have a hard time abandoning money we’ve already spent, and this kicks in when we have to say how satisfied we are with our luxury goods. No one wants to admit a $90,000 vehicle actually kind of sucks, so it can be hard to figure out if the self reported reliability ratings reflect reality or a desired reality.
  2. Study time Right after I put my last self reporting bias post up, this study came across my Twitter feed. It was a study looking in to “time spent on homework” vs grades, and initially it found that there was no correlation between the two. However, the researchers had given the college students involved pens that actually tracked what they were doing so they double checked the students reports. With the pen-measured data, there actually was a correlation between time on homework and performance in the class. It turned out that many of the low performing kids wildly overestimated how much time they were actually spending on their homework, much more so than the high performing kids. This bias is quite possibly completely unintentional….kids who were having a tough time with the material probably felt like they were spending more time than they were.
  3. Voter preference I mentioned voter preference in my Forest Gump Fallacy post, and I wanted to specifically call out Independent voters here. Despite the name and the large number of those who self identify as such, when you look at voting patterns many independent voters are actually what they call “closet partisans”. Apparently someone who identifies as Independent but has a history of voting Democrat is actually less likely to ever vote GOP than someone who identifies as a “weak Democrat”.  So Independent is a tricky group of Republicans who don’t want to say they’re Republicans, Democrats who don’t want to say they’re Democrats, 3rd party voters, voters who don’t care, and voters who truly have no party affiliation. I’m sure I left someone out, but you can see where it gets messy. This actually also effects how we view Republicans and Democrats, as those groups are normally polled based on self identification. By removing the Independents, it can make one or both parties look like their views are changing, even if the only change is who checked the box on the form.

If you see any more good ones, feel free to send them my way!

Gone but not forgotten

It’s been a long summer.  In April, I wrote to let you know my uncle had died unexpectedly.  A few weeks later, a different uncle of the same name also passed away.  On Tuesday, my grandfather died.  It’s an interesting coincidence that these three men were all named James, and that despite their disparate ages (56, 60, 89) all died within such a short time period.

I’ve done a lot of reflecting over the past several days, and I wanted to say a few words about my grandfather, then write a few things about where I go from here.  I’ve subdivided this so you can skip parts you’re not interested in.

James R King
My grandfather was the original stat-man in our family.  He quite literally wrote the book on it.  As we went through his stuff this weekend, I was amused to find that he had also been the original stats blogger in the family.  Apparently he had spent years running a stats newsletter where he wrote about stats topics that interested him and then sent it out to those who payed him $10 or so for the privilege of reading his thoughts. Judging by his archives, it seems to me quite a few people were interested in what he had to say.

My grandfather was truly a man of his time in many many ways.  He was hard working, hard drinking, driven by duty to God, country, family, intellectual curiosity and deep desire to see things work correctly.  He served in two wars (WWII, Korea), helped put a man on the moon, and had a deep disdain for stupidity.  As recently as a few months ago, he was grilling me about how to apply quality principles to health services environments.  He was annoyed that the administration of his assisted living facility wouldn’t take him on as an operational consultant.  He wasn’t trying to get money, he was just annoyed that things could be done better.  I’m not sure they ever knew how much free brain power they lost.

Since I got the new on Tuesday, I’ve been reflecting on what it means to watch another member of the greatest generation slip away.  For me, I have lost not only a grandfather, but someone who understood my way of viewing the world.  For all that “geek culture” has become mainstream, it’s still a bit of a lonely life for those of us who prefer to view the world through numbers and systems, and my grandfather was one of the few people I could count on to always know how I felt.  I’ll be raising a martini or two over a spreadsheet or three in his memory, I’m sure.

The Future of this Blog
Three deaths in 5 months is a lot, especially when the people involved were meaningful to your family structure.  I’ve been slow in posting this summer, and at this point, I’ve realized I need a complete break.  I started this blog as a fun project to work out some frustrations I had about political campaigns, and it worked well for that.  I’ve loved the readers I’ve had and the conversations that took place here.  I hope to get back to this at some point, to renew those conversations, but right now I don’t have it in me.

On the other hand…
I have some projects in the works you all might be interested in.  First and foremost, this blog has helped start an ongoing conversation with my (science teacher) brother about what it would take to give kids a good sense of how to apply math and science to the media that bombards them, and give them a good sense of practical scientific literacy.  These discussions have led to us start collaborating on an e-book/curriculum guide of sorts.  The idea is it would be a bit like this blog adapted for a classroom setting….a sort of “here’s how you take the dry concepts you’re hearing and here’s when you should use them in the real world”.  I’ll be posting periodic updates on this project, so you can check back for those.

Also, I know many of my readers have pretty awesome blogs of their own.  I’m always available for guest posts and/or random stats commentary if you miss me :).

Again, I want to thank everyone who has made this blog such a fun place for me to write.  The internet certainly has it’s ups and downs, but (in the words of the AVI) I have been happy to be part of this “small but excellent corner” of it.

Keep being 2SD above the norm, and good luck out there.