Biology/Definitions Throwback: 1946

It’s maple syrup season up here in New England, which means I spent most of yesterday in my brother’s sugar shack watching sap boil and teaching my son things like how to carry logs, stack wood, and sample the syrup. Maple syrup making is a fun process, mostly because there’s enough to do to make it interesting, but not so much to do that you can’t be social while doing it.

During the course of the day, my brother mentioned that he had found an old stack of my grandfather’s textbooks, published in the 1940s:

Since he’s a biology teacher, he was particularly interested in that last one. When he realized there was a chapter on “heredity and eugenics” he of course had to start there. There were a few interesting highlights of this section. For example, like most people in 1946, the authors were pretty convinced that proteins were responsible for heredity. This wasn’t overly odd, since even the guy who discovered DNA thought proteins were the real workhorses of inheritance. Still, it was interesting to read such a wrong explanation for something we’ve been taught our whole lives.

Another interesting part was where they reminded readers that despite the focus on the father’s role in heredity, that there was scientific consensus that children also inherited traits from their mother. Thanks for the reassurance.

Then there was their descriptions of mental illness. It was interesting that some disorders (manic depressive, schizophrenia) were clearly being recognized and diagnosed in a way that was at least recognizable today, while others were not mentioned at all (autism). Then there were entire categories we’ve done away with, such as feeblemindedness, along with the “technical” definitions for terms like idiot and moron:

I have no idea how commonly those were used in real life, but it was an odd paragraph to read.

Of course this is the sort of thing that tends to make me reflective. What are we convinced of now that will look sloppy and crude in the year 2088?

I’m Thinking of a Word That Starts With a…..

I’ve mentioned before that I like to try to find unusual ways of teaching my 5 year old son statistical concepts by relating them to things he likes. This pretty much doesn’t work, but this week I tried it again and attempted to use a discussion about letters to segue in to a discussion about perception vs data. He’s getting in to some reading fundamentals now, and is incredibly curious about what words start with which letters. This leads to our new favorite game “Let’s talk about ____ words!” where we name a letter and then just think of as many words as we can that start with that letter.

This game is fun, but he’s a little annoyed at letters that make more than one sound. This week he got particularly irritated at the letter “c”, which he felt was hogging all the words while leaving “k” and “s” with none. I started trying to explain to him that “s” in particular was doing pretty alright for itself, but after discussing “cereal” and “circus” he was pretty convinced that “s” was in trouble.

As I was defending the English language’s treatment of the letter “s”, I started to wonder what the most common first letter of words actually was. I also wondered if it was different for “kids words” vs “all words”. After some poking around on the internet, I discovered that there’s a decent amount of variation depending on what word list you go with. I decided to take a look at three lists:

  1. All unique words appearing more than 100,000 times in all books on Google ngrams (Note: I had to go to the original file here. The list they provide on that site and the Wiki page is actually the most common first letters for all words used, not just unique words. That’s why “t” is the most common….it’s counting every instance of “the” separately)
  2. The 1,000 most commonly used English language words (of Up-Goer 5 fame)
  3. The Dolch sight words list, used to teach kids to read

Comparing the percent of words starting with each letter on each list got me this graph:

As I suspected, “s” does quite well for itself across the board, though it really shines in the “core words” list. “K” on the other hand is definitely being left out. It’s interesting to see what letters do well in bigger word sets (like c, p and m), and which ones are only in the smaller sets (b, t, o and w). “W” seems very popular for early reading lists because of words like “what”, “where”, “why”. “S” actually is really interesting, as it appears to kick off lots of common-but-not-basic words. My guess is this is because of its participation in letter combinations like “sh” and “sch”.

Anyway, my son didn’t really seem to grasp the “the plural of anecdote is not data” lesson, so I pointed out to him that both “Spiderman” and “superhero” started with “S”. At that point he agreed that yes, lots of words started with “s” and went back to feeling bad for “K”. At least that we can agree upon.

Now please enjoy my favorite Sesame Street alphabet song ever: ABCs of the Swamp

Baby, You Can’t Drive My Car: Part 2

I didn’t get back to all the comments on my previous “Baby, You Can’t Drive My Car” post, but there were some good ones. Between those and some emails I got, there was some interesting statistical/follow-up issues raised that I felt deserved their own post.

First, the general “not getting driver’s licenses” trend among young people seems to actually be reflective of some larger delayed adulthood trends among “iGen”, or those born in 1995-2005. This Wall Street Journal article by a researcher studying the newest up and coming generation says the following ” As I found in analyzing seven large national surveys of teens, today’s adolescents are less likely to drive, drink, work, date, go out and have sex than were teens just 10 years ago. Today’s 18-year-olds look like 15-year-olds used to.” This may not be a bad thing. Per the article, this group is less likely to get in car accidents than teens in previous years, which is obviously a good thing. On the other hand, on my previous post commenter Michael pondered if delaying getting your driver’s license is good for your long term driving skills. Will the group that failed to get a license at 16 be getting in more accidents at 30 as a result? TBD.

Second point, related to the first, is the success of teen driver’s license initiatives. The initial results on these look good, but most of those numbers are obtained using raw numbers. In other words, we don’t know if those licensing laws improved the driving of 16 and 17 year olds, or if the reductions in crashes were due to fewer of them being able to drive at all. Not a bad outcome either way, but possibly a misleading one if the numbers pick back up at a later date. While I can definitely see some types of impulsiveness associated with teen drivers ebbing with age, there are certain errors inexperienced drivers might make regardless. Again, TBD.

Third point I started to wonder about….are we seeing a corresponding uptick in non-driver’s license IDs? If not, how are people without licenses identifying themselves? I can see people not wanting the hassle of getting a driver’s license, but it seems awfully hard to function without an ID. Even buying alcohol would be a challenge, at least in my area.

Fourth, related to #3, is the low driving rate in some states being driven by a population that can’t get a license? I mentioned in the first post that states with higher populations of those under 18 would likely have lower rates, but someone also noted that it’s not clear where the population numbers came from. Since some population counts include all residents regardless of citizenship, and since states vary wildly in allowing those without documentation of citizenship to get licenses, it’s possible that accounts for some of the differences.




What I’m Reading: November 2016

Like everyone else in the US, I’ve been reading a decent amount about the election. I have a few links of interest on that topic, but out of respect for the totally burned out folks, I have put those together at the end. I will however reiterate that I think this 2014 post from Slate Star Codex remains the most important blog post about the current political climate I have ever read.

Speaking of important blog posts, the Assistant Village Idiot’s “Underground DSM” post has been updated for 2016 and it continues to be one of the best pieces on mental health I have read. This needs to be a whole book AVI.

This month my book is “The Joy of X“, which I haven’t started reading yet. I’m hopping on a plane tomorrow morning though, so I plan on getting through most of it then. Also, I’m trying to put together a list of math or stats related books I want to read in 2017 (like this one from 2016), so if you have any recommendations I want to hear them!

An interesting piece on testing for fake data in research. The testing exploits the fact that making up realistic “random” data is a hell of a lot harder than it sounds.

For my teacher friends: the Mathematica Policy Research group took a look at teacher quality to see if that drove performance differences between low income and high income students, and it doesn’t. Poor kids and rich kids are actually equally likely to have good or bad math teachers:

On an academic note, here’s a ranking of colleges and their acceptance of viewpoint diversity.

I have no idea how accurate this primate hands family tree is, but it’s kind of awesome.

Ben, who I collaborated with on a Pop Science series earlier this year, did a series on the (possible?) death knell of the Pumpkin Spice craze by buying every Pumkin Spice product his store had to offer.

Okay, now below this line be politics:


This NYTs article is from August, but it covers a lot of interesting ground about how Facebook is skewing the way we talk about politics. I’ve put myself on FB timeouts more than once this election season, and I’ve enjoyed it every time.

There’s been a lot of talk about the electoral college this week, and whether or not it’s fair. This is one of those discussions that is sort of about numbers, but really about something else, so I’m not going in to that here. What I am interested in is Maine’s new experiment with ranked choice voting. More labor intensive to tally, but it’s got some interesting quirks that may change incentives for campaigns. A full explanation of how it works here.

A second Slate Star Codex link, but it’s too good not to share. Written the night before the election, he reminds everyone that in a close election over interpretation of the outcome is a dangerous game.

Also, I’ve gotten a request to start holding “Controversial Opinion” dinner parties. I kind of want to do this. There will be wine.

3 More Examples of Self Reporting Bias

Right after I put up my self reporting bias post last week, I saw a few more examples that were too good not to share. Some came from commenters, some were random stories I came across, but all of them could have made the original list. Here you go:

  1. Luxury good ratings Commenter Uncle Bill brought this one up in the comments section on the last post, and I liked it. The sunk cost fallacy  says that we have a hard time abandoning money we’ve already spent, and this kicks in when we have to say how satisfied we are with our luxury goods. No one wants to admit a $90,000 vehicle actually kind of sucks, so it can be hard to figure out if the self reported reliability ratings reflect reality or a desired reality.
  2. Study time Right after I put my last self reporting bias post up, this study came across my Twitter feed. It was a study looking in to “time spent on homework” vs grades, and initially it found that there was no correlation between the two. However, the researchers had given the college students involved pens that actually tracked what they were doing so they double checked the students reports. With the pen-measured data, there actually was a correlation between time on homework and performance in the class. It turned out that many of the low performing kids wildly overestimated how much time they were actually spending on their homework, much more so than the high performing kids. This bias is quite possibly completely unintentional….kids who were having a tough time with the material probably felt like they were spending more time than they were.
  3. Voter preference I mentioned voter preference in my Forest Gump Fallacy post, and I wanted to specifically call out Independent voters here. Despite the name and the large number of those who self identify as such, when you look at voting patterns many independent voters are actually what they call “closet partisans”. Apparently someone who identifies as Independent but has a history of voting Democrat is actually less likely to ever vote GOP than someone who identifies as a “weak Democrat”.  So Independent is a tricky group of Republicans who don’t want to say they’re Republicans, Democrats who don’t want to say they’re Democrats, 3rd party voters, voters who don’t care, and voters who truly have no party affiliation. I’m sure I left someone out, but you can see where it gets messy. This actually also effects how we view Republicans and Democrats, as those groups are normally polled based on self identification. By removing the Independents, it can make one or both parties look like their views are changing, even if the only change is who checked the box on the form.

If you see any more good ones, feel free to send them my way!