I’ve mentioned before that I like to try to find unusual ways of teaching my 5 year old son statistical concepts by relating them to things he likes. This pretty much doesn’t work, but this week I tried it again and attempted to use a discussion about letters to segue in to a discussion about perception vs data. He’s getting in to some reading fundamentals now, and is incredibly curious about what words start with which letters. This leads to our new favorite game “Let’s talk about ____ words!” where we name a letter and then just think of as many words as we can that start with that letter.
This game is fun, but he’s a little annoyed at letters that make more than one sound. This week he got particularly irritated at the letter “c”, which he felt was hogging all the words while leaving “k” and “s” with none. I started trying to explain to him that “s” in particular was doing pretty alright for itself, but after discussing “cereal” and “circus” he was pretty convinced that “s” was in trouble.
As I was defending the English language’s treatment of the letter “s”, I started to wonder what the most common first letter of words actually was. I also wondered if it was different for “kids words” vs “all words”. After some poking around on the internet, I discovered that there’s a decent amount of variation depending on what word list you go with. I decided to take a look at three lists:
- All unique words appearing more than 100,000 times in all books on Google ngrams (Note: I had to go to the original file here. The list they provide on that site and the Wiki page is actually the most common first letters for all words used, not just unique words. That’s why “t” is the most common….it’s counting every instance of “the” separately)
- The 1,000 most commonly used English language words (of Up-Goer 5 fame)
- The Dolch sight words list, used to teach kids to read
Comparing the percent of words starting with each letter on each list got me this graph:
As I suspected, “s” does quite well for itself across the board, though it really shines in the “core words” list. “K” on the other hand is definitely being left out. It’s interesting to see what letters do well in bigger word sets (like c, p and m), and which ones are only in the smaller sets (b, t, o and w). “W” seems very popular for early reading lists because of words like “what”, “where”, “why”. “S” actually is really interesting, as it appears to kick off lots of common-but-not-basic words. My guess is this is because of its participation in letter combinations like “sh” and “sch”.
Anyway, my son didn’t really seem to grasp the “the plural of anecdote is not data” lesson, so I pointed out to him that both “Spiderman” and “superhero” started with “S”. At that point he agreed that yes, lots of words started with “s” and went back to feeling bad for “K”. At least that we can agree upon.
Now please enjoy my favorite Sesame Street alphabet song ever: ABCs of the Swamp
4 thoughts on “I’m Thinking of a Word That Starts With a…..”
The Dolch list overlaps with the Swadesh Lists used by (some) linguists to look for relationships between languages, because they are the most basic words and more stable. https://en.wikipedia.org/wiki/Swadesh_list
The Swadesh list in English is disproportionately Germanic, which is why we call English a Germanic language even though more words overall come from the Romance languages, French & Latin.
Relatedly, the reason for the variation in some letters, such as f (Germanic words) and p (Latin words) is because of Grimm’s Law. Yes, the same Grimm as in the fairy tales. https://en.wikipedia.org/wiki/Grimm%27s_law
Just some fun for you.
He got a whole book of fairy tales and a law named after him? Talk about a life well spent.
And K and S are “in trouble” in English because:
1, Old Norman, from whence much English vocabulary comes, was a Romance language, therefore its spelling reflected the Vulgar Latin palatalization; and in the Langues d’Oïl soft C merged with S.
2. When English orthography was standardized in the 17th century, the printers who did so were Flemish. That’s why English spelling has such weird rules to begin with and why one word in six has irregular pronunciation.
See this is why I post this stuff. I learn so much from you all.
Comments are closed.