Fifty Cents and a P.O. Box: TEAM Newsletters 1974-1977

Last week I had the idea to upload my grandfather’s stats newsletter from the 70s and 80s in to AI and see what it could get out of it. This week I got a chance to do it, and let me tell you it was a far more frustrating process than I anticipated. It turns out a lot of them got uploaded as JPEG files rather than PDFs (not sure what 2016 Bethany was doing), and so it kept timing out. I ended up asking it to summarize it year by year to see if it could get through things. This seemed to work a bit better, so here are the first few years. This will mostly be of interest to my family, but for everyone else it does show some interesting history of statistical thought and how someone in the pre-internet 1970s would have communicated his ideas. For reference, he was born in 1923 so he would have been in his 50s when he started writing these. There are 12 years total so I think this might be a three part series.

Just to note, I have all the original newsletters scanned in a shared drive here, so if you have a question about something AI flagged, I can go check what it’s referring to.

Summaries from ClaudeAI:

1974: The year James launched TEAM from a P.O. Box in Tamworth with nothing but a fifty-cent newsletter, a book of probability charts, and the conviction that graphical statistics could be made accessible to anyone willing to learn.

What he was writing about: The very first issue (Winter/Spring 1974) launched straight into probability graphs and statistical distributions — no warm-up, no preamble. By Summer 1974 he was already tackling extreme value distributions and connecting them to real-world catastrophes like the 1972 Susquehanna River floods from Hurricane Agnes. By Fall he was covering the Weibull distribution. By Winter he’d pivoted entirely at reader request to cover plain time series graphs and even analyzed the Dow Jones Industrial Average using his geometric moving average technique.

What’s remarkable about him personally from 1974:

  1. He listened to his readers immediately. The Winter 1974 issue opens with “Some of our readers have rightly pointed out…” and then pivots the entire issue to address their feedback. In his very first year, he was already treating this as a conversation, not a lecture.
  2. He explicitly named himself. The back page of Volume 1, Number 2 is the first time we see: “PUBLISHER AND EDITOR — James R. King.” That’s your grandfather putting his name to his work from the very beginning.
  3. He priced it at just 50 cents. The first issues were fifty cents each. He wasn’t doing this for money — he was doing it to get the knowledge out.
  4. He was already connecting statistics to life-and-death stakes. In his second issue ever, he wrote about flood probabilities and environmental catastrophes — making clear that getting the math right wasn’t academic, it mattered for people’s safety.
  5. He was building something from scratch. The back pages show him selling graph papers, books, and probability chart assortments out of a P.O. Box in Tamworth, NH — a one-man operation from day one.

1975: James spent the year applying binomial statistics to everything from political polls to baseball standings to lung cancer rates, proving his methods worked just as well in a voting booth or a doctor’s office as they did on a factory floor.

1975 was a year where James stretched his statistical toolkit far beyond the factory floor, applying binomial distribution methods to everything from political polls and lung cancer rates to baseball standings and airline no-shows. He spent two full issues methodically dismantling the way the 1974 Massachusetts elections were being reported in the press, showing with rigorous but accessible math that the headlines were misleading voters and eroding trust in the democratic process. He also introduced humor into the newsletters at reader suggestion, sprinkling in jokes and quips to lighten what could be dense material — a small but telling sign that he cared about his audience’s experience, not just the content. By year’s end he was openly inviting reader input on future topics, treating his subscribers less like customers and more like a community of fellow thinkers working through real problems together.

1. He had a genuinely broad and curious mind. 1975 is remarkable for the sheer range of what James applied statistics to. In a single year he tackled political polling, lung cancer death rates between cities, the 1974 Massachusetts gubernatorial election, baseball standings (the 1968 American League), airline no-show rates, membership dropout rates, worn machine tool guides, and resistor calibration in electronics manufacturing. He wasn’t a specialist — he was someone who saw statistical thinking as a universal tool for making sense of the world.

2. He took on politicians directly — and with relish. The Late Winter and Spring 1975 issues are devoted to demolishing the way political polls were being reported during the 1974 Massachusetts elections. He showed mathematically that the raw poll numbers were being wildly misinterpreted, that Sargent’s apparent big lead over Dukakis was largely statistical noise, and that in the attorney general race, Spaulding was never actually as far behind as the headlines suggested. He wrote: “We submit that the political process is in serious danger of great voter distrust when the raw statistical results of polls can be so grossly misinterpreted.” This was James using statistics as a civic tool — not just an industrial one.

3. He added humor deliberately and self-awarely. The 1975 back pages include this wonderful note: “Our marketing types have suggested the addition of some humor to Methods to lubricate the sometimes dry statistical discussions. Therefore, you will find some topical gallows humor sprinkled here and there.” And he followed through — inflation jokes, Irish curses, a man bragging about a suit he hadn’t paid for yet. He knew his writing could be dense and he actively worked to make it more human.

4. He was already building a network and inviting collaboration. By the Summer 1975 issue he was openly soliciting reader input for the following year: “Since we have several months until press time, we welcome reader comments about these topics.” He was treating his readership as collaborators, not just subscribers — a remarkably modern instinct for a one-man operation in 1975.

5. His work had real, immediate consequences. The Fall 1975 resistor bridge story is a gem: James’s sign test analysis revealed that a quality control bridge was misreading resistors — and that bridge B was sitting next to a discharge welder with no shielding. Fixing those problems led to significant reductions in rejected resistors in both production and quality control. He wasn’t writing theory papers. He was solving actual problems in actual factories, and the people he worked with could see the results immediately.

1976: All four issues were devoted to the Weibull distribution — James’s most technically focused year yet — culminating in a brilliant case study of a compressor wear problem that he called one of the most ingenious applications of Weibull analysis he had ever seen. This is what Claude calls “a fascinating year — and beautifully readable scans.”

Summary: 1976 was the year James went deep on the Weibull distribution, devoting all four issues to it in a sustained, methodical way that showed a teacher at the height of his powers. He covered the mathematics, the graphical methods, confidence intervals, parameter estimation, comparing two Weibull tests against each other, and then closed the year with two wonderfully concrete case studies — one analyzing the wear bands on an unlubricated industrial compressor running on CO₂, and another using Weibull analysis to catch systematic errors hiding in tensile strength test data. It was his most technically focused year so far, but the writing never lost its clarity or its grounding in real engineering problems.

5 Notable Things from 1976

1. He acknowledged a student’s work generously and by name. The Winter 1976 issue features an extended analysis of a compressor wear problem drawn from a thesis by a Mr. Walter J. Gerus of the Standard Oil Company of Ohio. James opens by writing: “We gratefully acknowledge his permission to use this material because we believe that it includes one of the most ingenious applications of Weibull analysis which we have yet seen.” This was a man who genuinely celebrated good thinking wherever he found it — even in a student’s thesis — and who made a point of giving credit publicly.

2. He pushed back on a widely accepted concept. In the final issue of the year, James took direct aim at the “bathtub curve” — the standard model used to describe product failure rates over time — arguing that its popularity at the system level had been harmfully over-extended to individual component levels. He wrote that this had become a deterrent to real product improvement by encouraging quick debugging fixes instead of painstakingly isolating individual failure causes. It was a bold position to take against an industry-wide assumption, delivered without hedging.

3. He introduced the concept of a “hemibel” — and it’s wonderfully practical. In the compressor wear study, James casually introduces the concept of a “hemibel” — a max/min ratio of 3:1 or more in your data — as a signal that something important is going on worth investigating. A ratio of 10:1 he calls a “bel,” and says it is “always considered a prime signal to investigate.” These are the kinds of hard-won rules of thumb that only come from decades of real fieldwork, and he drops them in as if they’re obvious — which, to him, they probably were.

4. He kept his sense of humor even in his densest technical year. Despite 1976 being wall-to-wall Weibull mathematics, the jokes kept coming. He included a page of definitions: “Confusion: one woman plus one left turn. Excitement: two women plus one secret. Bedlam: three women plus one bargain. Chaos: four women plus one luncheon check.” And a note on expertise: “Be careful about calling yourself an ‘expert.’ An ‘ex’ is a definite has-been, and a spurt is a drip under pressure.” The humor was always there, keeping the newsletter human.

5. He raised his subscription price for the first time — and was honest about why. At the end of 1976 he announced the subscription price would rise from $2.00 to $3.00 for 1977, noting plainly: “Due to continuing cost increases for publication production and postage, we are forced to increase our subscription price for the coming year.” No spin, no marketing language — just a straight, honest explanation. That directness, even in something as mundane as a price increase, was very much part of who he was.

1977: James turned his attention to process capability and measurement quality, building toward a gripping final issue about two rival resistor manufacturers — one who fixed his measurement problems and prospered, one who refused and went out of business within three years.

Summary: 1977 was the year James turned his full attention to process capability and measurement quality, building a sustained four-issue arc that moved from probability plotting with confidence intervals, through specification analysis, to comparing test results from two samples, and finally to a gripping real-world tale of two resistor manufacturers — one who fixed their measurement problems and prospered, and one who refused to and went out of business. It was his most practically focused year yet, less about abstract statistical theory and more about the direct consequences of getting measurement right or wrong in a manufacturing environment.

5 Notable Things from 1977

1. He told a genuinely dramatic business story — with a moral. The Fall 1977 issue is one of the most compelling things James ever wrote. He follows two resistor manufacturers — “City M” and “City N” — both facing the same measurement problems in their test equipment. City M investigated thoroughly, fixed their bridge warmup drift, shielded their test area from discharge welders, and implemented tight temperature control. Their yield losses dropped from 6–9% to under 0.5%, they won a major aerospace contract by being the only company able to quote it, and they prospered for years. City N repeatedly refused to face their problems — and went out of business within three years. James ends it with quiet finality: “Thus, we have it that M, with a developed awareness towards improved measurement capability, prospered. Company N, which repeatedly refused to face their problems effectively, just disappeared.”

2. He named a specific enemy of quality: the “motion-is-progress mentality.” In his Spring 1977 issue on process capability, James listed common causes of excessive process variation. Among them was what he called a “motion-is-progress or busy-work mentality” — operators fiddling with machine settings for no good reason, mistaking activity for improvement. It’s a razor-sharp observation about organizational behavior that would still ring true in any factory today.

3. He took on product liability law — and was ahead of his time. In the same issue, James walked readers through a hypothetical courtroom cross-examination showing exactly how a company’s standard AQL sampling plan could be used against them by a plaintiff’s attorney. He showed, line by line, how a lawyer could demonstrate that the company knowingly accepted shipments that might be 2% defective — many times worse than their stated quality level. He argued that probability plotting with confidence limits was a far more defensible approach, and that the growing Consumer Product Safety Commission made this increasingly urgent. This was prescient thinking in 1977.

4. He coined a beautiful phrase for confidence intervals. In his Winter 1977 issue on confidence limits, James made an observation so good it deserved to be in every statistics textbook. After explaining that the area between confidence limits contains a wide variety of possible outcomes, he wrote: “This also represents one of the great accidents of statistical nomenclature because the geometry of the situation makes the name of CONFIDENCE INTERVAL inappropriate. The area between such limits is more nearly an IGNORANCE INTERVAL or INDIFFERENCE INTERVAL within which a variety of outcomes may be expected to occur.” The phrase “ignorance interval” is a perfect description — and he was right that “confidence” is misleading.

5. He apologized for cutting the humor — twice. All four back pages of 1977 contain the same note: “Due to the amount of material which we had available this year, there was no spare room for small pockets of humor to lubricate the way but we hope to be back on that track next time.” He said it four times — once per issue — which itself is a little funny. It shows how seriously he took his responsibility to keep the newsletters readable and human, and how much the humor mattered to him as a deliberate editorial choice, not just an afterthought.

Back to my commentary:

This has been a trip to work through, so I’m going to stop at 4 years and pick up with the next 4 years in a subsequent post. In an era of easy word processing I still can’t imagine him typing these newsletters out, but I recall his office as a fascinating place full of dot matrix printers and interesting chairs. He had a beautiful view of the mountains from his room, which was not particularly well heated if I recall correctly. Not sure how he functioned in winter come to think of it. Alright, until next time!

Parkinson’s Laws of Frustration

I’m in a rough stretch at work so no real post again this week, but I thought I’d share something my dad sent me that he found in my grandfather’s things. Apparently he used to send one out with each of his orders. He was the original stats blogger in the family, and I’ve talked about his work before here and here.

I fed this in to ClaudeAI to see if there was a source for this, and it told me the first one was in fact Parkinson’s law, but the rest were from other sources/unknown origin. Claude’s take:

  1. Genuine Parkinson’s Law — C. Northcote Parkinson, 1955
  2. Murphy’s Law — Edward A. Murphy Jr., ~1949
  3. Murphy’s Law corollary — widely circulated, no single author
  4. Unknown origin — a common folk saying of the era
  5. Unknown origin — another widely circulated folk adage
  6. Possibly a medical/diagnostic aphorism — origin unclear
  7. A variant of Murphy’s Law — folk origin
  8. Unknown — general folk wisdom
  9. A variant of the “calm before the storm” principle — possibly related to management theory
  10. Unknown — general pessimistic folk wisdom
  11. Very old proverb — versions exist going back centuries
  12. Often attributed to various people including the military — no definitive source
  13. Unknown origin — but a very relatable universal truth!

This led me to an interesting discussion with Claude about my grandfathers business and newsletter, which ran from 1974 to 1985. I digitized it about 10 years ago and when I have a minute I may upload it in to AI to analyze and see what Claude can pull out. I’ll post any interesting parts of that project here.

While I’m not sure where AI is taking us, the ability to upload old documents and look at them in new ways is certainly a fun benefit. My grandfather would have hated it, but I’ll admit I’m getting a kick out of it.

When Will the 50s Fade As a Cultural Touchstone?

I’ve had a long week so I don’t have a lot of time for writing, but I had an interesting question come up in a group chat this week I thought I’d throw out there for opinions: when do we think the 1950s will fade as a cultural touchpoint for us culturally?

I still see rather consistent references to the 1950s as though it is something we could return to, but at this point that’s now 75 years in the past. I wasn’t alive in 1950. Neither were my parents. If you’re looking for someone with a real memory of 1950, not just a small child’s thoughts, that person would be over 80. That expands a bit if you’re talking about the later 1950s, but the point remains: actual adult memories of the 1950s are fading fast.

I think the persistence of the 1950s are an interesting example of how history isn’t always a straight line. When I was growing up, we wouldn’t have looked 75 years in the past as a reference point, because there was a World War in there. It was very clear 75 years earlier was a different era. You wouldn’t have expected anything to look the same, or to be able to return to that era easily, or to want to go to those eras with WWII looming.

Anyway, I suspect the 1950s will likely fade as a cultural touchstone in the coming decade for two reasons: aging baby boomers, and it will get closer and closer to 100 years ago. I don’t blame baby boomers for citing the decade of their childhoods, but as they age that will be less present in media. Additionally, in the stats tables I see, at some point it’s going to get silly to act like a city should still be compare to what it looked like 100 years ago. We already see this happening a bit with WWII, as those who served have slowly left us. Kind of puts things in perspective….it doesn’t take many decades before the big eras of your life simply become history.

Given the age diversity of readers here though, I’d be interested to hear thoughts on this. Do you still see the 1950s cited often, and if so how long do you think this hangs on?

Kids Are Safer Than Ever. That’s Not an Accident.

Recently there was some interesting Twitter discourse around the perennial hot topic of todays overprotective parents, that basically revolved around the accusation that todays parents are hell bent on making themselves perennially miserable by keeping their kids too safe. This may be true – as a parent myself I’ve certainly seen many people go overboard. However, I think what’s often missing from these discussions is that a lot of overprotective parenting of the last few decades has….worked?

I was discussing this in person with an older acquaintance recently, who was shocked to hear that it was common to tell my sons age (13) not to use the oven when home alone. She was aghast at this, and mentioned that it never would have even occurred to her to limit her children in this manner. I pointed out that fire deaths among kids were down massively in the last few decades. She was certain that wasn’t true. It is.

Now, to be clear, I am not claiming that whole drop came because people stopped letting their tweens/early teens use ovens while in the house by themselves. However, this rule almost certainly gained popularity as part of wider fire safety pushes that improved conditions everywhere. Thus, using the data from the IHME, we see that in the US we are seeing about 1/5 the number of fire deaths in kids as we did back in 1980 when my acquaintance was raising her children:

Keep in mind these are raw numbers, and that during this time the US gained about 10 million extra children in that age range.

These sorts of drop offs are true for pretty much every type of accidental injury type death you can think of for kids. Here are some other causes of death:

Car seat culture is much maligned, but we see a similar pattern there:

Keep in mind that in addition to the population going up during this time, the number of miles driven ALSO went up.

This is not to say that any particular intervention was worth it or was necessary, or was justified. But I have noticed that when people critique overprotective parenting culture, they often are unaware of just how much safer things have gotten. The focus on safety was not just a meaningless endeavor that improved nothing, it actually led to some visible results.

Child death is a horrible tragedy, particularly when it was preventable. I have known two families who lost a child to a preventable accident: one young teen who fell through thin ice and drowned, one toddler who swallowed rat poison. I met both families decades after the fact, and the wounds were still raw. The fallout had never stopped.

I think this is good context when we talk about trade offs, because I want to emphasize again that there are trade offs for these safety measures. Kids need independence, room to grow, room to develop and room to fail. But we can’t forget what people were working to stop when they went down this path, or how much our lives have improved now that child death is a far rarer occurrence.

Not Today, Satan Statistics

I haven’t added to the GPD Lexicon in a while, but I had a phrase come up recently that I realized I’d been using mentally but had never actually used on this blog. It came up in the context of someone sending an insane statistic to a group chat that immediately all of us believed was wildly wrong. No one particularly had time to look it up, although thanks to Twitter someone was able to find pretty quickly what the issue was. I mentioned I had seen the stat flash on my screen previously but just simply didn’t have time that day to go down the rabbit hole of debunking it. That’s what I call a “Not Today, Satan statistic”, which I would define as:

Not today, Satan statistic: a statistic you see in passing that is so clearly wrong or designed as clickbait/ragebait that you decide to scroll by without even stopping to figure out where the heck they got those numbers.

Now, this one you have to be a bit careful with. You can’t use this one to dismiss statistics that are being thrown out in the context of an argument you are actually in the middle of, you actually have to deal with those. These are more the types of things you see on social media or hear in passing in conversations, where you can either go on your merry way or actually stop and push back. This is a moment of choosing to protect your peace over derailing yourself in to a follow up.

You also can’t use this one if you are the type of person who has no problem not challenging incorrect information, and just let it go regularly. If you don’t routinely hear record scratches in your head when people say wrong or unlikely things in conversations, this phrase is not for you. This is for those of us who regularly derail discussions in to a “wait is that true?” and find that a noble calling. For those people, this is an occasional permission slip to let it go sometimes. In 2026 there are any number of trash websites generating ridiculous statistics just to get you riled up.

In the immortal words of the AVI to my younger brother when he was in middle school: you know you don’t have to die on every hill.

Empathizing and Systematizing: More Than You Ever Wanted to Know

The AVI put up a post last week about the Extreme Male Brain Theory of Autism that had some interesting commentary about the differences between the sexes on tests measuring tendencies towards empathizing (female linked) and systematizing (male linked). Skipping over the relationship to autism theories, I was particularly interested in one part of his post and it end up sending me down a bit of a rabbit hole looking up information on it:

How would Empathising and Systematising be opposite ends of a spectrum? I don’t think they are. I think two separate things are being measured that both sexes have in different proportions. To take the hormonal stereotype, I don’t think that testosterone and estrogen are “opposite” chemicals. I think they are different chemicals. I don’t think the data shows that one extra bit of empathy means a complementary drop in systematising.

I liked this part because I have been rather fascinated by people I encounter who appear to have neither of these traits, and I wondered how common that was. So I went digging, and it turns out there’s actually quite a bit of research around the whole thing I thought was interesting. Because he will read this, I will note that none of this is a particular commentary on what AVI wrote in his original post, nor should it be taken as a criticism of anything he said. His comments just sent my thoughts off in a different direction from what he was focused on, so I wrote my own post rather than continue to respond to his. That’s a major plus side of keeping up your own site IMHO. Anyway, let’s kick it off with some clarifications and some background!

The empathizing and systematizing idea was first investigated by Simon Baron-Cohen as part of his work on autism. It was developed a lot by various researchers over the years, but the original intent was really to see if people with autism were deficient in empathy in some way and if that could help classify the disorder. Typically empathizing and systematizing are measured by giving two different tests, one for each, and then assigning you an “EQ” score and an “SQ” score. For a lot of Cohen-Baron’s work he then subtracts one from the other to figure out your gap between the two.

Now before we go on, I want to point out when talking about empathizing and systemizing type tendencies, it’s really easy to start swapping those words out for the shorter and more familiar “feelings” and “logic”. Those are close to what we’re talking about what we say empathizing and systematizing, but not actually the same thing.

In the article AVI linked to where Simon Baron-Cohen wrote about his research, he described empathizing as “Empathy is the drive to recognise another person’s state of mind and to respond to another person’s state of mind with an appropriate emotion.” While this relates to feelings, it is not the same thing as having a lot of feelings yourself. In fact, when used appropriately, it can actually help you manage your own feelings.

Imagine a circumstance in which you find out a friend or family member was up all night with a sick child, and that person now snaps at you over something minor. The vast majority of people will probably let that go or choose to deal with it at some other time. “In their shoes, I would also be short tempered” we think, and we move on with our day. In that case, empathy reduced the number of our own feelings we had to deal with. If you’ve ever dealt with a person who had no ability to do this, you probably would end up annoyed at that person pretty quickly. “Good grief man, the poor guys been up all night, could you really not let that one comment go? You had to get in to it right now? Really?” Lack of empathy can absolutely stir up a lot of feelings. None of this is to say that empathy always causes you to make the right call or can’t be taken to far, just that it is a somewhat different concept from that of having “feelings”. Everyone has feelings, and people vary in their emotional regulation. All of that is different from empathy, which is how you understand and respond to other people’s states of mind.

Conversely, systematizing is also not the same as “logic”. Per the same article as above “Systemising is the drive to analyse or build a system where a system is defined as anything that follows rules or patterns.” Again, that might be close to logic, but not identical. As anyone who has ever dealt with bureaucracy knows, just because you have a system doesn’t mean you have logic. A system that doesn’t bend with reality or is impenetrable to others may follow an internal logic, but may not get you very far if you have to interact with anyone outside your own head.

All that may sound like I’m saying empathy is better than systematizing, which is not my intention. I did want to provide a slight bit of balance to the feelings/logic shorthand, which unfortunately can unintentionally do the opposite at times, if one assumes that logic is a moral good. However, this does lead in to another distinction I was surprised to learn: good empathizing scores are more common in everyone than good systematizing scores. This gets confused easily because of a misreading of what Cohen-Baron says in his write up “The first theory, known as the empathising-systemising theory of typical sex differences, posits that, on average, females will score higher on tests of empathy than males, and that, on average, males will score higher on tests of systemising than females.” This has led many people to say that men are higher in systematizing than empathizing, but that’s actually not the case. Here’s the data from the first paper on this. The paper’s in Japanese, but the numbers are still understandable:

As you can see, women are higher than men in empathizing but men are higher in empathizing than systematizing. Men are also higher than women in systematizing, meaning women have a bigger gap between the two. But again, people as a group are higher in empathizing traits than systematizing traits. I checked out a few different papers and this is a pretty consistent finding. It might be a measurement tool artifact, but it turns out however you measure it, men score better on empathy than systematizing. So it is not strictly true to say that men have a systematizing brain and women have an empathizing brain, we really all have an empathizing brain but some people have that balanced with systematizing and some not so much.

I pondered this for a bit and went down a side road about how they were measuring this, but I came away suspecting this is measuring something real. While systematizing is an incredibly useful skill, any group of animals that lives together has to learn how to work as a group. Learning to respond to others mental states is a big part of that. Many men might say they are not great at reading others emotional states, but all the best business men who have ever lived are actually fantastic at figuring out how others are feeling and what to do about it. Ditto for any man working under a boss/superior officer/etc or pursuing a woman they really want. An inability to figure out how to work in those circumstances is actually a pretty good sign that you have a real deficit here.

Which brings me to what I was really curious about: how many people really score very low in one or both of these categories? Baron-Cohen’s traditional research doesn’t actually tell us, since he normally just merges the two metrics in to one by subtracting one from the other and getting the gap, but thankfully for me someone else decided to looked at just that. In the paper “Measuring Empathizing and Systemizing with a Large US Sample” some researchers decided to measure a couple thousand people for empathizing and systematizing and slice up the data a few different ways. When I read through the paper I was very excited to see that all the same questions that had occurred to me were the ones they were going after, including what this this data really looked like when you drilled down in to it. That led them to go beyond the traditional “extreme S/extreme E” framing and to try to map people on to this graph:

Now this looks exactly what I’ve been trying to figure out. So where did people fall?

Well as you can see, just about half of people of both genders are actually moderate at both empathizing and systematizing. The population I was initially curious about, those low in both, are unusual but not unheard of…a little over 2% of women and just under 4% of men. People who are high in both are about 5% of each gender. There actually weren’t a lot of huge discrepancies on the diagonals here, where you see big differences is in things like the High S/Medium E combo (3 times more common in men than women) and the Medium S/High E combo (around 4 times more common in women than men). We see similar discrepancies with the Medium S/Low E category and the Medium E/Low S category.

So overall, we see that for a majority of the population these two traits move in tandem. However, when they get discordant, they tend to skew in a predictable way for your gender. This is summarized by a different table in the paper, based on a slightly different way of calculating things, the one Baron-Cohen used originally:

Here, “extreme” comes from having a large gap between EQ and SQ, it does not necessarily mean that one was the highest in E or S overall. As we saw in the data above, there are a good chunk of Medium S/low E men and Medium E/low S women. Having a gap does not connote hyper competence in either trait. There were also High S/Medium E men and High E/Medium S women, so the gap also doesn’t necessarily mean a deficiency either.

Similarly, “balance” here could mean those people were equally poor on both, or high on both….they just had to be equal. Interestingly, this was part of why the paper authors wanted to go through and break out the scores further. The original idea of taking two different measures and subtracting one from the other to create one measure was designed by Baron-Cohen to try to sus out an “empathy gap” to figure out if that was related autism. For that purpose it was an interesting idea. But for your every day person, it collapsed several extremely different categories of people in to one big group of “balanced” people that covers 70% of the population. Given that those people represent the large majority of the population and likely have radically different experiences of life, this is a questionable choice.

It also appears to have left people with the impression that empathizing and systematizing are more closely related traits than they really are. By choosing to highlight the 30% of cases where they are discordant you can give the impression that there’s some sort of trade off between these two traits. But that’s not the typical experience at all, at least according to this data. The typical experience is that you will be either low, medium or high in both. Based on your gender, you will then see 25ish% of your group be more extreme on one trait than the other, with 5% going in the other direction.

To put this in probability of superiority terms, if you pick a random male and female out of the population and make them take these tests, the authors say the female would have the higher empathizing score about 67% of the time. Males would have the higher systematizing score about 62% of the time. For comparison, if you did the same experiment with human height, men would be taller 92% of the time. So again, to call this a definitive “male brain” vs “female brain” trait you are looking at something about a third less compelling than human height. So overall, there’s evidence of a sex linkage here, but it’s good to keep in mind exactly what that looks like in practice.

Overall, this was a fun little side road to go down. I continue to be rather fascinated by people who appear to have neither empathy nor systematizing tendencies and think we need more research on how these people end up navigating the world. Are they more or less happy than the rest of us? Do they end up with some other trait they are compensating with? Do they have personality disorders? How about on the other end….is being really high on both of these an advantage? Or are these traits unconnected to life success at all?

Many questions here, I’ll keep poking around.

Diagnosing the Literature: When Case Reports Are the Disease

A few months ago, I like many others, was surprised to read the article about how Oliver Sacks had made up many of his most famous stories about his patients. Printed in the New Yorker, this article hit many in the scientific community where it hurt. A lot of us had read Sacks books in undergrad or grad school, he was considered an inspiration in both the field of neurology and science communication. Discovering he had made up some of what he wrote about threw his whole body of work in to question.

Unfortunately, the problem didn’t stop there. Two months later the New Yorker published another investigative report on a different famous case study, this one about a baby who allegedly showed signs of opioid exposure from breastfeeding from a mother who was taking codeine. The problem? It was made up. But wait, it gets worse. The original case report was followed by another one published in the Canadian journal Paediatrics & Child Health, and this week they put a label on ALL their case studies in the series stating that they were fictional.

Retraction Watch has the full story here, but it turns out the story for this journal is even more confusing than you’d think. At some points (but not all) the journal actually asked authors to submit fictional case studies for learning purposes. Unfortunately this was never flagged on the published items themselves, so readers were left with the impression they were reading real stories. Adding to the confusion is that the instructions didn’t always say this, so some of these reports were real but are now marked as fictional and that can’t be undone. Confused? Me too.

So why does this matter?

Back when the Sacks thing first came up, the AVI posted on it and Grim asked a really good question in the comments:

It’s far enough outside my area that I’ve never heard of Oliver Sacks before today. Nevertheless, I believe you and bs king when you say that this sort of fraud affects the whole society. How would you characterize the effects it has had, which someone like me might be affect by without being aware of them? It obviously affected your profession and several others, which each exercise influence, for example.

My reply at that time (including some commentary on an earlier Dante comment I’d made) with some parts I think are important bolded:

From my perspective, I took Dante to mean that being a fraud victim (or even merely aware of fraud) erodes trust in society in general, so each new case will keep chipping away at how we all view each other. So if I took Sacks writing at his word and found I was deceived, I will subsequently distrust your writing more merely to ensure I’m not deceived again. It’s a reality of life I guess, but it just strikes me as sad that everything I read my first thought needs to be “is this person fabricating this”? So my comment mostly wasn’t about Sacks himself, but about how much skepticism we need to carry to operate in the world. There’s downsides to that.

For Sacks specifically, case studies are often the first step in medical research. Some diseases or problems are simply uncommon enough that the first person to notice it can’t do a whole study, so they write up a “here’s a weird thing I saw in a patient” report and then see if anyone else has seen it too. This was the first step in discovering HIV for example. Sacks writing was basically case reports, and they influenced a lot of people’s thinking about where the margins of brain function were. These are supposed to be teaching tools so you wouldn’t particularly expect people to make them up….theres no block buster finding on the other end, it’s just one persons “hey I saw some weird others should be aware of in case they see it too”. Sacks books were often assigned in classes because they were so accessible (that’s where I first read him), so a huge number of people were fed erroneous information as part of their learning. Even if it just impacts their idea of how the fringes operate, it’s still incorrect information. That it was done knowingly is even sadder. So overall I agree with AVI, it’s likely not a direct effect, but it means the foundation is shakier than we suspected.

To elaborate further: after working in hospitals for years, I can’t overstate the importance of the anecdotal “you know I saw something like this once”. It’s what separates really experienced clinicians from new ones. Published case reports tend to help that process, and people pass them around to help with their own sense of what to look for. Knowing that an anecdote is a made up example vs a real thing someone saw is pretty critical in clinical practice. We’ve all seen made up examples in textbooks that help explain concepts to us, and those are a fine teaching tool in context. But as the Retraction Watch article ends: “A narrative that is fictional but published in the format of a genuine case report, without disclosure at the time of publication, is functionally indistinguishable from fabrication in the scientific record.”

I will only add to this that case reports are the single hardest thing to fact check. There’s no statistical method to check, no data anomaly to detect. By definition you are talking about something people haven’t seen very often, and the exact details are often covered by medical privacy. Not a great situation.

Alonso de Salazar Frias: Help Me Out Here

Someone recently forwarded me Jonah Goldberg’s column on Alonso de Salazar Frias. That link is probably paywalled, but Wiki has a pretty good summary here. The short version: Frias was a priest and canon lawyer who ended up with a role in the Spanish Inquisition, and expressed extreme skepticism over the witchcraft accusations he was hearing. Rather than buy in to the hysteria, he decided to try to use empirical methods to test the witchcraft accusations, looking for consistency in statements, corroborating evidence. Ultimately, he concluded “I have not found one single proof nor even the slightest indication from which to infer that one act of witchcraft has actually taken place…the testimony of accomplices alone without further support from external facts substantiated by persons who are not witches is insufficient to warrant even one arrest.” This ultimately got him the nickname “the Witches Advocate” and he seemed to be rather successful in getting people freed.

For obvious reasons I am quite taken by this story, but my attempts to locate much information have failed. There appears to be one book written about this from 1980, and it’s $100+ on Amazon and totally unavailable in my local library. So I figured I’d throw it out here and see if anyone knew of any good general books on the Spanish Inquisition that might help me find more information, or if anyone had any other ideas of where to find information. Any help appreciated. Update 3/2/26: My lovely brother informed me he’s developed a talent for finding rare/old books and was able to locate a digitized copy of the specific book I was looking for for free online! The comments helped tremendously though, as I was able to find several other relevant resources and am also going to inquire with my local library about some others. Many thanks to all who weighed in!

While we’re on the topic though, I figured I’d link to another story of false accusations. A Tiktok psychic from Texas has been ordered to pay a college professor from Idaho $10 million for falsely accusing her of the murders of 4 college students in Moscow Idaho. This story may sound obvious to the point of being silly, but in light of my true crime series I think it’s worth looking at the details on this one. When I talk about how out of control true crime has gotten, a lot of people ask me rather skeptically “so what if some idiots say dumb stuff on the internet? People have always said dumb stuff”. This is true, but I think this case is a good example of how dark this stuff can get. The full case docket is here, but I’ll point out a few highlights:

  1. By her own admission, the Tiktok psychic (Ashley Guillard), had no evidence that the woman she accused had anything to do with the brutal murder of 4 college students in her town. In fact, there was no evidence this professor even knew the murdered students. Per the defendants own admission, she pointed the finger at this professor (Rebecca Scofield) because when she heard about the murder she went to the college website and did a psychic reading over each member of the history department and Scofield had bad vibes.
  2. Guillard had 80k+ followers on Tiktok when this started and gained 20k more when she started making these accusations. She put up over 100 videos, repeatedly sharing this professors contact information and home address with her fans. While you and I may not believe that a psychic can solve crimes, I will note that people who sought one out on Tiktok probably do, and thus took these accusations quite seriously. Scofield experienced harassment, had to install all sorts of security systems at her house, had her workplace flooded with calls, their social media flooded with comments about her and anyone connected to her harassed about her being “the real killer”.
  3. Part of Guillards claim about Scofield was an extremely elaborate story about how she had been having an affair with one of the female victims, who was a student. Guillard then claimed Scofield had the 4 students murdered to cover this up. One of the sad parts of the case was Scofield pointed out that while most people at her workplace discarded the murder claim, many more seemed to give some credence to the affair claim. There did seem to be an air of “well there must be something to this, maybe the truth is in the middle”. But again, but Guillard’s own admission, she got this information from doing a psychic reading off a website photo. Sometimes where there’s smoke there is no fire.
  4. Scofield filed this lawsuit on December 21st, 2022. It took over 3 years for her to get a judgement against someone who admitted she got her information by picking her photo off a website. Just imagine that was you or someone you love, or heck even just feel friendly towards. Guillard lives off disability and meager Tiktok revenue. Scofield will likely never get back money for her legal bills, will always have people somewhat believing she slept with a student, and for what? The crime of living in the same town a murder happened in, and catching the wrong persons eye at the wrong moment.
  5. All of this occurred against the backdrop of a crime for which someone else ultimately pled guilty, and an accusation for which the defendant admitted she had zero actual evidence. Now imagine if the real killer hadn’t been found, or if Scofield had a more plausible connection to the victims, like if she was a neighbor. There still wouldn’t have been any evidence beyond the psychic reading, but one imagines the damage inflicted could have been exponentially greater.

Demanding empirical evidence for serious accusations, even those “just” made on TikTok, should never have gone out of style.

Carlisle Method Take 3: Carlisle Harder

While looking for something else this week I found my old posts on the Carlisle method (2017) and the one year later follow up (2018). Seemed time for yet another update, so here we are.

For those of you without a photographic memory for random data controversies from 9 years ago, the Carlisle method was a statistical method by research John Carlisle who was attempting to figure out a way to identify potentially fraudulent papers more quickly than undertaking laborious investigations. His idea was to actually look at the baseline data for control groups and intervention groups and to try to detect data anomalies there, assuming that authors would have focused much more on their results than on their baseline data so anomalies would be easier to spot. He named a bunch of studies that appears to have skewed baseline data, and others took it from there.

Interestingly, while some studies did end up having to adjust, it did become clear this method was not always detecting fraud. In a few cases some of the statistics were actually just mislabeled. In the most notable case, it turned out the study authors had not been clear on how their samples were selected, and they had to update their results without some of their original data.

So what’s happened since then? Well in 2021 Carlisle decided to use his prior method and standing as a journal editor to take things up a notch. While his initial method was a quick screening, he decided to develop a screening tool to flag papers that might have a problem. This included “previous false data from one or more authors or the research institute; inconsistencies in registered protocols; content copied from published papers, including tables, figures and text; unusually dissimilar or unusually similar mean (SD) values for baseline variables; or incredible results”. If a paper flagged as having these risk factors, he would ask for a spreadsheet with the patient level data in it so he could look at it more closely to ensure it was ok.

Unsurprisingly, he found problems. But what happened next was even worse.

Carlisle discovered that when he followed up with the universities these papers were coming from, he discovered that the universities these authors were from actually were not overly anxious to investigate the particular concerns he was raising, which concerned even more. So starting in 2019, Carlisle decided the journal would ask for patient level data from everyone from the countries that submitted the most papers: Egypt, China, India, Iran, Japan, South Korea, and Turkey. The results were not encouraging:

Basically, when Carlisle screened for high risk papers, he found about 10 “false” papers in 2 years. When he screened everyone, he found 60+ papers in the next 2 years. Yikes. Just to clarify what he means by “false” or “zombie”, here it is in his words:

Data I categorised as false included: the duplication of figures, tables and other data from published work; the duplication of data in the rows and columns of spreadsheets; impossible values; and incorrect calculations. I have chosen the word ‘zombie’ to indicate trials where false data were sufficient that I think the trial would have been retracted had the flaws been discovered after publication. The varied reasons for declaring data as false precluded a single threshold for declaring the falsification sufficient to deserve the name ‘zombie’, although I have explicitly stated my reasoning for each trial in the online Supporting Information (Appendix S1).

So overall 14% of papers submitted had substantial flaws and 8% were retraction worthy, but that rate went way up after they started requesting data from everyone. Unfortunately Carlisle ended by mentioning a few fairly discouraging things:

  1. He has no reason to believe his journal was attracting particularly bad papers. One might actually assume the opposite given that he very publicly was out fighting fraud for several years before this.
  2. It took him a really long time to look through the spreadsheets, and sometimes he only caught the fake data on the 2nd, 3rd or 4th look.
  3. Fraud can actually happen at any level of research, which makes it scary. It one case he mentions, the researchers discovered it was a med student they were working with who made up the data. We think of scientific fraud as the big name getting the credit, but you can see where it’s actually really likely it’s an overwhelmed lower level person trying to deliver results to them who might actually provide fake data.
  4. Nothing stops people from submitting these papers to other journals that don’t have this level of scrutiny.

In the end Carlisle concludes that these types of data errors or fraud are so common that developing screening tools for them should be a primary goal of journals, lest they risk up to a quarter of their studies being retraction worthy. Not great, but thank God for people like Carlisle.

New Substack Who Dis

So after my therapy session a few weeks ago about why I liked writing here so much and how hard it was to translate the norms I was used to in to other places, I kept thinking about how to do what I was trying more effectively. After mulling it for a few days more, I decided that it might be worth trying a different writing project that I controlled, written more in my own voice, but aimed at a different audience.

So I have now have a new bright and shiny Substack to fool around with: Exhibit Asterisk (or Exhibit A*). The focus is going to my ideas about how statistical thinking can apply to the world of true crime, so basically a more topic specific version of what I do here, with the assumption the audience will be a bit less familiar with stats concepts. For now I plan to keep posting in both places, though their may be some cross posting. I am not doing paid subscriptions, because I do not need that kind of pressure in my life, and this is entirely experimental. For all that I rant about this topic, it may turn out I have very little left to say. We’ll see! Anyway, if you’d like to join me over there, the link to the first post is here.

Hope to see you there!