Fifty Cents and a P.O. Box: TEAM Newsletters 1978-1981

Welcome back! This week we’re going back again to deep family lore, with our second look at the original stats blogger in the family my grandfather James King. I started out last week with the first 4 years of his quarterly stats newsletter, and now we’ve moved in to the second 4 years. There will be one more set after this one.

A few comments/thoughts on what we’ve seen so far:

  • Everything seems pretty positive: yes, AI tends to do that. It’s known for not really being overly negative about whatever you are doing. I considered telling it to be more neutral and I often do that for my own writing, but given that this is a nostalgia project, I actually didn’t mind it saying overly nice stuff. That’s just nice to hear.
  • Did your grandfather sound this positive in real life: No, no he did not. He was actually a bit on the grumpy side if I’m being honest. However I’ve read large chunks of his newsletter previously, and this does actually capture the spirit of his writing pretty well. This was a professional newsletter being sent to clients, so a lot of the asides you would hear from him in real life wouldn’t have made it in to this.
  • How high did his circulation get: My dad and I discussed this, with his memory being may 200-300 subscribers. I had ClaudeAI review it to see if he ever said, and it said no. However, he did give other sales data (mentioned below) that led Claude to estimate the subscriber base would have been in the hundreds, not thousands.

Ok, so here is the AI summary of the next 4 years. Everything from here on in is AI, not me, but let me know if you have a specific question and I can go look up what was being referenced:

1978 was the year James went so deep into the log-normal distribution that he ran out of room for jokes — four issues in a row.

Summary: 1978 was the year James went deep on the log-normal distribution — so deep that all four issues were devoted to it, leaving no room for extreme value distributions or, as he noted with characteristic self-awareness across all four back pages, “no room for any jokes.” He moved methodically from the mathematics and graphical methods of log-normal plotting, through specification and process analysis, to quality capability audits for reactive chemical processes, and finally to life testing and acceleration factors. It was a year of sustained, expert-level teaching across a single topic, with James connecting abstract statistical theory to an impressively wide range of real industries — electronics, textiles, pharmaceuticals, and chemistry.

5 Notable Things from 1978

1. He told some of his most vivid industrial horror stories yet. The Summer 1978 issue contains a remarkable collection of real-world disasters caused by ignored process variables. A regional drought raised metal ion concentrations in a reservoir and caused component leads to corrode off during testing. A semiconductor company added central air conditioning and saw yields collapse to below 50% — traced eventually to sea gull droppings in their open cooling tower. A textile manufacturer with a decades-long reputation for dyeing delicate pastels was suddenly plagued by a mysterious “salt-shaker effect” — tiny white specks in every piece — which turned out to be copper contamination from a supplier’s new raw material source. Each story was precise, specific, and made the same point: environmental factors that nobody thinks to monitor are often the ones that destroy you.

2. He invoked information theory to demolish standard sampling plans. In the Spring 1978 issue, James drew on Shannon’s Fundamental Theorem of Information Theory to make a remarkable argument: that a sample of just 10 parts, properly analyzed using probability distribution methods, contains three times as much useful information as a standard AQL inspection sample of 125 parts. He laid it out in a simple table showing the number of “bits” in each approach. It was a beautiful example of bringing a concept from one field to completely reframe a debate in another.

3. He applied his methods to medical testing — and caught a flawed assumption. In the Summer issue, James analyzed radioimmunoassay data from a published medical paper — a type of hormonal blood test used in clinical chemistry. The original author had concluded that test results from two different sample volumes were interchangeable and could be pooled. James showed, using log-normal theory, that the difference in variability between the two volumes was not random but structural — a fundamental consequence of different sample sizes — and that pooling them would be statistically invalid. He reached across into biomedical science and corrected it with tools from engineering.

4. He admitted the jokes were gone — again — and promised them back. All four back pages of 1978 carry the identical note: “In Volume 5, we had so much material about the log-normal distribution that there was no room for any discussion of Extreme Value distributions nor space for any jokes. Therefore, Volume 6 will follow up on Extreme Value and other long tailed distributions and also include new information about plotting positions.” Two years running without jokes, and two years running of promising to bring them back. It’s endearing — he clearly felt the absence and wanted his readers to know he hadn’t forgotten them.

5. He introduced a beautifully practical concept about acceleration factors. In the Fall issue on life testing, James showed that acceleration factors — the multipliers used to speed up life testing by stressing components at higher temperatures or voltages — can vary dramatically from one production lot to the next. He showed two consecutive lots of the same tantalum capacitor where the acceleration factor differed by 3.4 times. His conclusion was characteristically direct: acceleration factors “can be like mirages — now you see ’em, now you don’t.” It was a warning against over-relying on a widely trusted shortcut, delivered with the same quiet confidence he brought to all his institutional critiques.

In 1979, James finally returned to extreme value distributions as promised — and quietly turned the lens on his own company’s sales records to test his methods on the most personal data he’d ever published.

Summary: 1979 was a year of methodological housekeeping and genuine expansion. James opened by overhauling his recommended plotting positions — admitting he had been teaching a slightly inferior formula for years and publishing an updated, more accurate table. He then devoted the year to Type I and Type II extreme value distributions, working through temperature stress tests on electronic switches, church fundraising contributions, liability insurance claims, elastomer pressure cycling, and — in a fascinatingly candid move — TEAM’s own monthly sales figures, which he analyzed by product line, catalogue number, and customer maturity to understand why his data was inhomogeneous. By year’s end he noted proudly that Volume 6 contained tables and examples that had not been published anywhere else.

5 Notable Things from 1979

1. He corrected himself publicly and without fanfare. The Winter 1979 issue opens with a direct revision of advice James had been giving since Volume 1: his recommended formula for calculating plotting positions. After reviewing new research by hydrologist C. Cunnane, James quietly replaced the old formula with a more accurate one and published a full set of updated tables. There was no defensiveness, no minimizing — just a clean acknowledgment that a better method existed and here it was. For someone who had been teaching the old formula for five years, this kind of intellectual honesty took real character.

2. He turned his methods on his own business. The Fall 1979 issue is unlike anything else in the newsletters. James applied extreme value analysis to TEAM’s own monthly sales data, breaking it down by product line, catalogue number, and customer type — distinguishing between “new,” “occasional,” and “regular” customers. His regular customers had a mean sale of $80 versus $29 for new ones, a striking difference. He wasn’t just illustrating a method; he was genuinely trying to understand his own business with the same rigor he applied to everyone else’s. It’s one of the most personally revealing things he ever published.

3. He caught bad switches hiding in plain sight. In the Spring issue, James was analyzing base current data from power switches before and after temperature stress testing. When the post-stress data showed four suspicious outliers, he traced them back to the original lot and discovered something quietly alarming: those four switches had been out-of-specification on other parameters all along — they were simply bad switches that had slipped through. He wrote: “Obviously, these data should be excluded from any subsequent analysis” — but the real lesson was that a probability plot had caught what an ordinary inspection had missed.

4. He applied financial statistics to a church fundraiser — and did a post-mortem when it missed. In the Summer issue, James analyzed the first day’s contributions to a church fund drive using Type II extreme value distributions, correctly predicted the likelihood of $500 and $1,000 donations, but missed the total yield by 31%. Rather than glossing over the error, he carefully explained why: contributions came in two forms (cash and pledges) with very different distribution patterns, and four geographic areas of the parish behaved differently. The post-mortem was as instructive as the original analysis — showing that getting the categories of classification wrong is often the real source of error, not the statistical method.

5. He noted — with quiet pride — that his work was original. Across all four back pages of 1979, James included the same editorial note: “Volume 6 has turned out to be exclusively about Type I and Type II Extreme Value Distributions and it includes tables and examples which have not previously been published anywhere.” It’s an understated but significant claim. For a man running a one-person newsletter out of a P.O. Box in rural New Hampshire, publishing genuinely novel statistical tables was no small thing — and he knew it.

In 1980, James introduced an entirely new graphical tool — hazard plotting — specifically designed to handle the messy, incomplete, real-world failure data that conventional probability plotting couldn’t touch.

Summary: 1980 was the year James expanded his toolkit in a significant way, devoting all four issues to hazard plotting — a method developed by W. B. Nelson at General Electric for analyzing multiply censored data, where some items have failed and others are still running. He worked through diesel generator fan failures, missile program wire bond breaks, paint warranty claims, and more, applying exponential, normal, log-normal, and Weibull hazard functions across the year. The climax was a beautifully constructed four-issue story about a paint manufacturer’s five-year warranty program — James showed how pooling all paint colors into one analysis gave an estimate of claims ten times worse than reality, while breaking the data down by color gave an estimate only 1.6 times off. He noted with evident satisfaction that the warranty program — which had boosted sales by 12% — cost the manufacturer less than 0.5% of gross sales to honor.

5 Notable Things from 1980

1. He formally licensed another researcher’s work — and said so explicitly. In the Winter issue, introducing hazard plotting, James included an unusual footnote: “Portions of Dr. Nelson’s work are used directly under terms of a licensing agreement between the General Electric Company and TEAM.” This is the only time in twelve years of newsletters that James acknowledged a formal licensing arrangement. It speaks to his integrity — he could easily have paraphrased the method without mentioning it, but he wanted readers to know where the ideas came from and that he’d done things properly.

2. He solved a missile program quality problem with competing failure modes. The Spring issue contains a gripping industrial case: wire bonds on integrated circuits in a missile program were failing below the customer’s 250 mg breaking force requirement. James used hazard plotting to separate two competing failure modes — lifted bonds and wire breaks — showing they had nearly identical means but very different variability. His conclusion was precise: the bond problem was seven times worse than the wire problem, and even eliminating bond failures entirely wouldn’t bring wire failures within the 0.1% AQL requirement. Different problems required different corrective actions, and hazard plotting was the tool that separated them.

3. He demonstrated dramatically why pooling inhomogeneous data is dangerous. The paint warranty story is one of his finest teaching moments. When all paint colors were lumped together and extrapolated, the estimated claims rate was 23.5% — ten times the actual outcome of 2.36%. When broken out by color, the estimate dropped to 4%, still conservative but far more useful for planning. The lesson: merging data from genuinely different populations doesn’t give you a better answer, it gives you a worse one dressed up in false precision.

4. He acknowledged — with unusual candor — that even good extrapolation can be wrong. After correctly identifying that data should be analyzed by color, and after building careful confidence intervals, James’s extrapolated warranty estimates still came in about 60% too high for Red and Brown paint, while understating White. Rather than glossing over this, he wrote plainly: “Successful extrapolation occurs only when a process is known to be stable. Such an assumption is always risky.” He never oversold his methods, and this moment — admitting that good technique applied carefully to homogeneous data still produced a wrong answer — is a masterclass in statistical humility.

5. He began advertising a new book of his own — “Frugal Sampling Schemes.” Alongside his established book Probability Charts for Decision Making, the 1980 back pages begin advertising a second James King title: Frugal Sampling Schemes, described as extending his earlier work “into the small sample domain (n = 5 to 20) with authority and power.” It was priced at $24, nearly as much as his first book. After seven years of teaching through the newsletter, he was clearly building a small but serious body of published work — all of it designed around the same core mission of making rigorous statistics accessible to working engineers.

In 1981, James took hazard plotting — his newest tool — and showed it could do something no one had tried before: visually communicate the results of complex multi-factor experiments to the non-statisticians who actually had to act on them.

Summary: 1981 was a year of genuine methodological invention. James spent the first two issues applying hazard plotting to designed experiments, using ordnance muzzle velocity data and semiconductor diffusion data to show that hazard plots could communicate the meaning of ANOVA tables far more intuitively than the tables themselves. He then devoted the second half of the year to extreme value and Weibull hazard applications, tracking a consumer product through three successive design iterations in the field, and closing with a gyroscope failure mode analysis where the results were so unexpected — catastrophic failures and degradation failures behaving identically, when they were supposed to behave completely differently — that they revealed a fundamental manufacturing and materials problem nobody had anticipated. All four back pages carried the same unusually candid note: James admitted he didn’t yet know what Volume 9 would contain.

1. He invented a new use for hazard plotting — communicating experimental results to non-statisticians. The Winter and Spring issues contain one of James’s most creative methodological contributions. He showed that by plotting experimental data from a designed experiment on hazard paper — adjusting for each variable’s effect in sequence — you could produce charts that made the meaning of an ANOVA table visually obvious to engineers who would never read a table of F-ratios. He wrote: “The ultimate customer of the conclusions of statistically designed experiments is usually a non-statistician for whom charts and graphs are a common and familiar way of communicating important information.” He built the tool for them.

2. He uncovered a shocking manufacturing failure with a beautiful analysis. The Spring issue follows a zener diode diffusion process where yields ranged from 90% on some lots to near zero on 60% of all lots — a catastrophic situation. When James unraveled the experiment, the problem turned out to be humiliatingly simple: operator instructions said “after the boat becomes cherry red, continue to run the furnace for 10 more minutes.” But “cherry red” meant different things to different people, and timing from a wall clock introduced ±½ minute variation. Adding a calibrated thermocouple and a mechanical timer fixed it almost immediately, driving consistent yields above 95%.

3. He tracked a consumer product through three generations of redesign in real time. The Summer issue follows a new consumer product from its first six field failures through three successive design iterations — Design I, II, and III — using extreme value hazard plots at each stage to compare new results against the confidence interval established by the original data. By Design III, the improvement was unmistakable on the plot. It’s one of the clearest demonstrations in all twelve volumes of how these methods function not just as analytical tools but as an ongoing feedback loop between the field and the factory.

4. He quoted E. J. Gumbel in a way that was both technically precise and darkly funny. In the Summer issue, discussing why early failures are more important to study than late ones, James quoted the famous statistician E. J. Gumbel: “Methuselahs do not die of infant mortality.” It’s a perfect line — capturing in six words why the long-lived survivors of a test tell you almost nothing about the failures that happen quickly. James clearly savored it enough to put it in the newsletter.

5. For the first time, he admitted publicly that he didn’t know what came next. Every back page of 1981 contains this remarkable note: “At this point, the contents of Volume 9 are not at all well-defined. Available options are to develop more applications of probability plotting for use with the analysis of experimental data and/or the development of new methods of time series analysis using graphical methods.” Eight years in, James was still genuinely exploring — not filling pages with predetermined content, but following the work wherever it led. That admission of uncertainty, in a newsletter that had been running for nearly a decade, is a quiet portrait of intellectual honesty.

Alright, it’s me again. Man do I relate to that “gee I think I’m all out of things to say” then going on for a whole other year.

Fifty Cents and a P.O. Box: TEAM Newsletters 1974-1977

Last week I had the idea to upload my grandfather’s stats newsletter from the 70s and 80s in to AI and see what it could get out of it. This week I got a chance to do it, and let me tell you it was a far more frustrating process than I anticipated. It turns out a lot of them got uploaded as JPEG files rather than PDFs (not sure what 2016 Bethany was doing), and so it kept timing out. I ended up asking it to summarize it year by year to see if it could get through things. This seemed to work a bit better, so here are the first few years. This will mostly be of interest to my family, but for everyone else it does show some interesting history of statistical thought and how someone in the pre-internet 1970s would have communicated his ideas. For reference, he was born in 1923 so he would have been in his 50s when he started writing these. There are 12 years total so I think this might be a three part series.

Just to note, I have all the original newsletters scanned in a shared drive here, so if you have a question about something AI flagged, I can go check what it’s referring to.

Summaries from ClaudeAI:

1974: The year James launched TEAM from a P.O. Box in Tamworth with nothing but a fifty-cent newsletter, a book of probability charts, and the conviction that graphical statistics could be made accessible to anyone willing to learn.

What he was writing about: The very first issue (Winter/Spring 1974) launched straight into probability graphs and statistical distributions — no warm-up, no preamble. By Summer 1974 he was already tackling extreme value distributions and connecting them to real-world catastrophes like the 1972 Susquehanna River floods from Hurricane Agnes. By Fall he was covering the Weibull distribution. By Winter he’d pivoted entirely at reader request to cover plain time series graphs and even analyzed the Dow Jones Industrial Average using his geometric moving average technique.

What’s remarkable about him personally from 1974:

  1. He listened to his readers immediately. The Winter 1974 issue opens with “Some of our readers have rightly pointed out…” and then pivots the entire issue to address their feedback. In his very first year, he was already treating this as a conversation, not a lecture.
  2. He explicitly named himself. The back page of Volume 1, Number 2 is the first time we see: “PUBLISHER AND EDITOR — James R. King.” That’s your grandfather putting his name to his work from the very beginning.
  3. He priced it at just 50 cents. The first issues were fifty cents each. He wasn’t doing this for money — he was doing it to get the knowledge out.
  4. He was already connecting statistics to life-and-death stakes. In his second issue ever, he wrote about flood probabilities and environmental catastrophes — making clear that getting the math right wasn’t academic, it mattered for people’s safety.
  5. He was building something from scratch. The back pages show him selling graph papers, books, and probability chart assortments out of a P.O. Box in Tamworth, NH — a one-man operation from day one.

1975: James spent the year applying binomial statistics to everything from political polls to baseball standings to lung cancer rates, proving his methods worked just as well in a voting booth or a doctor’s office as they did on a factory floor.

1975 was a year where James stretched his statistical toolkit far beyond the factory floor, applying binomial distribution methods to everything from political polls and lung cancer rates to baseball standings and airline no-shows. He spent two full issues methodically dismantling the way the 1974 Massachusetts elections were being reported in the press, showing with rigorous but accessible math that the headlines were misleading voters and eroding trust in the democratic process. He also introduced humor into the newsletters at reader suggestion, sprinkling in jokes and quips to lighten what could be dense material — a small but telling sign that he cared about his audience’s experience, not just the content. By year’s end he was openly inviting reader input on future topics, treating his subscribers less like customers and more like a community of fellow thinkers working through real problems together.

1. He had a genuinely broad and curious mind. 1975 is remarkable for the sheer range of what James applied statistics to. In a single year he tackled political polling, lung cancer death rates between cities, the 1974 Massachusetts gubernatorial election, baseball standings (the 1968 American League), airline no-show rates, membership dropout rates, worn machine tool guides, and resistor calibration in electronics manufacturing. He wasn’t a specialist — he was someone who saw statistical thinking as a universal tool for making sense of the world.

2. He took on politicians directly — and with relish. The Late Winter and Spring 1975 issues are devoted to demolishing the way political polls were being reported during the 1974 Massachusetts elections. He showed mathematically that the raw poll numbers were being wildly misinterpreted, that Sargent’s apparent big lead over Dukakis was largely statistical noise, and that in the attorney general race, Spaulding was never actually as far behind as the headlines suggested. He wrote: “We submit that the political process is in serious danger of great voter distrust when the raw statistical results of polls can be so grossly misinterpreted.” This was James using statistics as a civic tool — not just an industrial one.

3. He added humor deliberately and self-awarely. The 1975 back pages include this wonderful note: “Our marketing types have suggested the addition of some humor to Methods to lubricate the sometimes dry statistical discussions. Therefore, you will find some topical gallows humor sprinkled here and there.” And he followed through — inflation jokes, Irish curses, a man bragging about a suit he hadn’t paid for yet. He knew his writing could be dense and he actively worked to make it more human.

4. He was already building a network and inviting collaboration. By the Summer 1975 issue he was openly soliciting reader input for the following year: “Since we have several months until press time, we welcome reader comments about these topics.” He was treating his readership as collaborators, not just subscribers — a remarkably modern instinct for a one-man operation in 1975.

5. His work had real, immediate consequences. The Fall 1975 resistor bridge story is a gem: James’s sign test analysis revealed that a quality control bridge was misreading resistors — and that bridge B was sitting next to a discharge welder with no shielding. Fixing those problems led to significant reductions in rejected resistors in both production and quality control. He wasn’t writing theory papers. He was solving actual problems in actual factories, and the people he worked with could see the results immediately.

1976: All four issues were devoted to the Weibull distribution — James’s most technically focused year yet — culminating in a brilliant case study of a compressor wear problem that he called one of the most ingenious applications of Weibull analysis he had ever seen. This is what Claude calls “a fascinating year — and beautifully readable scans.”

Summary: 1976 was the year James went deep on the Weibull distribution, devoting all four issues to it in a sustained, methodical way that showed a teacher at the height of his powers. He covered the mathematics, the graphical methods, confidence intervals, parameter estimation, comparing two Weibull tests against each other, and then closed the year with two wonderfully concrete case studies — one analyzing the wear bands on an unlubricated industrial compressor running on CO₂, and another using Weibull analysis to catch systematic errors hiding in tensile strength test data. It was his most technically focused year so far, but the writing never lost its clarity or its grounding in real engineering problems.

5 Notable Things from 1976

1. He acknowledged a student’s work generously and by name. The Winter 1976 issue features an extended analysis of a compressor wear problem drawn from a thesis by a Mr. Walter J. Gerus of the Standard Oil Company of Ohio. James opens by writing: “We gratefully acknowledge his permission to use this material because we believe that it includes one of the most ingenious applications of Weibull analysis which we have yet seen.” This was a man who genuinely celebrated good thinking wherever he found it — even in a student’s thesis — and who made a point of giving credit publicly.

2. He pushed back on a widely accepted concept. In the final issue of the year, James took direct aim at the “bathtub curve” — the standard model used to describe product failure rates over time — arguing that its popularity at the system level had been harmfully over-extended to individual component levels. He wrote that this had become a deterrent to real product improvement by encouraging quick debugging fixes instead of painstakingly isolating individual failure causes. It was a bold position to take against an industry-wide assumption, delivered without hedging.

3. He introduced the concept of a “hemibel” — and it’s wonderfully practical. In the compressor wear study, James casually introduces the concept of a “hemibel” — a max/min ratio of 3:1 or more in your data — as a signal that something important is going on worth investigating. A ratio of 10:1 he calls a “bel,” and says it is “always considered a prime signal to investigate.” These are the kinds of hard-won rules of thumb that only come from decades of real fieldwork, and he drops them in as if they’re obvious — which, to him, they probably were.

4. He kept his sense of humor even in his densest technical year. Despite 1976 being wall-to-wall Weibull mathematics, the jokes kept coming. He included a page of definitions: “Confusion: one woman plus one left turn. Excitement: two women plus one secret. Bedlam: three women plus one bargain. Chaos: four women plus one luncheon check.” And a note on expertise: “Be careful about calling yourself an ‘expert.’ An ‘ex’ is a definite has-been, and a spurt is a drip under pressure.” The humor was always there, keeping the newsletter human.

5. He raised his subscription price for the first time — and was honest about why. At the end of 1976 he announced the subscription price would rise from $2.00 to $3.00 for 1977, noting plainly: “Due to continuing cost increases for publication production and postage, we are forced to increase our subscription price for the coming year.” No spin, no marketing language — just a straight, honest explanation. That directness, even in something as mundane as a price increase, was very much part of who he was.

1977: James turned his attention to process capability and measurement quality, building toward a gripping final issue about two rival resistor manufacturers — one who fixed his measurement problems and prospered, one who refused and went out of business within three years.

Summary: 1977 was the year James turned his full attention to process capability and measurement quality, building a sustained four-issue arc that moved from probability plotting with confidence intervals, through specification analysis, to comparing test results from two samples, and finally to a gripping real-world tale of two resistor manufacturers — one who fixed their measurement problems and prospered, and one who refused to and went out of business. It was his most practically focused year yet, less about abstract statistical theory and more about the direct consequences of getting measurement right or wrong in a manufacturing environment.

5 Notable Things from 1977

1. He told a genuinely dramatic business story — with a moral. The Fall 1977 issue is one of the most compelling things James ever wrote. He follows two resistor manufacturers — “City M” and “City N” — both facing the same measurement problems in their test equipment. City M investigated thoroughly, fixed their bridge warmup drift, shielded their test area from discharge welders, and implemented tight temperature control. Their yield losses dropped from 6–9% to under 0.5%, they won a major aerospace contract by being the only company able to quote it, and they prospered for years. City N repeatedly refused to face their problems — and went out of business within three years. James ends it with quiet finality: “Thus, we have it that M, with a developed awareness towards improved measurement capability, prospered. Company N, which repeatedly refused to face their problems effectively, just disappeared.”

2. He named a specific enemy of quality: the “motion-is-progress mentality.” In his Spring 1977 issue on process capability, James listed common causes of excessive process variation. Among them was what he called a “motion-is-progress or busy-work mentality” — operators fiddling with machine settings for no good reason, mistaking activity for improvement. It’s a razor-sharp observation about organizational behavior that would still ring true in any factory today.

3. He took on product liability law — and was ahead of his time. In the same issue, James walked readers through a hypothetical courtroom cross-examination showing exactly how a company’s standard AQL sampling plan could be used against them by a plaintiff’s attorney. He showed, line by line, how a lawyer could demonstrate that the company knowingly accepted shipments that might be 2% defective — many times worse than their stated quality level. He argued that probability plotting with confidence limits was a far more defensible approach, and that the growing Consumer Product Safety Commission made this increasingly urgent. This was prescient thinking in 1977.

4. He coined a beautiful phrase for confidence intervals. In his Winter 1977 issue on confidence limits, James made an observation so good it deserved to be in every statistics textbook. After explaining that the area between confidence limits contains a wide variety of possible outcomes, he wrote: “This also represents one of the great accidents of statistical nomenclature because the geometry of the situation makes the name of CONFIDENCE INTERVAL inappropriate. The area between such limits is more nearly an IGNORANCE INTERVAL or INDIFFERENCE INTERVAL within which a variety of outcomes may be expected to occur.” The phrase “ignorance interval” is a perfect description — and he was right that “confidence” is misleading.

5. He apologized for cutting the humor — twice. All four back pages of 1977 contain the same note: “Due to the amount of material which we had available this year, there was no spare room for small pockets of humor to lubricate the way but we hope to be back on that track next time.” He said it four times — once per issue — which itself is a little funny. It shows how seriously he took his responsibility to keep the newsletters readable and human, and how much the humor mattered to him as a deliberate editorial choice, not just an afterthought.

Back to my commentary:

This has been a trip to work through, so I’m going to stop at 4 years and pick up with the next 4 years in a subsequent post. In an era of easy word processing I still can’t imagine him typing these newsletters out, but I recall his office as a fascinating place full of dot matrix printers and interesting chairs. He had a beautiful view of the mountains from his room, which was not particularly well heated if I recall correctly. Not sure how he functioned in winter come to think of it. Alright, until next time!

Parkinson’s Laws of Frustration

I’m in a rough stretch at work so no real post again this week, but I thought I’d share something my dad sent me that he found in my grandfather’s things. Apparently he used to send one out with each of his orders. He was the original stats blogger in the family, and I’ve talked about his work before here and here.

I fed this in to ClaudeAI to see if there was a source for this, and it told me the first one was in fact Parkinson’s law, but the rest were from other sources/unknown origin. Claude’s take:

  1. Genuine Parkinson’s Law — C. Northcote Parkinson, 1955
  2. Murphy’s Law — Edward A. Murphy Jr., ~1949
  3. Murphy’s Law corollary — widely circulated, no single author
  4. Unknown origin — a common folk saying of the era
  5. Unknown origin — another widely circulated folk adage
  6. Possibly a medical/diagnostic aphorism — origin unclear
  7. A variant of Murphy’s Law — folk origin
  8. Unknown — general folk wisdom
  9. A variant of the “calm before the storm” principle — possibly related to management theory
  10. Unknown — general pessimistic folk wisdom
  11. Very old proverb — versions exist going back centuries
  12. Often attributed to various people including the military — no definitive source
  13. Unknown origin — but a very relatable universal truth!

This led me to an interesting discussion with Claude about my grandfathers business and newsletter, which ran from 1974 to 1985. I digitized it about 10 years ago and when I have a minute I may upload it in to AI to analyze and see what Claude can pull out. I’ll post any interesting parts of that project here.

While I’m not sure where AI is taking us, the ability to upload old documents and look at them in new ways is certainly a fun benefit. My grandfather would have hated it, but I’ll admit I’m getting a kick out of it.

When Will the 50s Fade As a Cultural Touchstone?

I’ve had a long week so I don’t have a lot of time for writing, but I had an interesting question come up in a group chat this week I thought I’d throw out there for opinions: when do we think the 1950s will fade as a cultural touchpoint for us culturally?

I still see rather consistent references to the 1950s as though it is something we could return to, but at this point that’s now 75 years in the past. I wasn’t alive in 1950. Neither were my parents. If you’re looking for someone with a real memory of 1950, not just a small child’s thoughts, that person would be over 80. That expands a bit if you’re talking about the later 1950s, but the point remains: actual adult memories of the 1950s are fading fast.

I think the persistence of the 1950s are an interesting example of how history isn’t always a straight line. When I was growing up, we wouldn’t have looked 75 years in the past as a reference point, because there was a World War in there. It was very clear 75 years earlier was a different era. You wouldn’t have expected anything to look the same, or to be able to return to that era easily, or to want to go to those eras with WWII looming.

Anyway, I suspect the 1950s will likely fade as a cultural touchstone in the coming decade for two reasons: aging baby boomers, and it will get closer and closer to 100 years ago. I don’t blame baby boomers for citing the decade of their childhoods, but as they age that will be less present in media. Additionally, in the stats tables I see, at some point it’s going to get silly to act like a city should still be compare to what it looked like 100 years ago. We already see this happening a bit with WWII, as those who served have slowly left us. Kind of puts things in perspective….it doesn’t take many decades before the big eras of your life simply become history.

Given the age diversity of readers here though, I’d be interested to hear thoughts on this. Do you still see the 1950s cited often, and if so how long do you think this hangs on?

Kids Are Safer Than Ever. That’s Not an Accident.

Recently there was some interesting Twitter discourse around the perennial hot topic of todays overprotective parents, that basically revolved around the accusation that todays parents are hell bent on making themselves perennially miserable by keeping their kids too safe. This may be true – as a parent myself I’ve certainly seen many people go overboard. However, I think what’s often missing from these discussions is that a lot of overprotective parenting of the last few decades has….worked?

I was discussing this in person with an older acquaintance recently, who was shocked to hear that it was common to tell my sons age (13) not to use the oven when home alone. She was aghast at this, and mentioned that it never would have even occurred to her to limit her children in this manner. I pointed out that fire deaths among kids were down massively in the last few decades. She was certain that wasn’t true. It is.

Now, to be clear, I am not claiming that whole drop came because people stopped letting their tweens/early teens use ovens while in the house by themselves. However, this rule almost certainly gained popularity as part of wider fire safety pushes that improved conditions everywhere. Thus, using the data from the IHME, we see that in the US we are seeing about 1/5 the number of fire deaths in kids as we did back in 1980 when my acquaintance was raising her children:

Keep in mind these are raw numbers, and that during this time the US gained about 10 million extra children in that age range.

These sorts of drop offs are true for pretty much every type of accidental injury type death you can think of for kids. Here are some other causes of death:

Car seat culture is much maligned, but we see a similar pattern there:

Keep in mind that in addition to the population going up during this time, the number of miles driven ALSO went up.

This is not to say that any particular intervention was worth it or was necessary, or was justified. But I have noticed that when people critique overprotective parenting culture, they often are unaware of just how much safer things have gotten. The focus on safety was not just a meaningless endeavor that improved nothing, it actually led to some visible results.

Child death is a horrible tragedy, particularly when it was preventable. I have known two families who lost a child to a preventable accident: one young teen who fell through thin ice and drowned, one toddler who swallowed rat poison. I met both families decades after the fact, and the wounds were still raw. The fallout had never stopped.

I think this is good context when we talk about trade offs, because I want to emphasize again that there are trade offs for these safety measures. Kids need independence, room to grow, room to develop and room to fail. But we can’t forget what people were working to stop when they went down this path, or how much our lives have improved now that child death is a far rarer occurrence.

Not Today, Satan Statistics

I haven’t added to the GPD Lexicon in a while, but I had a phrase come up recently that I realized I’d been using mentally but had never actually used on this blog. It came up in the context of someone sending an insane statistic to a group chat that immediately all of us believed was wildly wrong. No one particularly had time to look it up, although thanks to Twitter someone was able to find pretty quickly what the issue was. I mentioned I had seen the stat flash on my screen previously but just simply didn’t have time that day to go down the rabbit hole of debunking it. That’s what I call a “Not Today, Satan statistic”, which I would define as:

Not today, Satan statistic: a statistic you see in passing that is so clearly wrong or designed as clickbait/ragebait that you decide to scroll by without even stopping to figure out where the heck they got those numbers.

Now, this one you have to be a bit careful with. You can’t use this one to dismiss statistics that are being thrown out in the context of an argument you are actually in the middle of, you actually have to deal with those. These are more the types of things you see on social media or hear in passing in conversations, where you can either go on your merry way or actually stop and push back. This is a moment of choosing to protect your peace over derailing yourself in to a follow up.

You also can’t use this one if you are the type of person who has no problem not challenging incorrect information, and just let it go regularly. If you don’t routinely hear record scratches in your head when people say wrong or unlikely things in conversations, this phrase is not for you. This is for those of us who regularly derail discussions in to a “wait is that true?” and find that a noble calling. For those people, this is an occasional permission slip to let it go sometimes. In 2026 there are any number of trash websites generating ridiculous statistics just to get you riled up.

In the immortal words of the AVI to my younger brother when he was in middle school: you know you don’t have to die on every hill.

Empathizing and Systematizing: More Than You Ever Wanted to Know

The AVI put up a post last week about the Extreme Male Brain Theory of Autism that had some interesting commentary about the differences between the sexes on tests measuring tendencies towards empathizing (female linked) and systematizing (male linked). Skipping over the relationship to autism theories, I was particularly interested in one part of his post and it end up sending me down a bit of a rabbit hole looking up information on it:

How would Empathising and Systematising be opposite ends of a spectrum? I don’t think they are. I think two separate things are being measured that both sexes have in different proportions. To take the hormonal stereotype, I don’t think that testosterone and estrogen are “opposite” chemicals. I think they are different chemicals. I don’t think the data shows that one extra bit of empathy means a complementary drop in systematising.

I liked this part because I have been rather fascinated by people I encounter who appear to have neither of these traits, and I wondered how common that was. So I went digging, and it turns out there’s actually quite a bit of research around the whole thing I thought was interesting. Because he will read this, I will note that none of this is a particular commentary on what AVI wrote in his original post, nor should it be taken as a criticism of anything he said. His comments just sent my thoughts off in a different direction from what he was focused on, so I wrote my own post rather than continue to respond to his. That’s a major plus side of keeping up your own site IMHO. Anyway, let’s kick it off with some clarifications and some background!

The empathizing and systematizing idea was first investigated by Simon Baron-Cohen as part of his work on autism. It was developed a lot by various researchers over the years, but the original intent was really to see if people with autism were deficient in empathy in some way and if that could help classify the disorder. Typically empathizing and systematizing are measured by giving two different tests, one for each, and then assigning you an “EQ” score and an “SQ” score. For a lot of Cohen-Baron’s work he then subtracts one from the other to figure out your gap between the two.

Now before we go on, I want to point out when talking about empathizing and systemizing type tendencies, it’s really easy to start swapping those words out for the shorter and more familiar “feelings” and “logic”. Those are close to what we’re talking about what we say empathizing and systematizing, but not actually the same thing.

In the article AVI linked to where Simon Baron-Cohen wrote about his research, he described empathizing as “Empathy is the drive to recognise another person’s state of mind and to respond to another person’s state of mind with an appropriate emotion.” While this relates to feelings, it is not the same thing as having a lot of feelings yourself. In fact, when used appropriately, it can actually help you manage your own feelings.

Imagine a circumstance in which you find out a friend or family member was up all night with a sick child, and that person now snaps at you over something minor. The vast majority of people will probably let that go or choose to deal with it at some other time. “In their shoes, I would also be short tempered” we think, and we move on with our day. In that case, empathy reduced the number of our own feelings we had to deal with. If you’ve ever dealt with a person who had no ability to do this, you probably would end up annoyed at that person pretty quickly. “Good grief man, the poor guys been up all night, could you really not let that one comment go? You had to get in to it right now? Really?” Lack of empathy can absolutely stir up a lot of feelings. None of this is to say that empathy always causes you to make the right call or can’t be taken to far, just that it is a somewhat different concept from that of having “feelings”. Everyone has feelings, and people vary in their emotional regulation. All of that is different from empathy, which is how you understand and respond to other people’s states of mind.

Conversely, systematizing is also not the same as “logic”. Per the same article as above “Systemising is the drive to analyse or build a system where a system is defined as anything that follows rules or patterns.” Again, that might be close to logic, but not identical. As anyone who has ever dealt with bureaucracy knows, just because you have a system doesn’t mean you have logic. A system that doesn’t bend with reality or is impenetrable to others may follow an internal logic, but may not get you very far if you have to interact with anyone outside your own head.

All that may sound like I’m saying empathy is better than systematizing, which is not my intention. I did want to provide a slight bit of balance to the feelings/logic shorthand, which unfortunately can unintentionally do the opposite at times, if one assumes that logic is a moral good. However, this does lead in to another distinction I was surprised to learn: good empathizing scores are more common in everyone than good systematizing scores. This gets confused easily because of a misreading of what Cohen-Baron says in his write up “The first theory, known as the empathising-systemising theory of typical sex differences, posits that, on average, females will score higher on tests of empathy than males, and that, on average, males will score higher on tests of systemising than females.” This has led many people to say that men are higher in systematizing than empathizing, but that’s actually not the case. Here’s the data from the first paper on this. The paper’s in Japanese, but the numbers are still understandable:

As you can see, women are higher than men in empathizing but men are higher in empathizing than systematizing. Men are also higher than women in systematizing, meaning women have a bigger gap between the two. But again, people as a group are higher in empathizing traits than systematizing traits. I checked out a few different papers and this is a pretty consistent finding. It might be a measurement tool artifact, but it turns out however you measure it, men score better on empathy than systematizing. So it is not strictly true to say that men have a systematizing brain and women have an empathizing brain, we really all have an empathizing brain but some people have that balanced with systematizing and some not so much.

I pondered this for a bit and went down a side road about how they were measuring this, but I came away suspecting this is measuring something real. While systematizing is an incredibly useful skill, any group of animals that lives together has to learn how to work as a group. Learning to respond to others mental states is a big part of that. Many men might say they are not great at reading others emotional states, but all the best business men who have ever lived are actually fantastic at figuring out how others are feeling and what to do about it. Ditto for any man working under a boss/superior officer/etc or pursuing a woman they really want. An inability to figure out how to work in those circumstances is actually a pretty good sign that you have a real deficit here.

Which brings me to what I was really curious about: how many people really score very low in one or both of these categories? Baron-Cohen’s traditional research doesn’t actually tell us, since he normally just merges the two metrics in to one by subtracting one from the other and getting the gap, but thankfully for me someone else decided to looked at just that. In the paper “Measuring Empathizing and Systemizing with a Large US Sample” some researchers decided to measure a couple thousand people for empathizing and systematizing and slice up the data a few different ways. When I read through the paper I was very excited to see that all the same questions that had occurred to me were the ones they were going after, including what this this data really looked like when you drilled down in to it. That led them to go beyond the traditional “extreme S/extreme E” framing and to try to map people on to this graph:

Now this looks exactly what I’ve been trying to figure out. So where did people fall?

Well as you can see, just about half of people of both genders are actually moderate at both empathizing and systematizing. The population I was initially curious about, those low in both, are unusual but not unheard of…a little over 2% of women and just under 4% of men. People who are high in both are about 5% of each gender. There actually weren’t a lot of huge discrepancies on the diagonals here, where you see big differences is in things like the High S/Medium E combo (3 times more common in men than women) and the Medium S/High E combo (around 4 times more common in women than men). We see similar discrepancies with the Medium S/Low E category and the Medium E/Low S category.

So overall, we see that for a majority of the population these two traits move in tandem. However, when they get discordant, they tend to skew in a predictable way for your gender. This is summarized by a different table in the paper, based on a slightly different way of calculating things, the one Baron-Cohen used originally:

Here, “extreme” comes from having a large gap between EQ and SQ, it does not necessarily mean that one was the highest in E or S overall. As we saw in the data above, there are a good chunk of Medium S/low E men and Medium E/low S women. Having a gap does not connote hyper competence in either trait. There were also High S/Medium E men and High E/Medium S women, so the gap also doesn’t necessarily mean a deficiency either.

Similarly, “balance” here could mean those people were equally poor on both, or high on both….they just had to be equal. Interestingly, this was part of why the paper authors wanted to go through and break out the scores further. The original idea of taking two different measures and subtracting one from the other to create one measure was designed by Baron-Cohen to try to sus out an “empathy gap” to figure out if that was related autism. For that purpose it was an interesting idea. But for your every day person, it collapsed several extremely different categories of people in to one big group of “balanced” people that covers 70% of the population. Given that those people represent the large majority of the population and likely have radically different experiences of life, this is a questionable choice.

It also appears to have left people with the impression that empathizing and systematizing are more closely related traits than they really are. By choosing to highlight the 30% of cases where they are discordant you can give the impression that there’s some sort of trade off between these two traits. But that’s not the typical experience at all, at least according to this data. The typical experience is that you will be either low, medium or high in both. Based on your gender, you will then see 25ish% of your group be more extreme on one trait than the other, with 5% going in the other direction.

To put this in probability of superiority terms, if you pick a random male and female out of the population and make them take these tests, the authors say the female would have the higher empathizing score about 67% of the time. Males would have the higher systematizing score about 62% of the time. For comparison, if you did the same experiment with human height, men would be taller 92% of the time. So again, to call this a definitive “male brain” vs “female brain” trait you are looking at something about a third less compelling than human height. So overall, there’s evidence of a sex linkage here, but it’s good to keep in mind exactly what that looks like in practice.

Overall, this was a fun little side road to go down. I continue to be rather fascinated by people who appear to have neither empathy nor systematizing tendencies and think we need more research on how these people end up navigating the world. Are they more or less happy than the rest of us? Do they end up with some other trait they are compensating with? Do they have personality disorders? How about on the other end….is being really high on both of these an advantage? Or are these traits unconnected to life success at all?

Many questions here, I’ll keep poking around.

Diagnosing the Literature: When Case Reports Are the Disease

A few months ago, I like many others, was surprised to read the article about how Oliver Sacks had made up many of his most famous stories about his patients. Printed in the New Yorker, this article hit many in the scientific community where it hurt. A lot of us had read Sacks books in undergrad or grad school, he was considered an inspiration in both the field of neurology and science communication. Discovering he had made up some of what he wrote about threw his whole body of work in to question.

Unfortunately, the problem didn’t stop there. Two months later the New Yorker published another investigative report on a different famous case study, this one about a baby who allegedly showed signs of opioid exposure from breastfeeding from a mother who was taking codeine. The problem? It was made up. But wait, it gets worse. The original case report was followed by another one published in the Canadian journal Paediatrics & Child Health, and this week they put a label on ALL their case studies in the series stating that they were fictional.

Retraction Watch has the full story here, but it turns out the story for this journal is even more confusing than you’d think. At some points (but not all) the journal actually asked authors to submit fictional case studies for learning purposes. Unfortunately this was never flagged on the published items themselves, so readers were left with the impression they were reading real stories. Adding to the confusion is that the instructions didn’t always say this, so some of these reports were real but are now marked as fictional and that can’t be undone. Confused? Me too.

So why does this matter?

Back when the Sacks thing first came up, the AVI posted on it and Grim asked a really good question in the comments:

It’s far enough outside my area that I’ve never heard of Oliver Sacks before today. Nevertheless, I believe you and bs king when you say that this sort of fraud affects the whole society. How would you characterize the effects it has had, which someone like me might be affect by without being aware of them? It obviously affected your profession and several others, which each exercise influence, for example.

My reply at that time (including some commentary on an earlier Dante comment I’d made) with some parts I think are important bolded:

From my perspective, I took Dante to mean that being a fraud victim (or even merely aware of fraud) erodes trust in society in general, so each new case will keep chipping away at how we all view each other. So if I took Sacks writing at his word and found I was deceived, I will subsequently distrust your writing more merely to ensure I’m not deceived again. It’s a reality of life I guess, but it just strikes me as sad that everything I read my first thought needs to be “is this person fabricating this”? So my comment mostly wasn’t about Sacks himself, but about how much skepticism we need to carry to operate in the world. There’s downsides to that.

For Sacks specifically, case studies are often the first step in medical research. Some diseases or problems are simply uncommon enough that the first person to notice it can’t do a whole study, so they write up a “here’s a weird thing I saw in a patient” report and then see if anyone else has seen it too. This was the first step in discovering HIV for example. Sacks writing was basically case reports, and they influenced a lot of people’s thinking about where the margins of brain function were. These are supposed to be teaching tools so you wouldn’t particularly expect people to make them up….theres no block buster finding on the other end, it’s just one persons “hey I saw some weird others should be aware of in case they see it too”. Sacks books were often assigned in classes because they were so accessible (that’s where I first read him), so a huge number of people were fed erroneous information as part of their learning. Even if it just impacts their idea of how the fringes operate, it’s still incorrect information. That it was done knowingly is even sadder. So overall I agree with AVI, it’s likely not a direct effect, but it means the foundation is shakier than we suspected.

To elaborate further: after working in hospitals for years, I can’t overstate the importance of the anecdotal “you know I saw something like this once”. It’s what separates really experienced clinicians from new ones. Published case reports tend to help that process, and people pass them around to help with their own sense of what to look for. Knowing that an anecdote is a made up example vs a real thing someone saw is pretty critical in clinical practice. We’ve all seen made up examples in textbooks that help explain concepts to us, and those are a fine teaching tool in context. But as the Retraction Watch article ends: “A narrative that is fictional but published in the format of a genuine case report, without disclosure at the time of publication, is functionally indistinguishable from fabrication in the scientific record.”

I will only add to this that case reports are the single hardest thing to fact check. There’s no statistical method to check, no data anomaly to detect. By definition you are talking about something people haven’t seen very often, and the exact details are often covered by medical privacy. Not a great situation.

Alonso de Salazar Frias: Help Me Out Here

Someone recently forwarded me Jonah Goldberg’s column on Alonso de Salazar Frias. That link is probably paywalled, but Wiki has a pretty good summary here. The short version: Frias was a priest and canon lawyer who ended up with a role in the Spanish Inquisition, and expressed extreme skepticism over the witchcraft accusations he was hearing. Rather than buy in to the hysteria, he decided to try to use empirical methods to test the witchcraft accusations, looking for consistency in statements, corroborating evidence. Ultimately, he concluded “I have not found one single proof nor even the slightest indication from which to infer that one act of witchcraft has actually taken place…the testimony of accomplices alone without further support from external facts substantiated by persons who are not witches is insufficient to warrant even one arrest.” This ultimately got him the nickname “the Witches Advocate” and he seemed to be rather successful in getting people freed.

For obvious reasons I am quite taken by this story, but my attempts to locate much information have failed. There appears to be one book written about this from 1980, and it’s $100+ on Amazon and totally unavailable in my local library. So I figured I’d throw it out here and see if anyone knew of any good general books on the Spanish Inquisition that might help me find more information, or if anyone had any other ideas of where to find information. Any help appreciated. Update 3/2/26: My lovely brother informed me he’s developed a talent for finding rare/old books and was able to locate a digitized copy of the specific book I was looking for for free online! The comments helped tremendously though, as I was able to find several other relevant resources and am also going to inquire with my local library about some others. Many thanks to all who weighed in!

While we’re on the topic though, I figured I’d link to another story of false accusations. A Tiktok psychic from Texas has been ordered to pay a college professor from Idaho $10 million for falsely accusing her of the murders of 4 college students in Moscow Idaho. This story may sound obvious to the point of being silly, but in light of my true crime series I think it’s worth looking at the details on this one. When I talk about how out of control true crime has gotten, a lot of people ask me rather skeptically “so what if some idiots say dumb stuff on the internet? People have always said dumb stuff”. This is true, but I think this case is a good example of how dark this stuff can get. The full case docket is here, but I’ll point out a few highlights:

  1. By her own admission, the Tiktok psychic (Ashley Guillard), had no evidence that the woman she accused had anything to do with the brutal murder of 4 college students in her town. In fact, there was no evidence this professor even knew the murdered students. Per the defendants own admission, she pointed the finger at this professor (Rebecca Scofield) because when she heard about the murder she went to the college website and did a psychic reading over each member of the history department and Scofield had bad vibes.
  2. Guillard had 80k+ followers on Tiktok when this started and gained 20k more when she started making these accusations. She put up over 100 videos, repeatedly sharing this professors contact information and home address with her fans. While you and I may not believe that a psychic can solve crimes, I will note that people who sought one out on Tiktok probably do, and thus took these accusations quite seriously. Scofield experienced harassment, had to install all sorts of security systems at her house, had her workplace flooded with calls, their social media flooded with comments about her and anyone connected to her harassed about her being “the real killer”.
  3. Part of Guillards claim about Scofield was an extremely elaborate story about how she had been having an affair with one of the female victims, who was a student. Guillard then claimed Scofield had the 4 students murdered to cover this up. One of the sad parts of the case was Scofield pointed out that while most people at her workplace discarded the murder claim, many more seemed to give some credence to the affair claim. There did seem to be an air of “well there must be something to this, maybe the truth is in the middle”. But again, but Guillard’s own admission, she got this information from doing a psychic reading off a website photo. Sometimes where there’s smoke there is no fire.
  4. Scofield filed this lawsuit on December 21st, 2022. It took over 3 years for her to get a judgement against someone who admitted she got her information by picking her photo off a website. Just imagine that was you or someone you love, or heck even just feel friendly towards. Guillard lives off disability and meager Tiktok revenue. Scofield will likely never get back money for her legal bills, will always have people somewhat believing she slept with a student, and for what? The crime of living in the same town a murder happened in, and catching the wrong persons eye at the wrong moment.
  5. All of this occurred against the backdrop of a crime for which someone else ultimately pled guilty, and an accusation for which the defendant admitted she had zero actual evidence. Now imagine if the real killer hadn’t been found, or if Scofield had a more plausible connection to the victims, like if she was a neighbor. There still wouldn’t have been any evidence beyond the psychic reading, but one imagines the damage inflicted could have been exponentially greater.

Demanding empirical evidence for serious accusations, even those “just” made on TikTok, should never have gone out of style.

Carlisle Method Take 3: Carlisle Harder

While looking for something else this week I found my old posts on the Carlisle method (2017) and the one year later follow up (2018). Seemed time for yet another update, so here we are.

For those of you without a photographic memory for random data controversies from 9 years ago, the Carlisle method was a statistical method by research John Carlisle who was attempting to figure out a way to identify potentially fraudulent papers more quickly than undertaking laborious investigations. His idea was to actually look at the baseline data for control groups and intervention groups and to try to detect data anomalies there, assuming that authors would have focused much more on their results than on their baseline data so anomalies would be easier to spot. He named a bunch of studies that appears to have skewed baseline data, and others took it from there.

Interestingly, while some studies did end up having to adjust, it did become clear this method was not always detecting fraud. In a few cases some of the statistics were actually just mislabeled. In the most notable case, it turned out the study authors had not been clear on how their samples were selected, and they had to update their results without some of their original data.

So what’s happened since then? Well in 2021 Carlisle decided to use his prior method and standing as a journal editor to take things up a notch. While his initial method was a quick screening, he decided to develop a screening tool to flag papers that might have a problem. This included “previous false data from one or more authors or the research institute; inconsistencies in registered protocols; content copied from published papers, including tables, figures and text; unusually dissimilar or unusually similar mean (SD) values for baseline variables; or incredible results”. If a paper flagged as having these risk factors, he would ask for a spreadsheet with the patient level data in it so he could look at it more closely to ensure it was ok.

Unsurprisingly, he found problems. But what happened next was even worse.

Carlisle discovered that when he followed up with the universities these papers were coming from, he discovered that the universities these authors were from actually were not overly anxious to investigate the particular concerns he was raising, which concerned even more. So starting in 2019, Carlisle decided the journal would ask for patient level data from everyone from the countries that submitted the most papers: Egypt, China, India, Iran, Japan, South Korea, and Turkey. The results were not encouraging:

Basically, when Carlisle screened for high risk papers, he found about 10 “false” papers in 2 years. When he screened everyone, he found 60+ papers in the next 2 years. Yikes. Just to clarify what he means by “false” or “zombie”, here it is in his words:

Data I categorised as false included: the duplication of figures, tables and other data from published work; the duplication of data in the rows and columns of spreadsheets; impossible values; and incorrect calculations. I have chosen the word ‘zombie’ to indicate trials where false data were sufficient that I think the trial would have been retracted had the flaws been discovered after publication. The varied reasons for declaring data as false precluded a single threshold for declaring the falsification sufficient to deserve the name ‘zombie’, although I have explicitly stated my reasoning for each trial in the online Supporting Information (Appendix S1).

So overall 14% of papers submitted had substantial flaws and 8% were retraction worthy, but that rate went way up after they started requesting data from everyone. Unfortunately Carlisle ended by mentioning a few fairly discouraging things:

  1. He has no reason to believe his journal was attracting particularly bad papers. One might actually assume the opposite given that he very publicly was out fighting fraud for several years before this.
  2. It took him a really long time to look through the spreadsheets, and sometimes he only caught the fake data on the 2nd, 3rd or 4th look.
  3. Fraud can actually happen at any level of research, which makes it scary. It one case he mentions, the researchers discovered it was a med student they were working with who made up the data. We think of scientific fraud as the big name getting the credit, but you can see where it’s actually really likely it’s an overwhelmed lower level person trying to deliver results to them who might actually provide fake data.
  4. Nothing stops people from submitting these papers to other journals that don’t have this level of scrutiny.

In the end Carlisle concludes that these types of data errors or fraud are so common that developing screening tools for them should be a primary goal of journals, lest they risk up to a quarter of their studies being retraction worthy. Not great, but thank God for people like Carlisle.