There’s More to that Story: 4 Psych 101 Case Studies

Well it’s back to school time folks, and for many high schoolers and college students, this means “Intro to Psych” is on the docket. While every teacher teaches it a little differently, there are a few famous studies that pop up in almost every textbook. For years these studies were taken at face value, however with the onset of the replication crisis many have gotten a second look and have been found to be a bit more complicated than originally thought.  I haven’t been in a classroom for psych for quite a few years so I’m hopeful the teaching of these has changed, but just in case it hasn’t, here’s a post with the extra details my textbooks left out.

Kitty Genovese and the bystander effect: Back in my undergrad days, I learned all about Kitty Genovese, murdered in NYC while 37 people watched and did nothing. Her murder helped coin the term “bystander effect”, where large groups of people do nothing because they assume someone else will. It also helped prompt the creation of “911” the emergency number we all can call to report anything suspicious.

So what’s the problem? Well, the number 37 was made up by the reporter, and likely not even close to true. The New York Times had published the original article reporting on the crime, and in 2016 called their own reporting “flawed“. A documentary was made in 2015 by Kitty’s brother investigating what happened, and while there are no clear answers, what is clear is that a murder that occurred at 3:20am probably didn’t have 38 witnesses who saw anything, or even understood what they were hearing.

Zimbardo/Stanford Prison Experiment: The Zimbardo (or Stanford) Prison Experiment is a famous experiment in which study participants were asked to act as prisoners or guards in a multi-day recreation of a prison environment. However, things got quickly out of control and the guards got so cruel and the prisoners so rowdy that the whole thing had to be shut down early. This showed the tendency of good people to immediately conform to expectations when they were put in bad circumstances.

So what’s the problem? Well, basically the researcher coached a lot of the bad behavior. Seriously, there’s audio of him doing it. This directly contradicts his own statements later that there were no instructions given. Reporter Ben Blum went back and interviewed some of the participants who said they were acting how they thought the researchers wanted them to act. One guy said he freaked out because he wanted to get back to studying for his GREs and thought the meltdown would make them let him go early. Can bad circumstances and power imbalances lead people to act in disturbing ways? Absolutely, but this experiment does not provide the straightforward proof it’s often credited with.

The Robber’s Cave Study: A group of boys are camping in the wilderness and are divided in to two teams. They end up fighting each other based on nothing other than assigned team, but then come back together when facing a shared threat. This shows how tribalism works, and how we can overcome it through common enemies.

So what’s the problem? The famous/most reported on study was take two of the experiment. In the first version the researchers couldn’t get the boys to turn on each other, so they did a second try eliminating everything they thought had added group cohesion in the first try, and finally got the boys to behave as they wanted. There’s a whole book written about it and it showcases some rather disturbing behavior on the part of the head researcher Muzafer Sherif. He was never clear with the parents what type of experiment the boys were subjected to, and he actually both destroyed personal belongings himself (to blame it on the other team) and egged the boys on in their destruction. When Gina Perry wrote her book she found that many of the boys who participated (and are now in their 70s) were still unsettled by the experiment. Not great.

Milgram’s electric shock experiment: A study participant is brought in to a room and asked to administer an electric shock to a person they can’t see who is participating in another experiment. When the hidden person gets a question “wrong” they are supposed to zap them to help them learn. When they zap them, a recording plays of someone screaming in pain. It is found that 65% of people will administer a fatal shock to a person as long as the researcher keeps encouraging them to do so. This shows that our obedience to authority can override our own ethics.

So what’s the problem? Well, this one’s a little complicated. The original study was actually 1 of 19 studies conducted, all with varying rates of compliance. The most often reported findings were from the version of the experiment that resulted in the highest amount of compliance. A more recent study also reanalyzed participants behavior in light of their (self-reported) belief that the subject was actually in pain or not. One of the things the researchers told people to get them to continue was that the shocks were not dangerous, and it also appears many participants didn’t think what they were participating in was real, and it wasn’t. They found that those who either believed the researchers assurances or expressed skepticism about the entire experiment were far more likely to administer higher levels of voltage than those who believed the experiment was legit. To note though, there have been replication attempts that did find comparable compliance rates to Milgram’s, though the shock voltage has always been lower due to ethics concerns.

So overall, what can we learn from this? Well first and foremost that once study results hit psych textbooks, it can be really hard to correct the error. Even if kids today aren’t learning these things, many of us who took psych classes before the more recent scrutiny of these tests may keep repeating them.

Second, I think that we actually can conclude something rather dark about human nature, even if it’s not what we first thought. The initial conclusion of these studies is always something along the lines of “good people have evil lurking just under the surface”, when in reality the researchers had to try a few times to get it right. And yet this also shows us something….a person dedicated to producing a particular outcome can eventually get it if they get enough tries. One suspects that many evil acts were carried out after the instigators had been trying to inflame tensions for months or years, slowly learning what worked and what didn’t. In other words, random bad circumstances don’t produce human evil, but dedicated people probably can produce it if they try long enough. Depressing.

Alright, any studies you remember from Psych 101 that I missed?

Absolute Numbers, Proportions and License Suspensions

A few weeks ago I mentioned a new-ish Twitter account that was providing a rather valuable public service by Tweeting out absolute vs relative risk as stated in various news articles. It’s a good account because far too often scientific news is reported with things like “Cancer risk doubled” (relative risk) when the absolute risk went from .02% to .04%. Ever since I saw that account I’ve wondered about starting an “absolute numbers vs proportions” type account where you follow up news stories that compare absolute numbers for things against proportional rates to see if they are any different.

I was thinking about this again today because I got a request from some of my New Hampshire based readers this week to comment on a recent press conference held by the Governor of New Hampshire about their recent investigation in to their license suspension practices.

Some background: A few months ago there was a massive crash in Randolph, New Hampshire that killed 7 motorcyclists, many of them former Marines. The man responsible for the accident was a truck driver from Massachusetts who crossed in to their lane. In the wake of the tragedy, a detail emerged that made the whole thing even more senseless: he never should have been in possession of a valid drivers license. In addition to infractions spread over several states, a recent DUI in Connecticut should have resulted in him losing his commercial drivers license in Massachusetts. However, it appears that the Massachusetts RMV had never processed the suspension notice, so he still was driving legally. Would suspending his license have stopped him from driving that day? It’s not clear, but it certainly seems like things could have played out differently.

In the wake of this, the head of the Massachusetts RMV resigned, and both Massachusetts and New Hampshire ordered reviews of their processes for handling suspension notices sent to them by other states.

So back to the press conference. In it, Governor Sununu revealed the findings of their review, but took great care to emphasize that New Hampshire had done a much better job than Massachusetts in reviewing their out of state suspensions. He called the difference between the two states “night and day” and said “There was massive systematic failure in the state of Massachusetts. [The issue in MA was] so big; so widespread; that was not the issue here.”

He then provided more numbers to back up his claim. The two comparisons in the article above say that NH found their backlog of notices was 13,015, but MAs was 100,000. NH had sent suspension notices to 904 drivers based on the findings, MA had to send 2,476. Definitely a big difference, but I’m sure you can see where I’m going with this. The population of MA is just under 7 million people, and NH is just about 1.3 million. Looking at just the number of license drivers, it’s 4.7 million vs 1 million. So basically we’ve got a 5:1 ratio of MA to NH people. Thus a backlog of 13,000 would proportionally be 65,000 in MA (agreeing with Sununu’s point) but the 904 suspensions is proportionally much higher than MAs 2,476 (disagreeing with Sununu’s point). If you were to change it to the standard “per 100,000 people”, MA sent suspension notices to 52 people per 100,000 drivers, NH sent 90 per 100,000.

I couldn’t find the whole press conference video nor the white paper they said they wrote so I’m not sure if this proportionality issue was mentioned, but it wasn’t in anything I read. There were absolutely some absurd failures in Massachusetts, but I’m a bit leery of comparing absolute numbers when the base populations are so different. Base rates are an important concept, and one we should keep in mind, with or without a cleverly named Twitter feed.

Math aside, I do hope that all of these reforms help prevent similar issues in the future. This was a terrible tragedy, and unfortunately one that uncovered really gaps in the system that was supposed to deal with this sort of thing. Here’s hoping for peace for the victim’s families, and that everyone has a safe and peaceful Labor Day weekend!

 

Are You Rich?

A few weeks ago the New York Times put up a really interesting interactive “Are You Rich?” calculator that I found rather fascinating. While I always appreciate “richness” calculators that take metro region in to account (a surprising number don’t), I think the most interesting part is when they ask you to define “rich” before they give you the results.

This is interesting because of course many people use the word “rich” to simply mean “has more than I do”, so asking for a definition before giving results could surprise some people. In fact, they include this graph that shows that about a third of people in the 90th percentile for incomes still say they are “average”:

Now they include some interesting caveats here, and point out that not all of these people are delusional. Debt is not taken in to account in these calculations, so a doctor graduating med school with $175,000 in debt might quite rightfully feel their income was not the whole story.  Everyone I know (myself included) who finishes up with daycare and moves their kid in to public school jokes about the massive “raise” you get when you do that. On the flip side, many retirees have very low active income but might have a lot in assets that would give them a higher ranking if they were included.

That last part is relevant for this graph here, showing perceived vs actual income ranking. The data’s from Sweden, but it’s likely we’d see a similar trend in the US:

The majority of those who thought they were better off than they were are below 25th percentile, but we don’t know what they had in assets.

For the rest of it, someone pointed out on Twitter that while “rich people who don’t think they’re rich” get a lot of flack, believing you’re less secure than you are is probably a good thing. It likely pushes you to prepare for a rainy day a bit more. A country where everyone thought they were better off than they were would likely be one where many people made unwise financial decisions.

Interesting to note that the Times published this in part because finding out where you are on the income distribution curve is known to change your feelings about various taxation plans. In the Swedish study that generated the graph above, they found that those discovering they were in the upper half tended to be less supportive of social welfare taxation programs after they got the data. One wonders if some enterprising political candidate is eventually going to figure out how to put in kiosks at rallies or in emails to help people figure out if they benefit or not.

Life Expectancy and Record Keeping

Those of you who follow any sort of science/data/skepticism news on Twitter will have almost certainly have heard of the new pre-print taking the internet by storm this week: “Supercentenarians and the oldest-old are concentrated into regions with no birth certificates and short lifespans“.

This paper is making a splash for two reasons:

  1. It is taking on a hypothesis that has turned in to a cottage industry over the years.
  2. The statistical reasoning makes so much sense it makes you feel a little silly for not questioning point #1 earlier.

Of course #2 may be projection on my part, because I have definitely read the whole “Blue Zone” hypothesis (and one of the associated books) and never questioned the underlying data. So let’s go over what happened here, shall we?

For those of you not familiar with the whole “Blue Zone” concept, let’s start there. The Blue Zones were something popularized by Dan Buettner who wrote a long article about them for National Geographic magazine back in 2005. The article highlighted several regions in the world that seemed to have extraordinary longevity: Sardinia (Italy), Okinawa (Japan) and Loma Linda (California, USA). All of these areas seemed to have a very above average number of people living to be 100. They studied their habits to see if they could find anything the rest of us could learn. In the original article, that was this:

This concept proved so incredibly popular that Dan Buettner was able to write a book, then follow up books, then a whole company around the concept. Eventually Ikaria (Greece) and Nicoya Peninsula (Costa Rica) were added to the list.

As you can see the ultimate advice list obtained from these regions looks pretty good on its face. The idea that not smoking, making good family and social connections, daily activity and fruits and vegetables are good certainly isn’t turning conventional wisdom on it’s head. So what’s being questioned?

Basically the authors of the paper didn’t feel that alternative explanations for longevity had been adequately tested, specifically the hypothesis that maybe not all of these people were as old as they said they were or that otherwise bad record keeping was inflating the numbers. While many of the countries didn’t have clean data sets, they were able to pull some data sets from the US, and discovered that the chances of having people in your district live until they were 110 fell dramatically once state wide birth registration was introduced:

Now this graph is pretty interesting, and I’m not entirely sure what to make of it.  There seems to be a peak at around -15 years before implementation, which is interesting, with some notable fall off before birth registration is even introduced. One suspects birth registration might be some proxy for expanding records/increased awareness of birth year. Actually, now that I think about it, I bet we’re catching some WWI and WWII related things in here. I’m guessing the fall off before complete birth registration had something to do with the draft around those wars, where proving your age would have been very important. The paper notes that the years 1880 to 1900 have the most supercentenarians born in those years, and there was a draft in 1917 for men 21-30. Would be interesting to see if there’s a cluster of men at birth years just prior to 1887. Additionally the WWII draft start in 1941 went up to 45, so I wonder if there’s a cluster at 1897 or just before. Conversely, family lore says my grandfather exaggerated his age to join the service early in WWII, so it’s possible there are clusters at the young end too.

The other interesting thing about this graph is that it focused on supercentenarians, aka those who live to 110 or beyond. I’d be curious to seem the same data for centenarians (those who live to 100) to see if it’s as dramatic. A quick Google suggests that being a supercenetarian is really rare (300ish in the US out of 320 million) but 72,000 or so centenarians. Those living to 90 or over number well over a million. It’s much easier to overwhelm very rare event data with noise than more frequent data. I have the Blue Zone book on Kindle, so I did a quick search and noticed that he mentioned “supercenterians” 5 times, all on the same page. Centenarians are mentioned 167 times.

This is relevant because if we saw a drop off in all advanced ages when birth registrations were introduced, we’d know that this was potentially fraudulent. However, if we see that only the rarest ages were impacted, then we start to get in to issues like typos or other very rare events as opposed to systematic misrepresentation. Given the splash this paper has made already, I suspect someone will do that study soon. Additionally, the only US based “Blue Zone”, Loma Linda California, does not appear to have been studied specifically at all. That also may be worth looking at to see if the pattern still holds.

The next item the paper took a shot at was the non-US locations, specifically Okinawa and Sardinia. From my reading I had always thought those areas were known for being healthy and long lived, but the paper claims they are actually some of the poorest areas with the shortest life expectancies in their countries. This was a surprise to me as I had never seen this mentioned before. But here’s their data from Sardinia:

The Sardinian provinces are in blue, and you’ll note that there is eventually a negative correlation between “chance of living to 55” and “chance of living to 110”. Strange. In the last graph in particular there seem to be 3 provinces in particular that are causing the correlation to go negative, and one wonders what’s going on there. Considering Sardinia as a whole has a population of 1.6 million, it would only take a few errors to produce that rate of longevity.

On the other hand, I was a little surprised to see the author cite Sardinia as having on of the lowest life expectancies. Exact quote “Italians over the age of 100 are concentrated into the poorest, most remote and shortest-lived provinces,”. In looking for a citation for this, I found on Wiki this report (in Italian). It had this table:

If I’m using Google translate correctly, Sardegna is Sardinia and this is a life expectancy table from 2014. While it doesn’t show Sardinia having the highest life expectancy, it doesn’t show it having the lowest either. I tried pulling the Japanese reports, but unfortunately the one that it looks the most useful is in Japanese. As noted though, the paper hasn’t yet gone through peer review, so it’s possible some of this will be clarified.

Finally, I was a little surprised to see the author say “[these] patterns are difficult to explain through biology, but are readily explained as economic drivers of pension fraud and reporting error.” While I completely agree about errors, I do actually think there’s a plausible mechanism that would cause poor people who didn’t live to 55 as often to have longer lifespans. Deaths under 55 tend to be from things like accidents, suicide, homicide and congenital anomalies….external forces. The CDC lists the leading causes of death by age group here:

Over 55, we mostly switch to heart disease and cancer. A white collar office worker with a high stress job and bad eating habits may be more likely to live to 55 than a shepherd who could get trampled, but once they’re both 75 the shepherd may get the upper hand.

I’m not doubting the overall hypothesis by the way….I do think fraud or errors in record keeping can definitely introduce issues in to the data. Checking outliers to make sure they aren’t errors is key, and having some skepticism about source data is always warranted. After writing most of this post though, I decided to check back in on the Blue Zones book to see if they addressed this.  To my surprise, the book claims that at least in Sardinia, this was actually done. On page 25 and 26, they mention specifically how much doubt they faced and how one doctor personally examined about 200 people to help establish their truthfulness about their age. Dr Michel Poulain (a Belgian demographer) apparently was nominated by a professional society specifically to go to Sardinia to check for signs of fraud. According to the book, he visited the region ten times to review records and interview people. I have no idea how thorough he was or how his methods hold up, but his work seems at odds with the idea that someone just blindly pulled ages out of a database or the papers claim that “These results may reflect a neglect of error processes as a potential generative factor in remarkable age records”. Interestingly, I’d imagine WWI and WWII actually help with much of the work here. Since I’d imagine most people have very vivid memories of where they were and what they were doing during the war years, those stories might go far to establishing age.

Basically, it seems like sporadic exaggeration, error or fraud might give mistaken impressions about how many supercenteranian people there are overall, but I do wonder if having an unusual cluster brings enough scrutiny that we don’t have to worry as much that something was missed. In the Blue Zone book, they mention the group that brought attention to the Sardinians had helped debunk 3 other similar claims. Also, as mentioned, the paper doesn’t mention if the one US blue zone was one of the ones to get late birth registration, but I do know the Seventh Day Adventists are one of the most intensely studied groups in the country.

Anyway, given the attention and research that has been paid to these areas, I’d imagine we’re going to hear some responses soon.  Dr Poulain appears to still be active, and one suspects he will be responding to this questioning of his work. This post is getting my “things to check back in on” tag. Stay tuned!

 

 

Beard Science

As long as I’ve been alive, my Dad has had a full beard [1].

When I was a kid, this wasn’t terribly common. Over the years this has become surprisingly more common, and now the fact that his is closely trimmed is the uncommon part.

With the sudden increase in the popularity of beards, studying how people perceive bearded vs clean shaven men has gotten more popular. Some of this research is about how women perceive men with beards, and there’s actually a “peak beard” theory that suggests that women’s preferences for beards goes up as the number of men with beards goes down and vice versa.

This week though, someone decided to study a phenomena that has always fascinated me: small children’s reaction to men with beards. Watching my Dad (a father of 4 who is pretty good with kids) over the years, we have noted that kids do seem a little unnerved by the beard. Babies who have never met him seem to cry more often when handed to him, and toddlers seem more frightened of him. The immediacy of these reactions have always suggested that there’s something about his appearance that does it, and the beard is the obvious choice.

Well, some researchers must have had the same thought because a few weeks ago a paper “Children’s judgements [sic] of facial hair are influenced by biological
development and experience” was published that looks at children’s reactions to bearded men. The NPR write up that caught my eye is here, and it led with this line “Science has some bad news for the bearded: young children think you’re really, really unattractive.”. Ouch.

I went looking for the paper to see how true that was, and found that the results were not quite as promised (shocking!). The study had an interesting set up. They had 37 male volunteers get pictures taken of themselves clean shaven, then had them all grow a beard for the next 4-8 weeks and took another picture. This of course controls for any sort of selection bias, though to note the subjects were all of European decent. Children were then shown the two pictures of the same man and asked things like “which face looks best?” and “which face looks older?”. The results are here:


So basically the NPR lead in contained two slight distortions of the findings: kids never ranked people as “unattractive”, they just picked which face they thought looked best, and young kids actually weren’t the most down on beards, tweens were.

Interestingly, I did see a few people on Twitter note that their kids love their father with a beard, and it’s good to note the study actually looked at this too. The rankings used to make the graph above were done purely on preferences about strangers, but they did ask kids if they had a father with a beard. For at least some measures in some age groups, having exposure to beards made kids feel more positively about beards. For adults, having a father or acquaintances with beards in childhood resulted in finding beards more attractive in adulthood. It’s also good to note that the authors did use the Bonferri correction to account for multiple comparisons, so they were rigorous in looking for these associations.

Overall, some interesting findings. Based on the discussion, the working theory is that early on kids are mostly exposed to people with smooth faces (their peers, women) so they find smooth faces preferable. Apparently early adolescence is associated with an increased sensitivity to sex specific traits, which may be why the dip occurs at age 10-13. They didn’t report the gender breakdown so I don’t know if it’s girls or boys changing their preference, or both.

No word if anyone’s working on validating this scale:

[1] Well, this isn’t entirely true, there were two exceptions. Both times he looked radically different in a way that unnerved his family, but I was fascinated to note that some of his acquaintances/coworkers couldn’t figure out what was different. The beard covers about a third of his face. This is why eye witness testimony is so unreliable.

What I’m Reading: July 2019

As always, Our World in Data provides some interesting numbers to think about, this time with food supply and caloric intake by country.

This article on chronic lyme disease and the whole “medical issue as personal identity” phenomena was REALLY good and very thought provoking.

Ever want to know where the Vibranium from Black Panther would land on the periodic table of elements? Well, now there’s a paper out to help guide your thinking. More than just a fun paper to write up, the professors involved here actually asked their students this on an exam to see how they would reason through it. I’m in favor of questions like this (provided kids know to have watched the movie) as I think it can engage some different types of critical thinking in a way that can be more fun than traditional testing.

I mentioned to someone recently that I have a white noise app on my phone, but after testing it out I found that brown noise tends to be more effective in helping me sleep than white noise. They asked what the difference was, and I found this article that explains different color noises. In my experience the noises that tend to be loudest and most likely to interfere with sleep tend to hang out at the low end of the spectrum, YMMV.

A new study “Surrogate endpoints in randomised controlled trials: a reality check” gives an interesting word of warning to the cancer world. It’s common in clinical trials to use surrogate endpoints like “progression free survival” or “response rate”to figure out if drugs are working. This is done because overall survival can take a long time to get and researchers/patients/drug companies want results faster and it seems like if the surrogate markers are good the drugs can’t possibly hurt.

Unfortunately, it appears this isn’t the case. A new drug venetoclex was studied and patients on it were eventually found to have better progression free survival, but eventually twice as many deaths as those treated with regular treatment. Ugh. The lead author has a great Twitter thread on his paper here, where he suggests this means that either the drug is a “double edge sword” with both better efficacy and higher toxicity than alternatives, or that it’s a “wolf in sheep’s clothing” that makes things look good for a while but causes changes that means relapse is swift and deadly. Lots to think about here.

Finally, SSC has a good post up on bias arguments here. I especially like his points about when they are relevant.

Fentanyl Poisoning and Passive Exposure

The AVI sent along this article this week, which highlights the rising concern about passive fentanyl exposure among law enforcement.  They have a quote from a rehab counselor who claims that just getting fentanyl on your skin would be enough to addict you, and that merely entering a room where it’s in the air could cause instant addiction. Given that it’s Reason Magazine, they then promptly dispute the idea that this is actually happening.

I was interested in this article in part because my brother’s book contained the widely reported anecdote about the police officer who overdosed just by brushing fentanyl off of a fellow police officer. This anecdote has been seriously questioned since. Tim expressed concerns afterwards that had he realized this he would have left it out. I’ll admit that since my focus was mostly on his referenced scientific studies, I didn’t end up looking up various anecdotes he included.

This whole story indicates an interesting problem in health reporting. STAT news has more here, but there’s a couple things I noted. First, the viral anecdote really was widely reported, so I’m not surprised my brother heard about it. It has never technically been disproven….outside experts have said “it almost certainly couldn’t have happened this way” but neither the police officer nor the department have commented further. This makes it hard for the “probably not” articles to gain much traction.

Second, the “instant addiction” part was being pushed by a rehab counselor, not toxicologists who actually study how drugs interact with our body. Those experts point out that it took years to create a fentanyl patch that would get the drug to be absorbed through the skin, so the idea that skin contact is as effective as ingesting or breathing it in seems suspect.

Third, looking at the anecdotes, we realize these stories are NOT being reported by the highest risk groups. Pharmacists would be far more likely to accidentally brush away fentanyl than police officers, yet we do not hear these stories arising in hospital pharmacies. Plenty of patients have been legally prescribed fentanyl and do not suffer instant addiction. The fact that the passive exposure risk seems to only be arising in those who are around fentanyl in high stress circumstances suggests other things may be complicating this picture.

While this issue itself may be small in the grand scheme of things, it’s a good anecdote to support the theory that health fake news may actually cause the most damage. While political fake news seems to have most of our attention, fake or overblown stories about health issues can actually influence public policy or behavior. As the Reason article points out, if first responders delay care to someone who has overdosed because they are taking precautions against a risk that turns out to be overblown, the concern will hurt more people than it helps. Sometimes an abundance of precaution really can have negative outcomes.