Praiseworthy Wrongness: Genes in Space

Given my ongoing dedication to critiquing bad headlines/stories, I’ve decided to start making a regular-ish feature of people who get things wrong then work to make them right. Since none of us can ever be 100% perfect, I think a big part of cutting down on errors and fake news is going to be lauding those who are willing to walk back on what they say if they discover they made an error. I started this last month with an example of someone who realized she had asserted she was seeing gender bias in her emails when she wasn’t. Even though no one had access to the data but her, she came clean that her kneejerk reaction had been wrong, and posted a full analysis of what happened. I think that’s awesome.

Two days ago, I saw a similar issue arise with Live Science, who had published a story stating that after one year in space astronaut Scott Kelly had experienced significant changes (around 7%) to his genetic code. The finding was notable since Kelly is one half of an identical twin, so it seemed there was a solid control group.

The problem? The story got two really key words wrong, and it changed the meaning of the findings. The original article reported that 7% of Kelly’s genetic code had changed, but the 7% number actually referred to gene expression. The 7% was also a subset of changes….basically out of all the genes that changed their expression in response to space flight, 7% of those changes persisted after he came back to earth. This is still an extremely interesting finding, but nowhere near as dramatic as finding out that twins were no longer twins after space flight, or that Kelly wasn’t really human any more.

While the error was regrettable, I really appreciated what Live Science did next. Not only did they update the original story (with notice that they had done so), they also published a follow up under the headline “We Were Totally Wrong About that Scott Kelly Space Genes Story” explaining further how they erred. They also Tweeted out the retraction with this request:

This was a nice way of addressing a chronic problem in internet writing: controversial headlines tend to travel faster than their retractions. By specifically noting this problem, Live Science reminds us all that they can only do so much in the correction process. Fundamentally, people have to share the correction at the same rate they shared the original story for it to make a difference. While ultimately the original error was their fault, it will take more than just Live Science to spread the correct information.

In the new age of social media, I think it’s good for us all to take a look at how we can fix things. Praising and sharing retractions is a tiny step, but I think it’s an important one. Good on Live Science for doing what they could, then encouraging social media users to take the next step.

YouTube Radicals and Recommendation Bias

The Assistant Village Idiot passed along an interesting article about concerns being raised over YouTube’s tendency to “radicalize” suggestions in order to keep people on the site. I’ve talked before about the hidden dangers and biases algorithms can have over our lives, and this was an interesting example.

Essentially, it appears that YouTube has a tendency to suggest more inflammatory or radical content in response to both regular searches and in response to watching more “mainstream” viewing. So for example, if you search for the phrase “the Pope” as I just did in incognito mode on Chrome, it gives me these as the top 2 hits:

Neither of those videos are even the most watched Pope videos….scrolling down a bit shows some funny moments with the Pope (little boy steals the show) with 2.1 million hits and a Jimmy Kimmel bit on him with 4 million views.

According to the article, watching more mainstream news stories will quickly get you to more biased or inflammatory content. It appears that in it’s quest to make an algorithm that will keep users on the site, YouTube has created the digital equivalent of junk food…..content that is tempting but without a lot of substance.

It makes a certain amount of sense if you think about it. Users may not have time to really play around much on YouTube, unless the next thing they see is slightly more tempting than what they were originally looking for. Very few people would watch three videos in a row of Obama State of the Union Address coverage, but you might watch Obama’s State of the Union address followed by Obama’s last White House Correspondents Dinner talk followed by “Obama’s best comebacks” (the videos I got suggested to me when I looked for “Obama state of the Union”.

Even with benign things I’ve noticed this tendency. For example, my favorite go to YouTube channel after a long day is the Epic Rap Battles of History channel. After I’ve watched two or three videos, I started noticing it would point me towards videos from the creators lesser-watched personal channels. I actually had thought this was some sort of setting the creators set, but now I’m wondering if it’s the same algorithm. Maybe people doing random clicking gravitate towards lesser watched content as they keep watching. Who knows.

What makes this trend a little concerning is that so many young people use YouTube to learn about different things. My science teacher brother had mentioned seeing an uptick in kids spouting conspiracy theories in his classes, and I’m wondering if this is part of the reason. Back in my day, kids had to actually go looking for their offbeat conspiracy theories, now YouTube brings this right to them. In fact a science teacher who asks their kids to look for information on a benign topic may find that they’ve now inadvertently put them in the path of conspiracy theories that came up as video recommendations after the real science. It seems like this algorithm may have inadvertently stumbled on how to prime people for conversion to radical thought, just through collecting data.

According the the Wall Street Journal, YouTube is looking to tackle this problem, but it’s not clear how they’re  going to do that without running in to the same problems Facebook did when it started to crack down on fake news. It will be interesting to watch this develop, and it’s a good bias to keep in mind.

In the meantime, here’s my current favorite Epic Rap Battle:


What I’m Reading: March 2018

I’ve talked about salami slicing before, but Neuroskeptic has found perhaps the most egregious example of “split your data up and publish each piece individually” ever. An Iranian mental health study surveyed the whole population, then split up their results in to 31 papers….one for each Iranian province. They also wrote two summary papers, one of which got cited in each of the other 32. Now there’s a way to boost your publication count.

Also from Neuroskeptic: the fickleness of the media, and why we can’t have nice replications. Back in 2008, a study found that antidepressants worked mildly better than a placebo with a Standard Mean Difference of .32 (.2 is small, .5 is moderate). In 2018, another meta analysis found that they worked with a Standard Mean Difference of .3. Replication! Consistency! We have a real finding here! Right? Well, here are the Guardian headlines: 

Never trust the headlines.

In another interesting side by side, Scott Alexander Tweeted out the links to two different blog post write ups about the new “growth mindset” study. One calls it the “nail in the coffin” for the theory, the other calls it a successful replication.  Interesting to see the two different takes. The pre-print looks like it was taken down, but apparently they found that watching 2 25 minute videos about the growth mindset resulted in an average GPA boost of .03. However, it looks like that effect was higher for the most at risk students. The question appears to be if that effect is particular to the “growth mindset” instruction, or whether it’s really just a new way of emphasizing the value of hard work.

Also, close to my heart, are critical thinking and media literacy efforts backfiring? This one covers a lot of things I covered in my Two Ways to Be Wrong post. Sometimes teaching people to be critical results in people who don’t believe anything. No clear solution to this one.

I also just finished The Vanishing American Adult by Ben Sasse. Lots of interesting stuff in this book, particularly if you’re a parent in the child rearing years. One of the more interesting chapters covered building a reading list/bookshelf of the all time great books throughout history and encouraging your kids to tackle them. His list was good, but it always irks me a little that lists like these are so heavy on philosophy and literature and rarely include foundational mathematical or scientific books. I may have to work on this.



Delusions of Mediocrity

I mentioned recently that I planned on adding monthly(ish) to my GPD Lexicon page, and my IQ post from Sunday reminded me of a term I wanted to add. While many of us are keenly aware of the problem of “delusions of grandeur” (a false sense of one’s own importance), I think fewer people realize that thinking oneself too normal might also be a problem.

In some circles this happens a lot when topics  like IQ or salary come up, and a bunch of college educated people sit around and talk about how it’s not that much of an advantage to have a higher IQ or having an above average salary. While some people saying this are making good points, some are suffering a delusion of mediocrity. They are imagining in these discussions that their salary or IQ is “average” and that everyone is working in the same range as them and their social circle. In other words, they are debating IQ while only thinking about those with IQs above 110 or so, or salaries above the US median of $59,000.  In other words:

Delusions of Mediocrity: A false sense of one’s one averageness. Typically seen in those with above average abilities or resources who believe that most people live like they do.

Now I think most of us have seen this on a personal level, but I think it’s also important to remember it on a research level. When research finds things like “IQ is correlated with better life outcomes”, they’re not just comparing IQs of 120 to IQs of 130 and finding a difference….they’re comparing IQs of 80 to IQs of 120 and finding a difference.

On an even broader note, psychological research has been known to have a WEIRD problem. Most of the studies we see describing “human” behavior are actually done on those in Western, educated, industrialized, rich and democratic countries (aka WEIRD countries) that do NOT represent that majority of the world population. Even things like optical illusions have been found to vary by culture, so how can we draw conclusions about humanity while drawing from a group that represents only 12% of the world’s population? The fact that we don’t often question this is a mass delusion of mediocrity.

I think this all gets tempting because our own social circles tend to move in a narrow range. By virtue of living in a country, most of us end up seeing other people from that country the vast majority of the time. We also self segregate by neighborhood and occupation. Just another thing to keep in mind when you’re reading about differences.

5 Things About IQ Errors in Intro Psych Textbooks

A few months ago I did a post on common errors that arise when people try to self-estimate their IQ.  One concern I sort of covered at the time was that many people may not truly understand what IQ was. For example, there seems to be a tendency to confuse educational attainment with IQ, which is likely why many of us think our grandparents were not nearly as smart as we are.

I was thinking about this issue this past week when I saw a newly published study called “What Do Undergraduates Learn About Human Intelligence? An Analysis of
Introductory Psychology Textbooks“. As the study suggests, the authors took a look at intro psych textbooks to see what they say about IQ, and how well it aligns with the actual published research on IQ. So what did they find? Let’s take a look!

  1. Most of what undergrads learn about intelligence will be learned in intro psych. To back up the premise of the study, the authors looked at the topics covered in psych programs around the country. They determined that classes on intelligence were actually pretty rare, and that the primary coverage the topic got was in intro psych. Once they’d established this, they were able to pull the 30 most popular intro psych textbooks, and they chose to analyze those. Given the lack of subsequent classwork and the popularity of the textbooks used, they estimate that their study covers a huge proportion of the formal instruction/guidance/learning on intelligence that goes on in the US.
  2. The percent of space dedicated to discussing intelligence has dropped The first research question the authors wanted to look at was how much space was dedicated to explaining IQ/intelligence research to students. In the 80s, this was 6% of textbook space, but now it’s about 3-4%. Now it’s possible that this is because textbooks got longer (and thus the percent dropped), or it could be that the topic got de-emphasized. Regardless, an interesting note.
  3. IQ Fallacies were pretty common The list of possible IQ “fallacies” was drawn from two sources. The first was from this article by Gottfredson et al, which was published after “The Bell Curve” came out and had 52 signatories who wanted to clear up what current research on IQ said. The second paper was a statement from the American Psychological Association, also in response to the publicity around the Bell Curve. They used these two papers to generate the following list:  The most common fallacies they found were #2, 3 4 and 6. These were present in 8 books (2 and 3) and 6 books (4 and 6) respectively. Interestingly, for #3 they specifically clarified that they only called it a fallacy if someone asserted that you could raise IQ by adding a positive action as opposed to eliminating a negative action. Their example was that lead poisoning really does provably lower IQ, but fish oil supplements during pregnancy have not been proven to raise IQ. The initial two papers explain why these are viewed as fallacies.
  4. Briefs discussions led to inaccuracies In addition to fallacies, the authors also took a look at inaccuracies, questionable theories, and the proportionate amount of time authors spent looking at various topics. Many of the textbooks committed the errors of citing part of the story, but not the full story. For example, it was noted that testing bias was well covered, but not the efforts that have been made to correct for testing bias. Some textbooks went so far as to say that all IQ tests required you to speak English, where as nonverbal tests have been available as far back as 1936. Additionally, some theories of intelligence that have not born out well (Gardner’s theory of multiple intelligences and Sternberg’s triarchic theory of intelligence) were two of the most discussed topics in textbooks, but did not include a discussion of the literature supporting those vs the g theory of intelligence. I imagine the oversimplification issue is one that affects many topics in intro textbooks, but this does seem a bit of an oversight.
  5. Overall context of intelligence scores was minimized Despite good proof that intelligence scores are positively correlated with various good outcomes, the most surprising finding was that several textbooks said directly that IQ only impacted education and had little relevance to every day life (4 textbooks). This directly contradicts most current research, and also a certain amount of common sense. Even if IQ only helped you in academia, having a degree helps you in many other areas of life, such as income and all the advantages that brings.

Overall this was a pretty interesting paper, especially when they gave examples of the type of statements they were talking about. Reading the statement from the APA and comparing it to the textbooks was rather interesting, as it shows how far it is possible it is to drift from consensus if you’re not careful.

Additionally, the authors cited some interesting work to show that some popular public misconceptions around IQ are directly mirrored in the intro psych textbooks errors. Overall I think the point is well taken that intro to anything textbooks should be given a lot of scrutiny in making sure their claims are factual before being assigned.

On Wansink, the Joy of Cooking, and Multiple Comparisons

I’m down with a terrible cold this week, so I figured I’d just do a short update on the Buzzfeed article everyone’s sending me about the latest on Brian Wansink . The article does a pretty good job of recapping the situation up until now, so feel free to dive on in to the drama.

The reason this update is particularly juicy is because somehow Buzzfeed got a hold of a whole bunch of emails from within the lab, and it turns out a lot of the chicanery was a feature not a bug. The whole thing is so bad that even the Joy of Cooking went after Wansink today on Twitter, and DAMN is that a cookbook that can hold a grudge. Posting the whole thread because it’s not every day you see a cookbook publisher get in to it about research methodology:

Now normally I would think this was a pretty innocuous research methods dispute, but given Wansink’s current situation, it’s hard not to wonder if the cookbook has a point. Given what we now know about Wansink, the idea that he was chasing headlines seems a pretty reasonable charge.

However, in the (rightful) rush to condemn Wansink, I do want to make sure we don’t get too crazy here. For example, the Joy of Cooking complains that Wansink only picked out 18 recipes to look at out of 275. In and of itself, that is NOT a problem. Sampling from a larger group is how almost all research is done. The problem only arises if those samples aren’t at least somewhat random, or if they’re otherwise cherry picked. If he really did use recipes with no serving sizes to prove that “serving sizes have increased” that’s pretty terrible.

Andrew Gelman makes a similar point about one of the claims in the Buzzfeed article. Lee (the author) stated that “Ideally, statisticians say, researchers should set out to prove a specific hypothesis before a study begins.” While Gelman praises Lee for following the story and says Wansink’s work is “….the most extraordinary collection of mishaps, of confusion, that has ever been gathered in the scientific literature – with the possible exception of when Richard Tol wrote alone.” he also gently cautions that we shouldn’t go to far. The problem, he says, is not that Wansink didn’t start out with a specific hypothesis or that he ran 400 comparisons, it’s that he didn’t include that part in the paper.

I completely agree with this, and it’s a point everyone should remember.

For example, when I wrote my thesis paper, I did a lot of exploratory data analysis. I had 24 variables, and I compared all of them to obesity rates and/or food insecurity status. I didn’t have a specific hypothesis about which one would be significant, I just ran all the comparisons. When I put the paper together though, I included every comparison in the Appendix, clarified the number I did, and then focused on discussing the ones whose p values were particularly low. My cutoff was .05, but I used the Bonferri correction method to figure out which ones to talk about. That method is pretty simple….if you do 20 comparisons and want an alpha of less than .05, you divide .05 by 20 = .0025. I still got significant results, and I had the bonus of giving everyone all the information. If anyone ever wanted to replicate any part of what I did, or compare a different study to mine, they could do so.

Gelman goes on to point out that in many cases there really isn’t one “right” way of doing stats, so the best we can do is be transparent. He sums up his stance like this: “Ideally, statisticians say, researchers should report all their comparisons of interest, as well as as much of their raw data as possible, rather than set out to prove a specific hypothesis before a study begins.”

This strikes me as a very good lesson. Working with uncertainty is hard and slow going, but we have to make due with what we have. Otherwise we’ll be throwing out every study that doesn’t live up to some sort of hyper-perfect ideal, which will make it very hard to do any science at all. Questioning is great, but believing nothing is not the answer. That’s a lesson we all could benefit from. Well, that and “don’t piss off a cookbook with a long memory.” That burn’s gonna leave a mark.


Biology/Definitions Throwback: 1946

It’s maple syrup season up here in New England, which means I spent most of yesterday in my brother’s sugar shack watching sap boil and teaching my son things like how to carry logs, stack wood, and sample the syrup. Maple syrup making is a fun process, mostly because there’s enough to do to make it interesting, but not so much to do that you can’t be social while doing it.

During the course of the day, my brother mentioned that he had found an old stack of my grandfather’s textbooks, published in the 1940s:

Since he’s a biology teacher, he was particularly interested in that last one. When he realized there was a chapter on “heredity and eugenics” he of course had to start there. There were a few interesting highlights of this section. For example, like most people in 1946, the authors were pretty convinced that proteins were responsible for heredity. This wasn’t overly odd, since even the guy who discovered DNA thought proteins were the real workhorses of inheritance. Still, it was interesting to read such a wrong explanation for something we’ve been taught our whole lives.

Another interesting part was where they reminded readers that despite the focus on the father’s role in heredity, that there was scientific consensus that children also inherited traits from their mother. Thanks for the reassurance.

Then there was their descriptions of mental illness. It was interesting that some disorders (manic depressive, schizophrenia) were clearly being recognized and diagnosed in a way that was at least recognizable today, while others were not mentioned at all (autism). Then there were entire categories we’ve done away with, such as feeblemindedness, along with the “technical” definitions for terms like idiot and moron:

I have no idea how commonly those were used in real life, but it was an odd paragraph to read.

Of course this is the sort of thing that tends to make me reflective. What are we convinced of now that will look sloppy and crude in the year 2088?