5 Things About Precognition Studies

March 25, 2018March 25, 2018 / bs king / 2 Comments

Several months ago now, I was having dinner with a friend who told me he was working on some science fiction based on some interesting precognition studies he had heard about. As he started explaining them to me and how they was real scientific proof of ESP, he realized who he was talking to and quickly got sheepish and told me to “be gentle” when I ended up doing a post about it. Not wanting to kill his creative momentum, I figured I’d delay this post for a bit. I stumbled on the draft this morning and realized it’s probably been long enough now, so let’s talk about the paranormal!

First, I should set the stage and say that my friend was not actually wrong to claim that precognition has some real studies behind it. Some decent research time and effort has been put in to experiments where researchers attempt to show that people react to things that haven’t happened yet. In fact the history of this work is a really interesting study in scientific controversy and it tracks quite nicely with much of the replication crisis I’ve talked about. This makes it a really interesting topic for anyone wanting to know a bit more about the pluses/minuses of current research methods.

As we dig in to this, it helps to know a bit of background: Almost all of the discussions about this are referencing a paper by Daryl Bem from 2011, where 9 different studies were run on the phenomena. Bem is a respected psychological researcher, so the paper made quite a splash at the time. So what did these studies say and what should we get out of them, and why did they have such a huge impact on psychological research? Let’s find out!

The effect sizes were pretty small, but they were statistically significant Okay, so first things first….let’s establish what kind of effect size we’re talking about here. For all 9 experiments the Cohen’s d was about .22. In general, a d of .2 is considered a “small” effect size, .5 would be moderate, .8 would be large. In the real world, this translated in to participants picking the “right” option 53% of the time instead of the 50% you’d expect by chance.
The research was set up to be replicated One of the more interesting parts of Bem’s research was that he made his protocols publicly available for people trying to replicate his work, and he did this before he actually published the initial 2011 paper. Bem particularly pointed people to experiments #8 and #9, which showed the largest effect sizes and he thought would be the easiest to replicate. In these studies, he had people try to recall words off of a word list, writing down those they could remember. He then gave them a subset of those words to study more in depth, again writing down what they could remember. When they looked back, they found that subjects had recalled more of their subset words than control words on the first test. Since the subjects hadn’t seen their subset words at the time they took the first test, this was taken as evidence of precognition.
Replication efforts have been….interesting. Of course with interesting findings like these, plenty of people rushed to try to replicate Bem’s work. Many of these attempts failed, but Bem published a meta-analysis stating that on the whole they worked. Interestingly however, the meta-analysis actually analyzed replications that pre-dated the publication of Bem’s work. Since Bem had released his software early, he was able to find papers all the way back to 2001. It has been noted that if you remove all the citations that pre-dated the publication of his paper, you don’t see an effect. So basically the pre-cognition paper was pre-replicated. Very meta.
They are an excellent illustration of the garden of forking paths. Most of the criticism of the paper comes down to something Andrew Gelman calls “The Garden of Forking Paths“. This is a phenomena in which researchers make a series of tiny decisions as their experiments and analyses progress, which may add up to serious deviation from the original results. In the Bem study for example, it has been noted that some of his experiments actually used two different protocols, then combined the results. It was also noted that the effect sizes got smaller as more subjects were added, suggesting that the number of subjects tested may have fluctuated based on results. There are also decisions so small you mostly wouldn’t notice. For example, in the word recall study mentioned above, word recall was measured by comparing word lists for exact matches. This meant that if you spelled “retrieve” as “retreive”, it didn’t automatically give you credit. They had someone go through and correct for this manually, but that person actually knew which words were part of the second experiment and which were the control words. Did the reviewer inadvertently focus on or give more credit to words that were part of the “key word” list? Who knows, but small decisions like this can add up. There were also different statsticall analyses performed on different experiments, and Bem himself admits that if he started a study and got no results, he’d tweak it a little and try again. When you’re talking about an effect size of .22, even tiny changes can add up.
The ramifications for all of psychological science were big It’s tempting to write this whole study off, or to accept it wholesale, but the truth is a little more complicated. In a thorough write-up over at Slate, Daniel Engber points out that this research used typical methods and invited replication attempts and still got a result many people don’t believe is possible. If you don’t believe the results are possible, then you really should question how often these methods are used in other research. As one of the reviewers put it “Clearly by the normal rules that we [used] in evaluating research, we would accept this paper. The level of proof here was ordinary. I mean that positively as well as negatively. I mean it was exactly the kind of conventional psychology analysis that [one often sees], with the same failings and concerns that most research has”. Even within the initial paper, the word “replication” was used 23 times. Gelman rebuts that all the problems with the paper are known statistical issues and that good science can still be done, but it’s clear this paper pushed many people to take good research methods a bit more seriously.

So there you have it. Interestingly, Bem actually works out of Cornell and has been cited in the whole Brian Wansink kerfluffle, a comparison he rejects. I think that’s fair. Bem has been more transparent about what he’s doing, and did invite replication attempts. In fact his calls for people to look at his work were so aggressive, there’s a running theory that he published the whole thing to make a point about the shoddiness of most research methods. He’s denied this, but that certainly was the effect. An interesting study on multiple levels.

6 Year Blogiversary: Things I’ve Learned

March 21, 2018March 19, 2018 / bs king / 1 Comment

Six years ago today I began blogging (well, at the old site) with a rather ambitious mission statement. While I don’t have quite as much hubris now as I did then, I was happy to see that I actually stand by most of what I said when I kicked this whole thing off. Six years, 647 posts, a few hiatuses and one applied stats degree later, I think 2012 BS King would be pretty happy with how things turned out.

I actually went looking for my blogiversary date because of a recent discussion I had about the 10,000 hour rule myth. The person I was talking to had mentioned that after all these years of blogging my writing must have improved dramatically, and I mentioned that the difference was probably not as big as you might think. While I do occasionally get feedback on grammar or confusing sentences, no one sits down with bloggers and tells them “hey you really should have combined those two sentences” or “paragraph three was totally unnecessary”. In the context of the 10,000 hour rule, this means I’m lacking the “focused practice” that would truly make me a better writer. To truly improve you need both quality AND quantity in your practice.

The discussion got me wondering a bit…what skills does blogging help you hone? If the ROI for writing is minimal, what does it help me with? I mean, there’s a lot of stuff I love about it: the exchange of ideas, meeting interesting people, getting to talk about the geeky topics I want to talk about, thinking more about how I explain statistics and having people send me interesting stuff. But does any of that result in the kind of focused practice and feedback that improves a skill?

As I mulled it over, I realized there are two main areas I’ve improved in, one smaller, one bigger. The first is simply finding more colorful examples for statistical concepts. Talking to high school students helps with this, as those kids are unapologetic about falling asleep on you if you bore them. Blogging and thinking about this stuff all the time means I end up permanently on the lookout for new examples, and since I tend to blog about the best ones, I can always find them again.

The second thing I’ve improved on is a little more subtle. Right after I put this blog up, I established some ground rules for myself. While I’ve failed miserably at some of these (apostrophes are still my nemesis), I have really tried to stick to discussing data over politics. This is tricky because most of the data people are interested in is political in nature, so I can’t avoid blogging about it. Attempting to figure out how to explain a data issue routed in a political controversy with a reader base that contains highly opinionated conservatives, liberals and a smattering of libertarians has taught me a LOT about what words are charged and which aren’t. This has actually transferred over to my day job, where I occasionally get looped in to situations just so I can “do that thing where you recap what everyone’s saying without getting anyone mad”.

I even notice this when I’m reading other things now, how often people attempt to subtly bias their words in one direction or another while claiming to be “neutral”. While I would never say I am perfect at this, I believe the feedback I’ve gotten over the years has definitely improved my ability to present an issue neutrally, which I hope leads to a a better discussion about where data goes wrong. Nothing has made me happier over the years than hearing people who I know feel strongly about an issue agree to stop using certain numbers and to use better ones instead.

So six years in, I suppose I just want to say thank you to everyone who’s read here over the years, given me feedback, kept me honest, and put up with my terrible use of punctuation and run on sentences. You’ve all made me laugh, and made me think, and I appreciate you taking the time to stop on by. Here’s to another year!

Praiseworthy Wrongness: Genes in Space

March 18, 2018March 18, 2018 / bs king

Given my ongoing dedication to critiquing bad headlines/stories, I’ve decided to start making a regular-ish feature of people who get things wrong then work to make them right. Since none of us can ever be 100% perfect, I think a big part of cutting down on errors and fake news is going to be lauding those who are willing to walk back on what they say if they discover they made an error. I started this last month with an example of someone who realized she had asserted she was seeing gender bias in her emails when she wasn’t. Even though no one had access to the data but her, she came clean that her kneejerk reaction had been wrong, and posted a full analysis of what happened. I think that’s awesome.

Two days ago, I saw a similar issue arise with Live Science, who had published a story stating that after one year in space astronaut Scott Kelly had experienced significant changes (around 7%) to his genetic code. The finding was notable since Kelly is one half of an identical twin, so it seemed there was a solid control group.

The problem? The story got two really key words wrong, and it changed the meaning of the findings. The original article reported that 7% of Kelly’s genetic code had changed, but the 7% number actually referred to gene expression. The 7% was also a subset of changes….basically out of all the genes that changed their expression in response to space flight, 7% of those changes persisted after he came back to earth. This is still an extremely interesting finding, but nowhere near as dramatic as finding out that twins were no longer twins after space flight, or that Kelly wasn’t really human any more.

While the error was regrettable, I really appreciated what Live Science did next. Not only did they update the original story (with notice that they had done so), they also published a follow up under the headline “We Were Totally Wrong About that Scott Kelly Space Genes Story” explaining further how they erred. They also Tweeted out the retraction with this request:

Over 320K people read a story that we got wrong. We regret the error; here is the correction. Please retweet to help us spread the word: https://t.co/x6koh4kp0q

— Live Science (@LiveScience) March 16, 2018

This was a nice way of addressing a chronic problem in internet writing: controversial headlines tend to travel faster than their retractions. By specifically noting this problem, Live Science reminds us all that they can only do so much in the correction process. Fundamentally, people have to share the correction at the same rate they shared the original story for it to make a difference. While ultimately the original error was their fault, it will take more than just Live Science to spread the correct information.

In the new age of social media, I think it’s good for us all to take a look at how we can fix things. Praising and sharing retractions is a tiny step, but I think it’s an important one. Good on Live Science for doing what they could, then encouraging social media users to take the next step.

YouTube Radicals and Recommendation Bias

March 14, 2018March 13, 2018 / bs king / 1 Comment

The Assistant Village Idiot passed along an interesting article about concerns being raised over YouTube’s tendency to “radicalize” suggestions in order to keep people on the site. I’ve talked before about the hidden dangers and biases algorithms can have over our lives, and this was an interesting example.

Essentially, it appears that YouTube has a tendency to suggest more inflammatory or radical content in response to both regular searches and in response to watching more “mainstream” viewing. So for example, if you search for the phrase “the Pope” as I just did in incognito mode on Chrome, it gives me these as the top 2 hits:

Neither of those videos are even the most watched Pope videos….scrolling down a bit shows some funny moments with the Pope (little boy steals the show) with 2.1 million hits and a Jimmy Kimmel bit on him with 4 million views.

According to the article, watching more mainstream news stories will quickly get you to more biased or inflammatory content. It appears that in it’s quest to make an algorithm that will keep users on the site, YouTube has created the digital equivalent of junk food…..content that is tempting but without a lot of substance.

It makes a certain amount of sense if you think about it. Users may not have time to really play around much on YouTube, unless the next thing they see is slightly more tempting than what they were originally looking for. Very few people would watch three videos in a row of Obama State of the Union Address coverage, but you might watch Obama’s State of the Union address followed by Obama’s last White House Correspondents Dinner talk followed by “Obama’s best comebacks” (the videos I got suggested to me when I looked for “Obama state of the Union”.

Even with benign things I’ve noticed this tendency. For example, my favorite go to YouTube channel after a long day is the Epic Rap Battles of History channel. After I’ve watched two or three videos, I started noticing it would point me towards videos from the creators lesser-watched personal channels. I actually had thought this was some sort of setting the creators set, but now I’m wondering if it’s the same algorithm. Maybe people doing random clicking gravitate towards lesser watched content as they keep watching. Who knows.

What makes this trend a little concerning is that so many young people use YouTube to learn about different things. My science teacher brother had mentioned seeing an uptick in kids spouting conspiracy theories in his classes, and I’m wondering if this is part of the reason. Back in my day, kids had to actually go looking for their offbeat conspiracy theories, now YouTube brings this right to them. In fact a science teacher who asks their kids to look for information on a benign topic may find that they’ve now inadvertently put them in the path of conspiracy theories that came up as video recommendations after the real science. It seems like this algorithm may have inadvertently stumbled on how to prime people for conversion to radical thought, just through collecting data.

According the the Wall Street Journal, YouTube is looking to tackle this problem, but it’s not clear how they’re going to do that without running in to the same problems Facebook did when it started to crack down on fake news. It will be interesting to watch this develop, and it’s a good bias to keep in mind.

In the meantime, here’s my current favorite Epic Rap Battle:

What I’m Reading: March 2018

March 11, 2018March 9, 2018 / bs king / 3 Comments

I’ve talked about salami slicing before, but Neuroskeptic has found perhaps the most egregious example of “split your data up and publish each piece individually” ever. An Iranian mental health study surveyed the whole population, then split up their results in to 31 papers….one for each Iranian province. They also wrote two summary papers, one of which got cited in each of the other 32. Now there’s a way to boost your publication count.

Also from Neuroskeptic: the fickleness of the media, and why we can’t have nice replications. Back in 2008, a study found that antidepressants worked mildly better than a placebo with a Standard Mean Difference of .32 (.2 is small, .5 is moderate). In 2018, another meta analysis found that they worked with a Standard Mean Difference of .3. Replication! Consistency! We have a real finding here! Right? Well, here are the Guardian headlines:

Never trust the headlines.

In another interesting side by side, Scott Alexander Tweeted out the links to two different blog post write ups about the new “growth mindset” study. One calls it the “nail in the coffin” for the theory, the other calls it a successful replication. Interesting to see the two different takes. The pre-print looks like it was taken down, but apparently they found that watching 2 25 minute videos about the growth mindset resulted in an average GPA boost of .03. However, it looks like that effect was higher for the most at risk students. The question appears to be if that effect is particular to the “growth mindset” instruction, or whether it’s really just a new way of emphasizing the value of hard work.

Also, close to my heart, are critical thinking and media literacy efforts backfiring? This one covers a lot of things I covered in my Two Ways to Be Wrong post. Sometimes teaching people to be critical results in people who don’t believe anything. No clear solution to this one.

I also just finished The Vanishing American Adult by Ben Sasse. Lots of interesting stuff in this book, particularly if you’re a parent in the child rearing years. One of the more interesting chapters covered building a reading list/bookshelf of the all time great books throughout history and encouraging your kids to tackle them. His list was good, but it always irks me a little that lists like these are so heavy on philosophy and literature and rarely include foundational mathematical or scientific books. I may have to work on this.

Delusions of Mediocrity

March 7, 2018March 6, 2018 / bs king / 3 Comments

I mentioned recently that I planned on adding monthly(ish) to my GPD Lexicon page, and my IQ post from Sunday reminded me of a term I wanted to add. While many of us are keenly aware of the problem of “delusions of grandeur” (a false sense of one’s own importance), I think fewer people realize that thinking oneself too normal might also be a problem.

In some circles this happens a lot when topics like IQ or salary come up, and a bunch of college educated people sit around and talk about how it’s not that much of an advantage to have a higher IQ or having an above average salary. While some people saying this are making good points, some are suffering a delusion of mediocrity. They are imagining in these discussions that their salary or IQ is “average” and that everyone is working in the same range as them and their social circle. In other words, they are debating IQ while only thinking about those with IQs above 110 or so, or salaries above the US median of $59,000. In other words:

Delusions of Mediocrity: A false sense of one’s one averageness. Typically seen in those with above average abilities or resources who believe that most people live like they do.

Now I think most of us have seen this on a personal level, but I think it’s also important to remember it on a research level. When research finds things like “IQ is correlated with better life outcomes”, they’re not just comparing IQs of 120 to IQs of 130 and finding a difference….they’re comparing IQs of 80 to IQs of 120 and finding a difference.

On an even broader note, psychological research has been known to have a WEIRD problem. Most of the studies we see describing “human” behavior are actually done on those in Western, educated, industrialized, rich and democratic countries (aka WEIRD countries) that do NOT represent that majority of the world population. Even things like optical illusions have been found to vary by culture, so how can we draw conclusions about humanity while drawing from a group that represents only 12% of the world’s population? The fact that we don’t often question this is a mass delusion of mediocrity.

I think this all gets tempting because our own social circles tend to move in a narrow range. By virtue of living in a country, most of us end up seeing other people from that country the vast majority of the time. We also self segregate by neighborhood and occupation. Just another thing to keep in mind when you’re reading about differences.

5 Things About IQ Errors in Intro Psych Textbooks

March 4, 2018March 4, 2018 / bs king / 2 Comments

A few months ago I did a post on common errors that arise when people try to self-estimate their IQ. One concern I sort of covered at the time was that many people may not truly understand what IQ was. For example, there seems to be a tendency to confuse educational attainment with IQ, which is likely why many of us think our grandparents were not nearly as smart as we are.

I was thinking about this issue this past week when I saw a newly published study called “What Do Undergraduates Learn About Human Intelligence? An Analysis of
Introductory Psychology Textbooks“. As the study suggests, the authors took a look at intro psych textbooks to see what they say about IQ, and how well it aligns with the actual published research on IQ. So what did they find? Let’s take a look!

Most of what undergrads learn about intelligence will be learned in intro psych. To back up the premise of the study, the authors looked at the topics covered in psych programs around the country. They determined that classes on intelligence were actually pretty rare, and that the primary coverage the topic got was in intro psych. Once they’d established this, they were able to pull the 30 most popular intro psych textbooks, and they chose to analyze those. Given the lack of subsequent classwork and the popularity of the textbooks used, they estimate that their study covers a huge proportion of the formal instruction/guidance/learning on intelligence that goes on in the US.
The percent of space dedicated to discussing intelligence has dropped The first research question the authors wanted to look at was how much space was dedicated to explaining IQ/intelligence research to students. In the 80s, this was 6% of textbook space, but now it’s about 3-4%. Now it’s possible that this is because textbooks got longer (and thus the percent dropped), or it could be that the topic got de-emphasized. Regardless, an interesting note.
IQ Fallacies were pretty common The list of possible IQ “fallacies” was drawn from two sources. The first was from this article by Gottfredson et al, which was published after “The Bell Curve” came out and had 52 signatories who wanted to clear up what current research on IQ said. The second paper was a statement from the American Psychological Association, also in response to the publicity around the Bell Curve. They used these two papers to generate the following list: The most common fallacies they found were #2, 3 4 and 6. These were present in 8 books (2 and 3) and 6 books (4 and 6) respectively. Interestingly, for #3 they specifically clarified that they only called it a fallacy if someone asserted that you could raise IQ by adding a positive action as opposed to eliminating a negative action. Their example was that lead poisoning really does provably lower IQ, but fish oil supplements during pregnancy have not been proven to raise IQ. The initial two papers explain why these are viewed as fallacies.
Briefs discussions led to inaccuracies In addition to fallacies, the authors also took a look at inaccuracies, questionable theories, and the proportionate amount of time authors spent looking at various topics. Many of the textbooks committed the errors of citing part of the story, but not the full story. For example, it was noted that testing bias was well covered, but not the efforts that have been made to correct for testing bias. Some textbooks went so far as to say that all IQ tests required you to speak English, where as nonverbal tests have been available as far back as 1936. Additionally, some theories of intelligence that have not born out well (Gardner’s theory of multiple intelligences and Sternberg’s triarchic theory of intelligence) were two of the most discussed topics in textbooks, but did not include a discussion of the literature supporting those vs the g theory of intelligence. I imagine the oversimplification issue is one that affects many topics in intro textbooks, but this does seem a bit of an oversight.
Overall context of intelligence scores was minimized Despite good proof that intelligence scores are positively correlated with various good outcomes, the most surprising finding was that several textbooks said directly that IQ only impacted education and had little relevance to every day life (4 textbooks). This directly contradicts most current research, and also a certain amount of common sense. Even if IQ only helped you in academia, having a degree helps you in many other areas of life, such as income and all the advantages that brings.

Overall this was a pretty interesting paper, especially when they gave examples of the type of statements they were talking about. Reading the statement from the APA and comparing it to the textbooks was rather interesting, as it shows how far it is possible it is to drift from consensus if you’re not careful.

Additionally, the authors cited some interesting work to show that some popular public misconceptions around IQ are directly mirrored in the intro psych textbooks errors. Overall I think the point is well taken that intro to anything textbooks should be given a lot of scrutiny in making sure their claims are factual before being assigned.

On Wansink, the Joy of Cooking, and Multiple Comparisons

February 28, 2018February 27, 2018 / bs king / 6 Comments

I’m down with a terrible cold this week, so I figured I’d just do a short update on the Buzzfeed article everyone’s sending me about the latest on Brian Wansink . The article does a pretty good job of recapping the situation up until now, so feel free to dive on in to the drama.

The reason this update is particularly juicy is because somehow Buzzfeed got a hold of a whole bunch of emails from within the lab, and it turns out a lot of the chicanery was a feature not a bug. The whole thing is so bad that even the Joy of Cooking went after Wansink today on Twitter, and DAMN is that a cookbook that can hold a grudge. Posting the whole thread because it’s not every day you see a cookbook publisher get in to it about research methodology:

(THREAD) Inspired by @stpehaniemlee ’s new piece, we have decided to share this. We have the dubious honor of being a victim of @BrianWansink and Collin R. Payne’s early work. pic.twitter.com/s4NUd1YpqC

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

1/ We applaud the work of @Research_Tim , @OmnesResNetwork , and @sTeamTraen as they continue to systematically dismantle Wansink’s work ( https://t.co/CodG19BMA7 ). Unfortunately, this letter remains unchallenged in public discourse, hence this thread.

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

3/ After much digging, we found a complete list of the recipes they used in an interview with @webmd (https://t.co/ZXfYJQD9d8). We took some time to perform our own research using USDA-grade nutritional analysis software from @esharesearch .

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

5/ They arbitrarily made up portion sizes where none existed: muffin tin size may have increased, but how many muffins were a serving in 1936? (2.66 apparently)

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

7/ That’s a 6.8% sample size of recipes that remained in Joy since 1936. If taking the 2006 edition as a whole, it works out to less than a percent—which is important when you consider the “virtuous” chapters we added or expanded (Salads, Grains, Veg, etc).

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

9/ Regardless of these troubles, their “results” conveniently matched the conclusions of well-respected scholars (like @MarionNestle ) studying the contribution of restaurant menus, packaged foods, and sodas to the US obesity epidemic.

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

11/ Wansink’s letter is brilliant from a marketing perspective (his academic specialty, btw): aimed squarely at public-policy-minded nutrition scholarship, as well as media outlets on the hunt for a new obesity culprit.

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

13/ Regardless of their agenda (raising awareness of over-eating, calorie intake), Wansink is a bad researcher, and the rote repetition of his work needs to stop. Bad science does not make good public policy, or contribute to our ongoing disc re: health, cooking, and consumption.

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

PS/ It’s a little messed up that the only journalist to provide (ask for?) a full list of the recipes they used was writing for @webmd

— Joy of Cooking (@TheJoyofCooking) February 27, 2018

Now normally I would think this was a pretty innocuous research methods dispute, but given Wansink’s current situation, it’s hard not to wonder if the cookbook has a point. Given what we now know about Wansink, the idea that he was chasing headlines seems a pretty reasonable charge.

However, in the (rightful) rush to condemn Wansink, I do want to make sure we don’t get too crazy here. For example, the Joy of Cooking complains that Wansink only picked out 18 recipes to look at out of 275. In and of itself, that is NOT a problem. Sampling from a larger group is how almost all research is done. The problem only arises if those samples aren’t at least somewhat random, or if they’re otherwise cherry picked. If he really did use recipes with no serving sizes to prove that “serving sizes have increased” that’s pretty terrible.

Andrew Gelman makes a similar point about one of the claims in the Buzzfeed article. Lee (the author) stated that “Ideally, statisticians say, researchers should set out to prove a specific hypothesis before a study begins.” While Gelman praises Lee for following the story and says Wansink’s work is “….the most extraordinary collection of mishaps, of confusion, that has ever been gathered in the scientific literature – with the possible exception of when Richard Tol wrote alone.” he also gently cautions that we shouldn’t go to far. The problem, he says, is not that Wansink didn’t start out with a specific hypothesis or that he ran 400 comparisons, it’s that he didn’t include that part in the paper.

I completely agree with this, and it’s a point everyone should remember.

For example, when I wrote my thesis paper, I did a lot of exploratory data analysis. I had 24 variables, and I compared all of them to obesity rates and/or food insecurity status. I didn’t have a specific hypothesis about which one would be significant, I just ran all the comparisons. When I put the paper together though, I included every comparison in the Appendix, clarified the number I did, and then focused on discussing the ones whose p values were particularly low. My cutoff was .05, but I used the Bonferri correction method to figure out which ones to talk about. That method is pretty simple….if you do 20 comparisons and want an alpha of less than .05, you divide .05 by 20 = .0025. I still got significant results, and I had the bonus of giving everyone all the information. If anyone ever wanted to replicate any part of what I did, or compare a different study to mine, they could do so.

Gelman goes on to point out that in many cases there really isn’t one “right” way of doing stats, so the best we can do is be transparent. He sums up his stance like this: “Ideally, statisticians say, researchers should report all their comparisons of interest, as well as as much of their raw data as possible, rather than set out to prove a specific hypothesis before a study begins.”

This strikes me as a very good lesson. Working with uncertainty is hard and slow going, but we have to make due with what we have. Otherwise we’ll be throwing out every study that doesn’t live up to some sort of hyper-perfect ideal, which will make it very hard to do any science at all. Questioning is great, but believing nothing is not the answer. That’s a lesson we all could benefit from. Well, that and “don’t piss off a cookbook with a long memory.” That burn’s gonna leave a mark.

Biology/Definitions Throwback: 1946

February 25, 2018February 25, 2018 / bs king / 8 Comments

It’s maple syrup season up here in New England, which means I spent most of yesterday in my brother’s sugar shack watching sap boil and teaching my son things like how to carry logs, stack wood, and sample the syrup. Maple syrup making is a fun process, mostly because there’s enough to do to make it interesting, but not so much to do that you can’t be social while doing it.

During the course of the day, my brother mentioned that he had found an old stack of my grandfather’s textbooks, published in the 1940s:

Since he’s a biology teacher, he was particularly interested in that last one. When he realized there was a chapter on “heredity and eugenics” he of course had to start there. There were a few interesting highlights of this section. For example, like most people in 1946, the authors were pretty convinced that proteins were responsible for heredity. This wasn’t overly odd, since even the guy who discovered DNA thought proteins were the real workhorses of inheritance. Still, it was interesting to read such a wrong explanation for something we’ve been taught our whole lives.

Another interesting part was where they reminded readers that despite the focus on the father’s role in heredity, that there was scientific consensus that children also inherited traits from their mother. Thanks for the reassurance.

Then there was their descriptions of mental illness. It was interesting that some disorders (manic depressive, schizophrenia) were clearly being recognized and diagnosed in a way that was at least recognizable today, while others were not mentioned at all (autism). Then there were entire categories we’ve done away with, such as feeblemindedness, along with the “technical” definitions for terms like idiot and moron:

I have no idea how commonly those were used in real life, but it was an odd paragraph to read.

Of course this is the sort of thing that tends to make me reflective. What are we convinced of now that will look sloppy and crude in the year 2088?

Personalized Nutrition: Blood Sugar Testing Round 1

February 21, 2018February 20, 2018 / bs king / 7 Comments

Quite some time ago I did a blog post about some research out of Israel on personalized glucose responses. The study helped people test their own unique glucose response to various types of foods, then created personalized diet plans based on those foods. They found that some foods that were normally considered “healthy” caused extreme responses in some people, while some foods often deemed “unhealthy” were fine. I had mentioned that I wanted to try a similar experiment on myself, and yet I never did…..until now.

Recently the authors of the study came out with a book called “The Personalized Diet” which goes over their research and how someone could apply it to their life. While they recommend testing various meals, I decided to start testing specific carbohydrates sources from fruit, grains or starches to see what things looked like. I adhered to a few rules to keep things fair:

I tested all of these at midday, noon or 2pm. Glucose response can vary over the course of the day, so I decided to try midday on weekends. The tests below happened over a series of weeks, but all at the same time of day.
Each serving was 50g of carbohydrates. I’ve seen people do this with “effective carbs” (carbs-fiber) or portion size, but I decided to do this with just plain old carbs.
I ate all the food in 15 minutes. The times below represent “time from first bite” but I tried to make sure I had finished off the portion at around the 15 minute mark.
I tried not to do much for the first 60 minutes, just to make sure the readings wouldn’t be changed by exercise/movement.
I’ve only tried each food item once. Technically I should do everything twice to confirm, but I haven’t had time yet.

Now some results! I don’t have a ton of data yet, but I thought my haul so far was pretty interesting. Note: my fasting glucose is running a bit high, which is why I’m interested in this experiment to begin with. First, my fruit experiments:

Wow, cherries, what did I ever do to you? I was interested to see that so many fruits were identical, with one major outlier. I would never have thought something like pineapple would be so different from cherries. While I should probably repeat the measurement, it kind of supports the idea that some food responses are a bit unusual.

Next, my starches/grains. Note the shift of the peak from 30 minutes for fruit, to 60 minutes for most of these foods:

A couple of interesting things here:

I was surprised how much higher the variability was.
Sweet potatoes apparently are not my friend
Kind of surprising that white rice and brown rice were almost exactly the same
Beans and lentils were the lowest reaction of any food I tested

Overall I thought these results were very interesting, and I’ll have to consider how to use them going forward. My next step is going to be to try to expand this list, and then move on to some junk food/fast food/takeout meals to see if my response is significantly different to things like burgers and fries vs pizza. That should be interesting. I’ll probably base that on a serving (or at least the serving I eat) as opposed to carb count. If I get really crazy I may try some desserts or maybe some alcohol. You know, for science.

Let it never be said I’m not willing to suffer for my art.

graph paper diaries

because some of us need a few more lines to keep everything straight

5 Things About Precognition Studies

6 Year Blogiversary: Things I’ve Learned

Praiseworthy Wrongness: Genes in Space

YouTube Radicals and Recommendation Bias

What I’m Reading: March 2018

Delusions of Mediocrity

5 Things About IQ Errors in Intro Psych Textbooks

On Wansink, the Joy of Cooking, and Multiple Comparisons

Biology/Definitions Throwback: 1946

Personalized Nutrition: Blood Sugar Testing Round 1

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: