Weird Weather on Patriots’ Day

Well folks, tomorrow is Patriots’ Day/Marathon Monday here in Massachusetts, which means the kind of lousy weather we’re having is going to affect the Boston Marathon runners. That’s a pity, but I’m pleased that the weather was at least okay yesterday, as my son went to his first major league game with his dad and grandfather. Since he’s being raised in a mixed household (I’m a Red Sox fan as is his grandfather, his father is an Orioles fan), he went with an Orioles shirt/Red Sox hat outfit that apparently was quite a hit with the crowd. My husband was good natured about it, until he got stopped by the MASN camera crew who were wandering around trying to find a few Orioles fans in Fenway. He refused to risk being on an Orioles broadcast with a child in a Red Sox hat, so he pulled his spare Orioles hat out of his coat pocket and our kiddo got his TV debut. We haven’t been able to find the clip, but we’re still looking.

Anyway, with the weather going downhill today, my Dad and I started musing about the worst marathon weather we could remember. I mentioned 2012 when it got so hot that they proactively offered to defer entries, and my Dad mentioned that in 1976 it hit 96 degrees. This led me to a page on the Boston Athletic Association’s website about all the weird weather they’ve gone through over the years.

A few highlights:

  • 5 different years saw snow fall on the marathon
  • 1939 saw a partial eclipse
  • 3 years have seen driving rain
  • 1927 saw heat (84 degrees) and a newly paved road that melted under their feet
  • At least 4 marathons have been run in 90+ degree weather

I’m hoping that our weird weather gives hometown girl Shalene Flanagan an edge, as I’m cheering hard for her. The last time someone from Massachusetts won the Boston Marathon was Alberto Salazar in 1982, I think we’re due.

One interesting tidbit I never knew about Patriots’ Day: in Maine, it’s legally “Patriot’s Day”, which makes me incredibly happy. That is going to be my go to excuse if I screw it up at any point going forward.

 

Short Takes: Anti-Depressants, Neurogenesis, and #Marchforourlives

Three good articles, three different topics.

First up, the New York Times profiles people who are on anti-depressants long term and find they have trouble quitting. It’s an interesting article both because it impact a lot of people (7% of US adults have been on anti-depressants for 5+ years) and because it’s an interesting insight in to the limitations of our clinical trial/drug approval system. Basically, drugs get approved based off of a timeframe that can reasonably be done in a clinical trial: 6 to 9 months or so. In this case later studies went out as far as 2 years, but no further. This has caused issues when trying to get long term users back off. Some studies have reported 50-70% of longterm users reporting serious withdrawal symptoms, with many continuing on the medications just to avoid the withdrawal. I don’t really see a clear way around this….trials can’t go on forever…..but it is an unfortunate limitation of our current system.

Next up, Slate Star Codex does a somewhat unsettling review about adult neurogenesis.  He goes through dozens of highly cited papers talking about how useful/involved neurogenesis is in so many many things in our lives, just to follow it up with the new study that shows it probably doesn’t exist. Uuuuuuugh. Apparently a lot of the confusion started because it definitely exists in rats, and things kinda snowballed from there. It sounds like just another scientific squabble, but in the words of SSC “We know many scientific studies are false. But we usually find this out one-at-a-time. This – again, assuming the new study is true, which it might not be – is a massacre. It offers an unusually good chance for reflection.” Yikes.

Finally, some interesting stats about the March For Our Lives that took place recently, and who actually participated. Contrary to what I’d heard, this march actually had a higher average age (49) than many we’ve seen, and fewer than 10% of participants were under 18. Most interesting (to me) is that the first time protesters there were more likely to say they were motivated to march because of Trump (42%) than gun rights (12%).

Flashback: The Rise and Decline of the Datasexual

I mentioned a few days ago I was going to be taking a bit of a break and reposting things from my archives. Sifting through my old posts, I was intrigued to come across this one I did 6 years ago about the rise of the datasexual. My comments were based on this article called “Meet the Urban Datasexual”., which introduced the term as someone who is “preoccupied with their personal data” and “relentlessly digital, they obsessively record everything about their personal lives, and they think that data is sexy. In fact, the bigger the data, the sexier it becomes. Their lives – from a data perspective, at least – are perfectly groomed.”

With all the recent Facebook/data/etc concerns, I was curious if this term was still a thing. A quick Google suggests it is not. It made it as far as a mention in a TED talk in 2013, but the trail mostly goes cold after that. Google trends confirms that the Big Think article was the height of this term.

It’s interesting to note that this term was introduced at a time when interest in the quantified self movement was gaining steam, with interest in that term peaking about a year later. Since then, things appear to have died down a bit, which is odd considering there’s more ways than ever to track your data.

Part of me wonders if that’s why the interest waned. When you’re logging your own heart rate on a regular basis, you need a community to give you tips about where/when/how to log. Now that my Fitbit logs my heart rate and sleep and all my data can be accessed any time I want, is there really a reason to join a group to get tips on this?  With data and charts more easily available for everything, it seems like we all got a little more data in our lives.

Additionally, it appears the biggest concern now for most of us is not how to get our data, but how to keep it private. With some of the recent data privacy scandals, not as many people are as excited to broadcast their data obsession to everyone else.

Finally, it seems we’ve mostly stopped appending -sexual to things to describe things other than sexuality? I’m not really up on slang, but it seems like after metrosexual, that suffix kinda faded. Someone with teenagers let me know if that’s true.

Anyway, I wasn’t too sad to see that term go, but it is interesting to see what a difference a few years can make in how we view a topic. RIP datasexual.

Short Little Viral Vectors

Posting will probably be light in April. I’m dealing with some (hopefully easily resolvable) health issues, including some very low white blood cell counts that seem to be making me susceptible to every little thing that goes around. I felt like I was spending half my time sick, so it was fortuitous that I ran across this study that confirmed my fears: Community Surveillance of Respiratory Viruses Among Families in the Utah Better Identification of Germs-Longitudinal Viral Epidemiology (BIG-LoVE) Study.

In this study, they actually got 26 households (105 people total) to volunteer to get nasal swabs done once a week for a full year. They tested these swabs to see how often a viral infection was present, regardless of symptoms. The results were something every parent would intuitively guess…..households with kids had far more weeks with viruses present than those without:

I was interested to see that it’s the second kid that really ups things, and then the third and fourth don’t really add much viral load. 6 kids appears to just be madness.

Anyway, it’s a small sample size, but I am guessing this result would hold up pretty well.

Back to me….I may take a page out of the AVIs handbook and find a few old posts to bump, but other than that things may be light for a bit. Stay well everyone!

Easter and April Fools Day

Happy Easter to all of you out there who celebrate it and base your observance date on the Gregorian calendar! Happy April Fools Day to any of you out there who happen to enjoy that kind of thing!

I went on a Googling spree this morning because I couldn’t remember if these two dates had ever coincided before (or at least in my lifetime) and now I’ve learned all sorts of interesting facts about how often this happens. Even after 13 years of Baptist school I never quite got the hang of figuring out when Easter actually was going to be each year, so I realized I had no concept of how often it coincided with April Fools Day. Turns out it’s about 3-4 times/century, and the last time this happened was 1956.

I was curious if the 62 year gap was the longest gap that had taken place, but it turns out it’s not. That prize goes to the 68 year gap between 1736-1804. The shortest gap is 11 years, and it happens pretty frequently. For example, that’s the gap we have between this year and the next time the two days will coincide in 2029. The next one after that will be in 2040, and then not again until 2108.

Interestingly, for churches that adhere to the Julian calendar for scheduling Easter, the earliest Easter can be at this point is April 5th, so this won’t come up for them at all. If you’d like an overview of when Easter falls and why different churches put it on different days, try this link.

If you’d like to see one of the more amusing April Fools Day pranks done by a math teacher, watch this video:

Enjoy the day!

3 Control Groups I’m Pondering

One of the more important (but often overlooked) parts of research is the choice of control group, i.e. what we are comparing the group of interest to. While this seems like a small thing, it can actually have some big implications for interpreting research.  I’ve seen a few interesting examples recently, so I figured I’d do a quick list here:

First up, a new-to-me article about personality assessments in a traditional hunter-gatherer tribe. I’ve mentioned the problem of psychological research focusing too much on WEIRD (Westernized, Educated, Industrialized, Rich and Democratic) countries before, and this study sought to correct that error. Basically, they used the “Big 5” personality testing model and then tried to assess members of a traditional South American tribe according to this “universal” personality measurement. It failed. While it seemed like extraversion and conscientiousness could actually translate somewhat, agreeableness and openness were mixed, and neuroticism didn’t translate all that well. They ended up with a “Big Two”, which were basically an agreeableness/extraversion mix (pro-sociality) and something like conscientiousness (industriousness). They talk a lot about the challenges (translation issues, non-literate populations, etc), but the point is that what we call “universal” relies on a very narrow set of circumstances. Western college kids don’t make a good baseline.

Second, a new dietary study shows that nutritional education can be an effective treatment for depression.  It’s a good study, and I was interested to see the control group was given increased social support/time with a trained listener/companion type person. At 12 weeks, almost a third of the diet group were no longer depressed, whereas only 8% of the control group were feeling better. Interesting to note though: this was advertised as a dietary study, so those who didn’t get the diet intervention knew they were the control group. There was a higher dropout rate in the control group (25% vs 6%), and interestingly it was the most educated people who dropped out. Gotta admit, part of me wonders if it was the introverts driving this result. Just wondering how many people really enjoyed the whole “hang out with a stranger who’s not a therapist” thing. I would be interested to see how this works when paired with some sort of “hour of general relaxation” type thing.

Finally, after putting up my pre-cognition post on Sunday, I realized there was a Slate Star Codex post a few years back about the Bem paper that I wanted to reread. It was called “The Control Group is out of Control” and took the stance that parapsychology was actually a great control group for all of science. Given that you have a whole group of people attempting to follow the scientific method to prove something that most people believe doesn’t exist, they end up serving as a sort of “placebo science”, or an indicator of what science looks like when it’s chasing after nothing.

He has some really interesting anecdotes here about the amount of evidence we have that researchers are influencing their own results in ways that seem nearly impossible to control for. For example, he talks about a case in which rival researchers who supported different hypotheses and had gotten different results teamed up to use the same protocol and watched each other execute the experiments to see if they could figure out where the other one was going wrong. They still both ended up proving their preferred hypothesis, and in the discussion section brought up the (mutual) possibility that one or the other of them had hacked the computer records. That’s an odd thing to ponder, but it’s even odder when you wonder what this means for every other study ever done.

 

5 Things About Precognition Studies

Several months ago now, I was having dinner with a friend who told me he was working on some science fiction based on some interesting precognition studies he had heard about. As he started explaining them to me and how they was real scientific proof of ESP, he realized who he was talking to and quickly got sheepish and told me to “be gentle” when I ended up doing a post about it. Not wanting to kill his creative momentum, I figured I’d delay this post for a bit. I stumbled on the draft this morning and realized it’s probably been long enough now, so let’s talk about the paranormal!

First, I should set the stage and say that my friend was not actually wrong to claim that precognition has some real studies behind it. Some decent research time and effort has been put in to experiments where researchers attempt to show that people react to things that haven’t happened yet. In fact the history of this work is a really interesting study in scientific controversy and it tracks quite nicely with much of the replication crisis I’ve talked about. This makes it a really interesting topic for anyone wanting to know a bit more about the pluses/minuses of current research methods.

As we dig in to this, it helps to know a bit of background: Almost all of the discussions about this are referencing a paper by Daryl Bem from 2011, where 9 different studies were run on the phenomena. Bem is a respected psychological researcher, so the paper made quite a splash at the time. So what did these studies say and what should we get out of them, and why did they have such a huge impact on psychological research? Let’s find out!

  1. The effect sizes were pretty small, but they were statistically significant Okay, so first things first….let’s establish what kind of effect size we’re talking about here. For all 9 experiments the Cohen’s d was about .22. In general, a d of .2 is considered a “small” effect size, .5 would be moderate, .8 would be large. In the real world, this translated in to participants picking the “right” option 53% of the time instead of the 50% you’d expect by chance.
  2. The research was set up to be replicated One of the more interesting parts of Bem’s research was that he made his protocols publicly available for people trying to replicate his work, and he did this before he actually published the initial 2011 paper. Bem particularly pointed people to experiments #8 and #9, which showed the largest effect sizes and he thought would be the easiest to replicate. In these studies, he had people try to recall words off of a word list, writing down those they could remember. He then gave them a subset of those words to study more in depth, again writing down what they could remember. When they looked back, they found that subjects had recalled more of their subset words than control words on the first test. Since the subjects hadn’t seen their subset words at the time they took the first test, this was taken as evidence of precognition.
  3. Replication efforts have been….interesting. Of course with interesting findings like these, plenty of people rushed to try to replicate Bem’s work. Many of these attempts failed, but Bem published a meta-analysis stating that on the whole they worked. Interestingly however, the meta-analysis actually analyzed replications that pre-dated the publication of Bem’s work. Since Bem had released his software early, he was able to find papers all the way back to 2001. It has been noted that if you remove all the citations that pre-dated the publication of his paper, you don’t see an effect. So basically the pre-cognition paper was pre-replicated. Very meta.
  4. They are an excellent illustration of the garden of forking paths. Most of the criticism of the paper comes down to something Andrew Gelman calls “The Garden of Forking Paths“. This is a phenomena in which researchers make a series of tiny decisions as their experiments and analyses progress, which may add up to serious deviation from the original results. In the Bem study for example, it has been noted that some of his experiments actually used two different protocols, then combined the results. It was also noted that the effect sizes got smaller as more subjects were added, suggesting that the number of subjects tested may have fluctuated based on results. There are also decisions so small you mostly wouldn’t notice. For example, in the word recall study mentioned above, word recall was measured by comparing word lists for exact matches. This meant that if you spelled “retrieve” as “retreive”, it didn’t automatically give you credit. They had someone go through and correct for this manually, but that person actually knew which words were part of the second experiment and which were the control words. Did the reviewer inadvertently focus on or give more credit to words that were part of the “key word” list? Who knows, but small decisions like this can add up. There were also different statsticall analyses performed on different experiments, and Bem himself admits that if he started a study and got no results, he’d tweak it a little and try again. When you’re talking about an effect size of .22, even tiny changes can add up.
  5. The ramifications for all of psychological science were big It’s tempting to write this whole study off, or to accept it wholesale, but the truth is a little more complicated. In a thorough write-up over at Slate, Daniel Engber points out that this research used typical methods and invited replication attempts and still got a result many people don’t believe is possible. If you don’t believe the results are possible, then you really should question how often these methods are used in other research. As one of the reviewers put it “Clearly by the normal rules that we [used] in evaluating research, we would accept this paper. The level of proof here was ordinary. I mean that positively as well as negatively. I mean it was exactly the kind of conventional psychology analysis that [one often sees], with the same failings and concerns that most research has”. Even within the initial paper, the word “replication” was used 23 times. Gelman rebuts that all the problems with the paper are known statistical issues and that good science can still be done, but it’s clear this paper pushed many people to take good research methods a bit more seriously.

So there you have it. Interestingly, Bem actually works out of Cornell and has been cited in the whole Brian Wansink kerfluffle, a comparison he rejects. I think that’s fair. Bem has been more transparent about what he’s doing, and did invite replication attempts. In fact his calls for people to look at his work were so aggressive, there’s a running theory that he published the whole thing to make a point about the shoddiness of most research methods. He’s denied this, but that certainly was the effect. An interesting study on multiple levels.

6 Year Blogiversary: Things I’ve Learned

Six years ago today I began blogging (well, at the old site) with a rather ambitious mission statement. While I don’t have quite as much hubris now as I did then, I was happy to see that I actually stand by most of what I said when I kicked this whole thing off. Six years, 647 posts,  a few hiatuses and one applied stats degree later, I think 2012 BS King would be pretty happy with how things turned out.

I actually went looking for my blogiversary date because of a recent discussion I had about the 10,000 hour rule myth. The person I was talking to had mentioned that after all these years of blogging my writing must have improved dramatically, and I mentioned that the difference was probably not as big as you might think. While I do occasionally get feedback on grammar or confusing sentences, no one sits down with bloggers and tells them “hey you really should have combined those two sentences” or “paragraph three was totally unnecessary”. In the context of the 10,000 hour rule, this means I’m lacking the “focused practice” that would truly make me a better writer. To truly improve you need both quality AND quantity in your practice.

The discussion got me wondering a bit…what skills does blogging help you hone? If the ROI for writing is minimal, what does it help me with?  I mean, there’s a lot of stuff I love about it: the exchange of ideas, meeting interesting people, getting to talk about the geeky topics I want to talk about, thinking more about how I explain statistics and having people send me interesting stuff. But does any of that result in the kind of focused practice and feedback that improves a skill?

As I mulled it over, I realized there are two main areas I’ve improved in, one smaller, one bigger. The first is simply finding more colorful examples for statistical concepts. Talking to high school students helps with this, as those kids are unapologetic about falling asleep on you if you bore them. Blogging and thinking about this stuff all the time means I end up permanently on the lookout for new examples, and since I tend to blog about the best ones, I can always find them again.

The second thing I’ve improved on is a little more subtle. Right after I put this blog up, I established some ground rules for myself. While I’ve failed miserably at some of these (apostrophes are still my nemesis), I have really tried to stick to discussing data over politics. This is tricky because most of the data people are interested in is political in nature, so I can’t avoid blogging about it. Attempting to figure out how to explain a data issue routed in a political controversy with a reader base that contains highly opinionated conservatives, liberals and a smattering of libertarians has taught me a LOT about what words are charged and which aren’t. This has actually transferred over to my day job, where I occasionally get looped in to situations just so I can “do that thing where you recap what everyone’s saying without getting anyone mad”.

I even notice this when I’m reading other things now, how often people attempt to subtly bias their words in one direction or another while claiming to be “neutral”. While I would never say I am perfect at this, I believe the feedback I’ve gotten over the years has definitely improved my ability to present an issue neutrally, which I hope leads to a a better discussion about where data goes wrong. Nothing has made me happier over the years than hearing people who I know feel strongly about an issue agree to stop using certain numbers and to use better ones instead.

So six years in, I suppose I just want to say thank you to everyone who’s read here over the years, given me feedback, kept me honest, and put up with my terrible use of punctuation and run on sentences. You’ve all made me laugh, and made me think, and I appreciate you taking the time to stop on by. Here’s to another year!

Praiseworthy Wrongness: Genes in Space

Given my ongoing dedication to critiquing bad headlines/stories, I’ve decided to start making a regular-ish feature of people who get things wrong then work to make them right. Since none of us can ever be 100% perfect, I think a big part of cutting down on errors and fake news is going to be lauding those who are willing to walk back on what they say if they discover they made an error. I started this last month with an example of someone who realized she had asserted she was seeing gender bias in her emails when she wasn’t. Even though no one had access to the data but her, she came clean that her kneejerk reaction had been wrong, and posted a full analysis of what happened. I think that’s awesome.

Two days ago, I saw a similar issue arise with Live Science, who had published a story stating that after one year in space astronaut Scott Kelly had experienced significant changes (around 7%) to his genetic code. The finding was notable since Kelly is one half of an identical twin, so it seemed there was a solid control group.

The problem? The story got two really key words wrong, and it changed the meaning of the findings. The original article reported that 7% of Kelly’s genetic code had changed, but the 7% number actually referred to gene expression. The 7% was also a subset of changes….basically out of all the genes that changed their expression in response to space flight, 7% of those changes persisted after he came back to earth. This is still an extremely interesting finding, but nowhere near as dramatic as finding out that twins were no longer twins after space flight, or that Kelly wasn’t really human any more.

While the error was regrettable, I really appreciated what Live Science did next. Not only did they update the original story (with notice that they had done so), they also published a follow up under the headline “We Were Totally Wrong About that Scott Kelly Space Genes Story” explaining further how they erred. They also Tweeted out the retraction with this request:

This was a nice way of addressing a chronic problem in internet writing: controversial headlines tend to travel faster than their retractions. By specifically noting this problem, Live Science reminds us all that they can only do so much in the correction process. Fundamentally, people have to share the correction at the same rate they shared the original story for it to make a difference. While ultimately the original error was their fault, it will take more than just Live Science to spread the correct information.

In the new age of social media, I think it’s good for us all to take a look at how we can fix things. Praising and sharing retractions is a tiny step, but I think it’s an important one. Good on Live Science for doing what they could, then encouraging social media users to take the next step.

YouTube Radicals and Recommendation Bias

The Assistant Village Idiot passed along an interesting article about concerns being raised over YouTube’s tendency to “radicalize” suggestions in order to keep people on the site. I’ve talked before about the hidden dangers and biases algorithms can have over our lives, and this was an interesting example.

Essentially, it appears that YouTube has a tendency to suggest more inflammatory or radical content in response to both regular searches and in response to watching more “mainstream” viewing. So for example, if you search for the phrase “the Pope” as I just did in incognito mode on Chrome, it gives me these as the top 2 hits:

Neither of those videos are even the most watched Pope videos….scrolling down a bit shows some funny moments with the Pope (little boy steals the show) with 2.1 million hits and a Jimmy Kimmel bit on him with 4 million views.

According to the article, watching more mainstream news stories will quickly get you to more biased or inflammatory content. It appears that in it’s quest to make an algorithm that will keep users on the site, YouTube has created the digital equivalent of junk food…..content that is tempting but without a lot of substance.

It makes a certain amount of sense if you think about it. Users may not have time to really play around much on YouTube, unless the next thing they see is slightly more tempting than what they were originally looking for. Very few people would watch three videos in a row of Obama State of the Union Address coverage, but you might watch Obama’s State of the Union address followed by Obama’s last White House Correspondents Dinner talk followed by “Obama’s best comebacks” (the videos I got suggested to me when I looked for “Obama state of the Union”.

Even with benign things I’ve noticed this tendency. For example, my favorite go to YouTube channel after a long day is the Epic Rap Battles of History channel. After I’ve watched two or three videos, I started noticing it would point me towards videos from the creators lesser-watched personal channels. I actually had thought this was some sort of setting the creators set, but now I’m wondering if it’s the same algorithm. Maybe people doing random clicking gravitate towards lesser watched content as they keep watching. Who knows.

What makes this trend a little concerning is that so many young people use YouTube to learn about different things. My science teacher brother had mentioned seeing an uptick in kids spouting conspiracy theories in his classes, and I’m wondering if this is part of the reason. Back in my day, kids had to actually go looking for their offbeat conspiracy theories, now YouTube brings this right to them. In fact a science teacher who asks their kids to look for information on a benign topic may find that they’ve now inadvertently put them in the path of conspiracy theories that came up as video recommendations after the real science. It seems like this algorithm may have inadvertently stumbled on how to prime people for conversion to radical thought, just through collecting data.

According the the Wall Street Journal, YouTube is looking to tackle this problem, but it’s not clear how they’re  going to do that without running in to the same problems Facebook did when it started to crack down on fake news. It will be interesting to watch this develop, and it’s a good bias to keep in mind.

In the meantime, here’s my current favorite Epic Rap Battle: