Another day, another weird practice to add to my GPD Lexicon.
About two weeks ago, a friend sent me that “People over 65 share more fake news on Facebook” study to ask me what I thought. As I was reviewing some of the articles about it, I noticed that they kept saying the sample size was 3,500 participants. As the reporting went on however, the articles clarified that not all of those 3,500 people were Facebook users, and that about half the sample opted out. Given that the whole premise of the study was that the researchers had looked at Facebook sharing behavior by asking people for access to their accounts, it seemed like that initial sample size wasn’t reflective of those used to obtain the main finding. I got curious how much this impacted the overall number, so I decided to go looking.
After doing some follow up with the actual paper, it appears that 2,771 of those people had Facebook to begin with, 1,331 people actually enrolled in the study, and 1,191 were able to link their Facebook account to the software the researchers needed. So basically the sample size the study was actually done on is about a third of the initially reported value.
While this wasn’t necessarily deceptive, it did strike me as a bit odd. The 3,500 number is one of the least relevant numbers in that whole list. It’s useful to know that there might have been some selection bias going on with the folks who opted out, but that’s hard to see if you don’t report the final number. Other than serving as a selection bias check though (which the authors did do), 63% of the participants had no link sharing data collected on them, and thus are irrelevant to the conclusions reported. I assumed at first that reporters were getting this number from the authors, but it doesn’t seem like that’s the case. The number 3,500 isn’t in the abstract. The press release uses the 1,300 number. From what I can tell, the 3,500 number is only mentioned by itself in the first data and methods section, before the results and “Facebook profile data” section clarify how the interesting part of the study was done. That’s where they clarify that 65% of the potential sample wasn’t eligible or opted out.
This was not a limited way of reporting things though, as even the New York Times went with the 3,500 number. Weirdly enough, the Guardian used the number 1,775, which I can’t find anywhere. Anyway, here’s my new definition:
Reporting the high water mark: A newspaper report about a study that uses the sample size of potential subjects the researchers started with, as opposed the sample size for the study they subsequently report on.
I originally went looking for this sample size because I always get curious how many 65+ plus people were included in this study. Interestingly, I couldn’t actually find the raw number in the paper. This strikes me as important because if older people are online in smaller numbers thank younger ones, the overall number of fake stories might be larger among younger people.
I should note that I don’t actually think the study is wrong. When I went looking in the supplementary table, I noted that the authors mentioned that the most commonly shared type of fake news article was actually fake crime articles. At least in my social circle, I have almost always seen those shared by older people rather than younger ones.
Still, I would feel better if the relevant sample size were reported first, rather than the biggest number the researchers looked at throughout the study.












