Reporting the High Water Mark

Another day, another weird practice to add to my GPD Lexicon.

About two weeks ago, a friend sent me that “People over 65 share more fake news on Facebook” study to ask me what I thought. As I was reviewing some of the articles about it, I noticed that they kept saying the sample size was 3,500 participants. As the reporting went on however, the articles clarified that not all of those 3,500 people were Facebook users, and that about half the sample opted out. Given that the whole premise of the study was that the researchers had looked at Facebook sharing behavior by asking people for access to their accounts, it seemed like that initial sample size wasn’t reflective of those used to obtain the main finding. I got curious how much this impacted the overall number, so I decided to go looking.

After doing some follow up with the actual paper, it appears that 2,771 of those people had Facebook to begin with,  1,331 people actually enrolled in the study, and 1,191 were able to link their Facebook account to the software the researchers needed. So basically the sample size the study was actually done on is about a third of the initially reported value.

While this wasn’t necessarily deceptive, it did strike me as a bit odd. The 3,500 number is one of the least relevant numbers in that whole list. It’s useful to know that there might have been some selection bias going on with the folks who opted out, but that’s hard to see if you don’t report the final number.  Other than serving as a selection bias check though (which the authors did do), 63% of the participants had no link sharing data collected on them, and thus are irrelevant to the conclusions reported.  I assumed at first that reporters were getting this number from the authors, but it doesn’t seem like that’s the case.  The number 3,500 isn’t in the abstract. The press release uses the 1,300 number. From what I can tell, the 3,500 number is only mentioned by itself in the first data and methods section, before the results and “Facebook profile data” section clarify how the interesting part of the study was done. That’s where they clarify that 65% of the potential sample wasn’t eligible or opted out.

This was not a limited way of reporting things though, as even the New York Times went with the 3,500 number. Weirdly enough, the Guardian used the number 1,775, which I can’t find anywhere. Anyway, here’s my new definition:

Reporting the high water mark: A newspaper report about a study that uses the sample size of potential subjects the researchers started with, as opposed the sample size for the study they subsequently report on.

I originally went looking for this sample size because I always get curious how many 65+ plus people were included in this study. Interestingly, I couldn’t actually find the raw number in the paper. This strikes me as important because if older people are online in smaller numbers thank younger ones, the overall number of fake stories might be larger among younger people.

I should note that I don’t actually think the study is wrong. When I went looking in the supplementary table, I noted that the authors mentioned that the most commonly shared type of fake news article was actually fake crime articles. At least in my social circle, I have almost always seen those shared by older people rather than younger ones.

Still, I would feel better if the relevant sample size were reported first, rather than the biggest number the researchers looked at throughout the study.

6 thoughts on “Reporting the High Water Mark

  1. Dang. Now I have to go and figure out why a sample that is a minuscule fraction of the half BILLION Facebook users represents ALL “older people”. Of course, the sample is NOT in any way representative of even older Facebook users because the ” connection” had to work. What are the requirements for the “connection”? How did that bias the results? What about older users in other countries? Are ALL Facebook users more likely to share fake news? I’m an ” older user” and am unlikely to share my kids’ REAL college grades.


    • Yeah, the findings almost certainly can’t be applied outside the sample, which I believe was US only. If from there they were randomly selected people though, you can at some point use them as representative. Formulas here:

      Survey-wise, the most likely problem is all those opt outs. It seems very likely that people who wouldn’t give a researcher access to their personal Facebook page might be more skeptical of everything than others.


  2. And what is “fake news”? I’m only a couple of paragraphs into the original article and can already see an anti-Trump bias. It sounds like their definition if fake news us the same thing as ” hate facts” – anything you disagree with. (Dang. I woke up from my Sunday afternoon nap cranky.)


    • The definition they used was “false or misleading content intentionally dressed up to look like news articles, often for the purpose of generating ad revenue.” They do look like they kept it pretty narrow, not going after debatable stories. They also clarified that many of these shares weren’t political in nature…crimes that didn’t happen for example. Fake politics stories get all the headlines because of the election, but the other kinds are still out there.


  3. Fake crimes. I’m not coming up with any examples of that in my memory. I do have a second cousin around 60 who shares suspiciously perfect stories about racist jerks saying terrible things to black doctors after their long shifts of working among the poor, or of people being threatened by Trump supporters, but not crimes, exactly. Give me an example I might recognise.I had an uncle, now deceased, who regularly shared stories from The Onion as if they were real.

    As I am about to turn 66, I have some concern about all this. 😉

    At first I thought that the findings were so robust – older people were 7 times more likely to share fake news – that this would swamp any selection bias. Then I saw that their list of fake news sites came from BuzzFeed, and rethought that. Who does a study about fake news, for which they have received grant money in order to provide knowledge, and says “Yeah, we’ll just use the list from BuzzFeed, that’ll be fine.” It rather puts the lie to their condescension about digital natives versus digital immigrants.


    • Fake crimes aren’t normally very notable…things like “Florida Man Arrested for Tranquilizing and Raping Alligators” or “Altar Boys Arrested for Putting Weed in incense-burner”, which apparently were big ones this year. They noted specifically that these are not political, so they probably barely register for most people. Speaks to being on Facebook a little too much I think.

      I’m more okay with the Buzzfeed list than you and BlueCat seem to be, but you did raise an interesting point. This study looked exclusively at fake news as it originates from websites dedicated to looking “newsy”. They did not include things like memes with false facts or viral stories/shares that originate from within Facebook. It’s not clear if those would follow the same pattern.


Comments are closed.