Reporting the High Water Mark

January 20, 2019January 20, 2019 / bs king / 6 Comments

Another day, another weird practice to add to my GPD Lexicon.

About two weeks ago, a friend sent me that “People over 65 share more fake news on Facebook” study to ask me what I thought. As I was reviewing some of the articles about it, I noticed that they kept saying the sample size was 3,500 participants. As the reporting went on however, the articles clarified that not all of those 3,500 people were Facebook users, and that about half the sample opted out. Given that the whole premise of the study was that the researchers had looked at Facebook sharing behavior by asking people for access to their accounts, it seemed like that initial sample size wasn’t reflective of those used to obtain the main finding. I got curious how much this impacted the overall number, so I decided to go looking.

After doing some follow up with the actual paper, it appears that 2,771 of those people had Facebook to begin with, 1,331 people actually enrolled in the study, and 1,191 were able to link their Facebook account to the software the researchers needed. So basically the sample size the study was actually done on is about a third of the initially reported value.

While this wasn’t necessarily deceptive, it did strike me as a bit odd. The 3,500 number is one of the least relevant numbers in that whole list. It’s useful to know that there might have been some selection bias going on with the folks who opted out, but that’s hard to see if you don’t report the final number. Other than serving as a selection bias check though (which the authors did do), 63% of the participants had no link sharing data collected on them, and thus are irrelevant to the conclusions reported. I assumed at first that reporters were getting this number from the authors, but it doesn’t seem like that’s the case. The number 3,500 isn’t in the abstract. The press release uses the 1,300 number. From what I can tell, the 3,500 number is only mentioned by itself in the first data and methods section, before the results and “Facebook profile data” section clarify how the interesting part of the study was done. That’s where they clarify that 65% of the potential sample wasn’t eligible or opted out.

This was not a limited way of reporting things though, as even the New York Times went with the 3,500 number. Weirdly enough, the Guardian used the number 1,775, which I can’t find anywhere. Anyway, here’s my new definition:

Reporting the high water mark: A newspaper report about a study that uses the sample size of potential subjects the researchers started with, as opposed the sample size for the study they subsequently report on.

I originally went looking for this sample size because I always get curious how many 65+ plus people were included in this study. Interestingly, I couldn’t actually find the raw number in the paper. This strikes me as important because if older people are online in smaller numbers thank younger ones, the overall number of fake stories might be larger among younger people.

I should note that I don’t actually think the study is wrong. When I went looking in the supplementary table, I noted that the authors mentioned that the most commonly shared type of fake news article was actually fake crime articles. At least in my social circle, I have almost always seen those shared by older people rather than younger ones.

Still, I would feel better if the relevant sample size were reported first, rather than the biggest number the researchers looked at throughout the study.

What I’m Reading: January 2019

January 13, 2019January 13, 2019 / bs king / 1 Comment

Well my best read of the month was the draft manuscript of my brother’s upcoming book Addiction Nation: What the Opioid Crisis Reveals About Us . This book came out of an article he wrote about his own opioid addiction, which I blogged about here. I’m super proud of him for this book, so expect more mentions of this as the publication date draws closer. He’s asked me if I’d do some blogging with him about some of the research around this topic, so if anyone had anything in particular they’d be interested on that topic please let me know.

A recent bout of Wikipedia reading led me to this really interesting visual about the Supreme Court. My dad had mentioned recently the idea that there used to be a “Catholic seat” on the Supreme Court, before the more recent trend of Catholics dominating SCOTUS. Turns out there’s a visual that shows how right he was:

So basically for almost 60 years there was only one Catholic justice, and they seemed to be nominated to replace each other. Then in the late 80s that all changed, and by the late 2000s Catholics would dominate the court. As it stands today the breakdown is 5 Catholics, 3 Jews, and 1 Episcopalian.

I stumbled across an interesting paper a few weeks ago called “Metacognitive Failure as a Feature of Those Holding Radical Beliefs” that found that people who held “radical” beliefs were more likely to be overconfident/less aware of their errors in neutral areas as well. I haven’t read through the full study, but the idea that radical beliefs are due to generalized overconfidence as opposed to attachment to a specific idea is intriguing.

As someone who was raised with a good dose of 90s era environmentalism, I thought this Slate Star Codex post about “What Happened to 90s Environmentalism?” was fascinating. Turns out some of the stuff we were warned about was solved, some was overhyped and some….just stopped being talked about.

On a totally different note, I’ve decided to do a cookbook challenge this year, and am cooking my way through the book 12 Months of Monastery Soups. I sort of started blogging about it, but I’m not sure if I like that format or not. If I end up ditching that, then I’m still going to post pictures on my heretofore neglected Instagram account.

Updates on Mortality Rates and the Impact of Drug Deaths

January 6, 2019January 5, 2019 / bs king

A couple of years ago now, there was a lot of hubbub around a paper about mortality rates among white Americans. This paper purported to show that mortality for middle aged white people in the US were not decreasing (like other countries/races/ethnicities) were, but was actually increasing.

Andrew Gelman and others challenged this idea, and noted that some of the increase in mortality was actually a cohort effect. In other words, mortality was up, but so was the average age of a “45-54 year old”. After adjusting for this, their work suggested that actually it was white middle aged women in the south who were seeing an increase in mortality:

In this article for Slate, they published the state by state data to make this even clearer:

In other words, there are trends happening, but they’re complicated and not easy to generalize.

One of the big questions that came up when this work was originally discussed was how much “despair deaths” like the opioid overdoses or suicide rates were driving this change.

In 2017, a paper was published that showed that this was likely only partially true. Suicide and alcohol related deaths had remained relatively stable for white people, but drug deaths had risen:

Now, there appears to be a new paper coming out that shows there may be elevated mortality in even earlier age groups. It appears only the abstract is up at the moment, but the initial reporting shows that there may be some increase in Gen X (current 38-45 year olds) and some Gen Y (27-37 year olds). They have reportedly found elevated mortality patterns among white men and women in that age group, being partially driven by drug overdoses and alcohol poisonings.

From the abstract, the generations with elevated mortality were:

- Non-Hispanic Blacks and Hispanics: Baby Boomers
- Non-Hispanic White females: late-Gen Xers and early-Gen Yers
- Non-Hispanic White males: Baby Boomers, late-Gen Xers, and early-Gen Yers.

Partial drivers for each group:

Baby Boomers: drug poisoning, suicide, external causes, chronic obstructive pulmonary disease and HIV/AIDS for all race and gender groups affected.
Late-Gen Xers and early-Gen Yers: are at least partially driven by mortality related to drug poisonings and alcohol-related diseases for non-Hispanic Whites.

And finally, one nerve-wracking sentence:

Differential patterns of drug poisoning-related mortality play an important role in the racial/ethnic disparities in these mortality patterns.

It remains to be seen if this paper will have some of the cohort effect problems that have plagued other analyses, but the drug poisoning death issue seems to be a common feature. It remains to be seen what the long term outcomes of this will be, but here’s an interesting visualization from Senator Mike Lee’s website:

Not a pretty picture.

GPD Most Popular Posts of 2018

December 30, 2018December 30, 2018 / bs king

Well here we are, the end of 2018. I always get a kick out of seeing what my most popular posts are each year, as they are never what I expect them to be, so I always enjoy my year end roundup.

My most popular posts this year continue to be ones I think people Google for school assignments (6 Examples of Correlation/Causation Confusion and 5 Examples of Bimodal Distributions), and the Immigration, Poverty and Gumballs post. Every time immigration debates hit the news, Numbers USA reposts the video that sparked that post and my traffic goes with it. At this point I find that utterly bizarre as the video is 8 years old and the numbers he quotes over a decade old, but people still contact me wanting to fight about them. Sigh.

Okay, now on to 2018! Here were my most popular posts written in 2018:

5 Things About that “Republicans are More Attractive than Democrats” Study I was surprised to see this post topped my list, until I realized it’s actually just been an extremely steady performer, garnering equal number of hits every month since it went live. Apparently partisan attractiveness is a timeless concern.
Vitamin D Deficiency: A Supplement Story Yet another one that surprised me. Moderately popular when I first posted it, this has become increasingly popular as we enter winter months.
What I’m Reading: September 2018 My link fests don’t often make my most read list, but my discussion of the tipping point paper (i.e. how many people have to advocate for a thing before the minority can convince the majority) got this one some traffic from the AVI at Chicago Boyz.
5 Things About the GLAAD Accelerating Acceptance Report This was the post I personally shared the most this year, with and odd number of people I knew actually bringing it up to me. Yay self-promotion!
GPD Lexicon I was pleased to see my new GPD Lexicon page made the list, and I really enjoy putting these together. I think my 2019 resolution is to try to figure out how to put some of these in to an ebook. Writing them makes me happy.
Tick Season 2018 I think this post got put up on an exterminator website as a good example of why you needed their spraying services. Glad I could help?
Death Comes for the Appliance This actually may have been my favorite post of the year. I learned so much weird trivia and now have way too many talking points when people bring up appliance lifespans. Thanks to all who participated in the comments.
5 Things About Fertility Rates I still think about this post every time someone mentions fertility rates. The trends are fascinating to me.
Tidal Statistics I was very gratified to find people linking to this one when encountering their own statistics problems. Easier to fight the problem when you can actually name it.
The Mandela Effect: Group False Memory This entire concept still makes me laugh and think, all at the same time.

Well that’s a wrap folks, see you in the new year!

GPD Lexicon: Proxy Preference

December 23, 2018December 23, 2018 / bs king / 6 Comments

It’s been a little while since I added anything to the GPD Lexicon, but I got inspired this week by a Washington Post article on American living situations. It covered a Gallup Poll that asked people an interesting question:

“If you could live anywhere you wished, where would you prefer to live — in a
big city, small city, suburb of a big city, suburb of a small city, town, or rural area?”

The results were then compared to where people actually live, to give the following graph:

Now when I saw this I had a few thoughts:

I wonder if everyone actually knew what the definition of each of those was when they answered.
I wonder if this is really what people want.

The first was driven by my confusion over whether the town I grew up in would be considered a town or a “small city suburb”.

The second thought was driven by my deep suspicion that almost 30% of the US actually wanted to live in rural areas, whereas only half that number actually live in one. While I have no doubt that many people actually do want to live in rural areas, it seems like for at least some people that might be a bit of a proxy for something else. For example, one of the most common reason for moving away from rural areas is to find work elsewhere. Did saying you wanted to live in a rural area represent (for some people) a desire to not have to work or to be able to work less? A desire to not have economic factors influence where you live?

To test this theory, I decided to ask a few early to mid 20s folks at my office where they would live if they could live anywhere. All of them currently live in the city, but all gave different answers. This matched the Gallup poll findings, where 29% of 18-29 year olds were living in cities, but only 17% said they wanted to. As they put it:

“One of the most interesting contrasts emerges in reference to big-city suburbs. The desire to live in such an area is much higher among 18- to 29-year-olds than the percentage of this age group who actually live there. (As reviewed above, 18- to 29-year-olds are considerably more likely to want to live in a big-city suburb than in a big city per se.)”

Given this, it seems like if I asked any of my young coworkers if they wanted to rent a room from me in my large city suburb home, they’d say yes. And yet I doubt they actually would. When they were answering, almost none of them were talking about their life as it currently stands, but more what they hope their life could be. They wanted to get married, have kids, live somewhere smaller or in the suburbs. Their vision of living in the suburbs isn’t just the suburbs, it’s owning their own home, maybe having a partner, a good job, and/or kids. They don’t want a room in my house. They want their own house, and a life that meets some version of what they call success.

I think this is a version of what economists call a “revealed preference“, where you can tell what people really want by what they actually buy. In this version though, people are using their answers to one question to express other desires that are not part of the question. In other words this:

Proxy Preference: A preference or answer given on a survey that reflects a larger set of wants or needs not reflected in the question.

An example: Some time ago, I saw a person claiming that women should never plan to return to the workforce after having kids, because all women really wanted to work part time. To prove this, she had pointed to a survey question that asked women “if money were not a concern, what would your ideal work set up be?”. Unsurprisingly, many women said they’d want to work part time. I can’t find it now, but that question always seemed unfair to me. Of course lots of people would drop their hours if they had no money concerns! While many of us are lucky enough to like a lot of what we do, most of us are ultimately working for money.

A second example: I once had a pastor mention in a sermon that as a high schooler he and his classmates had been asked if they would rather be rich, famous, very beautiful or happy. According to his story, he was one of the only people who picked “happy”. When he asked his classmates why they’d picked the other things, they all replied that if they had those things they would be happy. It wasn’t that they didn’t want happiness, it was that they believed that wealth, fame and beauty actually led directly to happiness.

Again, I don’t think everyone who says they want to live in a rural area only means they want financial security or a slower pace of life, but I suspect they might. It would be interesting to narrow the question a bit to see what kind of answers you’d get. Phrasing it “if money were no object, where would you prefer to live today?” might reveal some interesting answers. Maybe ask a follow up question about “where would you want to live in 5 or 10 years?”, which might reveal how much of the answer had something to do with life goals.

In the meantime though, it’s good to remember that when a large number of people say they’d prefer to do something other than what they are actually doing, thinking about the reasons for the discrepancy can be revealing.

Who Believes What: How We Categorize Religious Beliefs

December 16, 2018December 16, 2018 / bs king / 1 Comment

One of the more common themes in political reporting is to track how different religious groups vote on certain issues. Ever since Trump got elected there has been a lot of reporting on Evangelical support for Trump (an issue I’ve posted about previously), and other topics are framed through the religious lens as well.

This discussion came up again from a different angle this week when Ross Douthat published a column praising WASPs, and people (including Ross himself) ended up discussing who WASPs were today. This led to a few exchanges that showed people debating whether it was fair to call today’s Evangelical Christians “Protestant”.

It’s an interesting question, and one that can be hard to parse if you’re not part of either group. Having been raised in Evangelical churches and schools, I would say that most Evangelicals consider themselves a subset of Protestant-ism as opposed to a separate group, but would agree that there are noticeable differences between the groups. The polling companies generally address this issue by calling groups “Evangelical” vs “Mainline Protestant”. PBS did a quick summary of what the differences in belief are here.

From a polling perspective Christianity is the most subdivided religion in the US, probably because Christianity is the most popular religion in the US. Here’s how Pew Research breaks down Christian respondents:

The orange arrows mean you can expand the section to see what was counted under it. Go here for the full clickable table to see who is in each bucket.

This shows that while most of the country (70.6%) claims the Christian faith, the exact flavor can vary. Evangelicals are the largest group, but Catholics are the largest denomination. Some of the denominations listed are not even particularly accepted by many Christians as part of the Christian faith, such as Jehovah’s Witness. Racial history is also used to subdivide religious groups. Confusingly for anyone not well acquainted with the topic, many churches with the same names actually can fall in three different categories. For example, Southern Baptists (5.3% of the US) are Evangelical, American Baptist Churches (1.5% of the US)are Mainline Protestant, and the National Baptist Convention (1.4% of the US) is a historically Black Protestant denomination. Overall, someone referencing a “Baptist” is most likely talking about an Evangelical type church (9.2% of the US) or a Historically Black Protestant Church (4% of the US). Mainline Protest Baptists are only 1.5% of the US.

For other religions, Pew breaks down “World Religions” (those having large numbers of adherents elsewhere in the world, but small numbers in the US ) and “Other Faiths”, which are basically Unitarians, New Age Religions and Native American traditions. For World Religions they include Judaism, Islam, Buddhist, Hindu and “other”.

When using race and ethnicity in breakdowns, the Public Religion Research Institute actually takes it a step further than Pew, and subdivides Catholics in to “Hispanic” vs “non-Hispanic”:

Finally, there are the “nones”. This group is currently the second biggest group in the US, big enough that Pew has started subdividing it as well:

As you can see, “none” sometimes mean a specific belief (atheist, agnostic), none by default (nothing in particular, religion not important), or seemingly non-committal (nothing in particular, religion important). These groups make up nearly equal portions of the 22.8% of people in this category.

Of course with all of these categories, it’s important to remember that this is all self-defined. No one checks that someone calling themselves a Baptist of any sort actually goes to church or even believes in God. Given that about 50% of people in the US say they rarely or never attend church, this definitely can skew things. This has led some surveyors to split out “regular church attendees” vs those who don’t often go. Conversely, on the atheist side, Scientific American reported that 1/3 of people who call themselves atheists stated that they believe in some form of life after death, and 6% said they believed in resurrections. Because of this, some groups have started to come up with new ways of categorizing religious people based on participation and belief levels.

So what’s right here? Should we categorize based on self-identification, participation, or belief? Well, I think it depends. Each of these things can be important depending on what your question actually is, and it’s important to know how surveyors or study authors addressed these issues before drawing any conclusions. It’s also probably important to remember that the distinctions drawn in US surveys are simply reflective of the US population. Muslims aren’t lumped together because Islam is a monolith, but rather because it would be hard to get a meaningful sample size of Sunni vs Shia in the US. If the population of the US shifts, we may see different groups highlighted.

If you know of any other interesting ways of breaking down religions, please let me know! Categorizing belief systems is an interesting challenge, and I like seeing how people address it.

5 Things About Crime Statistics

December 9, 2018December 9, 2018 / bs king / 5 Comments

Commenter Bluecat57 passed along an article a few weeks ago about the nightclub shooting in Thousand Oaks, California. Prior to the tragedy, Thousand Oaks had been rated the third safest city in the US, and it quickly lost that designation after the shooting. He raised the issue of crime statistics and how a city could be deemed “safe”. This seemed like a decent question to look in to, so I thought I’d round up a few interesting debates that crime statistics have turned up over the years.

Ready? Here we go!

There’s no national definition for some crimes In this age of hyperconnectedness, we tend to all assume that having cities report crime is a lot like reporting your taxes or something. Unfortunately, that’s not the case. Participation in most national crime databases is voluntary, and every jurisdiction has their own way of counting things. For example, 538 reported that in New York City, people who were hit by glass from a gunshot weren’t counted as victims of a shooting, but other jurisdictions do count them as such. Thus, any city that reports those as shootings will always look like they have a higher rate of crime than those that don’t.
Data is self-reported by cities and states Self reporting of anything can be known to influence the rates, and crime is no exception. One doesn’t have to look hard to find stories of cities changing how they classify crimes in response to public pressure. Even when everyone’s trying to be honest though, self-reports can be prone to typos and other issues. That’s why earlier this year NPR found that national school shooting numbers appear to have been skewed upward by reporting mistakes made by districts in Cleveland and California. One way of catching these issues is to ask people themselves how often they’ve been victimized and to compare it to the official reported statistics, but this can lead to other problems….
Crimes are self-reported by people For all crimes other than murder (most of the time) police can’t do much if they don’t know about a crime. Some crimes are underreported because people are embarrassed (falling for scams comes to mind), but some are underreported for other reasons. In some places, people don’t believe the police will help, that they will make things worse, or that they won’t respond quickly, so they will not report. Unauthorized immigrants frequently will not call the police for crimes committed against them, and some studies show that when their legal status changes their crime reporting rate triples. Additionally, crimes are typically not reported when the others involved were also committing crimes. Gang members will probably not report assault, and sex workers likely won’t report being robbed.
Denominators fluctuate One of the more interesting ideas Bluecat57 brought up when he passed the article on to me is that some cities suffer from having changing populations. For example, cities with a lot of tourists will get all of the crimes committed against the tourists, but the tourists will not be counted in their denominator. In Boston, the city population fluctuates by 250,000 people when the colleges are in session, but I’m not clear what population is used for crime reporting. Interestingly, this is the same reason we see states like Hawaii and Nevada reporting the highest marriage rates in the country…they get tourist weddings without keeping the tourists.
Unusual events can throw everything off Getting back to the original article that sparked this whole discussion, it’s hard to calculate crime rates when there’s one big event in the data. For example, people have struggled with whether or not to include 9/11 in NYCs homicide data. Some have, some haven’t. It depends on what your goal is, really. For a shooting like the one in Thousand Oaks, this would put them well ahead of the national average for this year (around 5 murders per 100,000 people) at 9 per 100,000, and immediately on par with cities like Tampa, FL. A big event in a small population can do that.

So overall, some interesting things to keep in mind when you read these things. As a report in Vox a few years ago said “In order for statistics to be reliable, they need to be collected for the purpose of reliability. In the meantime, the best that the public can do is to acknowledge the problems with the data we have, but use it as a reference anyway.” In other words, caveat emptor, caveats galore.

Where is the Center?

December 2, 2018December 2, 2018 / bs king / 6 Comments

Last week there was an interesting controversy about a New York Times op-ed (this one, in case you’re curious) that sparked an email discussion between some friends and I. I had been reading up on the concerns about the op-ed, which were mostly coming from left-leaning folks (summary of the controversy here) and was interested to note that in many of the discussions the political orientation of the New York Times was considered germane, as the NYTs was not considered a “friendly” publication to the left. I read multiple times that the NYT was obviously a “center right” publication.

This assertion surprised me, as I had always heard the NYTs referred to as a left leaning publication. As I’ve previously mentioned, I went to Baptist school through 12th grade, so this was actually a thing pretty frequently discussed.

As tends to happen when I hear two sides who disagree on something, I immediately wondered what definitions everyone was using. As I mentioned recently while discussing the political tribes study, measuring where the center is when it comes to compromise is hard. How do we measure where the center is when it comes to journalism? Or in general?

It strikes me that when we use the word “center”, people can mean a few things:

Center of public opinion of the country This one makes sense when we’re talking about elections, though can be deceptive. I heard someone recently mention that most people were actually liberal, because most people support expanding social services. Well yeah. The problem is that most people really tend to hate when their tax bill goes up, so they also tend to vote against that. What “most people want” can shift and wiggle depending on specifics.
Center of public opinion of those they are aware of I’ll come back to this one in a second, but who we see every day matters. A person growing up in Massachusetts will almost certainly end up with a slightly different idea of “center” than a person growing up in Texas. Likewise, a person spending a lot of time on the internet may believe that the center is something different than it is in real life.
Center of public opinion of a group of countries/the world This one comes up a lot when people talk about things like healthcare or anything that starts with “out of the G8 countries, the US is the only one without _____”. Likewise, a friend of mine who is Methodist recently sent out a video where their pastor pointed out that what was being proposed as a “moderate” stance on LGBT issues would actually be a radical stance for the Methodist churches located in Africa. Center changes quickly if you move outside the US.
Moderate political beliefs While there doesn’t appear to be a firm definition of moderate vs centrist, I did really like this Quora discussion about the difference. There’s an interesting assertion that non-extreme liberals like to use the word “centrist” whereas non-extreme conservatives like to use the word “moderate”. The political tribes study certainly took this stance, and called the right leaning center the “moderates”. Essentially though, “moderate” seems to imply a slower paced version of the liberal beliefs you align with. So someone who was “moderate” on taxes might believe they should be lowered, but would advocate gradual change. Someone who was “centrist” might believe they should stay where they are.
People who express their beliefs politely and are willing to listen to others, or who otherwise strive for harmony I’ll be coming back to this one as well, but there is an idea that “centrists” may just be people who don’t really like to openly argue with others. They may be people who put harmony ahead of political stances. They may be center by disposition, not by belief. Interestingly, the political tribes study I just mentioned put those who were “politically disengaged” in the exact center, flanked by those they called “passive liberals” and those they called “moderates”.
Someone who doesn’t agree with you on a key issue, but agrees on others. This one would be particularly key if you had one issue you felt strongly about. Major political parties tend to have a platform, but there are many people who are more single issue. If someone disagreed with them on that one issue, they may end up not thinking of them as on “their side” even if they were based on our traditional definitions.

With all those options, some groups that try to assess political bias have taken a multifaceted approach to ranking media outlets. For example, the website AllSides.com uses reader surveys that include a measure of the readers own bias in the calculation. When you sign up, you take a survey to assess your own bias, then they weight your rating of articles/outlets with that in mind. They also tell you how disputed the rankings are, and for large newspapers they rank the news and the editorial page separately. All Sides ranks The New York Times news section is rated “leans left” and their editorial page is rated “left”, FWIW.

So why the perception that the NYTs is center right?

Well, I thought about this and I’m guessing it’s a bit of #1 and 6 put together.

The first time I saw the NYTs referred to as “right leaning” was when they started profiling Trump voters after the election. Some people thought that was giving more air time to half the country than the other half, as there were not equal profiles of non-Trump voters. Of course the response is that the NYTs newsroom is almost certainly made up of non-Trump voters, along with much of their readership and that their typical articles reflect this, but there was still some thought that this should have been made more equal. This seems to have gotten in to the conventional wisdom in some circles, and now is getting repeated.

However on a deeper level, I wonder if #2 and #5 are coming in to play, particularly for younger people. It occurred to me that most of us older than, I don’t know, 25 or so, probably grew up with a different exposure to media than younger people have. When I was a kid, my parents subscribed to the newspaper (the Union Leader) and maybe watched the evening news. Now, both my husband and I read our news online. We’ve never gotten a newspaper, and we only watch the news when something big happens. This means my son has almost never seen how we get the news. He has much less of a baseline for news than I would have at the same age. If I’m not careful, his first exposure to reading the news will be random stories that catch his attention on Facebook/Twitter/whatever social media dominates when he starts getting in to it.

It occurred to me that if the bulk of your initial media exposure is viral headlines and journalism that openly advocates for certain positions, you’re going to have a very different take on what “center” is. If you’re used to media outlets marketing themselves directly to your demographic, then anything that doesn’t do that may not feel like “your side”. The further we get in to the internet/market segmentation age, the more people will have grown up without exposure to anything different.

I have no idea what the outcome of that would be, or if it will be a good thing or a bad thing. I do think it might have an impact on where we consider “the center” to be, as it may more and more come to mean “those not given to conflict” as opposed to “those attempting to represent both sides”. Not sure if that change is for the better or for the worse, but I do suspect there will be a shift.

I will note that we may already be seeing a shift in journalism due to Twitter. Someone noted recently that while about 20% of Americans have a Twitter account (including <18), almost 100% of journalists do. This means that journalists are most likely to hear from those who want to go on Twitter and mix it up with journalists, which almost certainly leaves out the “passive liberal” and “politically disengaged” off their radar. The survey suggested that’s 41% of the population, so that could lead to a serious skewing of perception. If they’re not hearing from “moderates” often, then they’re missing almost 60% of the US. One guesses they are hearing from the extremes (8% and 6%) more often than anyone else.

So those are my thoughts. BTW, if any of my readers happen to hold the opinion that the NYTs is center right, I would actually be rather interested in hearing why you drew that conclusion. I’ll admit I do not tend to read them, so I may have missed something. For everyone else, I’m equally curious what you call “center” and how you get there, or just in general what you think of the All Sides rankings. They seem to have gotten my two local papers right (Boston Herald – leans right, Boston Globe – leans left), so as far as I can tell they’re solid.

Good luck out there.

What I’m Reading: November 2018

November 25, 2018November 25, 2018 / bs king / 1 Comment

Happy post-Thanksgiving everyone! Hope yours was lovely. I went mostly computer free so if you’ve emailed me or sent me something recently, I promise it’s not off my radar. I didn’t get much reading in, but I did get sent two interesting pop culture graphics that are worth a gander.

First up, a visual representation of how accurate “based on a true story” movies are. Shows not only how often it’s inaccurate, but where those inaccuracies take place and how inaccurate they are. For example, here’s Selma (the highest rated) vs Imitation Game (one of the lowest rated). Bright red means false, light red means false-ish, grey is unknown, light blue is true-ish, dark blue is true. :

Check out the actual site, as you can click on each bar to see exactly what the scene was that got the rating. I was interested to see what they called “unknown”, and it appears that those are mostly things like conversations between two characters who definitely spoke, and almost certainly about that topic, but no specific record of or reference to the conversation exists.

Next up, from John: Are pop lyrics getting more repetitive? Using the same algorithm used to compress digital photos in to smaller file sizes, this guy tries to measure how repetitive the lyrics in the Billboard Top 100 songs for the last few decades. Not only is this an interesting project, but he spells out his methodology, assumptions, the outliers and his step by step process REALLY nicely. He shows examples of songs ranked highly repetitive, why he chose to use a log scale for his axis, and how his algorithm would evaluate a regular paragraph of text. Seriously, if scientific papers in general had methodology sections this robust we wouldn’t have a replication crisis.

So what was the most repetitive song in the 15,000 he looked at? Around the World by Daft Punk. Considering that song is just the phrase “Around the World” repeated 100+ times, this makes sense. He breaks down the most repetitive songs by decade, which I thought might be of interest to folks here. Remember, these are only songs that made it to the Billboard Hot 100:

1960s top 3:

Chain of Fools (Part 1) – Jimmy Smith, 1968 (92% size reduction)
Jingo – Sanata, 1969 (85% size reduction)
Any Way You Want It – The Dave Clark Five, 1964 (83% reduction)

(Note to my Dad – You Really Got Me by the Kinks was #5 for the decade at 81%)

1970s top 3:

Let’s All Chant – The Michael Zager Band, 1978 (88% size reduction)
Keep it Comin’ Love – KC and the Sunshine Band, 1977 (87%)
Who’d She Coo? – Ohio Players, 1976 (86%

1980s top 3:

Pump Up the Jam – Technotronic, 1989 (85%)
Funkytown – Lipps Inc. 1980 (85%)
Got My Mind Set On You – George Harrison, 1987 (80%)

1990s top 3:

Around the World – Daft Punk, 1997 (98%)
The Rockafeller Skank – Fatboy Slim, 1998 (95%)
Send Me On My Way – Rusted Root, 1995 (85%)

2000s top 3:

Better Off Alone – Alice Deejay, 2000 (84%)
Thong Song – Sisqo, 2000 (81%)
Dance With Me – 112, 2001 (81%)

2010s top 3:

Get Low – Dillon Francis & DJ Snake, 2015 (90%)
Barbra Streisand – Duck Sauce, 2011 (89%)
Feliz Navidad – Jose Feliciano, 2017 (89%)

Overall, songs did get more repetitive, both overall and the top 10 from each year. In 1960 the average song on the Top 100 was 46% compressible, while in 2015 it was 56% compressible. Interestingly, the top 10 songs are always more repetitive than the rest of them by about 2-6% or so.

There’s also a lot of interesting breakdowns by artist. I learned that the Guess Who was particularly repetitive for the 70s, and that country is much less repetitive than pop music. Apparently this even applies within artists, as Taylor Swift showed a sharp rise in repetitive lyrics after she switched from country to pop.

Anyway, go check it out, the graphics are great!

5 Things About the Many Analysts, One Data Set Paper

November 18, 2018November 18, 2018 / bs king / 1 Comment

I’ve been a little slow on this, but I’ve been meaning to get around to the paper “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. This paper was published back in August, but I think it’s an important one for anyone looking to understand why science can often be so difficult.

The premise of this paper was simple, but elegant: give 29 teams the same data set and the same question to answer, then see how everyone does their analysis and if all of those analyses yield the same results. In this case, the question was “do soccer referees give red cards to dark skinned players more than light skinned players”. The purpose of the paper was to highlight how seemingly minor choices in data analysis can yield different results, and all participants had volunteered for this study with full knowledge of what the purpose was. So what did they find? Let’s take a look!

1. Very few teams picked the same analysis methods. Every team in this study was able to pick whatever method they thought best fit the question they were trying to answer, and boy did the choices vary. First, the choice of analysis method varied: Next, the choice of covariates varied wildly. The data set had contained 14 covariates, and the 29 teams ended up coming up with 21 different combinations to look at:
2. Choices had consequences As you can imagine, this variability produced some interesting consequences. Overall 20 of the 29 teams found a significant effect, but 9 didn’t. The effect sizes they found also varied wildly, with odds ratios running from .89 to 2.93. While that shows a definite trend in favor of the hypothesis, it’s way less reliable than the p<.05 model would suggest.
3. Analytic choices didn’t necessarily predict who got a significant result. Now because all of these teams signed up knowing what the point of the study was, the next step in this study was pretty interesting. All the teams methods (but not their results) were presented to all the other teams, who then rated them. The highest rated analyses gave a median odds ratio of 1.31, and the lower rated analyses gave a median odds ratio of…..1.28. The presence of experts on the team didn’t change much either. Teams with previous experience teaching or publishing on statistical methods generated odds ratios with a median of 1.39, and the ones without such members had a median OR of 1.30. They noted that those with statistical expertise seemed to pick more similar methods, but that didn’t necessarily translate in to significant results.
4. Researchers beliefs didn’t influence outcomes. Now of course the researchers involved in this had self-selected in a to a study where they knew other teams were doing the same analysis they were, but it’s interesting to note that those who said up front they believed the hypothesis was true were not more likely to get significant results than those who were more neutral. Researchers did change their beliefs over the course of the study however, as this chart showed:While many of the teams updated their beliefs, it’s good to note that the most likely update was “this is true, but we don’t know why”, followed by “this is true, but may be caused by something we didn’t captured in this data set (like player behavior)”.
5. They key differences in analysis weren’t things most people would pick up on. At one point in the study, the teams were allowed to debate back and forth and look at each others analysis. One researcher noted that those teams that had included league and club as covariates were the ones who got non-significant results. As the paper states “A debate emerged regarding whether the inclusion of these covariates was quantitatively
  defensible given that the data on league and club were
  available for the time of data collection only and these
  variables likely changed over the course of many players’
  careers”. This is a fascinating debate, and one that would likely not have happened had these papers just been analyzed by one team. This choice was buried deep in the methods section, and I doubt under normal circumstances anyone would have thought twice about it.

That last point gets to why I’m so fascinated by this paper: it shows that lots of well intentioned teams can get different results even if no one is trying to be deceptive. These teams had no motivation to fudge their results or skew anything, and in fact were incentivized in the opposite direction. They still got different results however, for reasons that were so minute and debatable, they had to take multiple teams to discuss them. This shows nicely Andrew Gelman’s Garden of Forking Paths, how small choices can lead to big changes in outcomes. With no standard way of analyzing data, tiny boring looking choices in analysis can actually be a big deal.

The authors of the paper propose more group approaches may help mitigate some of these problems and give us all a better sense of how reliable results really are. After reading this, I’m inclined to agree. Collaborating up front also takes the adversarial part out, as you don’t just have people challenging each others research after the fact. Things to ponder.

graph paper diaries

because some of us need a few more lines to keep everything straight

Reporting the High Water Mark

What I’m Reading: January 2019

Updates on Mortality Rates and the Impact of Drug Deaths

GPD Most Popular Posts of 2018

GPD Lexicon: Proxy Preference

Who Believes What: How We Categorize Religious Beliefs

5 Things About Crime Statistics

Where is the Center?

What I’m Reading: November 2018

5 Things About the Many Analysts, One Data Set Paper

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: