After writing my Fun With Funnel Plots post last week, someone pointed me to this Neuroskeptic article from a little over a year ago. It covers a paper called “Romance, Risk and Replication” that sought to replicate “romantic priming studies”, with interesting results….results best shown in funnel plot form! Let’s take a look, shall we?
A little background: I’ve talked about priming studies on this blog before, but for those unfamiliar, here’s how it works: a study participant is shown something that should subconsciously/subtly stimulate certain thoughts. They are then tested on a behavior that appears unrelated, but could potentially be influenced by the thoughts brought on in the first part of the study. In this study, researchers took a look at what’s called “romantic priming” which basically involves getting someone to think about meeting someone attractive, then seeing if they do things like (say they would) spend more money or take more risks.
Some ominous foreshadowing: Now for those of you who have been paying attention to the replication crisis, you may remember that priming studies were one of the first things to be called in to question. There were a lot of concerns about p-value hacking, and concerns that they were falling prey to basically all the hallmarks of bad research. You see where this is going.
What the researchers found: Shanks et al attempted to replicate 43 different studies on romantic priming, all of which had found significant effects. When they attempted to replicate these studies, they found nothing. Well, not entirely nothing. They found no significant effects of romantic priming, but they did find something else:
The black dots are the results from original studies, and the white triangles are the results from the replication attempts. To highlight the differences, they drew two funnel plots. One encompasses the original studies, and shows the concerning “missing piece” pattern in the lower left hand corner. Since they had replication studies, they funnel plotted those as well. Since the sample sizes were larger, they all cluster at the top, but as you can see they spread above and below the zero line. In other words, the replications showed no effect in exactly the way you would expect if there were no effect, and the originals showed an effect in exactly the way you would expect if there were bias.
To thicken the plot further, the researchers also point out that the original studies effect sizes actually all fall just about on the line of the funnel plot for the replication results. The red line in the graph shows a trend very close to the side of the funnel, which was drawn at the p=.05 line. Basically, this is pretty good evidence of p-hacking…aka researchers (or journals) selecting results that fell right under the p=.05 cut off. Ouch.
I liked this example because it shows quite clearly how bias can get in and effect scientific work, and how statistical tools can be used to detect and display what happened. While large numbers of studies should protect against bias, sadly it doesn’t always work that way. 43 studies is a lot, and in this case, it wasn’t enough.