Funnel Plots 201: Romance, Risk, and Replication

After writing my Fun With Funnel Plots post last week, someone pointed me to this Neuroskeptic article from a little over a year ago.  It covers a paper called “Romance, Risk and Replication” that sought to replicate “romantic priming studies”, with interesting results….results best shown in funnel plot form! Let’s take a look, shall we?

A little background: I’ve talked about priming studies on this blog before, but for those unfamiliar, here’s how it works: a study participant is shown something that should subconsciously/subtly stimulate certain thoughts. They are then tested on a behavior that appears unrelated, but could potentially be influenced by the thoughts brought on in the first part of the study. In this study, researchers took a look at what’s called “romantic priming” which basically involves getting someone to think about meeting someone attractive, then seeing if they do things like (say they would) spend more money or take more risks.

Some ominous foreshadowing: Now for those of you who have been paying attention to the replication crisis, you may remember that priming studies were one of the first things to be called in to question. There were a lot of concerns about p-value hacking, and concerns that they were falling prey to basically all the hallmarks of bad research. You see where this is going.

What the researchers found: Shanks et al attempted to replicate 43 different studies on romantic priming, all of which had found significant effects. When they attempted to replicate these studies, they found nothing. Well, not entirely nothing. They found no significant effects of romantic priming, but they did find something else:

The black dots are the results from original studies, and the white triangles are the results from the replication attempts. To highlight the differences, they drew two funnel plots. One encompasses the original studies, and shows the concerning “missing piece” pattern in the lower left hand corner.  Since they had replication studies, they funnel plotted those as well. Since the sample sizes were larger, they all cluster at the top, but as you can see they spread above and below the zero line. In other words, the replications showed no effect in exactly the way you would expect if there were no effect, and the originals showed an effect in exactly the way you would expect if there were bias.

To thicken the plot further, the researchers also point out that the original studies effect sizes actually all fall just about on the line of the funnel plot for the replication results. The red line in the graph shows a trend very close to the side of the funnel, which was drawn at the p=.05 line. Basically, this is pretty good evidence of p-hacking…aka researchers (or journals) selecting results that fell right under the p=.05 cut off. Ouch.

I liked this example because it shows quite clearly how bias can get in and effect scientific work, and how statistical tools can be used to detect and display what happened. While large numbers of studies should protect against bias, sadly it doesn’t always work that way. 43 studies is a lot, and in this case, it wasn’t enough.

Fun With Funnel Plots

During my recent series on “Why Most Published Research Findings Are False“, we talked a lot about bias and how it effects research. One of the classic ways of overcoming bias in research is either to 1) do a very large well publicized study that definitively addresses the question or 2) pull together all of the smaller studies that have been done and analyze their collective results. Option #2 is what is referred to as a meta-analysis, because we are basically analyzing a whole bunch of analyses.

Now those of you who are paying attention may wonder how effective that whole meta-analysis thing is. If there’s some sort of bias in either what gets published or all of the studies being done, wouldn’t a study of the studies show the same bias?

Well, yeah, it most certainly would. That’s why there’s a kind of cool visual tool available to people conducting these studies to take a quick look at the potential for bias. It’s called a funnel plot, and it looks exactly as you would expect it to:

Basically you take every study you can find about a topic, and you map the effect size noted on the x-axis, and the size of the study on the y-axis.  With random variation, the studies should look like a funnel: studies with small numbers of people/data points will vary a lot more than larger studies, and both will converge on the true effect size. This technique has been used since the 80s, but was popularized by the excitingly titled paper “Bias in Meta-Analysis Can Be Detected by a Simple Graphical Test”.  This paper pointed out that if you gather all studies together and don’t get a funnel shape, you may be looking at some bias. This bias doesn’t have to be on the part of the researchers by the way….publication bias would cause part of the funnel to go missing as well.

The principle behind all this is pretty simple: if what we’re looking at is a true effect size, our experiments will swing a bit around the middle. To use the coin toss analogy, a fairly weighted coin tossed 10 times will sometimes come up 3 heads, 7 tails or vice versa, but if you toss it 100 times it will probably be much closer to 50-50. The increased sample size increases the accuracy, but everything should be centered around the same number….the “true” effect size.

To give an interesting real life example, take the gender wage gap. Now most people know (or should know) that the commonly quoted “women earn 77 cents on the dollar” stat is misleading. The best discussion of this I’ve seen is Megan McArdle’s article here, and in it an interesting fact emerges: even controlling for everything possible, no study has found that women outearn men.  Even the American Enterprise Institute and the Manhattan Institute both put the gap at 94 to 97 cents on the dollar for women. At one point in the AEI article, they opine that such a small gap “may not be significant at all”, but that’s not entirely true. The fact that no one seems to find a small gap going the other direction actually suggests the gap may be real. In other words, if the true gap was zero, at least half of the studies should show women out earning men. If the mid-line is zero, we only have half the funnel. Now this doesn’t tell us what the right number is or why it’s there, but it is a pretty good indication that the gap is something other than zero. Please note: The McArdle article is from 2014, so if there’s new data that shows women out earn men in a study that controls for hours worked and education level, send it my way.

Anyway, the funnel plot is not without it’s problems. Unfortunately there’s not a lot of standards around how to use it, and changing the scale of the axis can make it look more or less convincing than it really should be. Additionally, if the number of studies is small, it is not as accurate. Finally, it should be noted that missing part of the funnel is not definitive proof that publication or other bias exists. It could be that those compiling the meta-analysis had a hard time finding all the studies done, or even that the effect size varies based on methodology.

Even with those problems, it’s an interesting tool to at least be aware of, as it is fairly frequently used and is not terribly hard to understand once you know what it is. You’re welcome.