Proving Causality: Who Was Bradford Hill and What Were His Criteria?

Last week I had a lot of fun talking about correlation/causation confusion, and this week I wanted to talk about the flip side: correctly proving causality. While there’s definitely a cost to incorrectly believing that Thing A causes Thing B when it does not, it can also be quite dangerous to NOT believe Thing A causes Thing B when it actually does.

This was the challenge that faced many public health researchers when attempting to establish a link between smoking and lung cancer. With all the doubt around correlation and causation, how do you actually prove your hypothesis?  British statistician Austin Bradford Hill was quite concerned with this problem, and he established a set of nine criteria to help prove causal association. While this criteria is primarily used for proving causes for medical conditions, it is a pretty useful framework for assessing correlation/causation claims.

Typically this criteria is explained using smoking (here for example), as that’s what is was developed to assess. I’m actually going to use examples from the book The Ghost Map, which documents the cholera outbreak in London in 1854 and the birth of modern epidemiology.  A quick recap: A physician named John Snow witnessed the start of the cholera outbreak in the Soho neighborhood of London, and was desperate to figure out how the disease was spreading. The prevailing wisdom at the time was that cholera and other diseases were  transmitted by foul smelling air (miasma theory), but based on his investigation Snow began to believe the problem was actually a contaminated water source. In the era prior to germ theory, the idea of a water-borne illness was a radical one, and Snow had to vigorously document his evidence and defend his case….all while hundreds of people were dying. His investigation and documentation is typically acknowledged as the beginning of the field of formal epidemiology, and it is likely he saved hundreds if not thousands of lives by convincing authorities to remove the handle of the Broad Street pump (the contaminated water source).

With that background, here are the criteria:

  1. Strength of Association: The first criteria for proof is basic. People who do Thing A must have a higher rate of Thing B than those who don’t. This is basically a request for an initial correlation. In the case of cholera, this was where John Snow’s “Ghost Map” came in. He created a visual diagram showing that the outbreak of cholera was not necessarily purely based on location, but by proximity to one particular water pump. Houses that were right next to each other had dramatically different death rates IF the inhabitants typically used different water pumps. Of those living near the water pump, 127 died. Of those living nearer to other pumps, 10 died. That’s one hell of an association.
  2. Temporality: The suspected cause must come before the effect. This one seems obvious, but must be remembered. It’s clear that both water and air are consumed frequently, so either method of transmission passed this criteria. However, if you looked closely, it was clear that bad smells often came after disease and death, not before. OTOH, there were a lot of open sewer systems in London at the time, so everything probably smelled kinda bad. We’ll call this one a draw.
  3. Consistency: Different locations must show the same effects. This criteria is a big reason why miasma theory (the theory that bad smells caused disease) had taken hold. When disease outbreaks happened, the smells were often unbearable. This appeared to be very consistent across locations and different outbreaks. Given John Snow’s predictions however, it would have been beneficial to see if cholera outbreaks had unusual patterns around water sources, or if changing water sources changes the outbreak trajectory.
  4. Theoretical Plausibility This one can be tricky to establish, but basically it requires that you can propose a mechanism for cause. It’s designed to help keep out really out there ideas about crystals and star alignment and such. Ingesting a substance such as water quite plausibly could cause illness, so this passed.  Inhaling air also passed this test, since we now know that many diseases are actually transmitted through airborne germs. Cholera didn’t happen to have this method of transmission, but it wasn’t implausible that it could have. Without germ theory, plausibility was much harder to establish. Plausibility is only as good as current scientific understanding.
  5. Coherence The coherence requirement looks at whether the proposed cause agrees with other knowledge, especially laboratory findings. John Snow didn’t have those, but he did gain coherence when the pump handle was removed and the outbreak stopped. That showed that the theory was coherent, or that things proceeded the way you would predict they would if he was correct. Conversely, the end of the outbreak caused a lack of coherence for miasma theory…if bad air was the cause, you would not expect changing a water source to have an effect.
  6. Specificity in the causes The more specific or direct the relationship between Thing A and Thing B, the clearer the causal relationship and the easier it is to prove. Here again, by showing that those drinking the water were getting cholera at very high rates and those not drinking the water were not getting cholera as often, Snow offered a very straightforward cause and effect. If there had been other factors involved….say water drawn at a certain time of day….this link would have been more difficult to establish.
  7.  Dose Response Relationship The more exposure you have to the cause, the more likely you are to have the effect. This one can be tricky. In the event of an infectious disease for example, one exposure may be all it takes to get sick. In the case of John Snow, he actually doubted miasma theory because of this criteria. He had studied men who worked in the sewers, and noted that they must have more exposure to foul air than anyone else. However, they did not seem to get cholera more often than other people. The idea that bad air made you sick, but that lots of bad air didn’t make you more likely to be ill troubled him. With the water on the other hand, he noted that those using the pump daily became sick immediately.
  8. Experimental Evidence While direct human experiments are almost never possible or ethical to run, some experimental evidence may used as support for the theory. Snow didn’t have much to experiment on, and it would have been unethical if he had. However, he did note people who had avoided the pump and noted if they got sick or not. If he had known of animals that were susceptible to cholera, he could have tested the water by giving one animal “good” water and another animal “bad” water.
  9. Analogy If you know that something occurs one place, you can reasonably assume it occurs in other places. If Snow had known of other water-borne diseases, one suspects it would have been easier for him to make his case to city officials. This one can obviously bias people at times, but is actually pretty useful. We would never dream of requiring a modern epidemiologist to prove that a new disease could be water-borne….we would all assume it was at least a possibility.

Even though Snow didn’t have this checklist available to him, he ended up checking most of the boxes anyway. In particular, he proved his theory using strength of association, coherence, consistency and specificity. He also raised questions about the rival theory by pointing to the lack of dose-response relationship. Ultimately, the experiment of removing the pump handle succeeded in halting the outbreak.

Not bad for a little data visualization:

While some of these criteria have been modified or improved, this is a great fundamental framework for thinking about causal associations. Also, if you’re looking for a good summer read, I would recommend the book I referenced here: The Ghost Map. At the very least it will help you stop making “You Know Nothing John Snow” jokes.

Correlation and Causation: Real World Problems

In yesterday’s post, I got a bit worked up over sloppy reporting on a study on dietary interventions in pregnancy. 

This led to an interesting comment from the Assistant Village Idiot regarding weight gain recommendations for pregnant women.  The current weight gain recommendation is 25 – 35 pounds for a normal BMI woman, but AVI commented that it used to be much lower, and that women were hospitalized to stop them from eating too much.
I didn’t actually know that, so I immediately decided to look it up.  
I stumbled on to a fascinating presentation put together by an OB at UCSF on the history of maternal weight gain recommendations (link goes to the PowerPoint slides).  It not only confirmed what AVI had mentioned, but also gave some of the reasoning….which turned out to be a very interesting example of people erroneously conflating correlation with causation.  
Apparently part of the reason why they (they being doctor’s circa 1930) were so nervous about weight gain in pregnancy was that they were trying to prevent preeclampsia.  Now preeclampsia is a life threatening condition if left untreated, and one of the warning signs is rapid weight gain.  Apparently some doctors actually thought that the symptom was the cause, and believed that all excessive weight gain was a sign the patient was about to become preeclamptic.  Thus, the theory went, limiting weight gain would prevent preeclampsia and aid in “figure preservation” to boot*.  
Sadly, this also led to higher infant mortality, disability, and mental retardation….which seems a pretty steep price to pay for what was really a data analysis error.  As I’ve said before, this is why statistics are so relevant in medicine….the cost for getting things wrong is too steep to not be careful.
*To note, it is actually true that preeclampsia is linked to higher weight/glucose/insulin production….but the way they went about addressing it did as much harm to the fetus as good.  Current weight gain recommendations are set to optimize outcomes for the babies, not the mothers.