Book Recommendation: Bad Blood

Well, my audit went well last week. The inspector called us “the most boring audit he’d ever had”, which quite frankly is what you want to hear from a regulator. Interest = violations = citations = sad BS King.

As someone who has now dealt with quite a few inspectors over the years, I am always interested to see how exactly they choose to go about surveying everything given the time constraints. This particular inspector had an interesting tactic: he ran down the list of regulations we should be following, and asked us verbally if we followed it or not. Everything tenth one or so, he would suddenly pivot and ask us to provide proof. He mentioned afterwards that he put a lot of weight on how quickly we were able to produce what he asked for. From what I can tell, his theory was that if you produce proof for random questions easily and without hesitation, you probably prepared for everything fairly well. Not a bad theory. Luckily for me, our preparation strategy had been to read through every standard, then prepare a response for it. Thus, we were boring, and my sanity is restored.

I was thinking about all this as I sat down to relax this weekend and picked up the book “Bad Blood: Secrets and Lies in a Silicon Valley Startup” by John Carreyrou. This book covers the rise and fall of Theranos and its founder Elizabeth Holmes, a topic I’ve mentioned on this blog before. To say I couldn’t put it down is a near literal statement: I started it at 5pm last night and finished it by noon today. The book converges on many of my interests: health, medicine, technology, data, and how very smart people can be deceived in to believing something that isn’t true. It also doesn’t hurt that the companies founder is a woman about my age who was once touted as being the first self-made female billionaire in a field I have actually worked in.

For those unfamiliar with Theranos, I’ll give the short version. Theranos was a company started in 2003 by then 19 year old Stanford drop out Elizabeth Holmes. Her vision was to create a blood analyzer that could run regular lab tests on just a few drops of blood, so patients could use a finger stick (like with home glucose monitoring) rather than get their blood drawn the conventional way. Ten years in, the company was worth almost $10 billion, but there was an issue: their product didn’t really work the way they claimed, and the company was using extreme tactics to cover this up. Eventually, in a bid to get somebody to pay attention to this, the story was brought to the attention of a Wall Street Journal reporter (John Carreyrou, who wrote the book) and he managed to untangle the web. Despite the highlights all being pretty well publicized at the time, I found the details and timeline reconstruction to be a fascinating read.

What interested me most about the book was that my characterization in my blog post 2 years ago was a little bit wrong. I had snarked that Carreyrou was one of the first to question them, but as I read the book I discovered that actually a lot of people had questioned Theranos, even during its prime. It actually restored my faith in humanity to see how many people had attempted to raise concerns about what they saw. Many of these people were young, with student debt, or marketing people unfamiliar with science, or simply people with ethics who just got uncomfortable, and many of them only stopped pushing when they were on the receiving end of some downright frightening legal (and sometimes not so legal) intimidation tactics. Additionally, many people who were deceived really couldn’t be blamed. In one particularly bizarre anecdote, Carreyrou mentions that a fellow Wall Street Journal reporter had gone to a meeting with Theranos and they had promised to show him how the machine worked. It turns out the machine didn’t work, but they’d written a program to hide any error messages with a progress screen, and then when he left the room they swapped out his sample and ran it on a regular analyzer they had in another room. Not really his fault for not picking up on that. She got her deal with Walgreens by performing a similar slight of hand. Since the initial WSJ articles, Theranos has paid out millions in lawsuits claiming that they intentionally deceived investors, and Holmes and Ramesh Balwani (her #2 guy and former boyfriend) are under indictment.

Throughout the book, Carreyrou returns to two related but slightly different central points:

  1. Holmes and her investors wanted to believe she was the next Steve Jobs or Bill Gates.
  2. Healthcare doesn’t work like other tech sector products. Claiming your technology works before it’s ready could kill someone.

It was interesting for me to reflect that if Holmes hadn’t entered the healthcare realm, she might have actually succeeded. While the biographies of people like Steve Jobs are actually littered with the stories of broken promises, many of the people who flipped on Holmes stated that they were compelled to resign their jobs or talk to reporters because they feared the shoddy work was going to kill someone.

So if this was so obvious, how did Theranos get to $10 billion? And how did they end up with people like Henry Kissinger, George Schulz and James Mattis on the board? A few lessons I gleaned:

  1. Watch out for the narrative, ask for data. One of the few things everyone agrees upon in this story was that Holmes was a compelling CEO. She could spin a strong narrative to anyone who asked, and was kind and easy to work with as long as you let her stick to the story. Throughout the story though, anyone who asked for proof of anything she said was met with responses ranging from frosty to belligerent. This is what initially reminded me of my inspection. We were able to provide proof just as readily as we were able to provide verbal confirmation, which is why our inspector ended up believing us.
  2. Look for real experts. After Carreyrou published his first article about the concerns with the company, he notes that Theranos issued quite a few heavily worded denials and legal threats to the Wall Street Journal. Luckily for him, he noted that post-publication several other media outlets jumped in and started asking questions. He noted that one of the reasons they were so quick to pounce is that a quick look at Theranos’s board and investors revealed that no one involved really knew anything about biotech. While names like Henry Kissinger are impressive, people quickly started noting that the board was mostly military men and diplomats. The lack of any medical leadership seemed out of place. Additionally, some investing groups (like Google Health) that specialize in biotech had passed on Theranos. This was enough to cause other news outlets to turn up the heat on Holmes, as the lack of real experts struck everyone as suspicious.
  3. Look at the history. In an interview he gave, Carreyrou pointed out that it wasn’t the initial investors in Theranos who screwed up, as early investors are often gambling on half-baked ideas. The people who failed their due diligence were those who invested a decade in. He notes that those people should have been pushing harder for financial statements and peer reviewed studies, and that didn’t happen. For Theranos not to have peer reviewed studies in their first year was understandable. To still be lacking them in their tenth year was a very bad sign.
  4. Apply the right standards to the right industry. Healthcare isn’t the same as a cell phone. There are laws, and regulating bodies that can and will shut you down. A 1% product failure rate can kill people. Don’t get so excited by the idea of “disruption” that you ignore reality.

Come to think of it, with a few tweaks these are all pretty good life lessons about how to avoid bad actors in your personal life as well. I really do recommend this book, if only as a counter-narrative to the whole “everyone said we couldn’t do it, but we proved the naysayers wrong!” thing. Sometimes naysayers are right.

Although maybe not forever. As an interesting end note: according to this article, Holmes is currently fundraising in Silicone Valley for another start up.

Death Comes for the Appliance

Our dryer died this week. Or rather, it died last weekend and we got a new one this week. When we realized it was dead (with a full load of wet clothes in it, naturally), the decision making process was pretty simple.

We’re only the third owners of our (early 1950s) house, and the previous owners spent most of the 5 years they had it trying to flip it for a quick buck. We’ve owned it for 6 years now, so any appliance that wasn’t new when we moved in was probably put in by them when they moved in. That made the dryer about 11 years old, and it was a cheap model. I was pretty sure a cheap dryer over a decade old (that had been slowly increasing in drying time for a year or so, unhelped by a thorough cleaning) would be more trouble to repair than it was worth, so we got a new one.

After making the assertion above, I got a little curious if there was any research backing up the life span of various appliances. As long as I can remember I’ve been fairly fascinated by dead or malfunctioning appliances, which I blame on my Yankee heritage. I’ve lived with a lot of half-functioning appliances in my lifetime, so I’ve always been interested in what appliance sounds/malfunctions mean “this is an appliance that will last three more years if you just never use that setting and jerry-rig (yes that’s a phrase) a way to turn it off/on” and which sounds mean “this thing is about to burst in to flames, get a new one”.

It turns out there actually is research on the topic, summarized here, and that there’s a full publication on the topic here:

So basically it looks like we were on schedule for a cheap dryer to go. Our washing machine was still working, but it was cheaper if we replaced them both at the same time.

This list suggests our dishwasher was weak as it went at about 7 years (they refused to repair it for under the cost of replacement), but our microwave is remarkably strong (10 years and counting). We had to replace our refrigerator earlier than should have been necessary (that was probably the fault of a power surge), but our oven should have a few more years left.

Good to know.

Interestingly, when I mentioned this issue to my brother this weekend, he asked me if I realized what the longest lasting appliance in our family history was. He stumped me until he told me the location….a cabin owned by our extended family. The refrigerator in it has been operational since my mother was a child, and I’m fairly sure it’s this model of Westinghouse that was built in the 1950s, making it rather close to 70 years old:

Wanna see the ad? Here you go!

It’s amusing that it’s advertised as “frost free”, as my strongest childhood memories of this refrigerator were having to unplug it at the end of the summer season and then put towels all around it until all the ice that had built up in it melted. We’d take chunks out to try to hurry the process along.

Interestingly, the woman in the ad up there was Betty Furness, who ended up with a rather fascinating career that included working for Lyndon Johnson. She was known for her consumer advocacy work, which may be why the products she advertised lasted so darn long, or at least longer than my dryer.

Judging Attractiveness

From time to time, I see this graph pop up on Twitter: 

It’s from this blog post here, and it is almost always used as an example of how picky women are. The original numbers came from a (since deleted) OK Cupid blog post here. From what I can tell they deleted it because the whole “women find 80% of men below average” thing was really upsetting people.

Serious question though….has this finding been replicated in studies where men and women don’t get to pick their own photos?

As anyone who’s looked at Facebook for any length of time knows, photo quality can vary dramatically. For people we know, this is a small thing…”oh so and so looks great in that picture”, “oh poor girl looks horrible in that one”, etc etc. One only needs to walk in to a drug store to note that women in particular have a myriad of ways to alter their appearance….make up, hair products, hair styles, and I’m sure there are other things I am forgetting. Your average young male might use some hair product, but rarely alters anything beyond that.

So basically, women have a variety of ways to improve their own appearance, whereas men have very few. Women are also more rewarded for having a good looking photo on a dating site. From the (deleted) OK Cupid article:

So the most attractive male gets 10x the number of messages as the least attractive male, but the most attractive woman gets 25x the number of messages. A woman of moderate attractiveness has a huge incentive to get the best possible photo of herself up on the site, whereas a similarly placed man doesn’t have the same push. Back when I made a brief foray in to dating sites, I noted that certain photos could cause the number of messages in my inbox to triple overnight. With that kind of feedback loop, I think almost every woman would trend toward optimizing their photo pretty quickly. Feedback would be rather key here too, as research suggests we are actually pretty terrible at figuring out what a good photo of ourselves actually looks like.

Side note: as we went over in a previous post, measuring first messages puts guys at a disadvantage from the get go. Men as a group receive far fewer messages from women on these sites. This means their feedback loop is going to be much more subtle than women’s, making it harder for them to figure out what to change.

My point is, I’m not sure we should take this data seriously until we compare it to what happens when all the pictures used are taken under the same conditions. The idea that the genders select their photos differently is a possible confounder.

I did some quick Googling to see if I could find a similar distribution of attractiveness rankings for a general research study, and I did find this one from a Less Wrong post about a study on speed dating: 

They note that men did rate the average woman slightly higher (6.5) than women rated the average man (5.9), but note that we see a bell curve rating in both cases. The standard deviation was noted to be the same (0.5). At a minimum, I feel this suggests that online perceptions do not translate cleanly in to real life. I suspect that’s a statement that can be applied to many fields.

I’d be interested to see any other non-dating site data sets people know about, to see what distribution they follow.

Measuring Compromise

There’s a report that’s been floating around this week called Hidden Tribes: A Study of America’s Polarized Landscape. Based on a survey of about 8,000 people, the aim was to cluster people in to different political groups, then figure out what the differences between them were.

There are many interesting things in this report and others have taken those on, but the one thing that piqued my interest was the way they categorized the groups as either “wings” of the party or the “exhausted majority”. Take a look:

It’s rather striking that traditional liberals are considered part of the “exhausted majority” whereas traditional conservatives are considered part of the “wings”.

Reading the report, it seemed they made this categorization because the traditional liberals were more likely to want to compromise and to say that they wanted the country to heal.

I had two thoughts about this:

  1. The poll was conducted in December 2017 and January 2018, so well in to the Trump presidency. Is the opinion of the “traditionalist” group on either side swayed by who’s in charge? Were traditional liberals as likely to say they wanted to compromise when Obama was president?
  2. How do you measure desire to compromise anyway?

It was that second question that fascinated me. Compromise seems like one of those things that it’s easy to aspire to, but harder to actually do. After all, compromise inherently means giving up something you actually want, which is not something we do naturally. Almost everyone who has ever lived in a household/shared a workplace with others has had to compromise at some point, and two things become quickly evident:

  1. The more strongly you feel about something, the harder it is to compromise
  2. Many compromises end with at least some unhappiness
  3. Many people put stipulations on their compromising up front…like “I’ll compromise with him once he stops being so unfair”

That last quote is a real thing a coworker said to me last week about another coworker.

Anyway, given our fraught relationship with compromise, I was curious how you’d design a study that would actually test people’s willingness to compromise politically rather than just asking them if it’s generically important. I’m thinking you could design a survey that would give people a list of solutions/resolutions to political issues, then have them rank how acceptable they found each solution. A few things you’d have to pay attention to:

  1. People from both sides of the aisle would have to give input in to possible options/compromises, obviously.
  2. You’d have to pick issues with a clear gradient of solutions. For example, the recent Brett Kavanaugh nomination would not work to ask people about because there were only two outcomes. Topics like “climate change” or “immigration” would probably work well.
  3. The range of possibilities would have to be thought through. As it stands today, most of how we address issues already are compromises. For example, I know plenty of people who think we have entirely too much regulation on emissions/energy already, and I know people who think we have too little. We’d have to decide if we were compromising based on the far ends of the spectrum or the current state of affairs. At a minimum, I’d think you’d have to include a “here’s where we are today” disclaimer on every question.
  4. You’d have to pick issues with no known legal barrier to implementation. Gun control is a polarizing topic, but the Second Amendment does give a natural barrier to many solutions. I feel like once you get in to solutions like “repeal the second amendment” the data could get messy.

As I pondered this further, it occurred to me that the wings of the parties may actually be the most useful people in writing a survey like this. Since most “wing” type folks actually pride themselves on being unwilling to compromise, they’d probably be pretty clear sighted about what the possible compromises were and how to rank them.

Anyway, I think it would be an interesting survey, and not because I’m trying to disprove the original survey’s data. In the current political climate we’re so often encourage to pick a binary stance (for this, against that) that considering what range of options we’d be willing to accept might be an interesting framing for political discussions. We may even wind up with new political orientations called “flexible liberals/conservatives”. Or maybe I just want a good excuse to write a fun survey.

Media Coverage vs Actual Incidence

The month of October is a tough on for me schedule-wise, so I’m probably going to be posting a lot of short takes on random things I see. This study popped up on my Twitter feed this week and seemed pretty relevant to many of the themes of this blog: “Mediatization and the Disproportionate Attention to Negative News“.

This study took a look at airplane crashes, and tracked the amount of media attention they got over the years. I’ll note right up front that they were tracking Dutch media attention, so we should be careful generalizing to the US or other countries. The authors of the study decided to track the actual rate of airplane crashes over about 25 years, along with the number of newspaper articles dedicated to covering those crashes as a percentage of all newspaper articles published.

The whole paper is interesting, but the key graph is this one:

Now the authors fully admit that the MH17 airplane crash in 2014 (plane brought down by a missile, mostly Dutch passengers,) does account for that big spike at the end, but it appears the trend still holds even if you leave that out.

It’s an interesting data set, because it puts some numbers behind the idea that things are not always covered in the media in proportion to their actual occurrence. I think we all sort of know this intuitively in general, but it seems hard to remember when it comes to specific issues.

Even more interesting is that the authors did some analysis on exactly what these articles covered, to see if they could get some hints as to why the coverage has increased. They took 3 “eras” of reporting, and categorized the framing of the articles about the plane crashes. Here were there results:

Now again, the MH17 incident (with all its international relations implications) is heavily skewing that last group, but it’s interesting to see the changes anyway. The authors note that the framing almost definitely trends from more neutral to more negative. This supports their initial thesis that there is some “mediatization” going on. They define mediatization as “a long-term process through which the importance of the media and their spillover effects on society has increased” and theorize that “Under the conditions of mediatization, certain facets have become more prominent in media coverage, such as a focus on negativity, conflicts, and human-interest exemplars”. This tendency is the fault of “the decreasing press–party parallelism and media’s growing commercial orientation has strengthened the motives and effort to gain the largest possible audience media can get”.

As a result of this, the authors show that within the month after a plane crash is reported by the media, fewer people board planes. They don’t say if this effect has lessened or increased over time, but regardless, the media coverage does appear to make a difference. Interestingly, the found that airline safety was not related (time-series wise) to press coverage. Airlines were not more or less safe the month after a major crash than they were the month before, suggesting that crashes really aren’t taking place due to routine human error any more.

Overall, this was a pretty interesting study, and I’d be interested to see it repeated with some new media such as blogs or Twitter. It’s harder to get hard numbers on those types of things, but as their effect is felt more and more it would be interesting to quantify how they feed in to this cycle.

Take Your Best Guess

The AVI passed on an interesting post about a new study that replicates the finding that many psychological studies don’t replicate. Using 21 fairly randomly selected studies (chosen specifically to avoid being too sensational…these were supposed to be run of the mill), replication efforts showed that about 60% of studies held up while almost 40% could not be replicated.

This is a good an interesting finding, but what’s even more interesting is that they allowed people to place bets ahead of time on exactly which studies they thought would fail and which ones would bear out. Some of the people were other psych researchers, and some were placing bets for money. Turns out that everyone was pretty darn good at guessing which findings would replicate:

Consistently, studies that failed to replicate had fewer people guessing they would replicate. In fact, most people were able to guess correctly on at least 17 or 18 out of the 21.

Want to try your hand at it? The 80,000 hours blog put together a quiz so you can do just that! It gives you the overview of the study finding with an option to read more. about exactly what they found. Since I’m packing up for a work trip this week, I decided not to read any details and just go with my knee jerk guess from the description. I got 18 out of 21:

I encourage you to try it out!

Anyway, this is an interesting finding because quite often when studies fail to replicate, there are outcries of “methodological terrorism” or that the replication efforts “weren’t fair”. As the Put A Number on It blog post points out though, if people can pretty accurately predict which studies are going to fail to replicate, then those complaints are much less valid.

Going forward, I think this would be an interesting addendum to all replication effort studies. It would be an interesting follow up to particularly focus on the studies that were borderline….those that just over 50% of people thought might replicate, but that didn’t end up replicating. It seems like those studies might have the best claim to change the methodology and repeat.

Now go take the quiz, and share your score if you do! The only complaint I had was that the results don’t specifically tell you (I should have written it down) if you were more likely to say a study would replicate when it didn’t or vice versa. It would be an interesting personal data point to know if you’re more prone to Type 1 or Type 2 errors.

 

Tornadoes in the Middle of America

I was talking to my son (age 6) a few days ago, and was surprised to hear him suddenly state “Mama, I NEVER want to go to the middle of America”. Worried that I had somehow already managed to inadvertently make him in to one of the coastal elite, I had to immediately ask “um, what makes you say that?”. “The middle of America is where tornadoes are, and I don’t want to be near a tornado”, he replied. Oh. Okay then.

Apparently one of his friends at school had started telling him all about tornadoes, and he wanted to know more. Where were most of the tornadoes? Where was the middle of America anyway? And (since I’m headed to Nebraska in a week), what state had the most tornadoes?

We decided to look it up, and the first thing I found on Google image search was this map from US Tornadoes:

Source here. I was surprised to see  the highest  concentration was  in the Alabama/Mississippi area, but then I realized this was tornado warnings, not tornadoes themselves. The post that accompanies the map suggests that the high number of tornado warnings in the Mississippi area is because they have a much longer tornado season there than the Kansas/Oklahoma region that we (or at least I) normally think of as the hotbed for tornadoes.

Areas impacted by tornadoes vary a bit depending on what you’re counting, but this insurance company had a pretty good map of impacted areas here:

Measuring can vary a bit for two reasons: what you count as a tornado, and how you calculate frequency. The National Oceanic and Atmospheric Administration puts out a few different types of numbers: average number of tornadoes, average number of strong to violent tornadoes, tornadoes by state and tornado average per 10,000 square miles. Those last two are basically to help account for states like Texas, which gets hit with more tornadoes than any other state (155 between 1991 and 2010), but mostly because it’s so big. If you correct that to look at a rate over 10,000 square miles, it dips to 5.9….well below Florida (12.2) and Kansas (11.7).

Florida coming in ahead of Kansas surprised me, but this is where strength of tornadoes comes in. Apparently Florida has lots of weak tornadoes. Looking at only strong to violent tornadoes only, we get this:

The NOAA also breaks down risk by month, so I decided to take a look and see what the risk in Nebraska was for September:

I think I can reassure the kiddo that mommy is going to be just fine. Apparently if you want to go to the middle of America but avoid tornadoes, fall is a pretty good bet.

Of course after we got the numbers down, we went to YouTube and started watching storm chaser videos. While he thought those were fascinating, he did have a reassuring number of questions along the lines of “mama, why did the people in the video see the tornado but not run away?”. Good impulse kid. Also, continuing his mother’s habit of rampant anthropomorphizing, he informed me that this video made him “very sad for the trees” (see the 35-40 second mark):