5 Things About Extroverts

Last week I gave a run down of all the interesting stuff I found out about introverts, so naturally this week is going to be about extroverts. Since extroverts are the opposite of introverts, much of what I said last week still applies (or applies in reverse): extroverts tend to need more stimulation from their environment. While this is often phrased as “they get their energy from people”, that’s not entirely true. Being extroverted does not mean social interaction trumps sleep, food, water, or that you can’t get sick of people (all things I’ve heard people claim). So what is true of extroverts? Let’s take a look:

  1. I’ve been spelling “extrovert” wrong, and apparently Jung would be annoyed Before I wrote my post last week, I tried to look up “extrovert vs extravert” to see which was the correct spelling. It turns out that the debate about this runs a little deeper than I thought. While my spellchecker insists that “extrovert” is correct, Carl Jung (the guy who invented the whole concept) felt strongly it should be “extravert”. This was based on the Latin root and the actual definition he was going for. I’m going to stick with the one that makes my spell checker calmer, but it’s worth noting that we probably should be using “extravert”.
  2. There may be two types of extroverts Just like introversion, it turns out extroversion may not be a monolith. The two types (agenetic and affiliative) are described here, but basically they boil down to “social leadership” and “social warmth”. The first one has a lot to do with going after rewards, and the second one just wants to hang out with everyone. They are correlated, but some people have more of one than the other. Think the person who wants to be in charge of every group vs the person who just wants to be in every group.
  3. The success of extroverts is kinda bimodal Despite all the rumors that being an extrovert is some sort of cultural ideal, it turns out it’s actually kind of a mixed bag. For example, if you go to Urban Dictionary and type in “introvert” and you get a thoughtful description of what an introvert is. Try the same with the word “extrovert” and you get “asshole who doesn’t know how to shut their goddamn mouth“.  I’m serious. In fact 5 out of the top 7 definitions of extroverts slam extroverts. Interestingly, 5 of the top 7 definitions of introverts ALSO slam extroverts. If the chronic complaint from introverts is that their strengths go unnoticed, then the equivalent extrovert complaint would probably be that their faults get a little too noticed. This makes a lot of sense….having attention on you is great if you’re good at something, but probably worse for you if you’re bad at something. Interestingly, this plays out with things like leadership. Leaders are more likely to be extroverts, but if you control for social skills there actually isn’t an extrovert advantage.
  4. Some extrovert “benefits” are just circular reasoning Okay, so here’s the extrovert bias introverts so often complain about. Many of the supposed benefits of being an extrovert come not from actual benefits, but by using some of the definition for extroversion as a definition for other things. For example, for years it was noted that extroverts were happier than introverts. Then it was finally noted that many of the tests that measured happiness did so by asking things like “do you have a lot of friends?”, which is also a question used to determine if you’re an extrovert or not.  This works in the negative direction too. When you hear criticisms of extroverts, it’s often things like “they hog the spotlight” (random example here). However “do you like to be the center of attention” is a pretty frequently used question on personality tests, and it makes complete sense that people who say “yes” to that would end up spending more time as the center of attention than those that say “no”. I think this is important because sometimes I hear this get referenced as though personality tests were objective neurological tests, but they are really all rating and self assessment. The same answers that landed you in one category or another tend to persist even when you’re not taking the test.
  5. Test taking is often biased against them So if personality and psychological tests favor extroverts, then extroverts must really love test taking, right? Well, not all tests. It turns out that our most common testing environments (ie quiet rooms with no ambient noise) actually are biased against extroverts. Because of their need for stimulation, some research has found that extroverts actually perform better on tests when there is noise present. Unsurprisingly, introverts are the opposite, and ambiverts are in the middle: In school settings this is an obvious disadvantage, but in real life may explain why some professions end up extrovert dominated. In many settings, you actually will have to make your toughest calls while there is a lot of noise and chaos around you. By the way, there’s a rather persistent rumor (normally stated in the form of “introverts think more deeply”) that extroverts are less intelligent than introverts. Actually the most recent research says extroverts have a tiny advantage here, but the correlation on that is pretty shaky, and depends heavily on exactly how intelligence is measured.  There’s some suggestion that the high IQ (>160) may lean introvert, but that’s a really small slice of the population and wouldn’t be enough to move the dial.

So there you have it! Next week I may try to take on ambiverts, who can’t make up their mind about anything.

The Power of Denominators: Planned Parenthood Edition

Content note: Big contentious political issues ahead. Proceed with care. As with most of my posts, the intent here is not to take a stance on a political issue, but rather to discuss the ways numbers are used to talk about them.

Last week I got tagged in a rather interesting Facebook discussion about abortion and Planned Parenthood. It centered around this video from the group LiveAction, that focused on debunking the “abortion is only 3% of what Planned Parenthood does”.

What stuck out to me about this video (and the associated Slate and Washington Post articles it referenced) is that despite the contentious issue being addressed, this is fundamentally a debate about denominators. No one seems to question the numerator here….Planned Parenthood readily states that they performed 323,999 abortions in fiscal year 2014-2015. What’s up for debate is what you divide that by to get an accurate picture of their business, and what questions those denominator choices answer.  There are a couple of options here:

  1. Number of billed procedures or “discrete clinical interactions” Every year, Planned Parenthood provides 10.6 million different types of services in it’s clinics. This is the denominator used to get the 3% figure. As the video above (and the Slate and Washington Post article) point out, a pregnancy test, abortion, STI screening and follow up contraception prescription would count as 4 separate line items, despite not being even remotely equal in time, cost, or overall impact. What this number does answer is “what does Planned Parenthood do other than abortion?”.
  2. Pregnancy services provided The Washington Post article that investigated the 3% claim also investigated the claim by the Susan B Anthony foundation that 94% of “pregnancy services provided by Planned Parenthood” were abortions. To get this number, they took the number of services offered exclusively to pregnant women: abortions, prenatal services and adoption referrals. Those last two categories total a little over 20,000/year, so you end up with a denominator of 344,000 or so. This gets you to 94%. This number answers the question “what does Planned Parenthood do exclusively for women who present at the clinic already pregnant?”. I keep repeating exclusively because there’s no way of seperating out pregnancy tests or STI screenings for pregnant vs non-pregnant women.
  3. Amount of revenue Another way of calculating the percent of a business is calculating the percent of revenue derived from that one service. The Washington Post attempts to crunch these numbers based on published rates, and comes up with something in the 15-37% range. Since Planned Parenthood does not actually publish this data, there are a lot of assumptions built in. Essentially though, this is the number of procedures times the approximate cost per procedure divided by total PP revenues. The approximations are difficult to make mostly because costs vary and Planned Parenthood tends to have a sliding scale for those who can’t afford the full cost. This number is probably closer to what most people think of as “percent of business”.
  4. Number of abortions in the country I’ll come back to this one later, but The Blaze article notes that if you use the denominator of “total abortions performed in the USA” you find the Planned Parenthood performs a little over 30% of abortions. This answers the question “what percentage of abortions are actually performed at Planned Parenthood”.
  5. Number of patients In the LiveAction video, it is noted that Planned Parenthood saw about 2.7 million patients. This means about 1 out of every 8 patients seen by Planned Parenthood in a year got an abortion in that year. This is a stat to be careful with because people can have multiple visits, so this does not answer the question “what are the chances a person walking in to a Planned Parenthood clinic is there to have an abortion”, but rather “what percent of all patients had an abortion in a given year”. It should be noted that the assumption here is that no one got more than one abortion in a year. That is probably mostly, but not entirely, true.
  6. Number of total clinic visits Finally we get to the number of overall visits. This number is given at 4.6 million, and for my money is probably the most accurate representation of “what percent of their business is abortion”. This comes out to about 7% of visits per year, but if you count follow up visits (which may or may not occur), it could be up to 14%. This answers the question “what are the chances that a person walking in to Planned Parenthood is there to have an abortion”.

Some quick notes on this data: all of this was from other sources, I didn’t crunch any numbers myself. Since the original Blaze article didn’t quibble with any of Planned Parenthood’s published data, I took it as is. I also switched back and forth a few times between the 2013 data and 2014 data, so some numbers may be slightly off.

So overall, what do I think? Well, as you can see, denominators matter. For a less contentious issue, parsing this data would be purely a matter of intellectual debate, and no one would really care that much. When it comes to something like abortion however, the stakes are raised. Changing the denominator you use is inherently a political statement, as you change the ability of your data to answer a particular question.

Interestingly, I don’t think any of this data answers the real question. To me, the crux of the issue is something along the lines of “why is Planned Parenthood so important”? This is not answered by any of the above data. While they certainly perform a lot of abortions, they don’t perform the majority of them. So why all the focus on their business model?

Basically I think it comes down to political organization. I couldn’t find good data on where the other 2/3rds of abortions are performed, but my guess is they are probably independent doctors or clinics that have nowhere near the organizational or advocacy power of Planned Parenthood. Even if Planned Parenthood doesn’t perform those abortions, I think both sides probably agree they make it easier for the groups that do the procedures to continue their practices. By drawing the political fire and filing the lawsuit challenges themselves, Planned Parenthood ends up with an impact that is felt by everyone but would be nearly impossible to quantify in numbers. Additionally, many Planned Parenthood clinics are intentionally built in areas without easy access to other similar services. How much of this business would be picked up by other doctors/clinics/hospitals if Planned Parenthood closed is debatable. Whether or not that’s a good thing depends almost entirely on your pre-existing political beliefs.

As much as I love numbers, it’s important to remember the limits of data.   Any time someone rattles off a statistics, a helpful first question is “does that answer the question we’re really asking?”. Not all important issues can be quantified, and not all statistics hit the heart of the issue. Most important, very few people have ever (or should ever) change a profound moral conviction because of a denominator choice. In the immortal words of Andrew Lang:  “try not to use statistics as a drunken man uses lamp-posts, for support rather than for illumination”.

The Signal and the Noise: Chapter 12

This chapter had some great stuff about climate change and models of climate change. One of the more interesting parts (to me) was the review of the motivations of various countries when it comes to climate change treaties. Not every country arrives at the table on the same page, no matter what their leaders believe:


5 Things About Introverts

I am fascinated by personality testing. Myers-Briggs, Big 5, Enneagram, Buzzfeed quiz, yes please. I’ll take them. There’s something about assigning humanity to little boxes that just, I don’t know, appeals to me. Maybe that’s the ENTJ in me, or my moderate conscientiousness, or the fact that according to this quiz I’m a sea monster. Given this, I realized it was high time I did a bit of a research roundup on some of the better known facets of personality testing. This week I’m taking on introverts, and if all goes well next week will be extroverts.

A few things up front: first, on introvert/extrovert scales, I score right in the middle. This makes me one of the dreaded “ambiverts” who apparently can’t make up their minds. Second, while the definition of introvert is sometimes a little lacking (see point #1 below) it’s generally defined as someone who gets their energy from being alone. With the rise of the internet, introverts started kind of having a moment, and there’s been a rash of books/memes/Buzzfeed lists about how unappreciated they all are. So what’s going on here, and what does the research say?

  1. Introversion doesn’t always have a definition One of the first rather odd things I learned about introverts is that the most commonly used academic  definition is….”not an extrovert”. For example, in the Big Five Personality scale “introversion” is not technically a trait but “low extraversion” is. This may not seem like a big deal, but it can mean that we are lumping different things under “introvert” that may not necessarily be similar to one another. As introversion has become more trendy, I have seen more and more people lump normal social or physical limitations under “introversion”. For example, a rather extroverted friend of mine recently announced she was pretty sure she was actually an introvert. When asked why she thought this, she mentioned that she had been out 3 different nights the week before and that by the weekend she had been too exhausted to go to another party. When I inquired if maybe this was simply lack of sleep, she responded “but extroverts get their energy from people, so I should have been fine!”.  No. People get their energy from rest. Almost no one can substitute human interaction for sleep too often and feel good about it. Wanting to sleep isn’t “introverted” merely because you’re not socializing while you do it.
  2. There may be 4 types of introversion  When psychologists started actually looking in to introversion, they developed a theory that there may actually be 4 types of behavior we’ve been lumping under “introvert”: social introversion, thinking introversion, anxious introversion and restrained introversion. This was a helpful list for me, as I am moderately socially introverted (I prefer small groups), highly introverted in my thinking, but I have very little social anxiety and I’m not very restrained. Thus it makes sense that I strongly resonate with some descriptions of introversion and not others. The social anxiety piece can also be important to recognize as a separate category. I have a few friends who thought they were introverted when they were in high school only to discover that they really just didn’t like their classmates. While most introverts fight the perception that introversion = shyness, it’s probably good to note that most shy or socially anxious people will end up self selecting as introverts.
  3. Stimulation matters The four categories mentioned in #2 are still in the research phase, but there are other ways of looking at introversion as well. Some of the very first literature on introversion (and extroversion) actually defined it as an aversion to (or need for) extra environmental stimulation.   I like this framing a bit better than the social framing, because it includes things like loud noises or fast music or why coffee only helps extroverts (basically it increases your sensitivity to stimulation, which is the last thing most introverts need when they’re trying to get things done). This explains why I’ve occasionally had introverted coworkers complain that I talked to much, even when I was studiously avoiding talking to them, or why an introvert I mentioned this to always has to tell her (extrovert) husband to shut the TV off. Social situations may not be taxing because of social issues, but rather just the stimulation of hearing so many people talk.
  4. This can lead to some judginess With all the recent attention on introverts in the workplace, it’s interesting to note that there’s some evidence that  introverts actually judge extroverts more harshly than the other way around. In some studies done by Florida State University, they found introverted MBA students were more likely to give low marks to extroverted students, recommended they get lower bonuses, and declined to recommend them for promotions. This was true even when they manufactured the scenarios and controlled for performance. The extroverts in the study awarded bonuses/promotions/high marks much more in line with actual performance on the tasks and did not penalize introverts. The researchers hypothesize that due to the stimulation issue (#3) introverts may just have a harder time working with extroverts regardless of their competence. I also have to wonder if there’s a bit of the Tim Tebow Fallacy going on here….with all the press about how extroverts do better in business, many introverts (especially in MBA programs, as these research subjects were) could feel that by marking extroverts down they are balancing the scales a bit. We don’t know how this works in the general population, but it is worth keeping in mind.
  5. Introverts may (wrongly) think they’re the minority  There’s a bit of confusion over what percentage of the population is introverted….which is not particularly surprising when you consider the weird definitions we considered in #1-#3. At this point though, most estimates put it at about 50% (unless you consider “ambivert” a category). So why do introverts tend to feel outnumbered?  Well, it’s a statistical quirk called the majority illusion. Basically, because extroverts are more likely to have lots of friends, people are more likely to be friends with lots of extroverts. This artificially skews the perception of the numbers, and leaves people with the impression that they know more extroverts because there are more extroverts. So introverts, take heart. There are more of you out there than you think.

Come back next week and we’ll take a look at extroverts!

On Outliers, Black Swans, and Statistical Anomolies

Happy Sunday! Let’s talk about outliers!

Outliers have been coming up a lot for me recently, so I wanted to put together a few of my thoughts on how we treat them in research. In the most technical sense, outliers are normally defined as any data point that is far outside the expected range for a value. Many computer programs (including Minitab and R) automatically define an outlier as a point that lies more than 1.5 times the interquartile range outside the interquartile range as an outlier. Basically any time you look at a data set and say “one of these things is not like the others” you’re probably talking about an outlier.

So how do we handle these? And how should we handle these? Here’s a couple things to consider:

  1. Extreme values are the first thing to go When you’re reviewing a data set and can’t review every value, almost everyone I know starts by looking at the most extreme values. For example, I have a data set I pull occasionally that tells me how long people stayed in the hospital after their transplants. I don’t scrutinize every number, but I do scrutinize every number higher than 60. While occasionally patients stay in the hospital that long, it’s actually equally likely that some sort of data error is occurring. Same thing for any value under 10 days….that’s not really even enough time to get a transplant done. So basically if a typo or import error led to a reasonable value, I probably wouldn’t catch it. Overly high or low values pretty much always lead to more scrutiny.
  2. Is the data plausible? So how do we determine whether an outlier can be discarded? Well the first is to assess if the data point could potentially happen. Sometimes there are typos, data errors, someone flat out misread the question, or someone’s just being obnoxious. An interesting example of implausible data points possibly influencing study results was in Mark Regenerus’ controversial gay parenting study. A few years after the study was released, his initial data set was re-analyzed and it was discovered that he had included at least 9 clear outliers….including one guy who reported he was 8 feet tall, weighed 88 lbs, had been married 8 times and had 8 children. When one of your outcome measures is “number of divorces” and your sample size is 236, including a few points like that could actually change the results. Now, 8 marriages is possible, but given the other data points that accompanied it, they are probably not plausible.
  3. Is the number a black swan? Okay, so lets move out of run of the mill data and in to rare events. How do you decide whether or not to include a rare event in your data set? Well….that’s hard to. There’s quite a bit of controversy recently over black swan type events….rare extremes like war, massive terrorist attacks or other existential threats to humanity. Basically, when looking at your outliers, you have to consider if this is an area where something sudden, unexpected and massive could happen to change the numbers. It is very unlikely that someone in a family stability study could suddenly get married and divorced 1,000 times, but in public health a relatively rare disease can suddenly start spreading more than usual. Nicholas Nassim Taleb is a huge proponent of keeping an eye on data sets that could end up with a black swan type event, and thinking through the ramifications of this.
  4. Purposefully excluding or purposefully including can both be deceptive In the recent Slate Star Codex post “Terrorist vs Chairs“, Scott Alexander has two interesting outlier cases that show exactly how easy it is to go wrong with outliers. The first is to purposefully exclude them. For example, since September 12th, 2001, more people in the US have been killed by falling furniture than by terrorist attacks. However, if you move the start line two days earlier to September 10th, 2001, that ratio completely flips by an order of magnitude. Similarly, if you ask how many people die of the flu each year, the average for the last 100 years is 1,000,000. The average for the last 97 years? 20,000.  Clearly this is where the black swan thing can come back to haunt you.
  5. It depends on how you want to use your information Not all outlier exclusions are deceptive. For example, if you work for the New York City Police Department and want to review your murder rate for the last few decades, it would make sense to exclude the September 11th attacks. Most charts you will see do note that they are making this exclusion. In those cases police forces are trying to look at a trends and outcomes they can affect….and the 9/11 attacks really weren’t either. However, if the NYPD were trying to run numbers that showed future risk to the city, it would be foolish to leave those numbers out of their calculations. While tailoring your approach based on your purpose can open you up to bias, it also can reduce confusion.

Take it away Grover!

What I’m Reading: September 2016

My book for this month was supposed to be “In Pursuit of the Unknown: 17 Equations That Changed the World“, but I actually read that a few months ago.  It was really good though….an interesting history of mathematical thought, and how some of the famous equations you here about actually influenced human development.

This article is a few years old, but it covers the role of “metascience” in helping with the replication crisis. Metascience is essentially the science of studying science, and it would push researchers to study both their topic AND how experiments about their topic could go wrong. The first suggestion is to have another lab attempt replication before you publish, so you can include possible replication issue in your actual paper right up front.

….and here’s a person actually doing metascience! Yoni Freedhoff, MD is one of my favorite writers on the topic of diet and obesity. He has  a great piece up for the Lancet on everything that’s wrong with research in those areas.

On another note entirely….for reasons I’m not even going to try to explain, I am now the proud owner of a Tumblr dedicated to rewriting famous literary quotes to include references to Pumpkin Spice Lattes. It’s called Pumpkin Spiced Literature (obviously), and it’s, um, an acquired taste.

Okay, a doctoral thesis focused on how politicians delete their Tweets is kind of awesome. And yes, Anthony Weiner gets a mention by name. Related: a model of alcoholism that takes Twitter in to account.

On the one hand, I am intrigued by this app. On the other hand, I get sad when people want to shortcut their way out of building better problem solving skills.

This Vanity Fair article about the destruction of Theranos and the downfall of Elizabeth Holmes is incredible, fascinating and a little sad. Particularly intriguing was the quote that undid her: “a chemistry is performed so that a chemical reaction occurs and generates a signal from the chemical interaction with the sample, which is translated into a result, which is then reviewed by certified laboratory personnel.” That made WSJ reporter John Carreyou sit up and say “Huh, I don’t think she knows what she’s talking about”. Seems obvious, but he apparently was the only reporter to figure that out.

Underrated political moment of last week: the New York Times wrote a story about Gary Johnson’s “What is Aleppo?” moment, then has to issue two corrections when it turns out they’re actually not entirely sure what Aleppo is either. 


5 Possible Issues With Genomics Research

Ah, fall. A new school year, and a new class on my way to finish this darn degree of mine. This semester I’m taking “Statistical Analysis of Genomics Data”, and the whole first was dedicated to discussing reproducible research. As you can imagine, I was psyched. I’ve talked quite a bit about reproducible research, but genomics data has some twists I hadn’t previously considered.  In addition to all the usual replication issues, here are a few issues that come up when you try to replicate genomics studies:

  1. Different definitions of “raw data” In the paper “Repeatability of published microarray gene expression analyses” John Ioannidis et al attempted to reproduce one figure from 18 different papers that used microarray data. They succeeded on 2 of them. The number one reason for failure to replicate? Not being able to access the raw data that was used. In most cases the data had been deposited (as required by the journal) but it had not really been reviewed to see if it was just summary data or even particularly identifiable. Six out of 18 research groups had deposited data that you couldn’t even attempt to use, and other groups had data so raw it was basically useless. Makes me shudder just to think about it.
  2. Large and unwieldy data files Even in papers where the data was available, it was not always useable. Ioannidis et al had trouble reproducing the about 8 papers due to unclear data decisions. Essentially the files and data were there, but they couldn’t figure out how someone actually had waded through them to produce the results they got. To give you a picture of how big these data files are, my first homework for this class required a “practice” file that was 20689×37….or almost 800,000 data points. Unless that data is very well labeled, you will have trouble recreating what someone else did.
  3. Non-reproducible workflow Anyone who’s ever attempted to tame an unweildy data set knows it’s a trek and a half. I swear to god I have actually emerged from my office sweating after one of those bouts. That’s not so terrible, but what can kick it to the seventh circle of hell is finding out there was an error in the data set and now you have to redo the whole thing. In 8 of the papers Ioannidis et al looked at, they couldn’t figure out what the authors actually did to generate their figures. Turns out, sometimes authors can’t figure out what they did to generate their figures….which is why we end up with videos like this:

    All that copy/pasting and messing around is just ASKING for an error.

  4. Software version changes Another non-glamorous way things can get screwed up: you update your software part way through and the original stuff you wrote gets glitchy. This is an enormous headache if you notice it, and a huge issue if you don’t. 2 of the papers Ioannidis et al looked at didn’t include software version and couldn’t be reproduced. R is the most commonly used software for things like this and it’s open source, so updates aren’t always compatible with each other.
  5. Excel issues Okay, so you loaded your data, made a reproducible workflow, figured out your version of R, and now you are awesome right? Not necessarily. It turns out that Excel, one of the most standard computer programs on the planet, can seriously screw you up. In a recent paper, it was discovered that 20% of all genomics papers with Excel data files had inadvertently converted gene names to either dates or floating point numbers. This almost certainly means those renamed genes didn’t end up being included in the final research, but what effect that had is unknown. Sadly, the rate of this error is actually increasing by about 15%. Oof.

I am tempted to summarize all this by saying “Mo’ Data Mo’ Problems”, but…..no, actually, that sounds about right. Any time you can’t actually personally review all the data, you are putting your faith in computer systems and the organization of the files. Good organization is key, and it’s hard to focus on that when you’re wading through data files. Semper vigilans.