Analyzing Happiness via Social Media

Happy Wednesday everyone….or is it?

I stumbled across a new-to-me study recently: “Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter“, and I’ve been fascinated by it ever since. It’s a cumbersome name for an interesting study that analyzed Twitter posts to determine if there’s any pattern to when we express certain types of feelings on social media. For example, use of the word “starving” rises as you approach typical meal times, and falls once those times pass:

This data is fascinating to me because it gives some indication of where social media reflects reality, and some ideas of where it might not. For example, it appears the word starving is not often used at breakfast, but is used quite a bit for lunch. I don’t know that people are really the hungriest right before lunch, but it appears they may be most likely to want to express feelings of hunger at that time. I am guessing this is because people may have less control over when they get to eat (being at work, running around with errands, etc) and thus may get more agitated about it.

The researchers decided specifically to look at happiness as expressed through social media posts. They tracked this on a day to day basis, and the decided to figure out which days of the week were the happiest ones. Turns out Wednesday’s not looking so good:

I know the running joke is about Monday, but it’s interesting to note that Tuesday fared  the worst on this ranking. I suspect that’s related to the fact that Monday’s may instill the most dread in people, but aggravation you want to express may need a day or two to build up. Of course if you look at the overall scale, it’s not clear how much of a difference a score of 6.025 vs 6.08 really makes, but I’ll roll with it for now.

That havg on the y-axis there kind of an interesting number. They pulled out a lot of commonly used Twitter words, then asked people on Mechanical Turk to rate how happy each word made them on a scale of 1 to 9. Here’s how some common words fared:

I love that they ranked “the” and “of”, and was interested to see that vanity was more highly rated than greed.

Interestingly, in  order to keep their data clean, the researchers also excluded a few days that produced noticeable changes in happiness measures. Some of these were for obvious reasons (like holidays, days of natural disasters), but some were kind of funny. For example, they noted that May 24, 2010 as an unusual date because:

the finale of the last season of the highly rated television show ‘Lost’, marked by a drop in our time series on May 24, 2010, and in part due to the word ‘lost’ having a low happiness score of havg = 2.76, but also to an overall increase in negative words on that date.”

This of course shows an interesting weakness in social media studies….you always risk counting things that shouldn’t be counted. Additionally, you may give more credit to certain days than they deserve. For example, Saturday got a boost because of the high rankings of the word “party” and “wedding”, both of which are mostly held on Saturdays.

As social media continues to dominate our lives, I’m sure we’ll see progressively more research like this. Always interesting to consider the possible insights vs ways it can be misleading. Good luck with Wednesday folks, Saturday’s right around the corner.

Penguin Awareness Day and Extinction Threat Levels

Sometimes Twitter teaches me the most interesting things. Apparently yesterday was Penguin Awareness Day, which I found out when someone I knew retweeted this:

(Link if embed doesn’t work)

I was intrigued by the color coding under each, and was rather curious what the difference between “endangered”, “vulnerable” and “near threatened” were. Since I’m always on the lookout for faux classifications, I was wondering if those were random categories, or if they had some sort of basis.

Turns out, it’s actually the latter! This is probably well known to anyone in to conservation, but the classification system is actually put out by the International Union for Conservation of Nature.  It looks like this:

They publish a rather extensive document detailing each category, and apparently they update this document every couple years.  The entire goal of this classification system was to bring some rigor to the process of assessing different species populations, and they have some interesting guidelines.

For example, if a species population drops due to known and/or reversible causes, the size of the drop dictates their status. A drop of >90% in 10 years (or 3 generations) gets you labeled critically endangered, >70% gets you labeled endangered, and a drop of >50% gets you a “vulnerable” label. “Near threatened” doesn’t have a number, but would apply if there was growing concern/problems that didn’t meet any of the other criteria. They play out some other scenarios here. All of the criteria include numbers plus ongoing threats, so there are a few different cases for each.

For example, a critically endangered species could have <250 mature individuals + a threatened habitat, or <50 mature individuals with no threat. For endangered animals, those numbers are 2500 and 500 respectively, and for vulnerable animals it’s 10,000 and 1000. I was interested to note that they include quantitative models as a valid form of forecasting extinction.

Anyway, whether you agree with the criteria or not, it was nice to know that someone’s actually tried to define these terms in a transparent way that anyone can read up on.  Hopefully that means these guys will be okay:


5 Things About the Perfect Age

When people ask me to explain why I got degrees in both family therapy and statistics, my go to answer is generally that “I like to think about how numbers make people feel.” Given this, I was extremely interested to see this article in the Wall Street Journal this weekend, about researchers who are trying to figure out what people consider the “perfect” age.

I love this article because it’s the intersection of so many things I could talk about for hours: perception, biases, numbers, self-reporting, human development, and a heavy dose of self-reflection to boot.

While the researchers haven’t found any one perfect age, they do have a lot of thought provoking commentary:

  1. The perfect age depends on your definition of perfect Some people pick the year they had the most opportunities, some the year they had the most friends, some the years they had the most time, others the year they were the happiest, and other the years they had a lot to reflect on. Unsurprisingly, different definitions lead to different results.
  2. Time makes a difference Unsurprisingly, young people (college students) tend to say if they could freeze themselves at one age, it would be sometime in their 20s. Older people on the other hand name older ages….50 seems pretty popular. This makes sense as I suspect most people who have kids would pick to freeze themselves at a point where those kids were around
  3. Anxiety is concentrated to a few decades One of the more interesting findings was that worry and anxiety were actually most present between 20 and 50.  After 50, well-being actually climbed until age 70 or so. The thought is that generally that’s when the kids leave home and people start to have more time on their hands, but before the brunt of major health problems hits.
  4. Fun is also concentrated at the beginning and end of the curve Apparently people in the 65 to 74 age range report having the most fun of any age range, with 35 to 54 year olds having the least. It’s interesting that we often think of young people as having the “fun” advantage due to youth and beauty, but apparently the “confusion about life” piece plays a big part in limiting how fun those ages feel. Sounds about right.
  5. How stressed you are in one decade might dictate how happy you are in the next one This is just me editorializing, but all of this research really makes me wonder how our stress in one decade impacts the other decades. For example, many parents find the years of raising small children rather stressful and draining, but that investment may pay off later when their kids are grown. Similar things are true of work and other “life building” activities. Conversely, current studies show that men in their 20s who aren’t working report more happiness than those in their cohort who are working….but one suspects by age 40 that trend may have reversed. You never know what life will throw at you, but even the best planned lives don’t get their highs without some work.

Of course after thinking about all this, I had to wonder what my perfect age would be. I honestly couldn’t come up with a good answer to this at the moment, especially based on what I was reading. 50 seems pretty promising, but of course there’s a lot of variation possible between now and then. Regardless, a good example of quickly shifting opinions, and how a little perspective tweak can make a difference.

Magnitude Problems, Now With Names

In my last blog post, I put out a call for name ideas for a particular “potentially motivated failure to recognize that the magnitude of numbers matters” problem I was seeing, and man did you all come through! There were actually 3 suggestions that got me excited enough that I wanted to immediately come up with definitions for them, so I now have 3 (actually 4) new ways to describe my problem. A big thanks to J.D.P Robinson, Korora, and the Assistant Village Idiot for their suggestions.

Here are the new phrases:

Hrair Line: a somewhat arbitrary line past which all numbers seem equally large

Based on the book “Watership Down” where characters use the word “hrair” to mean “any number greater than 4”.  We all have a line like this when numbers get big enough….I doubt any of us truly registers the difference between a quadrillion and a sextillion unless we encounter those numbers in our work. Small children do this with time (anything other than “right now” is “a long time”), and I’d guess all but the richest of us do this with money (a yearly salary of $10 million and $11 million are both just “more than I make” to me). On it’s own, this is not necessarily a bad thing, but rather a human tendency to only wrap our heads around the number values that matter most to us. This tendency can be misused however, which is where we get….

The Receding Hrair Line: The tendency to move one’s hrair line based on the subject under discussion, or for one group and not another, normally to benefit your argument

Also known (in my head) as the Soros/Koch brothers problem. Occasionally you’ll see references to charitable gifts by those controversial figures, and it’s always a little funny to see how people perceive those numbers based on their pre-conceived feelings about Soros/Koch. I’ve seen grants of $5000 called “a small grant” or be credited with helping fund the whole organization. You could certainly defend either stance in many cases, but my concern is that people frequently seem to start from their Soros/Koch feelings and then bring the numbers along for the ride. They are not working from any sort of standard for what a $5000 grant means to a charity, but rather a standard for what a George Soros or Koch brothers gift means and working backwards. This can also lead too….

Mountain-Molehill Myopiathe tendency to get so fixated on an issue that major changes in magnitude of the numbers involved do not change your stance. Alternatively, being so fixated on an issue that you believe that any change to the number completely proves your point.

A close relative of number blindness, but particularly focused on the size of the numbers. Taking my previous Soros/Koch example, let’s say someone had defend the “a $5000 grant is not a big deal” stance. Now let’s say that there was a typo here, and it turned out that was a $50,000  or a $500 grant. For most people, this would cause you to stop and say “ok, given this new information, let me rethink my stance”. For those suffering from Mountain-Molehill Myopia however, this doesn’t happen. They keep going and act like all their previous logic still stands. This is particularly bizarre, given that most people would have no problem with you pausing to reassess given new information. All but the most dishonest arguers are going to hold you accountable for previous logic if new information comes up. The refusal to do so actually makes you more suspect.

The alternative case here is when someone decides that a small change to the numbers now means EVERYTHING has changed. For example, let’s say the $5000 turns out to be $4900 or $5100. That shouldn’t change anything (unless there are tax implications that kick in at some level of course), but sometimes people seriously overreact to this. You said $5000 and it turns out it was $4900, this means your whole argument is flawed and I automatically win.

There is clearly a sliding scale here, as some changes are more borderline. A $5000 grant vs a $2000 grant may be harder to sort through. For rule of thumb purposes, I’d say an order of magnitude change requires a reaction, and less than that is a nuanced change. YMMV.

Now, all of these errors can be annoying in a vacuum, but they get worse when onlookers start jumping in. This is where you get…..

Pyrgopolynices’ numbers: Numbers that are wrong or over-inflated, but that you believe because they are supported by those around you due to tribal affiliations rather than independent verification

Based on the opening scene of  Plautus’  Braggart Soldier, Korora provided me with the context for this one (slightly edited from the original comment):

…the title character’s parasītus , or flatterer-slave, is repeating to his master said master’s supposed achievements on the battlefield:

Artotrogus:. I remember: One hundred fifty in Cilicia. A hundred in Scytholatronia*, thirty Sardians, sixty Macedonians. Those are the men thou slewest in one day.
Pyrgopolynices: How many men is that?
Artotrogus: Seven thousand.
Pyrgopolynices: It must be as much. [Thou] correctly hast the calculation.

*there is no such place

After reading this I got the distinct feeling that we did away with flatterer-slaves, and replaced them with social media.

As someone who likes to correct others numbers, you’d think I’d be all about chiming in on Facebook/Twitter/whatever  conversations about numbers or stats, but I’m not. Starting about 3 years ago, I stopped correcting anyone publicly and started messaging people privately when I had concerns about things they posted. While private messages seemed to get an amiable response and a good discussion almost 90% of the time, correcting someone publicly seemed to drive people out of the woodwork to claim that those numbers were actually right. Rather than acknowledge the error as they would privately, my friends would then turn their stats claims in to Pyrgopolynices’ numbers….numbers that people believed because other people were telling them they were true. Of course those people were only telling them they were true because someone on “their side” had said them to begin with, so the sense of check and balances was entirely fictitious.

Over the long term, this can be a very dangerous issue as it means people can go years believing certain things are true without ever rechecking their math.

That wraps it up! Again, thank you to J.D.P Robinson for mountain-molehill myopia, AVI for throwing the word “hrair” out there, and Korora for the backstory on Pyrgopolynices’ numbers. In related news, I think I may have to start a “lexicon” page to keep track of all of these.

The (Magnitude) Problem With No Name

As most of you know, I am a big fan of amusing myself by coining new names for various biases/numerical tomfoolery I see floating around on the internet. I have one that’s been bugging me for a little while now, but I can’t seem to find a good name for it. I tried it out on a bunch of people around Christmas (I am SUPER fun at parties guys), but while everyone got the phenomena, no one could think of a pithy name. Thus, I turn to the internet.

The problem I’m thinking of is a specific case of what I’ve previously called Number Blindness  or “The phenomena of becoming so consumed by an issue that your cease to see numbers as independent entities and view them only as props whose rightness or wrongness is determined solely by how well they fit your argument”. In this case though, it’s not just that people don’t care if their number is right or wrong, it’s that they seem oddly unmoved by the fact that the number they’re using isn’t even the right order of magnitude. It’s as though they think that any “big” number is essentially equal to any other big number, and therefore accuracy doesn’t matter any more.

For example, a few weeks ago Jenna Fischer (aka Pam from the Office) got herself in trouble by Tweeting out (inaccurately) that under the new tax bill teachers could no longer deduct their classroom expenses. She deleted it, but while I was scrolling through the replies I came across an exchange that went something like this:

Person 1: Well teachers wouldn’t have to buy their own supplies if schools stopped paying their football coaches $5 million a year

Person 2: What high school pays their coach $5 million a year?

Person 3: 28 coaches in Texas make over $120,000 a year.

Person 2: $120,000 is not $5 million.

Person 3: Well that’s part of an overall state budget of $20-25 million just for football coaches. (bs king’s note: I couldn’t find a source for this number, none was given in the Tweet)

Person 2: ….

Poor person 2.

Now clearly there was some number blindness here….person 1 and 3 only seemed to care about the idea that numbers could support their cause, not the accuracy of said numbers. But it was the stunning failure to recognize order of magnitude that took my breath away. How could you seriously reply to a comment about $5 million dollar salaries with an article about $120,000 dollar salaries and feel you’d proved a point? Or respond to a second query with an overall state budget, which is an order of magnitude higher than that? It’s like some sort of big number line got crossed, and now it’s all fair game.

I suspect this happens more often the bigger the numbers get….people probably drive astronomers nuts by equating things like a billion light years and a trillion light years away. Given that I’ve probably done this I won’t get too cocky here, but I would like a name for the phenomenon. Any thoughts are appreciated.

Recreational Quantification

On my recent post about hot drinks and esophageal cancer, Gringo made a comment about how quickly his Yerba Mate cooled down in the summer (30 minutes) vs winter (10-15 minutes). I was struck by this, because I find random numerical trivia about people’s daily life quite fascinating. I think this is mostly because many people don’t actually keep track of stuff like this, or if they notice it they don’t remember it.

While this phenomena is obviously probably related to numerical aptitude, I also think it’s probably related to something John Allen Paulos talks about. In an article about Stories vs Statistics, he posits that about 61% of people (update: he may have been joking with this number, there’s no source for it) see numbers as “rhetorical decoration” to stories, whereas the other 39% see numbers as “clarifying information”.

This reminded me of an exchange I had with my father last week when we were discussing how cold it was:

Dad: How are you surviving the cold down there?
Me: It’s been pretty chilly. I could tell it was cold because my walk from the train normally takes me 30 minutes, and this week I noticed it was taking 26 minutes without me consciously increasing my speed.
Dad: wow, that’s cold.
<5 more minutes of back and forth on walking speeds during various weather patterns, and how traffic lights/street crossings make the 4 minute time saving even more impressive>

I have come to understand that most people do not reach for anecdotes like this when they are trying to explain how cold it is, but it’s one of the best ways of communicating information like that to my Dad.

Interestingly, Paulos attributes this communication preference to our feelings towards Type 1 vs Type 2 errors. He posits that those who want to hear numbers are doing so because they are focused on avoiding Type 1 errors (seeing something that’s not there), and those who prefer stories are more interested in avoiding Type 2 errors (failing to see something that is there). I have no idea if he’s right about this, but personality typing based on statistical approaches is a thing I am totally on board with.

Anyway, I find myself counting and/or finding ways of quantifying all sorts of things as I go through life. Some of these are straightforward (I tracked my gas mileage for quite some time, I track my steps and resting heart rate, I have a particular obsession with hours of daylight), but some are a little more complex.

For example, every time I go to a concert, I always take note of the relative frequency of mixed gender groups vs male only groups vs female only groups. I started this because I attend a lot of concerts with my husband, and we got in a running discussion about “guy bands” vs “girl bands”. As I tried to quantify which was which, I realized that a strict gender breakdown sometimes hid information about the band’s core audience. AC/DC for example: the crowd there is 30-40% women, but almost all of the women are there with men. The number of male only groups was 3 to 4 times the number of female only groups. Interestingly, in many of the mixed gender groups there were more women than men, which is why the proportion was so high despite women not attending alone. Thus I put AC/DC in the category of a “guy band” that appeals to women, as opposed to a gender neutral band. In other words, it appears women are happy to attend, but only if someone else suggests it.

Since I started tracking this, I have seen two bands who appear to have truly equal gender appeal: Tom Petty and the Heart Breakers and Aerosmith.

The most male dominated concert I have ever been to was Judas Priest. The most female dominated concert was Ani Difranco. At neither of these concerts could I find a member of the minority gender unaccompanied by a member of the majority gender.

Another interesting breakdown is “couples concerts” or “date concerts” where you see very few people attending in mono-gender groups. TV on the Radio and a few other hipster bands I’ve seen appear to be like that. On the other side, when I went to see a Drag Queen Christmas, it was entirely the opposite. The audience was half male and half female, but since most of the men were (presumably) gay the groups that attended were mostly mono-gender.

All that being said, I’d be interested in hearing about random things that readers count/track/note when out and about, or your band examples. I understand I have rather idiosyncratic tastes in music, so I’d be interested in other examples.

5 Things to Know About Hot Drinks and Esophageal Cancer

Fun fact: according to CNN, on New Year’s Day 90% of the US never got above freezing.

Second fun fact: on my way in to work this morning I passed an enormous fire burning a couple hundred yards from where the train runs. I Googled it to see what was happened and discovered it was a gas main that caught on fire, and they realized that shutting the gas off (normal procedure I assume) would have made thousands of people in the area lose heat. With temps hitting -6F, they couldn’t justify the damage so they let the fire burn for two days while they figured out another way of putting it out.

In other words, it’s cooooooooooold out there.

With a record cold snap on our hands and the worst yet to come this weekend, I’ve been spending a lot of time warming up. This means a lot of hot tea and hot coffee have been consumed, which reminded me of a factoid I’d heard a few months ago but never looked in to. Someone had told me that drinking hot beverages was a risk factor for esophageal cancer, but when pressed they couldn’t tell me what was meant by “hot” or how big the risk was. I figured this was as good a time as any to look it up, though I was pretty sure nothing I read was going to change my behavior. Here’s what I found:

  1. Hot means HOT When I first heard the hot beverage/cancer link, my first thought was about my morning coffee. However, I probably don’t have to worry much. The official World Health Organization recommendation is to avoid drinking beverages that are over 149 degrees F. In case you’re curious, Starbucks typically servers coffee at 145-165 degrees, and most of us would wait for it to cool for a minute before we drank it.
  2. Temperature has a better correlation with cancer than beverage type So why was anyone looking at beverage temperature as a possibly carcinogen to begin with? Seems a little odd, right? Well it turns out most of these studies were done in part to rule out that it was the beverage itself that was causing cancer. For example, quite a few of the initial studies noted that people who drank traditional Yerba Mate had higher esophageal cancer rates than those who didn’t. The obvious hypothesis was that it was the Yerba Mate  itself that was causing cancer, but then they noted that repeated thermal injury due to scalding tea was also a possibility. By separating correlation and causation, it was determined that those who drink Yerba Mate (or coffee or other tea) at lower temperatures did not appear to have higher rates of esophageal cancer. Nice work guys.
  3. The risk has been noted in both directions So how big a risk are we looking at? A pretty sizable one actually. This article reports that hot tea drinkers are 8 times as likely to get esophageal cancer as those who drink tea at lower temperatures, and those who have esophageal cancer are twice as likely to say they drank their tea hot before they got cancer. When assessing risk, knowing both those numbers is important to establish a strong link.
  4. The incidence rate seems to be higher in countries that like their beverages hot It’s interesting to note that the US does not even come close to having the highest esophageal cancer rates in the world. Whereas our rate is about 4.2 per 100,000 people, countries like  Malawi have rates of 24.2 per 100,000 people. Many of the countries that have high rates have traditions of drinking scalding hot beverages, and it’s thought that combining that with other risk factors (smoking, alcohol consumption, poverty and poorly developed health care systems) could have a compounding effect. It’s not clear if scalding your throat is a risk in and of itself or if it just makes you more susceptible to other risks, but either way it doesn’t seem to help.
  5. There is an optimum drinking temperature According to this paper, to minimize your risk while maximizing your enjoyment, you should serve your hot beverages at exactly 136 degrees F. Of course a lot of that has to do with how quickly you’ll drink it and what the ambient temperature is. I was pretty impressed with my Contigo thermos for keeping my coffee pretty hot during my 1.5 mile walk from the train station in -3 degrees F this morning, but lesser travel mugs might have had a problem with that. Interestingly I couldn’t find a good calculator to track how fast your beverage will cool under various conditions, but if you find one send it my way!

Of course if you really want to cool a drink down quickly, just move to Fairbanks, Alaska and throw it in the air:

Stay warm everyone!