It took me a few years of working with data before I realized what my primary job was. You see, back when I was a young and naive little numbers girl, I thought my primary job was to use numbers to expand what we knew about topics. I would put together information, hopefully gain some new insights, and pass the data on thinking my job was done.
It didn’t take me long before I realized the job was barely half finished.
You see, getting new insights from data is good and important, but it’s no more important than what comes next. As soon as you have data that says “x”, the natural inclination of almost everybody is to immediately extrapolate that out to say “Oh great! So we know x, which means we know y and z too!”. It’s then that my real job kicks in. Defending, defining and reiterating the limitations of data is a constant struggle, but if you are going to be honest about what you’re doing it’s essential.
I bring this up because I ran across a disturbing story that illustrates how damaging it can be when we don’t read the fine print about our data. The whole story is here (along with the great subtitle “The Hills Have IPs”), and it’s about one family’s tech-induced ten year nightmare.
The short version: 10 years ago, a company called MaxMind starts a business helping people identify locations for IP addresses associated with particular computers. When they can’t find a location, they set up a default for the geographic center of the USA. Unbeknownst to the company, this gets associated with the street address of a small farmhouse in Kansas. Over the next decade, every person who attempts to track down an IP address that’s not otherwise located (about 600 million of them) is given this address, which causes a constant stream of irate people, law enforcement and others to show up at the door of this farmhouse believing that’s where their hacker/iPhone thief/caller/harasser etc lives. The family has no idea why this is happening, and the local police department literally says the bulk of their job is now keeping angry and confused people away from this family.
The reporter who wrote the article (seriously, go read it) is the first person to put two and two together and actually figure out where the mix up happened.
What’s interesting about this story is that when it was brought to their attention, the company pointed out they actually have ALWAYS told customers not to trust the addresses given. They have always told people that results were only accurate within zip code or town. It’s not surprising that many individuals failed to recognize this, but it IS concerning that so many law enforcement agencies failed to take this in to account. This isn’t just local departments either….the FBI and IRS have investigated the address several times.
Want to know the scariest part? The reporter only figured this out by going through the companies records and then having someone build a computer program to find physical addresses associated with high numbers of IP addresses. While the Kansas farm was the worst, there were hundreds of other addresses with similar problems, including one that was a hub for lost iPhones that started her crusade. Without people grasping the limitations of this data, all of these homes are subject to people showing up angry, believing that someone else lives there.
As technology and the “big data” era expands, knowing what you don’t know is going to become increasingly critical. Small errors made at any one point in the system can and will be magnified over time until there can be real trouble. The fine print maybe never be as interesting as the big reveal, but it could save you a lot of trouble in the long run.