This is the question Twitter puts forth to every user on their profile page. But what are people actually putting down for their location? As I delved into hundreds of thousands of public profile location fields, the answer is: not always their actual location, and when it is, not often in an easily readable format for computers to understand. Along the way, I learned many insights both interesting and silly on how people express themselves through their location field.
When my research on local day-to-day patterns on Twitter began, one of the first steps we had to accomplish was to figure out where tweets were coming from. There were two ways to do this.
- If a tweet is sent from a phone or a browser that has geotagging enabled, or if a tweet is imported from a tagging service such as FourSquare, then it contains geo coordinates. How I took these coordinates and turned them into proper city names (reverse geocoding) is a subject for another post.
- Attempting to make sense of that location field in every user’s profile and translate it to an actual city name.
However, a cursory glance at a sample of locations that users input will show how difficult it can be to identify the name of a city from the many and various creative ways Twitter users express themselves through their location field.
Here’s a random peek:
|Sevilla Andalucía España
252 — dmv
Above The Clouds
last exit to sumerland
At Northies you will find me
Boston! Green all day!
|boston & new york .
SJC – SP – Brasil
Monumen Nasional, Jakarta
Makassar, Sulawesi Selatan
boston & new york .
in the shop doing hair
A place Quiet and Cozy
Luckily, Twitter provides a Search API that allows you to collect tweets that are located within a given radius of a point (exactly how Twitter comes up with the geo for these tweets I am not entirely sure). The Social Media lab at Rutgers has been collecting these tweets for 57 selected cities mostly in the U.S. for over a year, amassing a data set of over 700 million tweets. The logical next step for me was to go through the tagged tweets, look at the location field from their user profiles, and find the most popular terms for each city. Demonstrating how difficult parsing these location fields are, there were many terms that were simply flat out wrong or very questionable/vague, and I had to go through a lot of pruning.
Here are a small number of user locations that had been tagged to New York, NY, but that I took out for one reason or another:
Off the Wall
n YUR HEART
hogwarts on waverly place.
your sin! my city : D
Probably In The Mall
around the corner!
A lot of knowing what was an error and what wasn’t an error required me to delve into my knowledge of cities and popular trivia. For instance, I knew to take out Waverly Place, because it probably meant that the user was a fan of the show Wizards of Waverly Place on Disney as opposed to living in Waverly Place in NYC. Other terms were complete question marks to me, and required searching on the web and urbandictionary.com.
Some interesting/random things I have found/mused about while doing this filtering:
- Denver, CO is also commonly known as the “Mile High City.”
- From Atlanta, GA? How about Hotlanta? Also sometimes referred to as “Black Hollywood.”
- For the longest time, I kept seeing varying versions of “DMV” popping up, and I would be so confused. To me, DMV stands for Department of Motor Vehicles. Google agreed with me as well, but a quick search on urbandictionary revealed that, sure enough, it stands for “D.C. Maryland Virginia.”
- The many, many ways to incorporate “Bieber” into one’s location. One ex: Bieberlulu (Honalulu, HI). Also, an alarming number of users located in “biebers pants.”
- “Springfield” – enough information to infer Springfield, IL? Probably not, considering there are 34 cities in the U.S. with this name and 36 townships. Oh, and the Simpsons live in “Springfield.”
- “AdventureLand” – mistake? Or…a sprawling family resort in Des Moines, IA.
- Why is “New Yawkkk” way more popular than, say, “New Yawkk,” “New Yawwk,” or “New Yawkkkk”? Perhaps it has something to with the fact that Snooki (@Sn00ki) from Jersey Shore lists “New Yawkkk” as her location?
- More people list their location as “Quahog,” fictitious home of Family Guy, than “Cape Cod, MA.” Also Twitter seems to think Quahog is in MA.
- A cool nickname for Long Island, a borough of NYC – “Strong Island.”
- A not so cool nickname for Salt Lake City, UT – “SL, UT.”
- “Bunny Ranch” – not a mistake by Twitter but actually a “famous” brothel in Carson City, NV.
- Nashville, TN is also known as “Music City.” Apparently, it’s also known for money. “Cashville” and “Na$hville” were very popular.
- “Noho” – confusingly standing for both NOrth HOuston (NYC) and NOrth HOllywood (LA). Other ambiguities: “Chinatown,” “Downtown,” “Uptown,” etc.
- I found lots and lots examples of users listing a whole string of places for a location. For instance, something like “NY x LA” or, even crazier – “NY,LA,ATL,MIA,RIO,LON,PAR,LAG.” I wonder how many of these are large companies or actual globetrotters as opposed to the occasional vacationer that likes to inflate their travel time.
- Wishful thinking as well as excited announcements of moving abounded. I found many variations of “Wishing I were in X”, “It should be X,” “X, one day…”, and “Y, but very soon to be X,” the last one often punctuated with exclamation points and smileys.
- The location field can also be a place for clever little jokes (“behind U,” “near your ear”, “where ur not”) or possibly, flirtations?? (“ur place,” “in ya mouth”)
Finished product: a clean and unambiguous list of top location field entries for each city we were crawling. From here, I use these lists to geo-code millions of tweets to my selected cities if the user location field matches up with an entry on the list.
Find this topic (location fields in Twitter) interesting? Many more Bieber-isms and much more in-depth research on user location fields in the following research paper by Hecht, Hong, Suh, and Chi: Tweets from Justin Bieber’s Heart [pdf]