Having Trouble Keeping Up with Group Chat?

This post is republished from the ACM CSCW Medium.

Read our paper or try out our new Slack tool, Tilda! Paper: Making Sense of Group Chat using Collaborative Tagging and Summarization. Slack Tool: tildachat.com

Is Slack eating email? Lately, group chat tools like Slack, HipChat, and Microsoft Teams have gotten more popular for team and workplace communication. People say they prefer group chat over email because it’s more immediate, more personal, and more casual. People also point out that it’s just more fun, with plenty of chit-chat, jokes, gifs, and emojis.

Image result for slack gif
So, so many gifs.

But there have been complaints regarding group chat as well. One issue is that chat is suited for real-time communication, where people can exchange lots of back-and-forth quickly. This means that people who miss out might come back to many messages to go through. Some of those might be important but given the casual nature of chat, many of them will likely not be important. Distinguishing important from unimportant chat can be hard since they all look the same while scrolling around. The problem gets even worse if someone is, say, coming back from vacation or is a complete newcomer.

Remember opening your chat after a vacation?

In our research, we examined this problem — of catching up in group chat — through a series of studies. We talked to people who use group chat regularly to understand their problems with catching up. We created a series of mock-up interfaces for summaries of chat to learn about what people find useful for catching up.

From these findings, we developed a Slack app called Tilda for people to easily mark up their chat with a number of signals that then generate summaries. The tool uses features like emoji reactions and slash commands in Slack to make it easy to mark up chat while chatting. So far, we’ve tested out the tool in a number of experiments and deployments to real teams, with encouraging results. If you’re curious, try out the tool for yourself: https://tildachat.com.

Catching Up On Group Chat

In interviews with people who use group chat actively, we found many who experienced difficulties with catching up in group chat:

1) Despite an “always on” mentality, people still fell behind.

I think there’s a lot of content that I don’t need to consume. I’ve read [that] content switching is distracting and bad for productivity…But I hate having unread notifications.

Almost everyone we talked to had their group chat app open the entire day and checked it continuously. This echoes prior reports that Slack users have the app open on average 10 hours a day per weekday. Some also admitted to checking chat while on vacation. Despite all this effort to stay up-to-date, almost everyone we talked to also described falling behind due to things like too many conversations going on or too many channels to keep up with.

2) Catching up on group chat is hard.

Scrolling is basically the big issue, which is that you’ve got this giant timeline of stuff…You can only scroll and skim back through so many views on the viewport before you start getting tired of looking.

The way most interviewees caught up with group chat was to just scroll up in their chat window. This was hard because all the conversation looked the same at a glance, and important things were interspersed with chit-chat and humor. Other interviewees sometimes just gave up and ignored missed messages, expecting that important things would find them eventually. This also led to problems where people missed out on important messages. As a sender knowing this, people would post the same thing to different places like in both email and chat, and the conversations that played out in these different places would then be hard to trace.

3) Attempts to organize or synthesize chat haven’t worked well.

Acknowledging difficulties with catching up, some people described trying to start processes for organizing or synthesizing chat, like setting up a wiki, collaborative doc, or contributing to a Q&A forum. However, people would fail to update these separate applications, finding the work a documentation chore. People would also forget about them because the summaries were so separate from the chat application.

Designing a Chat Summary Interface

We build some mockups of different ways that chat conversations could be summarized to get a sense of what people would find useful. As you can see below, we varied the mockups to highlight different information and presentation formats, from A) short written summaries, to B) excerpts from the chat, to C) major types of discussion happening, such as Q&A or an announcement, to D) high level signals like participants and topics.

We found overall that people preferred formats that were highly structured, as opposed to free-form text, so that they would be easier to skim. At the same time, short excerpts weren’t that helpful on their own because they often missed useful context to understand what happened. Markers like the major types of discussion were helpful for gaining some of that context.

We also heard from people that they still wanted to read the original discussion for certain things that were interesting to them. So any summary should make it easy to dive in to read more as opposed to trying to be a one-stop static shop.

Tilda: A Tool for Marking Up Chat to Generate Summaries

From the above explorations, we build Tilda, a tool for marking up chat in Slack. The way Tilda works is that while having a conversation, you can mark up the ongoing chat with information, like whether it’s a question or what the topic of the conversation is. The ways you can do this are lightweight and integrated into the chatting experience, including adding an emoji reaction to a particular message to tag it or adding a short note using a slash command in the text box. The work is also collaborative — everyone can pitch in to mark up conversation.

Ways to add notes and tags in Tilda

Tilda then takes the information that you add and generates summaries of conversations. The summaries live inside Slack as well, with each note or tag in the summary linking back to where it happened in the chat. Additionally, any edits to the markup in the original chat get automatically reflected in the summary, wherever it’s posted.

Example summary by Tilda that lives in Slack.

You can choose to get a summary delivered to your direct messages with Tilda (where you can customize the channels, summaries, and people you want summaries from), or you can designate public “summary channels” where people can get summary notifications all in one place.

Check out our Best Paper awarded at CSCW 2018, where we report promising results from a number of lab studies comparing Tilda with using Google Docs for keeping notes while chatting in Slack, as well as deployments with real Slack teams, including 2 software startups, 1 journalism team, and 1 research group. Here are some quotes from our deployment:

“We really didn’t have a good system…Tilda made it muuuuch easier for us to fill someone in on something that happened…Overall I think Tilda greatly improved team communication over the week we used it. Conversations had better structure, team members were better kept up to date, and we actually had a way to save…results of our conversations for future use.”

“Before Tilda I would try to scroll…This was very tedious…With Tilda this process was much smoother. I would usually check our Tilda responses channel and skim through the summaries to see what I missed. If a topic seemed interesting, I would expand it all and read through everything. If I was uninterested in the topic I would just move on.”

To try out Tilda, head to tildachat.com. You can also run your own version of Tilda or contribute to the open source code at https://github.com/Microsoft/tilda.

Looking Ahead

We’re excited about a number of things regarding this work.

First off, Tilda is currently still in the prototype phase, so some exciting features don’t exist quite yet. For instance, you could imagine Tilda summaries being used for additional purposes. They could interface with calendar apps, project management apps, or any number of other apps so discussions automatically lead to actions without needing to remember all the different integrations. They could also exported to a separate repository outside Slack, to aid with writing up reports, deeper searching, newcomer integration, or longer-term organization.

Another aspect is the idea of using these chat logs with rich markup towards training machine learning models to help with summarization. Right now there’s a lack of annotated data to help with the task as well as underspecification of what summarizing chat really means. Our work takes important steps in this direction by: 1) describing what makes a good chat summary according to our mockups, 2) breaking down the summarization task into a set of concrete smaller tasks that can be chipped away at by more automated techniques, and 3) providing a simple mechanism and also motivation for people to mark up chat and create data.


4 Things We Learned from Talking to People who Face Harassment: Research behind Squadbox

Reposted from Squadbox’s Medium page.

Next week, at the ACM International Conference on Human Factors in Computing Systems (CHI 2018), the premiere venue for human-computer interaction research, we will be presenting our research conducted at MIT on online harassment. This research led to the design and development of Squadbox. You can read the full research paper here.

We’re a team of researchers based out of MIT CSAIL working on tools to help people combat online harassment. But before we got around to building these tools, we first sought to better understand the struggles that online harassment recipients face, as none of us had faced online harassment before (luckily). We wanted to answer questions like: how do people get harassed online and how does it affect their lives? What strategies do people already use to combat harassment, and how effective are they?

To learn this, we conducted a series of interviews with 18 people who have faced online harassment. We focused on harassment that is posted to an individual as opposed to harassment about an individual that is posted elsewhere (such as revenge porn). Interviewees came from a wide array of roles, from activist to journalist to scientist, and have faced harassment on a variety of platforms, as you can see in the table taken from our research paper below. Some people were harassed by hordes of strangers, while others were targeted by an individual or small number of people, often people they knew.

All the people we interviewed for our research study. We talked to people who were harassed for their public facing work, including journalists, scientists, activists, and Youtube personalities. We also talked to people who were harassed by people with whom they had developed personal or professional relationships, including from ex-partners, former collaborators, and fans.

Here are some of the things we learned:

1) People have very different definitions for and experiences with online harassment.

While there were some similarities that cropped up in how our interviewees described online harassment, there were also many different and unexpected cases that we discovered while talking to interviewees.

In terms of message content, many subjects described harassment as a personal attack, sometimes about aspects of their identity, that was designed to be emotionally upsetting. Some of these would be clear to almost anyone, such as well-known slurs or swear words. However, some people felt that other forms of content that were not explicit attacks were still harassing. For instance, one interviewee spoke of receiving deeply personal or graphic confessions or disturbing solicitations sent to their work email, which they considered a boundary violation. Other interviewees spoke of attacks that were coded and may not be clear to people unfamiliar with their identity or their community.

Also, not all messages deemed harassing were harassing because of the message content. For instance, interviewees described receiving messages that seem innocuous but were designed to suck time (sometimes called “sealioning”) or keep in contact with the interviewee due to obsessive interest (exhibiting stalking behavior). These were often exacerbated by the strategies mentioned below.

One interviewee described how their ex-partner would specifically send more messages designed to disturb them when they had an important work meeting scheduled, an example of how time can be weaponized.

Other attributes that define harassment beyond message content include a high volume of messages from a large number of people (sometimes called a “dogpile”) that were directed by some central source, such as blog post. Not every message sent would have harassing message content but the intended effect on the receiver is to overwhelm them, much like a distributed denial of service (DDoS) attack.

Interviewees also described harassment as when individuals would make repeated, persistent attempts at contact despite being ignored or asked to stop. One interviewee highlighted the persistent nature of several of their harassers:

“If I ignore their message, they’ll send one every week thinking I’m eventually going to reply, or they will reply to every single one of my tweets”

Besides finding new avenues to harass someone, harassers can also persist through obfuscation. One interviewee had a harasser who continually pretended to be a new person with a different email handle and would draw the interviewee into conversation, before revealing they were the same person as before. Similarly, one of our interviewees had a harasser who sent spoofed messages pretending to be their friends, making it so the interviewee became unable to distinguish between legitimate messages from friends and spoofed messages.

By reviewing some of our interviewees’ harassing emails, we noticed some creative techniques used by harassers designed specifically for certain mediums, such as email. For instance, some harassing emails had legitimate-sounding subject lines, and the harassment would be buried in a line partway or towards the end of an email, so the reader would have to open it and read to find out. Other emails went the other direction, adding harassing content even to the sender email address, through the use of throwaway email accounts.

While we found a variety of experiences with and definitions for harassment, there are undoubtedly more. What this initial foray told us was that any one-size-fits-all solution to harassment would likely fail to serve many people. Instead, much of harassment is contextual and require understanding of that context to recognize it. For example, these cases would not be something that a moderation team working in a different country or on short-term contract using a single set of guidelines would be able to cover.

It also demonstrated how difficult it would be to develop a purely computational approach to detecting harassment that would cover all these cases, especially as many harassment detection models today are only trained on message contents and context is not taken into account.

2) Encountering harassment during one’s day-to-day is a disturbing experience for many harassment recipients.

Almost all of our interviewees expressed frustration at their lack of agency to decide whether or when to confront harassing messages. One person said:

“Getting a [harassing] email when I’m looking for a message from my boss — it’s such a violation. It’s hard to prevent it from reaching me. Even if I wanted to avoid it I can’t. I can’t cut myself off from the internet — I have to do my job.”

One important point this person brought up is that for many people, stepping away from the internet or ignoring one’s messages is not a viable strategy. One might have to check their messages or be available online for their work — in which case, having harassing messages mixed in with other messages may lead to trepidation and stress whenever opening one’s inbox. One interviewee talked about how they were affected by both the mixture of messages in their inbox coupled with notifications that they received for incoming messages:

“The constant negativity really got to me…having it in your mind every 30 minutes or whenever there’s a new message…It just wears me down”

At the same time, interviewees need to see and get notified about their regular mail, especially if this is an account they use for work.

The problem of getting one’s day-to-day disrupted by harassment gets exacerbated when we consider volume and how it can be used to effectively shut down a person’s communication channels, as mentioned above. When they were inundated, many of our interviewees were left unable to respond to fans, their friends and community, or professional contacts:

“It’s made it harder to find the people who genuinely care, because it’s hard for me to motivate myself to look through comments or…go through my emails. Why should I look through hundreds of harassing comments to find a few good ones?”

The attack on their communication channels meant that some missed out on opportunities as a result of harassment. For instance, one of the journalists we talked to missed an interview request amidst a flood of harassing tweets.

One consequence of a DDoS (Distributed Denial of Service) style of harassment is that it can often be bursty — for example following publication of an article or video that gets a lot of attention — and thus many of our interviewees alternated between spikes of heavy harassment volume and periods with little or no harassment. Several of our interviewees also mentioned that oftentimes they could predict when a wave of harassment was likely to come, such as when they were about to publish a piece of content, without much recourse to do anything about it.

What these experiences demonstrated to us was that how the harassing messages arrive and are experienced is important to consider. What users needed was a way to gain control back over their inboxes so that they could go about their lives on their own terms.

3) Platform tools of block, filter, and report are inadequate.

Nearly every subject we interviewed stated that they had blocked accounts on social media or email, though most felt this was not very effective due to the number of harassers and harassers’ ability to circumvent blocking. One interviewee said:

“Every time he makes a new email, he creates a new name as well…Not only new names, but he also pretended to be different people.”

For others, blocking was not an option because they needed to or wanted to gather information from their harassers’ messages. Some who were harassed by ex-partners needed to keep in contact for coordinating childcare or for avoiding each other due to a restraining order. Others scanned their harassing messages so that they could become aware of potential threats, such as doxing of their private information, so they could then alert friends or authorities.

Another reason subjects wanted to see messages from harassers was to get an understanding of dissenting opinions for work purposes. For instance, some journalists we talked to felt that it was important for their job to keep a pulse on reader reactions. Other subjects wanted the ability to track their harassment over time in response to their public activity, such as learning what kinds of content generated the most harassment, in order to tailor their own behavior or respond to the harassment publicly. Still others wanted to track and document the harassment so that they could report it. Finally, some interviewees wanted to do damage control among peers after defamation.

Word or phrase-based filters were also inadequate. Some subjects expressed frustration at the difficulty of coming up with the right words to block or managing changes in language over time. One described filtering out messages despite false positives, saying:

“I have suicide as a filtered word because I get more comments from people telling me to commit suicide than I get from people talking about suicide…If I have the energy to, I’ll go through my ‘held for review’ folder to look through those.”

Finally, nearly every subject had reported harassers to platforms and strongly expressed dissatisfaction with both the process and the platforms’ opaque responses. A common frustration was that the burden of filing a report was too heavy, especially when there were many harassers. In the case of email, there is actually no process for reporting harassment at all on the major email platforms. Beyond platform tools, subjects also tried seeking help from law enforcement; the prevailing sentiment was that this was a time-consuming and fruitless experience.

In the absence of proper platform or legal involvement, users need tools to better manage their communications. However, tools simply targeted at the user to deal with their harassment on their own are insufficient because of how labor-intensive this task can be. Even in cases where platforms are responsive, harassment can be so contextual that there may be many cases that are not covered by generic platform policies.

4) People ask friends for help so they don’t have to face harassment alone or self-censor.

When we asked interviewees what they did in response to harassment that actually worked, some responded that they self-censored in order to give harassers less ammunition with which to harass them. Others made themselves harder to contact by closing Twitter direct messages from people they do not follow, not giving out their email, turning off notifications, or disabling comments. While this helped to mitigate harassment, it also made it more difficult to engage with people they did want to talk to — people they already know as well as non-harassing strangers, like collaborators, fans, clients, or sources:

“It’s impossible to contact me if you don’t have my contact info…I can’t be available to journalists as a source…I used to get all these awesome opportunities and I just can’t get them anymore.”

At the end of the day, these kinds of strategies, while providing relief to many people, are unsatisfactory because they mean that the harassers succeeded in silencing and isolating recipients of their harassment. When taken as a strategy for the internet as a whole, it means the loss of perspectives from vulnerable and targeted groups that often get harassed and consequently a failure to uphold principles of open dialogue and free speech online.

Instead, another mitigation strategy that helped and didn’t require silencing themselves was reaching out to friends or family for support and assistance. We had several interviewees independently describe ways that their friends would help. For instance, one person said that their best friend had their Twitter and Facebook passwords, and would log into their accounts and clear out harassing messages and notifications and block users. Another interviewee similarly said their spouse would log in to their email account and delete harassing messages, and a different interviewee who was an academic had others in their department going through their emails when they were undergoing an attack. One person described how their significant other would go through the comments on their posts and only read aloud the positive and encouraging ones. Multiple subjects said that they would forward potentially harassing emails unopened to friends for them to check and forward back.

Based on this research, we built Squadbox, a tool for people facing harassment to recruit their friends and other trusted individuals to moderate their inbox for them.

Many of the features of Squadbox are based off of findings from our interviews. For instance, we make it easy to turn moderation using Squadbox on and off due to people’s comments about the bursty nature of harassment. Since many interviewees talked about wanting to glean some information from their harassment, we allow them to specify what happens to messages deemed harassing, including receiving them in their inbox with a special tag, getting them summarized or partially redacted, or filed away.

Screenshot of a page in Squadbox. Photo was taken by me, then edited and published by Refinery29.

You can learn more about Squadbox by trying out the tool itself, reading our blog post or MIT’s press release introducing it, or even contributing to the project by looking at the code on Github. You can also read the original research paper here.

Considering End Users in the Design of News Credibility Annotations

Last week, I attended a working group meeting at the Brown Institute at Columbia to discuss a credibility schema for annotating the credibility of news content and other information online. The working group, hosted by Meedan and Hacks/Hackers, grew out of discussions started at MisinfoCon and incorporates perspectives from industry, academia, journalism, nonprofits, and design, among others.

As part of the day’s schedule, I gave a ~5 minute talk on end user applications for credibility annotations. This was slotted in a segment on use cases, or how credibility annotations could potentially be used by different stakeholders. I’ve now cleaned up my notes from the talk and present them below:


I am an HCI researcher designing and building end user tools for collaboration, and in my group, the systems we build tend to have a focus on giving end users direct control of what they see online, instead of ceding that control to opaque machine learning systems. Thus, today I am speaking on direct end user applications of the annotations as opposed to using them as inputs towards machine learning models to be used by news or social media organizations. In this case, I am using the phrase “end user” to describe basically a non-expert member of the general population for whom a tool would be designed.

First I want to make the point that, before we jump to thinking about training data and building machine learning models, credibility annotations that made by people can be immediately useful to other people just as is. In fact, there are cases where it may be beneficial to not have a machine learning intermediary or a top-down design enforced by a system.

Who Gets to Choose What You See?

So what might these cases be? One case we need to consider is the importance of visibility in an interface when it comes to attracting attention, and how attention can distort incentives and lead to problems such as fake and misleading news being spread widely on social media. Here it is helpful to consider who gets to determine what is being shown and whether their incentives are aligned with those of end users. For instance, on social media, system designers want to show end users engaging content to keep them active on the site, and thus site affordances and algorithms are shaped by engagement. In addition, news organizations also want to show end users engaging content to get them to click and visit their site to collect ad revenue. So what happens? In the end, we get things like clickbait and fake headlines.

Instead, let’s consider what it would take to center the news sharing experience around end user needs. To explore this idea, we built a tool called Baitless as a proof-of-concept. The idea is really simple. It’s an RSS reader where anyone can load an existing RSS feed and then rewrite the headline for any article in the feed and also vote on the best headlines that others have contributed.

Credibility WG.004

We then provide a new RSS feed where the titles are replaced by the best headline written by users. And if a user clicks on a link in their RSS feed reader, they are directed to a page where they can read the article and afterwards suggest new headlines directly on the page. In this way, end users can circumvent existing feedback loops to take control of their news reading experience.

Credibility WG.005.jpeg

At a higher level, right now end users cede control over everything they see in their social media feeds to systems that for the most part prioritize engagement, as opposed to other qualities such as veracity. Given that, how could end user-provided annotations help give end users control over their news feeds beyond simply headlines? Imagine if other people could perform actions on my feed such as removing false articles from my feed entirely or annotating news articles with links to refutations or verifications.

Who Annotates?

One aspect that is crucial when giving other people or entities power over one’s experience is the concept of trust. That is, who produces credibility annotations could also be an important signal for end users. After all, who I trust could be very different from who you trust. And this notion of trust in the ability to verify or refute information can be very different from the friend and follower networks that social media systems currently have. So if we have this network of the people and organizations that a person trusts, we can then do things like build news feeds that surface verified content as opposed to engaging content, and build reputation systems where actions have consequences for annotators. If you’re interested in this topic, please let me know, as we’ve just begun a project that delves into collecting trust network information and building applications on top of it.

An open question, which we don’t know the answer to yet is, is it good to put people in control of their experiences in this way or do we actually need something like machine learning to direct us to what is credible? Will this make filter bubbles worse, in that people will see less opposing content, or better? More importantly, given the recent research on the backfire effect, how might it affect how people react when they encounter opposing information? Might it make people more receptive if it’s from a trusted source?

Process over Means to an End

I also want to make the point that annotation, rather than just being some necessary but tedious work that goes into training models, is also a process that could actually be beneficial to end users in certain cases. For instance, news annotation can be a way to educate end users about media literacy. It’s also a way for readers to have more structured engagement with news content and a deeper relationship with news organizations beyond just firing off a comment into the abyss. After all, reading the news is a form of education, and journalists often play the role of educators when they write on a topic.

One project that we’ve done in this area is a tool that aims to teach readers to recognize moral framing while reading news. Using Bloom’s Taxonomy of Educational Objectives as a guide, we can imagine certain activities that readers could perform that would allow them to learn and also apply skills related to moral framing. To explore the various ways that users could annotate while reading, we built a browser extension called Pano (built on top of another system of ours exploring social browsing called Eyebrowse). It allows users to highlight and annotate passages on an article with the moral framing in that passage, leave comments and votes on a particular annotation, leave comments and chat on the page, as well as contribute towards a wiki-like summary describing the article’s point-of-view.


We conducted a field study comparing the use of our tool to simply participating in a written tutorial on moral framing and found that users who used our tool over a period of 10 days actually got better at writing arguments framed in the other side’s moral values. We also saw heavy use of the highlighting and annotation feature compared to low usage of the other features, such as wiki-editing a summary or commenting.

I wanted to leave you with some parting questions that I hope you’ll consider during this process:

  • When and why might end users want the ability to make annotations?
  • How do we design interfaces and interactions for consuming annotations that benefit end users?

Thanks to my collaborators at MIT who helped me create this talk: my advisor David Karger, along with Sandro Hawke, as well as Jessica Wang, a masters student who built Pano. And thank you to An Xiao Mina and Jenny 8 Lee for inviting me to the working group.

Year in Reviews

One of my New Year’s Resolutions for 2017 is to blog more, so here I am, reviving this blog from its two year silence!

So, why am I doing this, besides the obvious punny-ness? I thought it might be illuminating to share some short excerpts that I’ve received in reviews this year – sentences that have made me feel proud and also things that were hurtful and embarrassing – all to say that, hey, we all get them, and they’re a part of any academic’s life. Sometimes reviews can lift us up, but there are also times when they can feel demoralizing and painful because they are putting down a project that is close to one’s heart.

We all receive encouragement and criticism, and as academics, we should learn to treasure the positive stuff and keep that in reserve for when we’re feeling down, and listen to the criticism (when it’s constructive) without taking it too personally, since we all get it! Mixed in with my review excerpts, I share some of the highs and lows of 2016, following with my goals looking towards 2017 (gotta keep looking forward).

The Great 😍

“…[It] is a novel, interesting idea; frankly, it’s an idea that I think the community could brag about in the future (i.e., “that was invented here”).”

“Simple and powerful ideas like [X] are my favorite types of contributions…That’s magic.”

“In fact, the first half of the paper (before the evaluation) could serve as an exemplar…systems paper going forward.”

These quotes make me feel happy when I read them! If you get any reviews with gems like these, save them and take a look at them now and then when you’re feeling unsure about things.

One thing that I’m proud of from 2016 is that I gave a lot of talks this year, many of them to groups outside my research area. I gave 2 conference talks which I had done before but also 7 longer talks (30 min to 1 hr), which I had never done before. This including 1 research qualifying exam talk, 3 talks to other academic groups (including 2 computer graphics/vision groups), and 3 talks to industry groups (Wikimedia, Google, and Adobe). I found out long talks are hard to do well! I also gave a 5 minute talk at the Google PhD Fellowship summit that I think I was the most nervous for out of everything (even though it really didn’t matter for anything…)

I’m proud of this because I think I got better over the course of the year (although to be honest it never feels easy and I’m guessing it never will), and speaking is not something that comes naturally to me.

The Good 😀

“I was stuck by how self-aware the authors were of the limitations of their current approach.”

“Overall, the revised version improves substantially over the initial submission. Kudos on a great improvement!”

I was happy when I received these comments. First of all, I’m pretty uncomfortable with the idea of being a consummate salesperson even in my own papers, so I try to be balanced, which I always worry might be hurting my chances. It was nice to see someone notice and give kudos. Also, it’s a great feeling when reviews read rebuttals and revisions and change their score based on your work or at least show appreciation for the work you’ve just put in.

In good things that happened this year, I passed my quals, got 2 papers accepted at conferences, one of which may end up forming the basis of a thesis (who knows…)!

Something else I’m proud of is my level of service to the academic community, which has increased this year. Of course it’s not yet at the level of many people more senior than me but I think I did pretty well for a grad student. This year I served on a virtual PC for the first time (CHI WIP AC), reviewed ~16  conference papers (with 3 special recognitions!) and ~10 poster/WIP papers. I also spent some very stressful days as a member of an organizing committee as SV co-chair at RecSys ’16. This was incredibly hectic and stressful but I learned a lot about how a conference is organized. I realized that I would probably shrivel up and die if I were a professional event organizer because I find it so stressful.*thinks about wedding planning and dies.* Thank you to our many conference organizers – I have so much appreciation for you. 

The Bad 😩

“…the recommendation proposition is extremely naive in how it characterises [X]…”

“[X] is never a strong finding, and when it is one of the major pillars of a research paper it always raises the question that the paper may have been rushed and a full analysis and reflection on the work has not yet been completed.”

Now on to the bleurghs. Not much to say here from me except that I brush off my shoulders, find the places where I can take some lessons and improve, and get to recomputing, revising, reframing, and resubmitting! If they point out mistakes or suggest things to do to improve – that’s great advice. Maybe they misinterpreted something in the paper? That means that I should work on writing it more clearly. Or they aren’t convinced by an argument? Then at the least I should work on making the argument better, and maybe gather more data if necessary.

This year, I had 3 papers rejected – 1 that’s been on the backburner until I can pick it up again, 1 which was a 2nd-time-around submission and which is now in submission again (I’m sad about this paper because I genuinely think it’s good), 1 submitted for the first time and now currently being reworked.

Note: I think it’s important to talk about this kind of thing (and people are starting to, thanks to various transparency initiatives regarding rejection). Everyone faces rejection. Knowledge of this makes rejection just part of the process. An article I read once profiled a woman who mention that having been a competitive athlete made it so that she faced failure easier. I think there is some truth in this also as a former competitive athlete.

The Ugly 😰

“It’s not clear there are customers for this tool in this…community. The computational results are also mediocre…”

“Another drawback…is the lack of motivations in explaining what are the intellectual challenges that we need to address… In particular, the key questions from a reader’s perspective: What is the real novelty introduced by this…?”

“The paper doesn’t actually demonstrate the usefulness of this…in actual research.”

Ouch. Quotes like these hurt because they don’t just criticize a specific aspect of the paper but also cast aspersions on the entire project. With comments like these, it’s always important to remember 1) it’s one person’s opinion, 2) opinions can change, 3) framing and first impressions can make a big difference. It can also be helpful to get a second opinion from someone you trust that is knowledgeable about the community to see if research directions should actually be shifted before making any drastic decisions. Earlier in my PhD I found that my instincts were to too quickly acquiesce to any reviewer demands and defer to their opinions, while people I respected knew better when to push back.

I think the worst parts of this year did not happen to me personally (thankfully), but in some cases felt very personal. I’m talking of course about the events of the election, including the whole lead-up and aftermath, which took a hefty emotional toll and also sucked up a great deal of my time. There have been 1000’s of hot takes and I’ve probably read 3/4ths of them so I won’t repeat what’s already been said.


Looking forward

Receiving reviews can sometimes really suck and sometimes feel really validating. The funny thing is it’s not always the papers you think that will be received well or poorly. At the end of the day though, reviews are a rare and valuable resource, and there is always something to be learned. There is also (almost) always a place and a future for your work.

Also, guess what? As can be seen, my highest highs and lowest lows of this year ultimately had little to do with reviews. It’s easier to bounce back from bad reviews if you realize that they only reflect one of many facets of your life.

A lot of people would characterize 2016 as a terrible year overall. But 2017 is upon us, and it’s time to make plans! This coming year, besides my research projects which I’m always excited about, I’m excited about two things in particular:

  • I’m taking on a small army of new undergrad researchers starting in the new year, and I’m excited to work on my research mentoring skills. My first few years, I was pretty unstructured and perhaps overly nice to my undergrads, and I think they could actually use a bit more structure and more of a push. It can be frustrating to work with people are clearly busy elsewhere (it’s MIT after all) but I think I need to cultivate these expectations more instead of expecting people to be present and engaged outright. We’ll see how it goes! My concrete goal is to work closely enough with one or more of my students to eventually write a paper together.
  • I’m excited to TA for the very first time! I imagine that it will be a lot of work and a lot of the work will take me outside of my comfort zone but I’m curious to see how I manage and if I enjoy it.

So happy new year and may you have amazing reviews this year 🙂

Mailing Lists: Why Are They Still Here, What’s Wrong With Them, and How Can We Fix Them?

Online group discussion has been around almost as long as the Internet, but it seems we still can’t create tools for it that satisfy everyone. Starting with the first mailing list in 1972 and evolving to modern social media tools, we see tensions between people’s desire to participate in productive discussions and the struggle to manage a deluge of incoming communication. Our group wants to build better tools to address this problem. As a first step we decided to learn from an existing tool. Mailing lists have survived almost unchanged from the earliest days of the Internet, and are still heavily used today. The fact that they’re still here suggests that they’re doing something right; the fact that so many new social media tools have tried to replace them suggest they’re doing something wrong.  We wanted to understand both sides. We interviewed a variety of mailing list users to understand what they loved and hated about their mailing lists. We discovered an interesting opportunity to combine some of the best features of email and social media to address the weaknesses of both.

To understand how different types of groups use mailing lists and why they continue to use them, we interviewed members of two active mailing list communities and surveyed 28 additional mailing lists of many different types. When asked whether they would be interested to switching to a different tool, such as Facebook Groups, or a discussion forum, or a subreddit, most people across the board were not interested in switching to a newer social media. In fact, only 12% indicated they were interested in switching to Facebook Groups, the most analogous tool to many people. When we asked why, people’s responses grouped into the following four themes:

  • Email is for work while social media is for play or procrastination. One interviewee was concerned about more cat pictures and other irrelevant or silly posts if his group moved to a Facebook Group and felt this was the wrong tone for the list. Other people felt that mailing list communication was actually somewhat in between work and play.
  • Email feels more private while social media feels more public. People mentioned images of people’s faces and hyperlinks to their profile as making the Facebook Groups interface feel more public. However, we were concerned to find out that most people surveyed and interviewed did not realize their mailing list archives were public. Nor could they properly estimate how many people read their emails. In most cases, people guessed an entire order of magnitude lower than the true subscription count.
  • There is a greater confidence that email will be seen. Not only do more people use email instead of Facebook, people also had a sense that email would be seen, while Facebook algorithms might make it uncertain who receives what.
  • Email management is more customizable. People enjoyed being able to set up their own filters and customize their notifications and experience of their mailing list.

Given all of these reasons for preferring mailing lists, have all of the social moderation features and controls in newer social media been created for naught? It seems the answer to this is also no. In our research, we found many tensions within mailing list communities, specifically issues arising from people within the same mailing list expressing very different opinions and perceptions about the list. The following three tensions stood out the greatest:

  • Tensions over type and quantity of content on the list. While some users enjoyed intellectual discussions on the list, others hated them. Same for just about any other category of content, such as humor, job listings, rental and item sales, etc. People even disagreed about the correct etiquette for reply-to-the-list versus reply-to-the-sender.
  • Tensions over desire for interaction versus hesitation to post. Most users expressed a desire for more discussion on their mailing list, yet the majority of these folks have never participated in a discussion themselves. When asked about the reasons people were deterred from posting, they mentioned concerns such as the fear of spamming others, fear of looking stupid, fear of offending, and fear of starting a heated debate.
  • Tensions over push versus pull email access method. Most users either received all their mailing list email in their main inbox (push) or filtered all their mailing list emails to a separate folder (pull). We found very different attitudes from people with these two different strategies. For instance, push-users were much more worried about missing email, were more likely to miss email, and were more hesitant to post out of fear of spamming. On the other side, pull-users read email when they felt like it and not when it arrived, were more likely to miss emails, and had a more relaxed attitude towards sending emails.

Some of these tensions have been mitigated in newer social media systems thanks to social moderation and other newer features. So what can we do given what we’ve learned? One thing we can do is to improve new social media systems by incorporating more of what people like about mailing list systems. Another thing we can do is to improve mailing lists by incorporating some features taken from social media. Some things that we consider are introducing slow propagation through the list using likes, allowing friends to moderate posts before they get sent farther, allowing topic tags and following of different topics and threads, and more. We emphasize improving mailing lists because it’s something that anyone can work on (you don’t have to work at Facebook!), it’s relatively easy to build and test since users continue to use their existing mail clients, and it’s really about time that mailing lists had some innovation.

In that vein, we’re actively working on a new mailing list system. It’s still in a very preliminary stage but you can check it out and join or even start your own mailing list group at http://murmur.csail.mit.edu. You can also read our research paper published at CHI 2015 or look at slides and notes from the talk given at the CHI 2015 conference.

*This blog post was first posted at the Haystack Blog: http://haystack.csail.mit.edu/blog/2015/05/05/mailing-lists-why-are-they-still-here-whats-wrong-with-them-and-how-can-we-fix-them/

This work was conducted with Mark Ackerman of University of Michigan and David Karger at MIT CSAIL.

Thoughts and Reflections from a 1st Time Reviewer

A few months ago, I received an email that asked me to serve on the ICWSM 2014 Program Committee. The role of PC member for this conference entails reading several submitted papers to the conference and then writing up reviews that would then be used by the senior members to come to a decision about accepting or rejecting the papers. I was really excited to be asked to be a part of this process because it was my first time seeing what goes on behind the scenes of how papers get vetted, reviewed, and deliberated on by the research community. Unfortunately I could find few resources for how to write a proper review and did not come across any tailored to this specific research community (or even the general research communities of HCI, CSCW, or social computing).

Nevertheless, with the help of my research advisor who provided several useful tips, I came away from the process with a better understanding of how to write a good review. I also found reading papers from the point-of-view of a reviewer eye-opening and informative for myself as a researcher and writer. As a result, I am writing down some of my thoughts and findings in the hopes that they may be useful for other beginning reviewers going through the same experience. Please note that this is not meant to be a comprehensive how-to for writing reviews – only an account of the thought-process, revelations, and observations of a 1st time reviewer!

At First…

My initial reaction to being invited to review was excitement, followed closely by anxiousness. I was worried that as a beginning researcher I would not find anything substantial to say about any of the submissions, written as they were by more experienced researchers than myself. Surely any of the meager suggestions or recommendations that I could potential prescribe would have been thought of and taken care of already?

Once I actually received my assigned papers, however, I was surprised to find that there was a good amount of variation in the papers in terms of quality of writing and methodology, and incidentally, I could in fact come up with comments for each of the papers (though whether these comments were of any value I couldn’t yet be certain – more on that later).


Some things that I noticed while reading papers from the lens of a reviewer for the first time:

Grammatical errors, imprecise or undefined wording, and poor sentence structure were glaringly noticeable, and I could sense my perception of the paper quickly souring upon encountering more than a few of them. Noticing this, I tried my best to suppress my misgivings and overlook such errors because it was apparent that some of the papers were written by non-native English speakers, and I thought it unfair for them to be penalized.

On a wider note, I realized how incredibly important it is to be a good writer and to think deeply about organization and presentation. As a reviewer, it was almost a joy to read the papers that conveyed ideas clearly, presented well-thought-out graphs and tables, and told a well-structured story that flowed coherently from section to section. On the flip side, it was frustrating to read papers that had sections that seemed misplaced or irrelevant, or graphs and tables where it wasn’t clear what they were trying to express.

There were two papers sections in particular that stood out to me as surprisingly important:

Related Work

I could sense alarm bells going off when I read a Related Work section and could think of or could quickly Google related research that was not mentioned. It wasn’t necessarily a specific paper, but when a particular research direction or theme that was very relevant to the work of the paper was not given mention, it made me wonder what else the writers missed. As someone who is not the pre-eminent expert on every topic of the papers I read nor in the position of replicating each paper’s work, I couldn’t be certain that everything the writers were asserting was true. So reading a comprehensive Related Work section somehow made me feel more confident in the writers and more inclined to believe their assertions.


The Discussion section I found to also be extremely important. One might think that in order to write a strong research paper, it’s best to emphasize the strengths of the method or highlight only the positive findings and obfuscate, ignore, or minimize bad or inconclusive findings. In fact, I found that the opposite is true. I found myself much more willing to take a paper’s findings and conclusions at face value if the authors openly talked about drawbacks, caveats, and failures they had while doing their work. It was also interesting to see how authors tried to understand or conceptualize their findings, even if it wasn’t rigorous or a main part of their paper. I found it much more interesting as a reader to be presented with findings supported by a plausible explanation or put in context of a larger model rather than being given a slew of numbers at the end with no discussion at all.


After writing my initial reviews, I sent them off to my advisor for him to give a quick look-over to make sure I wasn’t putting my foot in my mouth. He gave me some great tips which I summarize below:

  • Start out reviews with a short summary of the paper. It’s useful for reviewers to remember what paper they’re reading about and helps authors know if they’re getting their main ideas across properly. Don’t forget to write within the summary what aspect of the paper is novel. This forces the reviewer to focus on the novelty of the work, which may affect the rating given to the paper. I recall cases where I actually struggled to put into words the novelty of a paper and as a result, realized I should bumped my ratings down.
  • Be careful of subjective statements that start with phrases such as “I would have liked…” because it’s unclear to the meta-reviewer and authors whether this is an objective problem with the paper or a subjective preference. The former is a strike against the paper while the latter may be interesting to the authors but shouldn’t count against the paper in terms of acceptance into the conference. When going through my reviews, I noticed that I had included several of these phrases – I guess because I was a first-time reviewer and afraid of coming off too strong or assured in my review, especially if I turned out to be wrong.


After turning in my reviews, I got a chance to see how other reviewers reviewed the same papers. Some reviews brought up points I had overlooked or had diverging opinions from me, which was  useful for me to see the other points of view that I didn’t consider as well as the range of perspectives. The best reviews in my opinion were the ones that demonstrated their expertise in that particular niche by backing up their points with citations and including recommendations and papers to look at along with their critiques. I could see how these reviews would be really useful for the authors to refine their work. Finally, it was a validating experience to see when other reviews brought up similar points to my own and when the meta-reviewers cited my review or mentioned points from my review as helpful for forming their final opinion. Here was concrete proof that many of my comments were actually valuable.

So in the end, my worst fears of being wildly off the mark in my reviews never materialized, and I finished the experience more confident in my research instincts and more understanding of the mentality of paper reviewers and thus how to frame, organize, and style my research writing. I think being part of the review process will help me become a better writer and ultimately, a better researcher and active member of the research community, and I encourage program chairs and organizers of conferences to invite PhD students and newer members of the community to participate in the review process.


*  This post is cross-posted at the Haystack Group blog.

While I’ve learned a great deal from this process, there are many, many veteran reviewers out there who have many more tips that they’ve cultivated or learned over time. Therefore, I’m keeping a list of comments to this piece and further advice I receive from others here:

– “…always talk about the weaknesses and limitations of your approach in a paper, but don’t have that be the last thing you talk about. I remember once ending a paper with the “Limitations” section right before a short conclusion, and my MSR boss telling me how that leaves a negative last impression.”

– My advisor also mentioned that a great (and hilarious) resource for reviewers is “How NOT to review a paper: The tools and techniques of the adversarial reviewer” by Graham Cormode, which also has citations to other useful guides on reviewing.

– This guide written for CHI 2005 is several years old but is a really thorough look into how to write a proper review, including useful examples of both suitable and unsuitable reviews.

What is a Neighborhood?

For my master’s thesis, I am looking at neighborhoods and how to define, classify, and describe them.

First off, what is a neighborhood? Though there are conflicting definitions of what exactly constitute a neighborhood, most would agree that it is a geographically localized, somewhat homogeneous community within a larger city. If one had to name neighborhoods in New York City, it would be easy to rattle off names such as Upper East Side, Soho, East Village, Chinatown, Midtown, etc.  And when we picture these neighborhoods in our minds, each neighborhood often has a distinct feel or vibe to it that makes it easily identifiable.


Can you guess these neighborhoods? *Images taken from wikipedia.org

How do we have such a clear picture of what these neighborhoods are like? When thinking about what sort of characteristics differentiate one neighborhood from another, we think of the kind of places (restaurants, shops, stores, etc.), the kind of people (tourists, bankers, affluent people, young people, a certain ethnicity), and the kinds of activities that take place (working, shopping, partying, sightseeing) within this neighborhood. For instance, we would expect to see a lot more offices, working, and tourists in Midtown, but maybe more boutiques, shopping, and artists in Soho. Of course, there are many neighborhoods and characteristics of neighborhoods that are harder to guess off the top of our heads (NoLIta versus TriBeCa or NoHo?)

Another issue with neighborhoods is their fuzzy boundaries and ever-changing characteristics. People and places are not stationary, and over time, the characteristics and perhaps the boundaries of a neighborhood may change (gentrification, anyone?), or new neighborhoods emerge and old ones become consumed. For a recent example, see the shrinking of Little Italy. Who defines what the boundaries of a neighborhood are, anyway? These are often arbitrarily drawn by city officials going off of outdated information or natural boundaries, such as a river, that may no longer be there or do not represent cultural boundaries. Even worse, real estate agents have used these shifty boundaries to falsely stretch boundaries of more coveted neighborhoods or come up with new neighborhoods out of the blue to repackage and market a place. Hence, the aforementioned NoLIta, TriBeCa, NoHo, and now DUMBO, BoCoCa, BoHo, FiDi, and whatever else they have managed to come up with. Clearly, some names have stuck while others faded away, leading to the conclusion that in some cases, these names actually fulfilled a need and put a name to a newly formed, distinct neighborhood.

Surely there’s a better way? How can we systematically find neighborhoods, quantify their characteristics, and observe their changing or shifting states? With an eye to the characteristics I’ve mentioned that define neighborhoods, there are many popular social media websites out there now that give indication to some of these characteristics. For my study, I focus on Foursquare check-in data, specifically a data set that’s been collected from the Twitter API (check-ins from Foursquare forwarded to public Twitter accounts) from May 27th, 2010 to November 2nd, 2010 by the Cambridge NetOS group. This data set returns a list of places, a list of users, and a list of each time a user has checked into a place.

The characteristics that I chose to focus on were places, time, and tourist/local. Let me describe each in more detail, and how I collected these values.

Places – Foursquare has a given list of place categories that define all the places that people check in to. These include bars, Mexican restaurants, shoe stores, and many more. Categories are placed into a hierarchical tree, so that Mexican restaurants are under Restaurants and shoe stores are under Shops. For a full list of categories, visit here. Thus, every place has an associated place category tag.

Time – Here, I try to answer the question of: what time of the day are places busiest? By counting the volume of check-ins for every hour in a day for a place, I can pick out when they are most active. Then, by assigning chunks of hours in a day to categories such as Morning, Afternoon, Evening, and more, I can classify every place by the time category they are busiest.

Tourist/Local – To determine whether a place is touristy or local, I first must determine whether a user is a local or tourist. By counting the percentage of check-ins a user has in or around a city, I can make an educated guess of whether a user is a local of that city. From here, a place can be considered local or touristy based on the proportion of locals or tourists that visit this place.

From these tags associated with every place, I can cluster places based on their geographic location and the various characteristics just mentioned. The clustering method I use is called OPTICS and is a density-based hierarchical clustering algorithm that does not require an input of the number of clusters and also doesn’t require every point to be into a cluster. This makes sense for our look at neighborhoods, since we do not know the number of neighborhoods in advance, and neighborhoods are small. It would make little sense to define a “shopping” cluster that is the entire size of Manhattan, even though there are shops throughout the city. In this case, we are interested in highly-dense pockets for each characteristic. Because we have such a large number of characteristics, it would be difficult to perform OPTICS for each category, manually setting inputs into the algorithm to achieve a reasonable-looking set of clusters. By using an automatic clustering algorithm that is fine-tuned to each city, we greatly reduce the time it takes to cluster. Here I will share some preliminary results:

Here is an geographic plot of the places in New York City that are characterized as “Chinese Restaurant”. Highlighted is the cluster found by OPTICS of a dense area of Chinese restaurants – Chinatown.

Here is another plot that now shows places characterized as “Lunchtime”, or places that are their busiest from 11AM-1PM.

In this way, we can characterize the areas of a city by the clusters that are present. And, by overlapping clusters, we can find the areas of intersect that have homogeneous qualities across many characteristics, leading us to neighborhoods!

Here is a look at only the clusters corresponding to “Late Evening”, or 10PM to 2AM (in blue) and “Nightlife Spots” (in red). The purple is their overlap.

We see a lot of overlap in the clusters, leading us to the possibility that we could define neighborhood boundaries using this method. We also see that the nightlife spots are indeed busy in the late evening, as we would expect, but not all late evening clusters have dense nightlife spots. Thus, some characteristics of different neighborhoods emerge – this is the feel or vibe that we want to quantify. However, this is only an example from just looking at two characteristics. With more characteristics overlapping, we can do an even better job of finding neighborhoods and characterizing them based on place, time, and local/tourist.

This work is still in progress, and a web application for anyone to explore the various clusters is forthcoming. I touched earlier upon many more characteristics that can define a neighborhood that I have not delved into, including things like demographics of occupation, ethnicity, age, and more. Activities, such as working or shopping, were also not explicitly studied, though they may be inferred from the characteristics of place and time. In the future, these characteristics and more could be added to improve this study. Newer data sets could also be used to observe changes as a result of time and find if neighborhoods have moved, grown, shrunk, appeared or disappeared. The addition of cities (I am looking at New York City and London at the moment) is always good, though we are limited to only large cities that have enough volume in check-ins to do analysis on. An interesting related task would be to find neighborhoods that are similar across cities, as this post attempts. Comparisons of cities are also possible, as I happened upon striking differences between the check-in activity of New York City and London. Last, as more social media websites incorporate location-tagging, we could replicate this analysis on their data and possibly increase the size of our user demographic.

Where in the World Are You?

This is the question Twitter puts forth to every user on their profile page. But what are people actually putting down for their location? As I delved into hundreds of thousands of public profile location fields, the answer is: not always their actual location, and when it is, not often in an easily readable format for computers to understand. Along the way, I learned many insights both interesting and silly on how people express themselves through their location field.

When my research on local day-to-day patterns on Twitter began, one of the first steps we had to accomplish was to figure out where tweets were coming from. There were two ways to do this.

  1. If a tweet is sent from a phone or a browser that has geotagging enabled, or if a tweet is imported from a tagging service such as FourSquare, then it contains geo coordinates. How I took these coordinates and turned them into proper city names (reverse geocoding) is a subject for another post.
  2. Attempting to make sense of that location field in every user’s profile and translate it to an actual city name.

However, a cursory glance at a sample of locations that users input will show how difficult it can be to identify the name of a city from the many and various creative ways Twitter users express themselves through their location field.

Here’s a random peek:

Sevilla Andalucía España
252 — dmv
Above The Clouds
Lion City!
last exit to sumerland
At Northies you will find me
SomeWhere ….
Boston! Green all day!
canada, eh?
boston & new york .
SJC – SP – Brasil
Monumen Nasional, Jakarta
a bed.
Caracas Venezuela
Makassar, Sulawesi Selatan
Swimming Pool
Ur timeline
boston & new york .
in the shop doing hair
A place Quiet and Cozy
Sweden!!ღ ♫
Mi paraiso

Luckily, Twitter provides a Search API that allows you to collect tweets that are located within a given radius of a point (exactly how Twitter comes up with the geo for these tweets I am not entirely sure). The Social Media lab at Rutgers has been collecting these tweets for 57 selected cities mostly in the U.S. for over a year, amassing a data set of over 700 million tweets. The logical next step for me was to go through the tagged tweets, look at the location field from their user profiles, and find the most popular terms for each city. Demonstrating how difficult parsing these location fields are, there were many terms that were simply flat out wrong or very questionable/vague, and I had to go through a lot of pruning.

Here are a small number of user locations that had been tagged to New York, NY, but that I took out for one reason or another:

Waverly Place
Thames Street
Off the Wall
Third World
you’r world
Financial Freedom
Sapphire World
hogwarts on waverly place.
Tha Wurld
1st Place
your sin! my city : D
Probably In The Mall
around the corner!
Nick World
eight prince
Donghae’s room

A lot of knowing what was an error and what wasn’t an error required me to delve into my knowledge of cities and popular trivia. For instance, I knew to take out Waverly Place, because it probably meant that the user was a fan of the show Wizards of Waverly Place on Disney as opposed to living in Waverly Place in NYC. Other terms were complete question marks to me, and required searching on the web and urbandictionary.com.

Some interesting/random things I have found/mused about while doing this filtering:

  • Denver, CO is also commonly known as the “Mile High City.”
  • From Atlanta, GA? How about Hotlanta? Also sometimes referred to as  “Black Hollywood.”
  • For the longest time, I kept seeing varying versions of “DMV” popping up, and I would be so confused. To me, DMV stands for Department of Motor Vehicles. Google agreed with me as well, but a quick search on urbandictionary revealed that, sure enough, it stands for “D.C. Maryland Virginia.”
  • The many, many ways to incorporate “Bieber” into one’s location. One ex: Bieberlulu (Honalulu, HI). Also, an alarming number of users located in “biebers pants.”
  • “Springfield” – enough information to infer Springfield, IL? Probably not, considering there are 34 cities in the U.S. with this name and 36 townships. Oh, and the Simpsons live in “Springfield.”
  • “AdventureLand” – mistake? Or…a sprawling family resort in Des Moines, IA.
  • Why is “New Yawkkk” way more popular than, say, “New Yawkk,” “New Yawwk,” or “New Yawkkkk”? Perhaps it has something to with the fact that Snooki (@Sn00ki) from Jersey Shore lists “New Yawkkk” as her location?
  • More people list their location as “Quahog,” fictitious home of Family Guy, than “Cape Cod, MA.” Also Twitter seems to think Quahog is in MA.
  • A cool nickname for Long Island, a borough of NYC – “Strong Island.”
  • A not so cool nickname for Salt Lake City, UT – “SL, UT.”
  • “Bunny Ranch” – not a mistake by Twitter but actually a “famous” brothel in Carson City, NV.
  • Nashville, TN is also known as “Music City.” Apparently, it’s also known for money. “Cashville” and “Na$hville” were very popular.
  • “Noho” – confusingly standing for both NOrth HOuston (NYC) and NOrth HOllywood (LA). Other ambiguities: “Chinatown,” “Downtown,” “Uptown,” etc.
  • I found lots and lots examples of users listing a whole string of places for a location. For instance, something like “NY x LA” or, even crazier – “NY,LA,ATL,MIA,RIO,LON,PAR,LAG.” I wonder how many of these are large companies or actual globetrotters as opposed to the occasional vacationer that likes to inflate their travel time.
  • Wishful thinking as well as excited announcements of moving abounded. I found many variations of “Wishing I were in X”, “It should be X,” “X, one day…”, and “Y, but very soon to be X,” the last one often punctuated with exclamation points and smileys.
  • The location field can also be a place for clever little jokes (“behind U,” “near your ear”, “where ur not”) or possibly, flirtations?? (“ur place,” “in ya mouth”)

Finished product: a clean and unambiguous list of top location field entries for each city we were crawling. From here, I use these lists to geo-code millions of tweets to my selected cities if the user location field matches up with an entry on the list.

Find this topic (location fields in Twitter) interesting? Many more Bieber-isms and much more in-depth research on user location fields in the following research paper by Hecht, Hong, Suh, and Chi: Tweets from Justin Bieber’s Heart [pdf]