Thoughts and Reflections from a 1st Time Reviewer

A few months ago, I received an email that asked me to serve on the ICWSM 2014 Program Committee. The role of PC member for this conference entails reading several submitted papers to the conference and then writing up reviews that would then be used by the senior members to come to a decision about accepting or rejecting the papers. I was really excited to be asked to be a part of this process because it was my first time seeing what goes on behind the scenes of how papers get vetted, reviewed, and deliberated on by the research community. Unfortunately I could find few resources for how to write a proper review and did not come across any tailored to this specific research community (or even the general research communities of HCI, CSCW, or social computing).

Nevertheless, with the help of my research advisor who provided several useful tips, I came away from the process with a better understanding of how to write a good review. I also found reading papers from the point-of-view of a reviewer eye-opening and informative for myself as a researcher and writer. As a result, I am writing down some of my thoughts and findings in the hopes that they may be useful for other beginning reviewers going through the same experience. Please note that this is not meant to be a comprehensive how-to for writing reviews – only an account of the thought-process, revelations, and observations of a 1st time reviewer!

At First…

My initial reaction to being invited to review was excitement, followed closely by anxiousness. I was worried that as a beginning researcher I would not find anything substantial to say about any of the submissions, written as they were by more experienced researchers than myself. Surely any of the meager suggestions or recommendations that I could potential prescribe would have been thought of and taken care of already?

Once I actually received my assigned papers, however, I was surprised to find that there was a good amount of variation in the papers in terms of quality of writing and methodology, and incidentally, I could in fact come up with comments for each of the papers (though whether these comments were of any value I couldn’t yet be certain – more on that later).


Some things that I noticed while reading papers from the lens of a reviewer for the first time:

Grammatical errors, imprecise or undefined wording, and poor sentence structure were glaringly noticeable, and I could sense my perception of the paper quickly souring upon encountering more than a few of them. Noticing this, I tried my best to suppress my misgivings and overlook such errors because it was apparent that some of the papers were written by non-native English speakers, and I thought it unfair for them to be penalized.

On a wider note, I realized how incredibly important it is to be a good writer and to think deeply about organization and presentation. As a reviewer, it was almost a joy to read the papers that conveyed ideas clearly, presented well-thought-out graphs and tables, and told a well-structured story that flowed coherently from section to section. On the flip side, it was frustrating to read papers that had sections that seemed misplaced or irrelevant, or graphs and tables where it wasn’t clear what they were trying to express.

There were two papers sections in particular that stood out to me as surprisingly important:

Related Work

I could sense alarm bells going off when I read a Related Work section and could think of or could quickly Google related research that was not mentioned. It wasn’t necessarily a specific paper, but when a particular research direction or theme that was very relevant to the work of the paper was not given mention, it made me wonder what else the writers missed. As someone who is not the pre-eminent expert on every topic of the papers I read nor in the position of replicating each paper’s work, I couldn’t be certain that everything the writers were asserting was true. So reading a comprehensive Related Work section somehow made me feel more confident in the writers and more inclined to believe their assertions.


The Discussion section I found to also be extremely important. One might think that in order to write a strong research paper, it’s best to emphasize the strengths of the method or highlight only the positive findings and obfuscate, ignore, or minimize bad or inconclusive findings. In fact, I found that the opposite is true. I found myself much more willing to take a paper’s findings and conclusions at face value if the authors openly talked about drawbacks, caveats, and failures they had while doing their work. It was also interesting to see how authors tried to understand or conceptualize their findings, even if it wasn’t rigorous or a main part of their paper. I found it much more interesting as a reader to be presented with findings supported by a plausible explanation or put in context of a larger model rather than being given a slew of numbers at the end with no discussion at all.


After writing my initial reviews, I sent them off to my advisor for him to give a quick look-over to make sure I wasn’t putting my foot in my mouth. He gave me some great tips which I summarize below:

  • Start out reviews with a short summary of the paper. It’s useful for reviewers to remember what paper they’re reading about and helps authors know if they’re getting their main ideas across properly. Don’t forget to write within the summary what aspect of the paper is novel. This forces the reviewer to focus on the novelty of the work, which may affect the rating given to the paper. I recall cases where I actually struggled to put into words the novelty of a paper and as a result, realized I should bumped my ratings down.
  • Be careful of subjective statements that start with phrases such as “I would have liked…” because it’s unclear to the meta-reviewer and authors whether this is an objective problem with the paper or a subjective preference. The former is a strike against the paper while the latter may be interesting to the authors but shouldn’t count against the paper in terms of acceptance into the conference. When going through my reviews, I noticed that I had included several of these phrases – I guess because I was a first-time reviewer and afraid of coming off too strong or assured in my review, especially if I turned out to be wrong.


After turning in my reviews, I got a chance to see how other reviewers reviewed the same papers. Some reviews brought up points I had overlooked or had diverging opinions from me, which was  useful for me to see the other points of view that I didn’t consider as well as the range of perspectives. The best reviews in my opinion were the ones that demonstrated their expertise in that particular niche by backing up their points with citations and including recommendations and papers to look at along with their critiques. I could see how these reviews would be really useful for the authors to refine their work. Finally, it was a validating experience to see when other reviews brought up similar points to my own and when the meta-reviewers cited my review or mentioned points from my review as helpful for forming their final opinion. Here was concrete proof that many of my comments were actually valuable.

So in the end, my worst fears of being wildly off the mark in my reviews never materialized, and I finished the experience more confident in my research instincts and more understanding of the mentality of paper reviewers and thus how to frame, organize, and style my research writing. I think being part of the review process will help me become a better writer and ultimately, a better researcher and active member of the research community, and I encourage program chairs and organizers of conferences to invite PhD students and newer members of the community to participate in the review process.


*  This post is cross-posted at the Haystack Group blog.

While I’ve learned a great deal from this process, there are many, many veteran reviewers out there who have many more tips that they’ve cultivated or learned over time. Therefore, I’m keeping a list of comments to this piece and further advice I receive from others here:

– “…always talk about the weaknesses and limitations of your approach in a paper, but don’t have that be the last thing you talk about. I remember once ending a paper with the “Limitations” section right before a short conclusion, and my MSR boss telling me how that leaves a negative last impression.”

– My advisor also mentioned that a great (and hilarious) resource for reviewers is “How NOT to review a paper: The tools and techniques of the adversarial reviewer” by Graham Cormode, which also has citations to other useful guides on reviewing.

– This guide written for CHI 2005 is several years old but is a really thorough look into how to write a proper review, including useful examples of both suitable and unsuitable reviews.

What is a Neighborhood?

For my master’s thesis, I am looking at neighborhoods and how to define, classify, and describe them.

First off, what is a neighborhood? Though there are conflicting definitions of what exactly constitute a neighborhood, most would agree that it is a geographically localized, somewhat homogeneous community within a larger city. If one had to name neighborhoods in New York City, it would be easy to rattle off names such as Upper East Side, Soho, East Village, Chinatown, Midtown, etc.  And when we picture these neighborhoods in our minds, each neighborhood often has a distinct feel or vibe to it that makes it easily identifiable.


Can you guess these neighborhoods? *Images taken from

How do we have such a clear picture of what these neighborhoods are like? When thinking about what sort of characteristics differentiate one neighborhood from another, we think of the kind of places (restaurants, shops, stores, etc.), the kind of people (tourists, bankers, affluent people, young people, a certain ethnicity), and the kinds of activities that take place (working, shopping, partying, sightseeing) within this neighborhood. For instance, we would expect to see a lot more offices, working, and tourists in Midtown, but maybe more boutiques, shopping, and artists in Soho. Of course, there are many neighborhoods and characteristics of neighborhoods that are harder to guess off the top of our heads (NoLIta versus TriBeCa or NoHo?)

Another issue with neighborhoods is their fuzzy boundaries and ever-changing characteristics. People and places are not stationary, and over time, the characteristics and perhaps the boundaries of a neighborhood may change (gentrification, anyone?), or new neighborhoods emerge and old ones become consumed. For a recent example, see the shrinking of Little Italy. Who defines what the boundaries of a neighborhood are, anyway? These are often arbitrarily drawn by city officials going off of outdated information or natural boundaries, such as a river, that may no longer be there or do not represent cultural boundaries. Even worse, real estate agents have used these shifty boundaries to falsely stretch boundaries of more coveted neighborhoods or come up with new neighborhoods out of the blue to repackage and market a place. Hence, the aforementioned NoLIta, TriBeCa, NoHo, and now DUMBO, BoCoCa, BoHo, FiDi, and whatever else they have managed to come up with. Clearly, some names have stuck while others faded away, leading to the conclusion that in some cases, these names actually fulfilled a need and put a name to a newly formed, distinct neighborhood.

Surely there’s a better way? How can we systematically find neighborhoods, quantify their characteristics, and observe their changing or shifting states? With an eye to the characteristics I’ve mentioned that define neighborhoods, there are many popular social media websites out there now that give indication to some of these characteristics. For my study, I focus on Foursquare check-in data, specifically a data set that’s been collected from the Twitter API (check-ins from Foursquare forwarded to public Twitter accounts) from May 27th, 2010 to November 2nd, 2010 by the Cambridge NetOS group. This data set returns a list of places, a list of users, and a list of each time a user has checked into a place.

The characteristics that I chose to focus on were places, time, and tourist/local. Let me describe each in more detail, and how I collected these values.

Places – Foursquare has a given list of place categories that define all the places that people check in to. These include bars, Mexican restaurants, shoe stores, and many more. Categories are placed into a hierarchical tree, so that Mexican restaurants are under Restaurants and shoe stores are under Shops. For a full list of categories, visit here. Thus, every place has an associated place category tag.

Time – Here, I try to answer the question of: what time of the day are places busiest? By counting the volume of check-ins for every hour in a day for a place, I can pick out when they are most active. Then, by assigning chunks of hours in a day to categories such as Morning, Afternoon, Evening, and more, I can classify every place by the time category they are busiest.

Tourist/Local – To determine whether a place is touristy or local, I first must determine whether a user is a local or tourist. By counting the percentage of check-ins a user has in or around a city, I can make an educated guess of whether a user is a local of that city. From here, a place can be considered local or touristy based on the proportion of locals or tourists that visit this place.

From these tags associated with every place, I can cluster places based on their geographic location and the various characteristics just mentioned. The clustering method I use is called OPTICS and is a density-based hierarchical clustering algorithm that does not require an input of the number of clusters and also doesn’t require every point to be into a cluster. This makes sense for our look at neighborhoods, since we do not know the number of neighborhoods in advance, and neighborhoods are small. It would make little sense to define a “shopping” cluster that is the entire size of Manhattan, even though there are shops throughout the city. In this case, we are interested in highly-dense pockets for each characteristic. Because we have such a large number of characteristics, it would be difficult to perform OPTICS for each category, manually setting inputs into the algorithm to achieve a reasonable-looking set of clusters. By using an automatic clustering algorithm that is fine-tuned to each city, we greatly reduce the time it takes to cluster. Here I will share some preliminary results:

Here is an geographic plot of the places in New York City that are characterized as “Chinese Restaurant”. Highlighted is the cluster found by OPTICS of a dense area of Chinese restaurants – Chinatown.

Here is another plot that now shows places characterized as “Lunchtime”, or places that are their busiest from 11AM-1PM.

In this way, we can characterize the areas of a city by the clusters that are present. And, by overlapping clusters, we can find the areas of intersect that have homogeneous qualities across many characteristics, leading us to neighborhoods!

Here is a look at only the clusters corresponding to “Late Evening”, or 10PM to 2AM (in blue) and “Nightlife Spots” (in red). The purple is their overlap.

We see a lot of overlap in the clusters, leading us to the possibility that we could define neighborhood boundaries using this method. We also see that the nightlife spots are indeed busy in the late evening, as we would expect, but not all late evening clusters have dense nightlife spots. Thus, some characteristics of different neighborhoods emerge – this is the feel or vibe that we want to quantify. However, this is only an example from just looking at two characteristics. With more characteristics overlapping, we can do an even better job of finding neighborhoods and characterizing them based on place, time, and local/tourist.

This work is still in progress, and a web application for anyone to explore the various clusters is forthcoming. I touched earlier upon many more characteristics that can define a neighborhood that I have not delved into, including things like demographics of occupation, ethnicity, age, and more. Activities, such as working or shopping, were also not explicitly studied, though they may be inferred from the characteristics of place and time. In the future, these characteristics and more could be added to improve this study. Newer data sets could also be used to observe changes as a result of time and find if neighborhoods have moved, grown, shrunk, appeared or disappeared. The addition of cities (I am looking at New York City and London at the moment) is always good, though we are limited to only large cities that have enough volume in check-ins to do analysis on. An interesting related task would be to find neighborhoods that are similar across cities, as this post attempts. Comparisons of cities are also possible, as I happened upon striking differences between the check-in activity of New York City and London. Last, as more social media websites incorporate location-tagging, we could replicate this analysis on their data and possibly increase the size of our user demographic.

Where in the World Are You?

This is the question Twitter puts forth to every user on their profile page. But what are people actually putting down for their location? As I delved into hundreds of thousands of public profile location fields, the answer is: not always their actual location, and when it is, not often in an easily readable format for computers to understand. Along the way, I learned many insights both interesting and silly on how people express themselves through their location field.

When my research on local day-to-day patterns on Twitter began, one of the first steps we had to accomplish was to figure out where tweets were coming from. There were two ways to do this.

  1. If a tweet is sent from a phone or a browser that has geotagging enabled, or if a tweet is imported from a tagging service such as FourSquare, then it contains geo coordinates. How I took these coordinates and turned them into proper city names (reverse geocoding) is a subject for another post.
  2. Attempting to make sense of that location field in every user’s profile and translate it to an actual city name.

However, a cursory glance at a sample of locations that users input will show how difficult it can be to identify the name of a city from the many and various creative ways Twitter users express themselves through their location field.

Here’s a random peek:

Sevilla Andalucía España
252 — dmv
Above The Clouds
Lion City!
last exit to sumerland
At Northies you will find me
SomeWhere ….
Boston! Green all day!
canada, eh?
boston & new york .
SJC – SP – Brasil
Monumen Nasional, Jakarta
a bed.
Caracas Venezuela
Makassar, Sulawesi Selatan
Swimming Pool
Ur timeline
boston & new york .
in the shop doing hair
A place Quiet and Cozy
Sweden!!ღ ♫
Mi paraiso

Luckily, Twitter provides a Search API that allows you to collect tweets that are located within a given radius of a point (exactly how Twitter comes up with the geo for these tweets I am not entirely sure). The Social Media lab at Rutgers has been collecting these tweets for 57 selected cities mostly in the U.S. for over a year, amassing a data set of over 700 million tweets. The logical next step for me was to go through the tagged tweets, look at the location field from their user profiles, and find the most popular terms for each city. Demonstrating how difficult parsing these location fields are, there were many terms that were simply flat out wrong or very questionable/vague, and I had to go through a lot of pruning.

Here are a small number of user locations that had been tagged to New York, NY, but that I took out for one reason or another:

Waverly Place
Thames Street
Off the Wall
Third World
you’r world
Financial Freedom
Sapphire World
hogwarts on waverly place.
Tha Wurld
1st Place
your sin! my city : D
Probably In The Mall
around the corner!
Nick World
eight prince
Donghae’s room

A lot of knowing what was an error and what wasn’t an error required me to delve into my knowledge of cities and popular trivia. For instance, I knew to take out Waverly Place, because it probably meant that the user was a fan of the show Wizards of Waverly Place on Disney as opposed to living in Waverly Place in NYC. Other terms were complete question marks to me, and required searching on the web and

Some interesting/random things I have found/mused about while doing this filtering:

  • Denver, CO is also commonly known as the “Mile High City.”
  • From Atlanta, GA? How about Hotlanta? Also sometimes referred to as  “Black Hollywood.”
  • For the longest time, I kept seeing varying versions of “DMV” popping up, and I would be so confused. To me, DMV stands for Department of Motor Vehicles. Google agreed with me as well, but a quick search on urbandictionary revealed that, sure enough, it stands for “D.C. Maryland Virginia.”
  • The many, many ways to incorporate “Bieber” into one’s location. One ex: Bieberlulu (Honalulu, HI). Also, an alarming number of users located in “biebers pants.”
  • “Springfield” – enough information to infer Springfield, IL? Probably not, considering there are 34 cities in the U.S. with this name and 36 townships. Oh, and the Simpsons live in “Springfield.”
  • “AdventureLand” – mistake? Or…a sprawling family resort in Des Moines, IA.
  • Why is “New Yawkkk” way more popular than, say, “New Yawkk,” “New Yawwk,” or “New Yawkkkk”? Perhaps it has something to with the fact that Snooki (@Sn00ki) from Jersey Shore lists “New Yawkkk” as her location?
  • More people list their location as “Quahog,” fictitious home of Family Guy, than “Cape Cod, MA.” Also Twitter seems to think Quahog is in MA.
  • A cool nickname for Long Island, a borough of NYC – “Strong Island.”
  • A not so cool nickname for Salt Lake City, UT – “SL, UT.”
  • “Bunny Ranch” – not a mistake by Twitter but actually a “famous” brothel in Carson City, NV.
  • Nashville, TN is also known as “Music City.” Apparently, it’s also known for money. “Cashville” and “Na$hville” were very popular.
  • “Noho” – confusingly standing for both NOrth HOuston (NYC) and NOrth HOllywood (LA). Other ambiguities: “Chinatown,” “Downtown,” “Uptown,” etc.
  • I found lots and lots examples of users listing a whole string of places for a location. For instance, something like “NY x LA” or, even crazier – “NY,LA,ATL,MIA,RIO,LON,PAR,LAG.” I wonder how many of these are large companies or actual globetrotters as opposed to the occasional vacationer that likes to inflate their travel time.
  • Wishful thinking as well as excited announcements of moving abounded. I found many variations of “Wishing I were in X”, “It should be X,” “X, one day…”, and “Y, but very soon to be X,” the last one often punctuated with exclamation points and smileys.
  • The location field can also be a place for clever little jokes (“behind U,” “near your ear”, “where ur not”) or possibly, flirtations?? (“ur place,” “in ya mouth”)

Finished product: a clean and unambiguous list of top location field entries for each city we were crawling. From here, I use these lists to geo-code millions of tweets to my selected cities if the user location field matches up with an entry on the list.

Find this topic (location fields in Twitter) interesting? Many more Bieber-isms and much more in-depth research on user location fields in the following research paper by Hecht, Hong, Suh, and Chi: Tweets from Justin Bieber’s Heart [pdf]