Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Researchers Ride the Twitter Wave

August 6, 2009 | By Microsoft blog editor

By Rob Knies, Managing Editor, Microsoft Research

He rocks in the treetops all the day long,

Hoppin’ and a-boppin’ and a-singin’ his song.

All the little birds on Jaybird Street

Love to hear the robin go tweet tweet tweet …

* * *

When L.A. R&B singer Bobby Day took Jimmie Thomas’ lyrics to the top of the charts in the summer of 1958—a tune memorably revived in 1972 by a 13-year-old Michael Jackson—there was no way to foresee how those words would resonate a half-century later.

But they certainly do. Twitter, the wildly popular micro-blogging service, has become an Internet sensation, with millions flocking to the site each month to post a jittery stream of brief status updates. Whether it’s Ashton Kutcher or your cousin Sue, these days, it seems, everybody wants to emulate Rockin’ Robin.

That’s fine. We’re all fascinated by a genuine pop phenomenon. But what does it all mean?

This summer, a handful of young researchers at Microsoft Research’s New England and Redmond labs, are aiming to find out.

Microsoft Research New England interns

danah boyd (second from left) with her Microsoft Research New England interns Alice Marwick (left), Sarita Yardi, and Scott Golder.

No fewer than a half-dozen Microsoft Research interns are pursuing various Twitter-related projects this summer, five of them based at Microsoft Research New England. That makes sense, because that facility includes one of the world’s foremost social-media experts, danah boyd. And, as she notes, Twitter provides an enticing opportunity for researchers, not for its technological advances, but because of how people use it.

“Twitter is interesting because of what actually takes place on top of it,” boyd says, “how a very simple system can be used for so many diverse kinds of practices.

“One of the reasons for investigating these things from different angles is to say: ‘What are some of these complex practices? Let’s actually map out what’s going on in order to step back and think about it.’ Getting at both the practices that unfold and the way these phenomena look is really critical for our long-term understanding of what’s going on in the media landscape.”

Hence the multifaceted approach taken by the Twitter work being conducted within Microsoft Research. Some of the projects take a big-picture view of what is occurring, with the intent of generalizing a mathematical model of the Twittersphere. Others zoom in to take a close look at specific segments of the phenomenon. Each is fascinating in its own right, and the research is particularly compelling because data simply continues to pour in from those never-ending tweets.

“Doing research while it’s moving is the best way to get a handle on things,” boyd says. “When you enter a phenomenon, you don’t know what you’re going to measure at first. You have to be deeply embedded in it to see things changing.”

Whirlwind Analysis

Sarita Yardi, a fourth-year Ph.D. student at Georgia Tech’s School of Interactive Computing, knows about that. Not long after she began her Microsoft Research New England internship, she found herself in the midst of a Twitter maelstrom.

“I’m looking at how news stories spread on Twitter, in a couple of different dimensions,” she says. “You hear of a big story, and you know that a lot of people are using Twitter within local communities to share stories like ‘Hey, there’s a car crash here; take this different route.”

On Twitter, though, communities don’t necessarily have to be local.

“We started tracking the tweets about the death of [abortion doctor] George Tiller,” Yardi says. “At first, you saw people tweeting about the event, just that it happened. Then people started tweeting about their opinions related to this. Obviously, it’s a very controversial issue; you have people with one position tweeting about it, and people on the other side are also stating their views. They’re trying to define their opinions, whether it be pro-life or pro-choice.

“We’re looking at a large data set of maybe about 30,000 tweets and saying: ‘What were people saying about this? Were they talking to each other? What kind of news sources did they use to spread this information?”

The sheer volume of the data requires special effort.

“The challenge,” she says, “is figuring out what to do with that. I have 30,000 tweets about George Tiller over a couple of days. How do I make sense of that? I can use a text parser to determine what people are saying. I can look at date/time stamps and plot when they’re tweeting.

“I’d been here about two weeks and decided: ‘This is pretty interesting. I’m interested in how people form opinions on this.’ It’s taking me into new research directions, which I hadn’t anticipated, but they are very interesting and useful.”

Celebrity Tweets

Alice Marwick, a Ph.D. candidate in the Department of Media, Culture, and Communication at New York University, is investigating the use of Twitter by celebrities, which gained popular currency in April when Kutcher became the first person to collect a million Twitter followers. Four months later, he, Ellen DeGeneres, and Britney Spears each have surpassed 2 million, and Oprah Winfrey is not far behind.

“When you have people with audiences of 2 million followers,” Marwick asks, “how do you manage your audience? How do you think of your audience?

“One of my projects looks at the way these celebrity users deal with varied audiences. They have fans, they have other celebrities, they have coworkers, gossip columnists, makeup artists. Usually, those kinds of interactions are very different. A celebrity will be very different when they’re interviewing with a gossip magazine than they will be to a fan club or to one of their friends. We’re interested in how people in general manage multiple audiences in social media.”

Marwick is also examining how celebrities communicate with their fans on Twitter. And that leads to a larger question of the very nature of celebrityhood on the Internet.

“My other project is looking at the ways that fans have access to celebrities on Twitter,” she explains. “There’s this idea that Twitter is a way for celebrities to interact directly with fans, to be very candid with them, to give them insider information about their lives. Until recently, that has been seen as taboo. There have always been intermediaries, like public-relations agents or movie studios, creating this kind of barrier.

“I’m looking at the differences between celebrities and what we call micro-celebrities, people whose fame basically comes out of the Internet and who are very used to having direct access to their fans.”

As examples of “micro-celebrities,” she cites people like Kevin Rose, the founder of Digg; video blogger iJustine, and notorious MySpace/MTV personality Tila Tequila.

“We’re interested in looking at whether there really is greater access,” Marwick says, “or whether that’s something the celebrities perpetuate but it’s not actually going on. iJustine takes questions on Twitter and produces a short video every week where she answers the questions from her followers. There’s a lot of reciprocity between her celebrity and her fans, whereas someone like Britney Spears doesn’t ever have to directly interact with her fans if she doesn’t want to.”

Emerging Patterns

Scott Golder, the third of boyd’s 2009 interns, is investigating the patterns of social connection among Twitter users.

“My graduate work is in the area of social networks, the patterns of connections between members of a social group,” says Golder, who is pursuing a Ph.D. in sociology at Cornell. “I am particularly interested in how one’s connections to others affects how one sees the world and judges things to be good or bad, popular or unpopular, expected or unexpected.”

One project, Golder says, is “a laboratory experiment examining how one’s ‘neighborhood’ within a large social network—in this case, Twitter—can be used to identify which users might be interested in one another.”

The results of this research might improve social media users’ ability to find people they already know—or to identify others they’d find interesting.

“My second project involves understanding how large corporations are using social media, specifically Twitter, to do marketing, public relations, and customer service,” Golder adds. “This involves a combination of quantitative data analysis from communication logs and interviews with Twitter users, both inside and outside of corporations.

“Corporations have traditionally had tight control over who can speak to the public on behalf of the corporation. Marketing and corporate-communications departments consciously control the messages they share. Likewise, though customer-service representatives talk directly to customers, they have scripts and rules that guide these interactions. The casual and personal mode of communication in social media such as Twitter raise the possibility of causing conflict within the corporation as their modes of communication and oversight conflict. The results of this research will be helpful to large companies and organizations that are trying to manage their corporation’s use of social media and improve their use of social media to reach their customers.”

It’s an ambitious agenda.

“As you can see,” Golder concludes, “these two projects require a variety of techniques—analysis of both lab and field data, and both qualitative and quantitative data. I appreciate getting the opportunity to use and improve a variety of skills while here at Microsoft.”

A Model Approach

Daniel Romero and Grant Schoenebeck, two more New England-based interns, are looking at the Twitter phenomenon on a macro-level. They are working together to construct a mathematical model of the activity on the site.

Grant Schoenebeck

Grant Schoenebeck

“We want to use this,” explains Schoenebeck, a fourth-year graduate student at the University of California, Berkeley, “to explain certain phenomena. For example, when is a topic likely to reach a large audience vs. when is it likely to disappear quickly and quietly? What types of users tends to be the most instrumental in starting or continuing a trend? Can we efficiently discern communities in the network by looking at information flow?”

The pair addresses the question from somewhat different backgrounds. Schoenebeck’s research focuses on computational complexity, the study of what makes problems computationally hard to solve. On the other hand, Romero, a Cornell Ph.D. student in applied mathematics, specializes in modeling online social networks, empirically and analytically, then tries to build a mathematical model that explains the system at hand.

“Twitter is a social network,” Romero says. “People have followers and followees. You could think of Twitter as a directed graph where people give information to each other on this directed graph.

“What we’re interested in is to understand how information flows in this directed graph and why it flows in a different way than it does on regular blogs.”

For example, Twitter’s 140-character maximum means that, by its very nature, it differs in its ability to deliver information from blogs, in which writers can take as much room as they require.

“When people do micro-blogging,” Romero says, “they tend to write information much more often than people who blog. This makes a difference in how information flows on these graphs.

Daniel Romero

Daniel Romero

“What we’re interested in is to make a model that explains how information flows on Twitter and how that model would change if you were to apply it to blogs, then see the difference between these two. Maybe that would explain why Twitter has been so successful.”

Schoenebeck finds himself increasingly intrigued by the intersection of computational complexity theory and social networking.

“The intersection of these two topics has just begun to be explored but has the potential for many interesting results,” he says. “Unlike most computational problems, where you are allowed to see the entire input, in social networking, each player can be thought of as a computational node that can only interact with its neighbors.

“Say we want an entire network to coordinate on some decision—conventions for Twitter use, or which side of the road to drive on. Given such a problem, will people ever all agree? What kinds of network structures make coordination more or less likely?”

But unlike some of their colleagues, the pair aren’t as interested in the content of the tweets they analyze, but more in the behavior of the network itself.

“If there is a piece of information that a random Twitter user would like to spread, what is the probability that this message will spread, and what does it depend on?” Romero asks. “Does it depend on who follows this person, or does it depend on the topological specifics of the graph?”

Comparative Research

And then there’s Jiang Yang, who’s been working at Microsoft Research Redmond with Scott Counts, who focuses on building and researching social software based on psychological principles to facilitate online social interactions, networks, and distributed collaboration. Yang, too, has an interest in comparing blogging with micro-blogging.

Jiang Yang

Jiang Yang with her pal Watson

“The comparison will center on the nature of the two blogging services as social media and their information-diffusion structures,” Yang says. “By investigating these, we would be able to answer the difference in their contribution patterns, how individuals influence one another, what different information content they provide, and what role the media play in the design space.”

Yang, from the University of Michigan, is using a semi-structured exploration. Based on the research questions she has formulated, she plans to use statistical tools and to develop different measures to make the comparison, which also will employ innovative visualizations.

“The study,” she says, “has two primary goals that are also interlinked. In the design space of blogging services, we hope to look at the different role that micro-blogging plays and its design implications.

“Blogging has long been studied as a medium of information diffusion, and micro-blogging has started to be used for marketing. Analyzing the differences and similarities in terms of information-diffusion structure and efficiency can yield valuable knowledge to the proper use of each.”

One thing’s for certain about all this Microsoft Research activity: The sheer speed at which huge volumes of information are being exchanged on Twitter is keeping the involved researchers on their toes.

“Twitter moves quickly,” Yardi admits, “but it also means that it gives us a lot faster access to data. That means that we just have to move at the rate Twitter is moving.”

Up Next

Black and white photo of Karthik Ramachandra

Data platforms and analytics

Froid and the relational database query quandary with Dr. Karthik Ramachandra

Episode 73, April 24, 2019 - In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users prefer a mix of imperative programming, along with declarative SQL, because its user-defined functions (or UDFs) allow for good software engineering practices like modularity, readability and re-usability. Sadly, these benefits have traditionally come with a huge performance penalty, rendering them impractical in most situations. That bothered Dr. Karthik Ramachandra, a Senior Applied Scientist at Microsoft Research India, so he’s spent a great deal of his career working on improving an imperative complement to SQL in database systems. Today, Dr. Ramachandra gives us an overview of the historic trade-offs between declarative and imperative programming paradigms, tells us some fantastic stories, including The Tale of Two Engineers and The UDF Story, Parts 1 and 2, and introduces us to Froid – that’s F-R-O-I-D, not the Austrian psychoanalyst – which is an extensible, language-agnostic framework for optimizing imperative functions in databases, offering the benefits of UDFs without sacrificing performance.

Microsoft blog editor

The Future is Fusion with Asta Roseway podcast

Social sciences

The future is fusion with Asta Roseway

Episode 44, October 3, 2018 - Asta Roseway gives an inside look at one of the most unconventional labs at Microsoft Research. Located at the intersection of science, technology and art, it’s a lab that insists that technology, like art, should push boundaries, tell stories and feed our souls. Get ready for the unexpected because when Asta asks “what if?” you’re likely to find yourself immersed in a world of responsive clothing, smart tattoos, talking plants and even environmentally sensitive… makeup!

Microsoft blog editor

Nancy Baym

Social sciences

Playing to the crowd and other social media mandates with Dr. Nancy Baym

Episode 41, September 12, 2018 - Dr. Nancy Baym shares her insights on a host of topics ranging from the arduous maintenance requirements of social media, to the dialectic tension between connection and privacy, to the funhouse mirror nature of emerging technologies. She also talks about her new book, Playing to the Crowd: Musicians, Audiences and the Intimate Work of Connection, which explores how the internet transformed – for better and worse – the relationship between artists and their fans.

Microsoft blog editor