Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Microsoft Research Unveils Technologies to Improve the Web Experience

May 8, 2007 | By Microsoft blog editor

By Rob Knies, Managing Editor, Microsoft Research

Battling search spam. Streamlining Web-page monitoring. Helping protect online privacy. Enabling the illiterate to use computers.

These are just a few of the ways Microsoft Research is demonstrating its commitment to making the Internet a more secure, easily searchable, user-friendly destination for consumers worldwide.

wwwEach of those goals was featured May 8-12 during the 16th International World Wide Web Conference (WWW 2007), to be held at the Fairmont Banff Springs Hotel, located in Alberta’s Banff National Park.

The conference, which attracts innovators, decision-makers, technologists, businesses, and standards bodies from around the globe, is an annual gathering to discuss the future of the Web. And, as is customary, Microsoft Research was fully invested in supporting those efforts.

In its five labs worldwide, Microsoft Research undertakes a wide variety of projects designed to enhance the value of the World Wide Web, in areas as diverse as security, search, user interfaces, data mining, and technology for emerging markets.

Of 111 papers accepted for the conference, 16—14 percent—were submitted by Microsoft Research, the most of any single organization represented at the event. Four of Microsoft Research’s five worldwide labs had papers accepted, and one of the papers—Wherefore Art Thou R3579? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, co-authored by Lars Backstrom and Jon Kleinberg of Cornell University in collaboration with Cynthia Dwork, a principal researcher for Microsoft Research Silicon Valley—received the conference’s Best Paper Award.

Bill Buxton, Microsoft Research principal researcher, served as a plenary speaker on May 11, delivering a commentary on social networking and Web communities entitled Design for the World Narrow Web.

He was hardly alone. Colleague Kentaro Toyama, assistant managing director of Microsoft Research India, participated in a panel discussion on Web Delivery Models for Developing Regions. Susan Dumais, principal researcher for Microsoft Research Redmond, also served as a panelist, on the topic of Searching Personal Content.

A workshop on Adversarial Information Retrieval on the Web included participation by Microsoft Research’s Krysta Svore, Qiang Wu, and Chris J.C. Burges, along with Microsoft’s Aaswath Raman, authors of the paper Improving Web Spam Classification using Rank-Time Features. Another paper delivered as part of that workshop was Transductive Link Spam Detection, written by Burges, colleague Dengyong Zhou, and Microsoft’s Tao Tao.

Marc Najork, principal researcher for Microsoft Research Silicon Valley, served as track chair for the Tutorials and Workshops committee. Toyama was deputy chair for the Technology for Developing Regions committee, and Xing Xie, lead researcher for Microsoft Research Asia, was the deputy chair for the Browsers and User Interfaces committee. No fewer than a dozen other Microsoft Research representatives participated as members of various WWW 2007 committees.

Such conference support will be further in evidence in 2008, when the event will be held in Beijing. Hsiao-Wuen Hon, principal researcher and deputy managing director for Microsoft Research Asia, will be the vice general chair for WWW 2008, and Wei-Ying Ma, principal researcher and research manager for the same lab, will be a program chair.

Collaboration, as always, was a hallmark of Microsoft Research’s participation in WWW 2007. Of the 16 papers accepted from the organization, 10 of them featured co-authorships with academic colleagues, representing 12 universities from around the world. Microsoft Research also contributed five poster papers to the conference, and four of those represented collaboration with academic partners.

Stopping Search Spam

Among those academic collaborations was a paper entitled Spam Double-Funnel: Connecting Web Spammers with Advertisers, part of the conference’s Industrial Practice and Experience track. The paper was co-written by Yi-Min Wang, principal researcher of Microsoft Research Redmond’s Cybersecurity and Systems Management research group; Ming Ma, a research software-design engineer in the same group; and Yuan Niu and Hao Chen of the University of California, Davis.

“Our goal is to provide visibility into the complicated structure of the search-spam industry,” Wang says, “to educate the user community and the search industry on how search spammers operate and to suggest how good guys can work together to win the war against the bad guys.”

Search spammers use questionable search-engine-optimization techniques to promote low-quality Web pages into top search results, Wang explains. These attempts waste the time of users, who are conned into visiting junk pages before finding one with useful content.

“In contrast with the common approach to search spam by merely detecting and blacklisting spam pages,” Wang says, “our study pursues a new, ‘follow the money’ strategy by identifying the actual companies and individuals who are involved in the search-spam industry to make money.

“We show that a large part of the search-spam industry is based on advertising syndication, and it can be modeled as a double funnel with five layers. We expose the major players at each level and suggest a more effective anti-spam approach by attacking the bottleneck.”

Consolidating Web Updates

Another way to assist Web users is to make it easier for them to monitor pages they have identified as personally useful. This is the idea behind Homepage Live: Automatic Block Tracing for Web Personalization, a WWW 2007 paper co-written by Jie Han, Dingyi Han, and Yong Yu, of Shanghai Jiao Tong University, along with Chenxi Lin, Hua-Jun Zeng, and Zheng Chen of Microsoft Research Asia, to be delivered as part of the Personalization session of the conference’s Browsers and User Interfaces track.

“We want to enable Web users to mark blocks in Web pages and trace this block through the life of the Web page,” Chen says. “Our application allows users the freedom to virtually mark any block within a Web page and automatically trace the blocks when the pages change.”

The Homepage Live project works like this: A user selects a section of a Web page to track, and a technique called block tracing keeps that selection updated as the page is updated. The user can collect a number of sections of his or her favorite Web pages and assemble those sections on a customized page, thereby keeping abreast of pertinent information as it is updated.

“Our application can enhance the Web experience for users,” Chen explains, “by making browsing more efficient. Users no longer need to visit their favorite Web pages repeatedly. They can just mark blocks within their favorite Web pages and organize those blocks into a single page. With those simple steps, users will be able to follow all their favorite Web pages from a single page.”

Helping Protect Privacy on Social Networks

Then there is the winner of the WWW 2007 Best Paper Award, Wherefore Art Thou R3579? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, part of the WWW 2007 Data Mining track’s Mining in Social Networks session.

The paper’s amusing title masks a serious concern. Some social-network sites on the Web have suggested anonymization of the communications within those networks. Dwork and her Cornell colleagues argue that such efforts would destroy the privacy of participants.

“We described two attacks, one active, one passive,” Dwork says. “The heart of both attacks is to create a small structure in the communication graph that can be recognized. This structure corresponds to a small subgraph, where each vertex is a user account and an edge between vertices indicates communication between the two user accounts.

“Once an attacker has located the structure, she or he can find the connection pattern between any two accounts that are both connected to the structure. For example, a small group of friends can together find out whether Alice and Bob, each of whom is linked to the small group, are in communication with one another.”

Such discoveries could wreak havoc on the implied trust social networks seem to offer.

“Our project,” Dwork says, “is on privacy-preserving analysis of data. The goal is to enable site hosts to reveal interesting information about the social-networking graph hosted on their computers without compromising privacy.”

Enabling Non-Readers to Use Computers

Microsoft Research India has been pursuing intriguing work on enabling illiterate or semi-literate persons to make effective use of PCs. A paper by Indrani Medhi, Archana Prasad, and Toyama of Microsoft Research India—called Optimal Audio-Visual Representations for Illiterate Users of Computers, part of the Communication in Developing Regions session of the conference’s Technology for Developing Regions track—marks the latest step in the lab’s research.

“We wanted to find out what was the most comprehensible way to represent concepts to a non-literate person,” Medhi explains. “The project was a careful study comparing a variety of different representational types.

“We tested how health symptoms could best be represented, with a subject group of 200 illiterate people. For each, we randomly selected one representation from among 10: text, static drawings, static photographs, hand-drawn animations, and video, each with and without voice annotation.”

The results of the study were interesting:

  • Voice annotation helped users understand quicker, but the target population was sometimes confused by the combination of audio and visual information.
  • Richer information was not necessarily better understood.
  • Various factors influence the comparative effectiveness of dynamic versus static images.

“We hope that the results of our research will improve the design of user interfaces for illiterate and first-time computer users,” Medhi concludes. “The results of this study would apply to help make any Web site comprehensible to illiterate users.”

Up Next

computer data center

Security, privacy, and cryptography, Systems and networking

Microsoft Research presents its latest advances in computer systems at OSDI 2018

Researchers from Microsoft Research will present their latest advances in computer systems at the USENIX Symposium on Operating Systems Design and Implementation 2018 — the biennial flagship conference for systems research — October 8–10 in Carlsbad, California. These advances cover a broad spectrum of topics, as evident by the number of papers coauthored by researchers […]

Srinath Setty

Principal Researcher

Search and information retrieval

In between the lines: Broadening access to web search by understanding the needs of people with dyslexia

People with dyslexia perceive the world in different ways when it comes to the arrangement of letters and words on paper – or, these days, on web pages. Indeed, so much of modern life and accessibility to the information that enables us to participate in it depends on our ability to perceive and process online […]

Microsoft blog editor

Artificial intelligence, Search and information retrieval

Adversarial and reinforcement learning-based approaches to information retrieval

Traditionally, machine learning based approaches to information retrieval have taken the form of supervised learning-to-rank models. Recent advances in other machine learning approaches—such as adversarial learning and reinforcement learning—should find interesting new applications in future retrieval systems. At Microsoft AI & Research, we have been exploring some of these methods in the context of web […]

Bhaskar Mitra

Principal Applied Scientist