Keeping E-Mail Safe: Microsoft Co-Sponsors Conference on E-Mail and Anti-Spam

Published

By Rob Knies, Managing Editor, Microsoft Research

Joshua Goodman’s grandfather recently got himself a new computer. He’s a medical writer who had been accustomed to typing his columns and mailing them to his publishers. But the publishers increasingly began to ask that he submit his material via e-mail, so he went out and purchased a new PC.

Joshua Goodman’s grandfather is 90 years old.

Microsoft research podcast

Abstracts: August 15, 2024

Advanced AI may make it easier for bad actors to deceive others online. A multidisciplinary research team is exploring one solution: a credential that allows people to show they’re not bots without sharing identifying information. Shrey Jain and Zoë Hitzig explain.

Such stories—we’ve all heard them—illustrate the importance society has come to place on e-mail in the Internet age. They also underscore the importance of ensuring that e-mail is easy to use and free of irritants such as spam and phishing attacks, which often target the elderly with their scams.

Goodman, a researcher for the Machine Learning and Applied Statistics (MLAS) Group within Microsoft Research, is among those committed to combating spam and phishing. He is serving as general conference chair for the second Conference on Email and Anti-Spam (CEAS), scheduled for July 21-22 at Stanford University, in cooperation with the International Association for Cryptologic Research and the Institute of Electrical and Electronics Engineers’ Technical Committee on Security and Privacy. Microsoft is a co-sponsor of the conference.

“E-mail has turned into this thing that everyone uses,” Goodman said. “It’s a large driving force in people’s lives. It’s key for e-commerce, but flaws in it also create huge vulnerabilities. It’s an extremely important area, and there’s a lot of interest and a lot of different areas to explore.”

The first CEAS, also co-sponsored by Microsoft, was held in Mountain View, Calif., in July 2004, with David Heckerman, MLAS research area manager for Microsoft Research, as general conference chair and Goodman acting as program co-chair. Heckerman also co-authored one of the seminal papers on statistical spam filtering.

“E-mail is the No. 1 application that people use, and spam continues to be a major problem with e-mail”, Goodman said. “It’s amazing there wasn’t a conference like this sooner.”

Microsoft authors wrote seven of the 26 papers accepted for this year’s conference, and they’ll be presenting their findings to a gathering significantly different from many such events.

“Most computer-science conferences are mainly attended by researchers,” Goodman said. “We’re making this one a little more interdisciplinary. We’ll have a lot of people from industrial research labs and a lot of academic researchers, and we’ll have people from companies that build spam-fighting software.

“That’s what great about this conference: You get to present to people who can actually use your ideas, and you get feedback from people who have experience actually deploying these systems.”

The Microsoft papers accepted for the conference include:

  • Automatic Discovery of Personal Topics to Organize Email, by Arun Surendran, John Platt, and Erin Renshaw: This paper describes a procedure to discover a user’s personal topics by clustering the user’s e-mail. Topics are automatically labeled by the use of appropriate keywords, which are obtained using domain knowledge about e-mail and the workplace of the user. An e-mail/document browser uses the keywords as standing queries to create virtual folders that organize, index, and retrieve e-mail efficiently.
  • Searching For John Doe: Finding Spammers and Phishers, by Aaron Kornblum: Researchers from MSR are not the only ones presenting during this conference. Kornblum is an attorney at Microsoft who works on fighting spam. This paper describes some of Microsoft’s efforts to find spammers, many of whom work hard to be anonymous. By subpoenaing information and “following the money,” even clever spammers often can be tracked down, the first step in a successful lawsuit.
  • The Social Network and Relationship Finder: Social Sorting for Email Triage, by Carman Neustaedter (of the University of Calgary), A.J. Bernheim Brush, Marc A. Smith, and Danyel Fisher: E-mail triage is the process of examining unhandled e-mail and deciding what to do with it. Studies have found that people use a variety of approaches to triage their e-mail, many of which have a social component. The Social Network and Relationship Finder aggregates social meta-data about e-mail correspondents to help people sort their mail faster.
  • Computers Beat Humans at Single Character Recognition in Reading Based Human Interaction Proofs (HIPs), by Kumar Chellapilla, Kevin Larson, Patrice Simard, and Mary Czerwinski: Human interaction proofs (HIPs) are challenges designed to be easily solved by humans but too hard for computers to solve. They have become commonplace on the Internet for protecting online services, such as free e-mail systems, from abuse by automated scripts and/or bots. For a computer to solve these problems, it must solve both the segmentation problem: finding where the letters are, and the recognition problem: reading the individual letters. This paper compares human and computer single-character recognition abilities and demonstrates that computers are as good as or better than humans at single-character recognition in HIPs. Using this knowledge, the researchers hope to build better HIP systems, in part by focusing on making the segmentation problem harder.
  • Implicit Queries for Email, by Joshua Goodman and Vitor Carvalho: E-mail is the No. 1 application that people use; search is the other. How can we combine these two systems? This paper tries to automate the process of finding keywords in e-mail to send to a search engine, making search easier for users. The paper shows how to use machine-learning methods to learn what kinds of words are most likely to be relevant. One key idea in the paper is to look at query logs from MSN Search: the words and phrases people have searched for in the past are the ones they will want to search for in the future.
  • Good Word Attacks on Statistical Spam Filters, by Daniel Lowd (of the University of Washington) and Christopher Meek: It has been known for several years that most spam filters are susceptible to “good work attacks,” in which words typically found in good (non-spam) e-mail messages are added to a spam message to trick a filter. This paper carefully examines how well these techniques work and tries to find ways to build filters that are more robust. Unfortunately, the attacks are powerful, and the best method found is simply to retrain and deploy new filters under attack quickly.
  • Forward Thinking, by Marc A. Smith, Jeff Ubois (of UC Berkeley), and Ben Gross (of the University of Illinois at Urbana-Champaign): There has been little research published on e-mail forwarding behavior despite its implications for security, knowledge management, and the design of e-mail interfaces. This paper examines the decisions made in forwarding, reading, and acting on e-mails depending on the credibility of the sender’s reputation.

Eric Bosco, vice president of Communications and Community Engineering for AOL, and Peter Neumann, principal scientist for the Principled Systems Group within the SRI International Computer Science Lab, are the invited speakers for CEAS 2005.

Microsoft employees serving on the CEAS program committee are Heckerman; Eric Horvitz, senior researcher and group manager of the Adaptive Systems and Interaction Group within Microsoft Research; Geoff Hulten, a researcher in the Anti-Spam Technology and Strategy Group; and Platt, a senior researcher in the Microsoft Research Knowledge Tools Group.

Learn more

Continue reading

See all blog posts