Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Find Your Lost Data

November 12, 2004 | By Microsoft blog editor

By Suzanne Ross, Writer, Microsoft Research

The more data you have, the more you know The more you know, the more you forget. The more you forget, the less you know. So why have data?

Microsoft Researchers have an answer for this old, slightly twisted riddle. They’ve put together a nifty interface that will find all the data on your PC that you need, be it email, documents, tablet notes or spreadsheets. You can find all the data that people have sent to you, all the Web pages you’ve ever seen, and all the attachments you’ve ever forgotten to save.

You don’t have to remember where you put stuff, or even exactly what that stuff is. The program, called Stuff I’ve Seen (SIS), indexes everything you care about on your hard drive and your email. It sorts it by date, by type and allows you to filter and refine your search.

“Several years ago Susan Dumais and I realized that the technology existed at Microsoft to do high quality search on all of your stuff, but no one’s done it. So we just did it as a proof of concept. We wanted to index all of your life on the computer,” said Ed Cutrell, one of the researchers on the project.

“When we first developed SIS, we used some classic techniques from information retrieval called ‘best match score.’ We ranked all of the results by that score.”

“It became clear that this just wasn’t enough. The reason why is, when it’s your own stuff you have all of these other, better cognitive associations that helps you remember things. We found the date is far and away the most popular sort order. If you try to sort by date on the Web it’s going to be meaningless to you,” said Cutrell. In contrast, people often know lots of details about their own stuff and remember associations with other things in their lives.

For instance, if you’re looking for an email from your boss about the new product line, you might remember that he sent it sometime before you went on vacation five weeks ago. Using SIS, you could search on the name of your boss and quickly refine the search by the date or other memorable landmarks.

SISSIS is different from Web search in that it’s easy to filter or pivot after the initial search. The Web searches so many documents that search engines have to ask for lots of information up front. The problem with that is, you don’t always know exactly what you’re looking for. You may just have a vague idea that you need some information that was somewhere in an email or document.

With SIS, you can type in your best guess, such as ‘set up a blog,’ and then you can refine the search, filtering by the type of document, from an Excel spreadsheet to a Power Point file to a music file and more. You can further refine by date, rank, author or other properties that you remember about the document.

SIS can also reduce the need for bookmarks and folder organizations. Studies have found that 70% of the Web sites we go to are Web sites we’ve gone to before. Finding them again can be tricky and time consuming. You have to either maintain an extensive file system, or hope that you can remember the exact search keyword you used before to find it again. SIS just automatically saves the Web pages you’ve gone to and adds them to your index.

A few people have wondered if SIS exposes their documents to a ‘big brother.’ No, no need to worry about big bro’. SIS only finds the documents that are already on your local hard drive and all your mail. The only way for someone to get to your data is if they hack your computer, steal your password, or you let them in.

Stuff I Should See

Cutrell and the team at Adaptive Systems and Interaction have added another feature to SIS that helps those of us who don’t know what we know. It’s called Implicit Query, or SIS IQ for short. SIS IQ finds things that we didn’t even realize we needed.

If you’re working on an email or a document, SIS IQ will search your index for information related to that document. You may be responding to a request from someone, and you forgot you had already sent over a thick set of attachments on this same subject to another friend three months ago. SIS IQ will find it for you, and display it unobtrusively on a sidebar next to your work area.

Some people are perfectly happy filing and categorizing their stuff. They have nested folders within nested folders. But for the rest of us, SIS offers a way to just throw everything in one big pile and forget about it. SIS will find it when you need it again.

Up Next

Katja Hofmann

Artificial intelligence, Search and information retrieval

Malmo, Minecraft and machine learning with Dr. Katja Hofmann

Episode 39, August 29, 2018 - Dr. Hofmann talks about her vision of a future where machines learn to collaborate with people and empower them to help solve complex, real-world problems. She also shares the story of how her early years in East Germany, behind the Iron Curtain, shaped her both personally and professionally, and ultimately facilitated a creative, exploratory mindset about computing that informs her work to this day.

Microsoft blog editor

Search and information retrieval

In between the lines: Broadening access to web search by understanding the needs of people with dyslexia

People with dyslexia perceive the world in different ways when it comes to the arrangement of letters and words on paper – or, these days, on web pages. Indeed, so much of modern life and accessibility to the information that enables us to participate in it depends on our ability to perceive and process online […]

Microsoft blog editor

Data management, analysis and visualization, Search and information retrieval

Getting LinkedIn to Data Science with Dr. Igor Perisic

Episode 11, February 7, 2018 - Big data is a big deal, and if you follow the popular technical press, you’ll have heard all the metaphors: data is the new oil, the new bacon, the new currency, the new electricity. It’s even been called the new black. While data may not actually be any of these things, we can say this: in today’s networked world, data is increasingly valuable, and it is essential to research, both basic and applied.

Microsoft blog editor