Data Management, Exploration and Mining (DMX)

Established: March 27, 2000

Overview

The Data Platforms and Analytics pillar currently consists of the Data Management, Mining and Exploration Group (DMX) group, which focuses on solving key problems in information management. Our current areas of focus are infrastructure for large-scale cloud database systems, reducing the total cost of ownership of information management, enabling flexible ways to query, browse and organize rich data sets containing both structured and unstructured data, and the management of database schemas and mappings.

Our research focuses on research projects that produce practical software. Our software has shipped in many Microsoft products and services, including the Database Tuning Advisor (in SQL Server), the Fuzzy Lookup and Fuzzy Grouping operators (Microsoft SQL Server Integration Services (SSIS), and used in Bing Maps and Bing Shopping), the mapping compiler for Microsoft’s ADO.NET Entity Framework, the schema-matching algorithm in Microsoft’s BizTalk Mapper, click-through prediction in Bing search, and the advertisement indexing engine in search advertising, among others.

Our research has also had significant impact in the academic community. We publish in the top conferences in the areas of systems, information retrieval, and database management (SIGMOD, VLDB, SIGKDD, SIGIR, WWW, ICDE, CIDR, etc.). Our work has spawned two VLDB 10-Year Best Paper Awards, and Best Paper awards at SIGMOD, VLDB, ICDE and CIDR, and a ICDE Influential paper Award.

News

Sudipto Das has won the 2013 SIGMOD Jim Gray Doctoral Dissertation Award, honoring the best PhD thesis in database systems of the past year, for his thesis titled “Scalable and Elastic Transactional Data Stores for Cloud Computing Platforms,” at U.C. Santa Barbara.Herodotos Herodotou received an Honorable Mention for the ACM SIGMOD 2013 Dissertation Award, in recognition of his dissertation titled “Automatic Tuning of Data-Intensive Analytical Workloads”.

People

Publications

Videos

Projects

Concept Expansion

Established: November 10, 2014

Given a concept name, and seed entities, return entities and tables in this concept. Sway Presentation

Query Result Navigation

Established: January 9, 2014

Exploratory queries on a database often returns too few or too many results (e.g., a home search query on a database of available homes). In such cases, the user faces the challenges of (i) navigating through too many results and/or…

Synonym Mining

Established: January 7, 2014

The same entity is often referred to in a variety of ways. For example, the camera Canon 600d is also referred to as "canon rebel t3i", the celebrity Jennifer Lopez is also referred to as "jlo" and Seattle Tacoma International…

Rethinking Eventual Consistency

Established: July 31, 2013

The past five years has seen a resurgence of work on replicated, distributed database systems, to meet the demands of intermittently-connected clients and disaster-tolerant database systems that span data centers. Each product or prototype uses a weakened definition of replica-consistency…

SQLVM: Performance Isolation in Multi-Tenant Relational Database-as-a-Service

Established: February 14, 2013

Multi-tenancy and resource sharing are essential to make a Database-as-a-Service (DaaS). However, resource sharing usually results in the performance of one tenant’s workload to be affected by other co-located tenants. In the SQLVM project, our approach to performance isolation in…

Hyder, a transactional indexed-record manager for shared flash

Established: February 8, 2013

Hyder is a transactional indexed-record manager for shared flash. That is, it supports operations on indexed records and transaction operations that bracket the record operations. It is designed to run on a cluster of servers that have shared access to…

Entity Search and Query Portals

Established: March 20, 2011

The goal of entity search is to return entities (e.g., people, products, locations) relevant to a keyword query. The goal of Query Portals is to go one step further and return not only the names of relevant entities but a…

Blews – what the blogosphere tells you about news

Established: February 18, 2008

While typical news-aggregation sites do a good job of clustering news stories according to topic, they leave the reader without information about which stories figure prominently in political discourse. BLEWS uses political blogs to categorize news stories according to their…

Data Exploration

Established: June 8, 2004

This project focuses on novel ways to query, browse, extract, explore, mine and manage various kinds of data residing within the enterprise and on the web: structured data in relational databases, tabular data embedded in web pages, enterprise documents and…

Data Cleaning

Established: July 1, 2002

Poor data quality is a well-known problem in data warehouses that arises for a variety of reasons such as data entry errors and differences in data representation among data sources. For example, one source may use abbreviated state names while…

Data Mining

Established: November 2, 2001

Goal The Knowledge Discovery and Data Mining (KDD) process consists of data selection, data cleaning, data transformation and reduction, mining, interpretation and evaluation, and finally incorporation of the mined "knowledge" with the larger decision making process. The goals of this…

AutoAdmin

Established: November 2, 2001

Database management systems provide functionality that is central to developing business applications. Therefore, database management systems are increasingly being used as an important component in applications. Yet, the problem of tuning database management systems for achieving required performance is significant,…

Visitors and Interns

2016

Interns

Zhao Chang

chang_zhao

Hi, everyone. My name is Zhao Chang. I am a second year PhD student at University of Utah. My advisor is Prof. Feifei Li. My research interests include large-scale data management and data privacy. I like doing sports and reading...

Amit Chavan

amit-chavan

Hi, I am Amit Chavan and I am a PhD student working on large scale data management at the University of Maryland, College Park. My research is about building sustainable and scalable tools for data analysts when they interact with data. I have worked on problems related to version control of large datasets and processing queries on said versioned data. Besides database research, I enjoy sci-fi (in any form J) and photography. More info: http://www.cs.umd.edu/~amitc/..

Yeounoh Chung

yeounoh_home

Hello, I am Yeounoh Chung, a PhD student from Brown. My general research interests span a variety of topics in data exploration; at Brown, my work has been on quantifying uncertainty in data exploration. At MSR, I will be working closely with Christian Konig and Wentao Wu. For more information, please feel free to visit my page at https://cs.brown.edu/~yeounoh/..

Mohammad Dashti

md

I’m a 4th-year PhD student at EPFL University in Switzerland, advised by Prof. Christoph Koch. I am originally from Iran, where I got my bachelor’s and master’s degrees, both from Sharif University of Technology in Tehran. I’m interested in database systems (in particular, transaction processing) and programming languages (in particular, compilation techniques). Achieving high throughput for transaction processing with low latency while keeping the state strongly consistent is a hard problem. In my Phd, I am pushing the limits on this problem, mainly by applying compilation techniques in this context.

Kolya Malkin

kolya_malkin

I’m a second year PhD student at Yale University. My Bachelor’s degree is from the University of Washington. Although most of my waking hours are spent studying algebraic geometry and graph theory, you’re also likely to find me in a coffee shop or at the top of a mountain..

Lukas Maas

Lukas Maas

My name is Lukas Maas and I’m a second-year PhD Student at Harvard University, where I am advised by Stratos Idreos. My research interests lie in the intersection of databases, compilers and software engineering. In particular, I am interested in how computer programs can help us design more robust and flexible data processing systems. To learn more about my research, please see my website: www.lukasmaas.com..

Abolfazl Asudeh Naee

Abolfazl Naee

I am a CS Ph.D. candidate at the University of Texas at Arlington and a member of UTA-DBXLAB, supervised by Dr. Gautam Das. My research interests include Query Reformulation, Top-k Indices, and Hidden Web Databases.

Volleyball is what I love to play. More info: http://asudeh.github.io..

Azade Nazi

Azade Nazi

I am 5th year PhD student in Database Exploration Lab (DBXLAB) at University of Texas at Arlington. I am interested in different areas like Data Mining, Information Retrieval & Web Mining, Hidden Graph, Database, and Social Network Analysis. This is my second internship with DMX group and I am really excited about it. In my free time, I like to play volleyball, ping pong, badminton, or attend group exercise classes..

Tim Kiefer

kiefer

I finished my PhD in the Database Systems Group at TU Dresden, Germany last October… and afterwards immediately went to New Zealand for three months to enjoy the beautiful countryside while travelling by (and living in a) car. When I am not pursuing my research in the areas of load balancing, workload placement, or distributed data management systems in general, I love trampoline gymnastics and rock climbing (actually anything outdoors or sports related)..

Yi Lu

yilu

I am a first-year PhD student from MIT. I am working with Prof. Sam Madden on adaptive data partitioning. I obtained my master degree from Chinese University of Hong Kong with focus on distributed graph processing systems. Besides research, I enjoy hiking, playing ping-pong and watching movies..

Bruhathi Sundarmurthy 

bruhathi

Hello, my name is Bruhathi. I am a PhD Student at the University of Wisconsin-Madison. I am interested in both database theory and database systems. At school I work on problems related to uncertain and incomplete databases and at MSR I will be working on offline query scheduling. Besides work, I love watching and playing tennis, and I also enjoy playing the violin..

Yue Wang

yuewangpicture

Hi, I’m a PhD candidate at the University of Massachusetts Amherst, supervised by Prof. Gerome Miklau and Prof. Alexandra Meliou. I focus on data cleaning and completion time estimation.

I love swimming and hiking. Reading and movie watching also bring me a lot of fun. More about me? https://people.cs.umass.edu/~yuewang/.

Yudian Zheng

yudian

Hi, all ! My name is Yudian Zheng, a 3rd year Ph.D. candidate from the University of Hong Kong. My research interests include a variety of topics such as leveraging the human intelligence to solve complex tasks (crowdsourcing), cleaning dirty data (data cleaning) and mining patterns from web data (data mining).

Besides research, I love watching sports games (especially football) and films. I also like playing computer games with Chinese martial characters. You may refer to my website (http://i.cs.hku.hk/~ydzheng2/) to know more about me.

Erkang (Eric) Zhu

erkang zhu

I am a PhD student in Computer Science at the University of Toronto under the supervision of Professor Renée J. Miller.

I am interested in data management (searching, integration, and analytic) techniques for data on the Web and Open Data. I enjoy programming and maintain a number of open source projects.

In addition to computer related activities, I love traveling, cooking and hanging out with friends and family.

2015

Ahmed El-Kishky

ahmedpic1

I am a second-year PhD student in the Data Mining Group at the University of Illinois at Urbana-Champaign where I’m advised by Professor Jiawei Han. Before joining UIUC, I obtained my Bachelor’s degree in Computer Science and Mathematics from The University of Tulsa.

Generally I’m interested in unstructured data mining, more particularly deriving insight and uncovering hidden structure from large quantities of unstructured text.

In my spare time I enjoy playing racquetball, going to the gym, hiking, traveling to new places, and of course reading.

Anja Gruenheid

anja

I’m a third year PhD student at ETH Zurich, Switzerland, supervised by Donald Kossmann. My research focus is on data management and data integration, specifically on how data changes affect integration tasks. Here at MSR, I work in the related area of data cleaning.

I like to travel a lot and I’m an enthusiastic photographer. I’m also a novice foodie who’s just learning to appreciate all the great cuisines out there.

Ashish Tapdiya

ashish

I am Ashish. I am currently a PhD student at the Vanderbilt University where I work with Dan Fabbri. This summer, I am interning in the DMX group and will be working with Vivek Narasayya to automate the performance management of SQL server in cloud. In my free time I like to hike, bike, watch movies, tennis etc.

Azade Nazi

azadenazi

I am 4th year PhD student in Database Exploration Lab (DBXLAB) at University of Texas at Arlington under supervision of Dr. Gautam Das. I am interested in different areas like Data Mining, Information Retrieval & Web Mining, Hidden Graph, Database, and Social Network Analysis. In my free time, I like to play volleyball, ping pong, badminton, or attend group exercise classes.

Fotios Psallidas

fp_pic

Hello, everyone. I’m a third year PhD student at Columbia University under the supervision of Prof. Luis Gravano. My research interests include (near) real-time structured knowledge discovery and exploration from noisy and possibly disparate sources. In my second internship at the DMX group I will be working with Suajit Chaudhuri and Vivek Narasayya on exciting and challenging problems of enterprise-centric knowledge discovery and searching.

More details on my research interests @ http://www.cs.columbia.edu/~fotis/

Keqian Li

keqianli

I’m Keqian Li, a master student at University of British Columbia interested in both the analytics and system side of big data. My advisor is Prof. Laks Lakshmanan. You can get a sample of my research interest from my homepage: http://www.cs.ubc.ca/~keqianli/. My industry internship experience would often involve large scale data analysis. This summer at MSR I will be working with Kris and Yeye in the area of data cleaning. I like sports, good food, and traveling.

Mayuresh Kunjir

mayuresh

Hi everyone! I’m Mayuresh Kunjir, third year PhD student at Duke University, advised by Dr. Shivnath Babu. At Duke, I work in big data analytics, specifically on cluster resource allocation and job scheduling. Here at Microsoft, I would be working in automated physical database tuning. I enjoy playing Volleyball and Tennis in my spare time. On weekends here, I would love to go hiking nearby.

Shumo Chu

shumo_photo

I am a PhD student from Database Group at University of Washington, Seattle. I am working with Dan Suciu and Magdalena Balazinska on parallel query processing. I had worked with Spanner Group at Google as an intern.

I enjoyed dinning around Seattle and indoor climbing/bouldering with friends in my spare time.

Silu Huang

silu

My name is Silu Huang. I’m a first-year PhD student in UIUC, under the supervision of Prof. Aditya Parameswaran. My research interest lies in data analytics and data management. I obtained my master degree from Chinese University of Hong Kong with focus on graph algorithms. In my spare time, I like doing sports, reading books and watching movies.

Vasileios Verroios

Vasileios

Hi everyone!! I am a PHD student at Stanford University and my advisor is Hector Garcia-Molina. My research lies on the intersection of crowdsourcing and data integration/exploration/mining.

Xu Chu

xuchu

Hello, everyone. My name is Xu Chu. I am a PhD student in Database Group at University of Waterloo. My advisor is Ihab Ilyas. My research focuses on various aspects of data quality management. Example topics are entity resolution, data quality rules discovery and enforcement, human involved data cleaning, and scalable data cleaning. In my free time, I like to watch some TV, hit the gym, and play badminton. This is my second internship with DMX group, working with Yeye He!

My homepage is: https://cs.uwaterloo.ca/~x4chu/

2014

Interns

Bailu Ding

blding

I am a PhD student from Computer Science Department at Cornell. I am working with Johannes Gehrke on database systems.

Xu Chu

xuchu

I am currently a 3rd year PhD student in Database Group at University of Waterloo, Canada. I am working with Prof. Ihab Ilyas. I am generally interested in structured data management. Recently I have been focusing on data cleaning, schema discovery, and data integration. My homepage is: https://cs.uwaterloo.ca/~x4chu/

In my free time, I like to watch some television, play ping pong, badminton, or hit the gym.

Yuliang Li

yuliang li

My name is Yuliang Li. I am a 2nd-year PhD student in UC San Diego. In UCSD, I am working with Alin Deutsch and Victor Vianu in databases and database theory. I also enjoy solving puzzles, formal logics and home cooking.

Jennifer Ortiz

jenniferortiz2

My name is Jennifer Ortiz. I just finished my second year as a PhD student at the University of Washington, advised by Magdalena Balazinska. The main focus on my work is thinking about new ways to help users choose a Cloud service when they wish to explore their data. On the data science side, I have also been involved in collaborating with astronomers to provide the tools needed to help them analyze and understand the merging history behind the galaxies that exist today. Other things I enjoy doing: drinking coffee, watching movies and spending time with my family.

Fotis Psallidas

fotis

My name is Fotis Psallidas and I am a second year PhD student at Columbia University, working with Prof. Luis Gravano. Research-wise, I am interested in combining disparate sources under the goal of extracting interesting patterns. Sometimes I just give up waiting for exact solutions and I try to approximate them. Besides research, I spend time walking// watching movies-tv series// drinking coffee// and going places.

Xiang Ren

xiangren

I’m currently a 2nd year PhD student in Data Mining Group at University of Illinois at Urbana-Champaign, working with Prof. Jiawei Han. Before joining UIUC, I got my bachelor in Computer Science from Zhejiang University, China. My research mainly focuses on mining and constructing text-rich information networks, including applications like search, recommendation and structure enrichment in heterogeneous information networks. My home page is: http://web.engr.illinois.edu/~xren7/

In my spare time, I’d like to play basketball and foosball, go to gym, travel around and do some hiking. I’m also a food lover who will check out local restaurants for all kinds of great food :).

Jayanta Mondal

jayanta2

I am a fourth year PhD student from the university of Maryland. I am a student of prof. Amol Deshapande and I work in the area of processing real-time queries on large-scale graph-structured data. This is my second internship at DMX/MSR and I will be exploring physical database designing with Sudipto Das this summer. Besides computer science, I enjoy trying out different types sports, a recent addition being boxing. I also like travelling (visited 12 national parks so far), photography, and anything related to food (starting from exploring new ingredients to doing online courses).

Saravanan Thirumuruganathan

My name is Saravanan Thirumuruganathan (you can also call me Sara). I am a fourth-year PhD student from University of Texas at Arlington. My advisor is Prof. Gautam Das. I am interested in data exploration, analytics over hidden web databases and social content mining. In my spare time, I love reading books, writing code (and poems!) and taking MOOC courses from other fields. I’m excited to be a part of MSR this summer and sitting in 112/3325.

Jingjing Wang

jingjing

I am a 3rd year PhD student at University of Washington, working with professor Magdalena Balazinska on databases. Before joining UW, I obtained my bachelor degree in Computer Science from Fudan University, China. I also interned in Microsoft Research Asia in 2010, working on web data extraction with Haixun Wang. My research interest generally lies in the area of database systems.

In my free time, I enjoy listening to various kinds of music, watching anime and reading novels. I’m also a fan of sports, I play basketball, Ping-Pong, and other ball games, also go hiking sometimes.

2013

Visitors

Eli Cortez

eli

I am a Visiting Researcher at the Data Management, Exploration and Mining (DMX), which is part of the eXtreme Computing Group (XCG) of Microsoft Research. I received my Ph.D. degree in Computer Science from Federal University of Amazonas/Brazil in December 2012. My dissertation work titled “Unsupervised Information Extraction by Text Segmentation” was awarded by the Brazilian Computer Society as the Best PhD Thesis defended in 2012During my PhD i founded with some friends a startup that provides e-commerce technologies, such as: search, classification and extraction. My broad research interest lies in the area of databases and data mining, more specifically, data exploration and information extraction.

Interns

Renata Borovica

renataI am a PhD student in the Data-intensive Applications and Systems Laboratory (DIAS) at Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland, supervised by Professor Anastasia Ailamaki. Before joining EPFL, I have been working for several years for an IT company as a member of a database team, while obtaining my Master’s degree from the school of Computer Science and Automatics at the University of Novi Sad, Serbia.

My research interests span the area of autonomous database management systems and scientific data management. In particular, I find my passion in the topic of robust query processing. Toward that end, I envision database systems as adaptive systems with the ability to heal themselves in order to provide robust and predictable query execution performance.

I like spending my free time outdoors, either cycling, running, hiking, or simply enjoying a wonderful view. I am also a yoga and karate fan.

More information can be found at http://people.epfl.ch/renata.borovica/bio?lang=en&cvlang=en

Fabian Hüske

fabian

Hi, my name is Fabian. I am a PhD student in the Database Systems and Information Management (DIMA) group at Technische Universität Berlin working with Volker Markl. I received a master in computer science from University of Ulm, Germany. My research interests include massively parallel data processing and query optimization. Apart from research I like to spend time with my kids, try out new recipes, and play field hockey. Visit my website for more details http://www.user.tu-berlin.de/fabian.hueske/.

Feng Li

fengli3

I am currently a 4th year PhD Student in the Department of Computer Science, National University of Singapore. I was honored to join the database group in 2009, supervised by Prof. Ooi Beng Chin. From 2005 to 2009, I studied in Peking University, Beijing, China and obtained my BSc degree from the Department of Electronic Engineering and Computer Science. This is my third internship in DMX group and I am proud of this. My research interests are mainly in MapReduce, cloud computing and database system, including indexing and query processing. I am also interested in microblog data processing.

In my free time, I like to play Ping-Pong and swimming. I also started to play tennis recently. My home page is http://www.comp.nus.edu.sg/~li-feng/.

Lanyue Lu

lanyue

I’m a PhD student of Computer Sciences at University of Wsiconsin – Madison. I work under the guidance of Prof. Andrea C. Arpaci-Dusseau and Prof. Remzi H. Arpaci-Dusseau, as a member of ADvanced Systems Lab (ADSL) and Wisconsin Institute on Software-defined Datacenters in Madison (WISDoM).

My research interests include file & storage systems, operating systems, and cloud computing. You may reach my webpage here: http://pages.cs.wisc.edu/~ll/. I play basketball regularly to keep energetic.

Jayanta Mondal

jmonda

I am Jayanta Mondal, a PhD student at the University of Maryland, College Park, working with Professor Amol Deshpande. My primary research focus has been real-time processing on large volume of graph-structured data, with the high level goal of building end-to-end scalable systems in the cloud. Before I started my graduate study, I was involved with a start-up, and hence like to look at research problems from the point of view of both system developers and application users. Personally, I get excited by cool things and find myself hopping from one hobby to another. My recent passion has been bouldering. Couple of my more long term hobbies have been photography (some of the photographs could be found here) and cooking.

Markus Pilman

mpillman

I am a 2nd year PhD student at ETH located in the nicest city worldwide (Zurich – for the very unlikely case you were wondering which one this might be). I am supervised by Donald Kossmann and I am working in the field of distributed databases. I am working there on a distributed shared memory database.

Besides work, I am a fairly good skier (I grew up in a town in the alps and it was a five minute walk from my home to the next skiing resort) and I do bouldering (some kind of climbing).

Sudip Roy

sudiproy

I am 4th year PhD candidate in the Big Red Data Group at Cornell University. I am advised by Prof. Johannes Gehrke. My broad research interest lies in the area of databases and data mining. At Cornell, I am currently working on exploiting transaction semantics to improve performance of geo-replicated data stores. Further details about me and my research can be found at www.cs.cornell.edu/~sudip.

I play different racquet sports, including squash, badminton, tennis and ping-pong, at different skill levels. I also enjoy hiking, kayaking, and am generally interested exploring new places and activities.

Yanyan Shen

yanyan

Hi, my name is Yanyan Shen, and I am working with Kaushik Chakrabarti & Surajit Chaudhuri on the relational data search project in this summer.

I am a PhD student from National University of Singapore, supervised by Prof. Ooi Beng Chin. I am interested in various aspects in database, from web data management to cloud computing.

It is so great for me to come to a place that has cool weather! And I really love here!If anyone loves movie, music, shopping, hiking, table tennis, I am very happy to talk and play with you. No basketball please, although I am a tall girl. 🙂

Akash Das Sarma

akash1

I am a first year PhD student with Prof. Jennifer Widom at Stanford University. I have interned at MSR in the past (2011 summer with the Theory group) and am excited at the prospect of working in the DMX group this summer. I have most recently been working on crowd algorithms for human computing problems for my PhD, though I have dabbled in a number of other areas during my undergrad years. I enjoy playing chess, basketball and soccer and am always looking for new people to play with!

Thrivikrama Taula

taula

I have finished first year of my Masters from University of Illinois at Urbana-Champaign, where I did my research in Data Mining under Prof. Jiawei Han.

My other interests include following various tech-blogs and posts, financial markets, soccer (die-hard Manchester United fan!), singularity. In my free time, I like to play badminton, yoga or to hit the gym.

Tomasz Tylenda

tomasz_tylenda_1

I am a PhD student at the Max Planck Institute for Informatics in Saarbrücken, Germany, where I work with Prof. Gerhard Weikum. My research interests include knowledge base exploration and information extraction. Prior to joining the Max Planck Institute, I studied in Poland at the University of Wrocław.

In my free time I do sports that don’t use a ball, in particular cycling and rock climbing. I watch alternative movies and read mostly non-fiction.

You can find my homepage at: http://www.mpi-inf.mpg.de/~ttylenda/

Chi Wang

chi3

I am a PhD candidate in my 5th year in University of Illinois at Urbana-Champaign, advised by Jiawei Han. I interned in MSR Redmond/XCG in last two summers and MSR Asia in 2009.

I am interested in mining latent entity structures to better organize linked information.

Find more at http://web.engr.illinois.edu/~chiwang1/

Sheng Wang

sheng

I am a PhD student of computer Science from National University of Singapore. I joined the database group in 2011, supervised by Prof. Beng Chin OOI. My research interests are mainly in cloud computing and database systems, including indexing, query processing and data management, especially for supporting write-intensive workloads. Details could be found on my web page: http://www.comp.nus.edu.sg/~wangsh/

I like to play all kinds of sports in my spare time. I fairly enjoy billiard sports: pool, snooker, carom and any possible variants.

Mohan Yang

mohan

I am a second year Ph.D. student supervised by Professor Carlo Zaniolo in the Department of Computer Science at UCLA. I obtained my B.E. degree in computer science from Shanghai Jiao Tong University in 2010. Prior to joining UCLA, I worked in a startup company with my friends. My research interests include database systems and data mining. I enjoy kayaking and cycling in my free time.

2012

Post-Doc

Vadim Savenkov

vadim

I just finished my PhD at Vienna University of Technology, under the supervision of Reinhard Pichler. My research focus is on information integration. In particular, I’ve been working on management and optimization of schema mappings and on algorithms for data exchange (visit my homepage to learn more). I also hold a Master’s degree in Computational Logic (Vienna University of Technology and Dresden University of Technology) and an Engineer’s degree from Bauman Moscow State Technical University. Here at Microsoft Research, I will be working with Phil Bernstein on extending the ADO.NET Entity Framework.

Interns

Aaron Elmore

aaron

I am a third year PhD student at UC Santa Barbara, under supervision of Divy Agrawal and Amr El Abaddi. I have a BS from DePaul University and MS from the University of Chicago. My primary research focus is on building tools and primitives for elastic multitenant databases. I am also exploring data solutions for ecologists at the National Center for Ecological Analysis and Synthesis (NCEAS), and data consistency in multi-datacenter and mobile environments. More information can be found at http://cs.ucsb.edu/~aelmore.

Jeffrey Jestes

jeff

Hello everybody! I am a fourth year PhD student studying in the School of Computing at the University of Utah with Feifei Li. This is my second internship at Microsoft Research, and I am very excited to work with Kris Ganjam in the DMX group. My general research interests are summarizing massive data in distributed and parallel frameworks (such as MapReduce); ranking, monitoring, and tracking big data; and scalable query processing in large databases. I am also interested in text processing and uncertainty in data. To learn more about my research you can visit my homepage at http://www.cs.utah.edu/~jestes/.

Feng Li

fengli

I am currently a 3rd year PhD Student in the Department of Computer Science, National University of Singapore. I was honored to join the database group in 2009, supervised by Prof. Ooi Beng Chin. From 2005 to 2009, I studied in Peking University, Beijing, China and obtained my BSc degree from the Department of Electronic Engineering and Computer Science. My research interests are mainly in cloud computing and database system, including indexing and query processing. I am also interested in microblog data processing. My home page is http://www.comp.nus.edu.sg/~li-feng/.

Semih Salihoglu

semih_picture

I am a third year PhD student at Stanford University, advised by Jennifer Widom. I’m mainly interested in systems/algorithms to do distributed/parallel graph computations. Before PhD, I was a software engineer at Google. Outside of work, I enjoy playing soccer and going running.

Tallat Shafaat

tallat

I am a final year Ph.D student at KTH – Royal Institute of Technology, Sweden. I’m working on large-scale distributed systems, with focus on P2P/decentralized techniques and distributed Key-value stores, under the supervision of Ali Ghodsi and Seif Haridi. I am originally from Pakistan; I did my under-graduate from GIK Institute.

My homepage is located at: http://www.sics.se/~tallat/.

Bilyana Taneva

bilyana

My name is Bilyana Taneva and I am a PhD student at the Max-Planck Institute for Informatics in Germany. I am advised by Gerhard Weikum and I am working on analyzing and extracting data about entities. My general research interests are data mining and information retrieval.

In my free time I enjoy skiing, climbing, and hiking, and I would love to explore the mountains in the region.

Chi Wang

chi

I am a PhD candidate to be finishing my 3rd year in University of Illinois at Urbana-Champaign, advised by Jiawei Han. I am interested in mining information networks, especially finding latent roles and relations of linked objects. I interned in MSR Redmond and MSR Asia in 2011 and 2009.

Visit my home page http://www.cs.illinois.edu/homes/chiwang1 for more information.

I play basketball and baseball in my leisure time, and I watch pro Starcraft games.

Mohan Yang

img_mohan_yang

I am a first year Ph.D. student supervised by Professor Carlo Zaniolo in the Department of Computer Science at UCLA. I obtained my B.E. degree in computer science from Shanghai Jiao Tong University in 2010. Prior to joining UCLA, I worked in a startup company with my friends. My research interests include data stream management systems and data mining. I enjoy kayaking and cycling in my free time. My home page is http://www.mhyang.com.

Meihui Zhang

meihui

My name is Meihui, a fourth year PhD student from National University of Singapore (NUS) supervised by Prof. Beng Chin OOI. I am glad to have this opportunity to join DMX group. My research interests mainly focus on database issues. I am currently working on database exploration, which is to design algorithms to analyze database instances to efficiently and accurately discover database schema elements, such as keys, meaningful join paths, and implicit relationships inherent in the data.

Please visit my homepage at http://www.comp.nus.edu.sg/~zmeihui for more about me.

2011

Visiting Researchers

Guillem Rull Fort

guillem rull fort

I got my PhD on January 2011 in the Department of Software at the Technical University of Catalonia (UPC), in Barcelona, Spain. My thesis was about the application of query containment techniques to the validation of schema mappings, and was sponsored by Microsoft Research under the MSR European PhD Scholarship program. My supervisor was Dr. Ernest Teniente. My research interests so far have been focused on reasoning on database/conceptual schemas and mappings. Besides research, I enjoy a good book, watching movies, and I also like to play videogames.

Interns

Bolin Ding

bolin1

I’m a PhD student in the Department of Computer Science at UIUC. My advisor is Prof. Jiawei Han. I’m interested in efficient algorithms and index structures for datamining and databases in general. Besides efficiency, I also care about people’s privacy. Before joining UIUC, I got my MPhil degree on System Engineering in the Chinese University of Hong Kong under the supervision of Prof. Jeffrey Xu Yu, and my BS degree on Math and Applied Mathematics in Renmin University of China. I play Go, Pingpong (table tennis), and basketball. My homepage: https://netfiles.uiuc.edu/bding3/www/.

Daniel Fabbri

farbri

I am a fifth year Ph.D. student at the University of Michigan, working with Kristen LeFevre. My research interests are database systems and database security. My current research is focused on analyzing access patterns in electronic health record systems and explaining why these accesses occur. Outside of research, I enjoy playing a variety of sports ranging from soccer to volleyball.

You can find my homepage at http://www.eecs.umich.edu/~dfabbri.

Yupeng Fu

yupeng

Hi, everyone, I am Yupeng, a third-year PhD student from UC San Diego working with Prof. Yannis Papakonstantinou. My research interest is in the intersection of database and web technologies.

Outside of research, I play basketball, tennis, swimming and do many out-door activities.

Manish Gupta

manishI completed my bachelors (TR) in computer science from Mumbai University in 2005. After finishing my Masters (TR) under the supervision of Dr. Soumen Chakrabarti in computer science at IIT Bombay, I worked at HotJobs, Yahoo! Bangalore from 2005 to 2007. I joined for a PhD in data mining under the guidance of Dr. Jiawei Han at University of Illinois at Urbana Champaign in Aug 2009. In summer 2010, I interned at IBM T. J. Watson Research Center under Dr. Charu Aggarwal. My research interests are in research and development in the areas of Data Mining, Information Retrieval and Web Mining, Social Computing and Algorithms.

WebPage: http://www.cs.illinois.edu/homes/gupta58/.

Willis Lang

willis lang

I am a PhD student from the University of Wisconsin-Madison advised by Prof. Jignesh Patel. I am also a graduate student member of the Microsoft Gray Systems Lab in Madison. My area of research within data management focuses on reducing datacenter operating costs through efficient cluster configuration, provisioning, and energy management as well as effective workload scheduling and load balancing.

Additional details can be found at pages.cs.wisc.edu/~wlang.

Feng Li

fengli

I am currently a 2nd year PhD Student in the Department of Computer Science, National University of Singapore. I was honored to join the database group in 2009, supervised by Prof. Ooi Beng Chin. From 2005 to 2009, I studied in Peking University, Beijing, China and obtained my BSc degree from the Department of Electronic Engineering and Computer Science. My research interests are mainly in cloud computing and database system, including indexing and query processing.

My home page is http://www.comp.nus.edu.sg/~li-feng/.

Jiexing Li

jiexing li

Hi everyone. My name is Jiexing Li (Jessie). I am a 2nd year PhD student in the Department of Computer Sciences, University of Wisconsin – Madison, under the supervision of Prof. Jeff Naughton. I am interested in database research, with a focus on query progress indicators, query optimization and parallel databases. This summer, I am working on admission control for database queries with Surajit Chaudhuri and Vivek Narasayya.

My homepage can be found in http://pages.cs.wisc.edu/~jxli/.

Ndapandula Nakashole

ndapa-photo

I am a PhD student at the Max Planck Institute for Informatics, Germany, where I work with Prof. Gerhard Weikum. Before that I completed a Bsc and an MSc at the University of Cape Town, South Africa.

I am interested in Information Extraction and more broadly, in Large-Scale Data Analytics.

I enjoy reading a good book. I also like to work up a sweat, outdoors or at the gym.

Hyunjung Park

hyunjung2

I am a third year Ph.D. student at Stanford University working with Prof. Jennifer Widom. My research interests include database systems and cloud computing. In particular, I am currently working on a database system that can fetch data from various external sources like crowdsourcing and web. Besides work, I enjoy traveling and scuba diving.

For more information about me, please visit my homepage at http://infolab.stanford.edu/~hyunjung/.

Vamsidhar Thummala

vamsi

I go by Vamsi for short. I’m a PhD student at Duke University working with Prof. Shivnath Babu and Prof. Jeff Chase. My research interests include database systems and energy-efficient computing. In particular, I work on improving query optimizer to deal with dynamic nature of the resources in virtualized environments.

Besides research, I enjoy cooking, reading, and volleyball. More at: http://www.cs.duke.edu/~vamsi

Chi Wang

chi_wang

I am a PhD candidate in my 2nd year in University of Illinois at Urbana-Champaign, advised by Jiawei Han. I work in Data Mining and I am especially interested in mining information networks such as finding latent roles and relations of linked objects. Now I work in DMX group in Redmond and my mentor is Kaushik Chakrabarti. In 2009 I worked in Theory group of MSRA with Wei Chen and Yajun Wang on developping a scalable algorithm for influence maximization in social networks.

Visit my home page http://www.cs.illinois.edu/homes/chiwang1 for more information.

Mohamed Yakout

yakoutHi, my name is Mohamed Yakout. I am a PhD candidate in the computer science department at Purdue University. My research focuses on data cleaning and data integration, including the situations where data privacy is a concern. Precisely, I focus on user centric techniques to improve the data quality. My advisor is Prof. Ahmed K. Elmagarmid. I have just received the Bilsland Dissertation Fellowship by the Purdue Graduate School.

My home page http://www.cs.purdue.edu/homes/myakout

In my free time, I enjoy playing with my little princess Jasmine (20 months old) and I go to the gym.

Tao Zou

taozou

I am Tao Zou, a second year PhD student at Cornell University working with Johannes Gehrke. I am interested in database systems and cloud computing. In particular, my research focuses on building scalable data-driven systems, and validate their properties through extensive experiments.

Besides research, I like playing table tennis, basketball, computer games, and traveling.

2010

Interns

Bahman Bahmani

bahmanI am a PhD student in Stanford University, working with Prof. Ashish Goel. My main research interest is in algorithmic aspects of large scale web applications. Recently, I have been focusing on recommendation and personalization over social networks.

I listen to music almost all the time! Other than that, in my spare time, I enjoy a wide range of activities, including but absolutely not limited to going to gym, practicing martial arts (Kenpo Karate), reading books or papers on brain sciences (specially evolutionary psychology and neuroscience), dining out, etc.

Klaus Berberich

klaus-berberichHi, my name is Klaus Berberich. I am a PhD student at the Max-Planck Institute for Informatics working with Srikanta Bedathur and Gerhard Weikum.

The focus of my PhD thesis has been on improving search in web archives. My general research interests lie on the border between data management and information retrieval.

When not at work, I enjoy travelling, reading a good book, and listening to music. Beyond that, I am interested in photography and guitars.

For more information about me please visit: http://www.mpi-inf.mpg.de/~kberberi.

Bolin Ding

bolin2

I’m a PhD student in the Department of Computer Science at UIUC. My advisor is Prof. Jiawei Han. I’m interested in efficient algorithms and index structures for datamining and databases in general. Besides efficiency, I also care about people’s privacy. Before joining UIUC, I got my MPhil degree on System Engineering in the Chinese University of Hong Kong under the supervision of Prof. Jeffrey Xu Yu, and my BS degree on Math and Applied Mathematics in Renmin University of China. I play Go, Pingpong (table tennis), and basketball.

My homepage: https://netfiles.uiuc.edu/bding3/www/

Yeye He

yeye

My name is Yeye He. I am a PhD student at University of Wisconsin-Madison working with Professor Jeff Naughton on database privacy. Prior to returning to school for PhD I worked at Oracle Corporation on data warehousing performance. This summer I will be working with Dong Xin on data exploration projects. I like traveling and watching movies in my spare time.

For more information please visit my homepage at http://www.cs.wisc.edu/~heyeye.

Hideaki Kimura

redmond_t-hideki_lthumbMy name is Hideaki Kimura. I’m a 3rd year PhD student of the database research group at Brown University advised by Stan Zdonik. My research topic is query optimization and automatic physical database design.

Besides research and coding, I love biking.

For more information about me please visit: http://www.cs.brown.edu/people/hkimura/

Jian Li

jianliMy name is Jian Li. I am a Ph.D. student at University of Maryland, College Park.

My research interests include databases and algorithms. In particular, I am working on ranking over probabilistic databases and some stochastic optimization problems.

I got my BSc degree from Sun Yat-sen(Zhongshan) University, China and MSc degree in computer science from Fudan University, China.

I like playing soccer, table tennis, and traveling.

My homepage: www.cs.umd.edu/~lijian

Kristi Morton

mortonKristi Morton is a third year PhD student in the department of Computer Science and Engineering at the University of Washington (UW). At UW, Kristi works with her advisors, Magdalena Balazinska in the Databases Group and Dan Grossman in the Programming Languages Group, on improving the software tools in data-intensive, distributed systems.

Details of her research projects can be found here: http://www.cs.washington.edu/homes/kmorton/

In her spare time, she plays drums, sings, and writes Computer Science-themed parodies in the UW Computer Science and Engineering Band (also known as Parody Bits). Their music can be found here: http://www.youtube.com/user/CSEBand.

Aditya Ganesh Parameswaran

agpI’m a third year PhD student in the Infolab at Stanford University, working with Prof. Hector Garcia-Molina. I’m broadly interested in algorithmic questions underlying the management and utilization of large data. I graduated from the Indian Institute of Technology (IIT) Bombay in 2007. Apart from work, I enjoy trying out new restaurants and cuisines (averaging at 2 or so a week), tennis, geocaching, playing the drums (badly) and watching inspiring youtube videos.

My homepage is located at http://www.stanford.edu/~adityagp/.

Hyunjung Park

hyunjung2I am Hyunjung Park, a second year Ph.D. student at Stanford University working with Jennifer Widom. My research at the Stanford InfoLab focuses on data provenance.  Besides work, I enjoy traveling and scuba diving.

My homepage is at http://infolab.stanford.edu/~hyunjung/.

Hyunjung recently won the ACM SIGMOD 2010 Programming Contest , where he (single-handedly) built a distributed query execution engine that was almost twice as fast as the runner-up team’s system!

Senjuti Basu Roy

senjutiI am a PhD candidate at the UT Arlington, just completed my 3rd year in PhD there. My PhD research majorly focuses upon different exploration techniques on large databases. I have been a part of Database Exploration Lab at UT Arlington and I happily call that my second home :].

Besides that, I love dancing, debating and traveling (a lot :]).

 My homepage : http://dbxlab.uta.edu/dbxlab/senjuti.html

Ian Rae

ianraeHi, my name is Ian Rae, and I’ve just completed my second year of Ph.D. studies at the University of Wisconsin—Madison, advised by Jeff Naughton. In my studies there, I work with the Microsoft Jim Gray Systems Lab to improve parallel database support for handling unstructured text data.

In my somewhat non-existent spare time, I read science fiction and fantasy novels, play computer games, and plan my wedding (October!). I also occasionally take pictures that David DeWitt calls “goofy,” an example of which is available to the right.

In a past life, before becoming a graduate student, I used to go indoor rock climbing. I’ve been told that there are several nice climbing gyms around here, so I hope to go again before the end of the summer.

Gurgen Tumanyan

gurgen

Hi, my name is Gurgen Tumanyan, and I am a Master’s student at San Francisco State University, advised by Dragutin Petkovic. I work with Helix group at Stanford Bioengineering department, where I employ computer science black and white magic to predict protein function from the 3D structure.

When I am not pondering on bioengineering problems and not ruminating on software engineering challenges 🙂 , I usually can be found with my camera on the streets of the nearest big city with my camera, or playing pool. In my previous life I used to play tennis and would love to play again.

Guozhang Wang

Hi, my name is Guozhang. I’m a second year PhD student at Cornell University working with Johannes Gehrke. I am interested in data management and cloud computing. In particular, I am working on applying query optimization and parallel processing techniques to large scale behavioral simulations. Before that I have worked on privacy-preserving data publishing. You can find out more about me at www.cs.cornell.edu/~guoz.

I am a big fan of NBA (Celtics rocks!) and I also play basketball a lot myself. Also I like playing video games (NBA Live 2005,2006,2007,…) and hiking in my free time.

Mohamed Yakout

yakoutHi, my name is Mohamed Yakout. I am a third year PhD student at Purdue University. My advisor is Prof. Ahmed K. Elmagarmid. My research focuses on data cleaning and data integration, including the situations where data privacy is a concern.

My home page http://www.cs.purdue.edu/homes/myakout.

I am originally from Egypt and had my Master degree from Alexandria University. Prior pursuing my PhD, I worked for Bibliotheca Alexandrina (The new library of Alexandria) where I participated in several digital library projects.

2009

Parag Agrawal

parag

I’m a PhD student in the Department of Computer Science at Stanford University. My advisor is Jennifer Widom. I work in the Stanford InfoLab. I graduated from IIT Bombay in 2005 with a Bachelors in Computer Science.

Bolin Ding

bding

My name is Bolin Ding. I’m finishing my 2nd year of Ph.D. program at UIUC. My advisor is Prof. Jiawei Han. I’m interested in IR techniques in databases and data warehouses, and pattern-based query processing and mining. In general, I’m interested in efficient algorithms for database and datamining problems. Before joining UIUC, I got my M.S. on System Engineering in the Chinese University of Hong Kong under the supervision of Prof. Jeffrey Xu Yu, and my B.S. on Math and Applied Mathematics in Renmin University of China. I play Go, Pingpong (table tennis), and basketball. My homepage: https://netfiles.uiuc.edu/bding3/www/

Michaela Goetz

michaela goetz

Hi, my name is Mila. I’m a second year PhD student at Cornell University working with Johannes Gehrke and Christoph Koch. I am interested in data management. In particular, I am working on uncertain databases and privacy-preserving data publishing. You can find out more about me here.

In my free time, I enjoy playing tennis during the summer and ice hockey during the winter.

Yeye He

yeye

My name is Yeye He. I am a PhD student at University of Wisconsin-Madison working with Professor Jeff Naughton on database privacy. Prior to returning to school for PhD I worked at Oracle Corporation on data warehousing performance. This summer I will be working with Dong Xin on data exploration projects. I like traveling and watching movies in my spare time. Please visit my homepage at http://www.cs.wisc.edu/~heyeye for more information about me.

Mei Hui

I’m a PhD candidate at Computer Science Department, National University of Singapore. My advisor is Prof. Ooi BengChin. My major research interests are Community Database Management, Information Retrieval and Web 2.0.

In my free time I like to play badminton, swim and watch movies.

If you want to find more about me, visit my website: http://www.comp.nus.edu.sg/~huimei

Gjergji Kasneci

gjergjiGjergji Kasneci is a doctoral student at the Max-Planck Institute for Informatics. He received his M.S. degree from the University of Marburg in Germany, where he was awarded with the Fellowship of the German National Academic Foundation.

Most of his research focuses on graph-based Information Retrieval and Semantic Search. His main projects are the NAGA (http://www.mpi-inf.mpg.de/~kasneci/naga/) and the YAGO (http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/) system.

Additional information about Gjergji can be found at: http://www.mpi-inf.mpg.de/~kasneci/

Hongrae Lee

hlee2

Hi, my name is Hongrae Lee and I’m finishing my 2nd year of Ph.D. program at the University of British Columbia, Canada. My advisor is Prof. Raymond Ng and I’m interested in approximate query processing in text databases and its optimization. This summer, I’ll be working with Dr. Surajit Chaudhuri in Autoadmin project. In my spare time, I like doing sports like soccer, tennis, and ski. I hardly say no to coffee or delicious foods:) You could find more about me here (http://www.cs.ubc.ca/~xguy).

Abhijeet Mohapatra

abhijeet

Hello, my name is Abhijeet Mohapatra. I am first year PhD student at Stanford University. My advisor is Jennifer Widom. I am interested in Uncertain Data Modeling, Data Mining techniques(especially for Recommendation Systems). I love playing basketball and swimming. I do quite a bit of sketching too. This summer, I am working under Ravi Ramamurthy on the Auto Admin Project.

For more information, you can visit my homepage.

Vijendra Singh Purohit

vijendra_singh_purohit_purdue_15451

Homepage: http://purdue.academia.edu/VijendraSinghPurohit

Zhijun Yin

zhijun_yin

My name is Zhijun Yin. I am a second-year PhD student at University of Illinois, Urbana and Champaign under the supervision of Professor Jiawei Han. Before coming to UIUC, I got my B.S. from Fudan University in 2007. My research interests focus on applying data mining techniques to solve interesting web applications.

You can find more about me at www.cs.uiuc.edu/homes/zyin3.

2008

Ioannis Antonellis

ioannisMy name is Ioannis (Yannis) Antonellis and i am a 2nd year CS Phd student at Stanford University. My advisor is Hector Garcia-Molina and i am working on collaborative techniques and their applications on query log analysis for web and sponsored search.

This summer I am interning in the Data Management, Exploration and Mining group, working with Christian Konig on analyzing web browsing logs.

For more information please visit my homepage: http://www.stanford.edu/~antonell

Also, i regularly (plan to) write on the Stanford Infoblog: http://infoblog.stanford.edu

Fei Chiang

feiI am a PhD student in the Department of Computer Science at the University of Toronto. I am a member of the Database Research Group and my advisor is Prof. Renée J. Miller. My current research interests are in the efficient management of uncertain and inconsistent data, data quality, data mining, and meta-data management.

In my spare time, I like to play tennis, hike and run.

This summer I’ll be working with Raghav Kaushik and Vivek Narasayyaon problems in data cleaning You can find out more about me at www.cs.toronto.edu/~fchiang.

Hicham Elmongui

elmonguiHi, my name is Hicham Elmongui, and I am a PhD candidate in the Computer Science Department at Purdue. My research is in the area of databases. Specifically, I am interested in the query optimization for moving objects databases. My advisor is Prof Walid Aref.

This summer, I am interning with the DMX group. My manager is Vivek Narasayya, and I will work with Ravi Ramamurthy as well. In 2006 and 2007, I interned with the Database group at MSR, where my manager was Paul Larson. I also worked with Jingren Zhou.

I received Frederick N. Andrews Fellowship. I am also a recipient of the Purdue University Outstanding Teaching Award, and the Outstanding Service to the Purdue CS Department Award. Besides, I was the college valedictorian when I graduated Summa Cum Laude with my B.S. in Computer Science and Automatic Control from Alexandria University, Egypt.

In my free time, I enjoy playing with my four year old son, Yahya. My hobbies include reading and travelling.

Ling Hu

linghu

My name is Ling Hu. I am a second year PhD student from Northeastern University. I work with Prof. Donghui Zhang in database lab, CCIS. My research interests are query optimization, data warehousing, database security issues.I like reading and watching movies. I do Yoga and go to gym regularly. You can get to know more about me here.

Hongrae Lee

hlee2

Hi, my name is Hongrae Lee and I’m finishing my 2nd year of Ph.D. program at the University of British Columbia, Canada. My advisor is Prof. Raymond Ng and I’m interested in approximate query processing in text databases and its optimization. This summer, I’ll be working with Dr. Surajit Chaudhuri in Autoadmin project. In my spare time, I like doing sports like soccer, tennis, and ski. I hardly say no to coffee or delicious foods:) You could find more about me here(http://www.cs.ubc.ca/~xguy).

Rimma Nehma

rimmaMy name is Rimma Nehme. I am a PhD student at Purdue University. My research area is query optimization, more specifically I am working the application of machine learning techniques to query optimization and efficient query processing in streaming databases. In my free time, I like to run, draw, ski and watch movies (and I also like to make movies).

This summer I am working in the DMX group with my mentor Nico Bruno.

If you would like to find out more about me, visit my website: http://www.cs.purdue.edu/homes/rnehme/

Christopher Re

chrisre

Christopher (Chris) Ré is a graduate student in the department of Computer Science and Engineering at the University of Washington advised by Dan Suciu. Chris’ interests are theoretical and practical problems in data management. Details of his work can be found here. His thesis work in probabilistic data management will be completed at the end of the current academic year (’09).

Karl Schnaitter

karlI am a PhD candidate at University of California, Santa Cruz… GO SLUGS! My advisor is Alkis Polyzotis, and our main work has dealt with on-line physical tuning and processing top-k join queries. I also have interests in programming languages and algorithm analysis. I like to spend a lot of my free time on trails, either running or hiking. I have also played guitar for 12 years, and I am a member of a rock band called God of Shamisen.

I am very excited to be visiting Microsoft and working with Nico this Summer! We will be working on problems in physical design tuning.

My homepage: http://www.soe.ucsc.edu/~karlsch/

Tianyi Wu

twHi, my name is Tianyi Wu. I am a third-year Ph.D. student in the Department of Computer Science, University of Illinois at Urbana-Champaign, where I work under the supervision of Dr. Jiawei Han. My research interests include data warehousing and OLAP, ranking queries, and association mining. Before joining U of I, I got my B.S. in Fudan University, China. It’s exciting that I’ll be an intern working with Dr. Kaushik Chakrabarti and other researchers in DMX this summer. My hobbies maybe too much to be listed here. Generally I like various sports but for now I enjoy playing pool with my dad most.

Please visit my homepage at http://ews.uiuc.edu/~twu5 to find out more about me.

2007

Faculty

Dan Suciu

dan_suciuDan Suciu is a professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, spent five years at AT&T Labs then joined the University of Washington in 2000. Dan is conducting research in data management, with an emphasis on topics that arise from sharing data on the Internet, such as management of semistructured and heterogeneous data, data security, and managing data with uncertainties. He is a co-author of the book Data on the Web: from Relations to Semistructured Data and XML, holds six US patents, received the 2000 ACM SIGMOD Best Paper Award, is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship.

Interns

Arjun Dasgupta

arjun_picHi, I am Arjun Dasgupta. I am a Master’s graduate from the University of Texas at Arlington and a part of the Database Exploration Group at UT Arlington headed by Dr. Gautam Das. My research interests include Information Retrieval from databases, Data Mining and Exploration. I am starting off my PhD from fall where I plan to work on Data Mining from web sources. Wish me luck!

In my free time I like to fish, bike and cook. I am very excited to be working here at Microsoft with Vivek Narasayya in the DMX group over the summer.

For more about me, visit my personal webpage at http://arjundasgupta.com/default.aspx.

Abhay Jha

abhay

I am a first-year graduate student from UW, Seattle. At UW, I was working with Dan Suciu on query evaluation over probabilistic databases with constraints.

I am going to work with Arvind Arasu and Raghav Kaushik on the Data Cleaning project.Besides working, wasting time on internet and watching movies, this summer I am planning on exploring this beautiful city, and visit some places I have been meaning to for long.

Bhargav Kanagal

bhargavMy name is Bhargav Kanagal. I am currently a 2nd year PhD student in the University of Maryland, College Park.

I work with Dr. Amol Deshpande in the MauveDB project, that aims at efficiently incorporating Statistical & Probabilistic Models inside relational database systems; this offers accurate and meaningful answers to queries over uncertain / incomplete data.

I will be working in the Auto-Admin Project with Dr. Ravi Ramamurthy this Summer. My hobbies include Classical music, Tennis and watching movies.

Kristen LeFevre

kristenHi, my name is Kristen LeFevre. I just completed my Ph.D. at the University of Wisconsin – Madison, where my co-advisors were David DeWitt and Raghu Ramakrishnan. My Ph.D. thesis was in the area of database privacy, including techniques for protecting individual anonymity in data publishing. For more information about this work, please visit my Wisconsin website (http://www.cs.wisc.edu/~lefevre).

I am very pleased to be visiting Microsoft this fall as a post-doctoral intern, where I will be working with Arvind Arasu and Surajit Chaudhuri on the data cleaning project. Following my internship, I will join the University of Michigan as an Assistant Professor.

Rimma Nehme

rimmaMy name is Rimma Nehme.

I am a PhD student in the Computer Science department at Purdue University. My PhD research is in the area of continuous spatio-temporal query processing and optimization. My advisors are Prof Walid Aref and Mourad Ouzzani. I obtained a Masters degree in Computer Science from Worcester Polytechnic Institute (WPI) in 2005 where I have worked with Prof Elke Rundensteiner. Prior to that I have worked at EMC Corporation for 4 years on the Symmetrix (enterprise storage) Solutions Enabler API (SYMAPI) and CLI for managing data storage arrays. My hobbies include skiing, running, delicious foods, travelling and watching movies. Please check my my homepage for more information about me.

Svetlana Marinova

p06I am Svetlana Marinova from Bulgaria, 3rd year Ph.D student at Technical University – Sofia in Database Systems.

I am happy to be back for a second internship at MSR and to be part of the team again. Areas that interest me include cryptography, information systems, programming languages and distributed databases. In my free time I love to travel, read books, going to the movies and watch different kinds of sport (sometimes take part in it as well).

If you are interested to know more, please visit my homepage.

Rares Vernica

raresI am a Graduate Student at School of Information and Computer Sciences, University of California, Irvine. My research interests are in the area of databases. In particular, my focus is on data integration, data uncertainty, and data lineage.

My advisor is Prof. Chen Li. For more information about my research and the projects I am involved in, please visit my home page at http://www.ics.uci.edu/~rvernica.

In my spare time I practice Kendo, Japanese fencing, where I hold the rank of 3kyu.

2006

Bee-Chung Chen

head_bee-chung

My name is Bee-Chung Chen. I am a Ph.D. student at University of Wisconsin – Madison working with Prof. Raghu Ramakrishnan on data mining and database-related topics. I received my B.S. and M.S. degrees in Computer Science and Information Engineering from National Taiwan University. For more information about me, please visit my homepage.

Eric Chu

eric

Eric is a 3rd-year PhD student from the University of Wisconsin-Madison, where he works with Prof. Jeff Naughton.

He returns to DMX for a second summer to work with Sanjay Agrawal and Vivek Narasayya in the AutoAdmin project.

Eric enjoys swimming and playing badminton in his free time.

Shantanu Joshi

ssjoshiI am a PhD student in the Database Center at the University of Florida. My PhD thesis titled ‘Randomization Techniques for Approximate Query Processing’ is advised by Prof Chris Jermaine. I obtained a Masters degree in Computer Science from Florida in 2003 and Bachelors in Engineering from University of Bombay in 2000.

My hobbies include music, exploring new places and sports. Please check my homepage for more about me.

Svetlana Marinova

svetlanaMy name is Svetlana Marinova and I am a second year PhD student in the area of Database Systems in Technical University of Sofia.

The topic of my dissertation is “Organization and program realization of information system for human resources management”.

I have over 7 years of experience with Windows orientated programming including ASP/ASP.NET, Visual Basic, C/C++, Visual C++, and MFC. Other areas of my interest include DHTML; JavaScript; HTML.

My scientific interests are relational Databases; Information Retrieval Systems; Distributed Database Systems; Cryptographic Algorithms and Protocols;

Program languages; Algorithms. I am interested in social psychology, creativity and methods for self improvement.

In my spare time I enjoy sightseeing, shopping, watching movies and recently cooking. I love traveling, swimming and tae-bo.

Abhijit Pol

abhi

I am a fourth year Ph.D. student at the University of Florida. I am advised by Dr. Chris Jermaine and co-advised by Dr. Alin Dobra. My primary research interests are in the area of Approximate Query Processing and Online Aggregation. I am also interested in Physical Database Design and Indexing. In my PhD thesis, I am investigating different challenges in supporting approximation in subset-based SQL queries. Before this, I did my bachelors in Mechanical Engineering and pursued my masters in Industrial and Systems Engineering at University of Florida. Off computer screen, I play tennis, squash, chess, and pool. I love traveling and camping and am a self discovered poet.

Anish Das Sarma

My name is Anish Das Sarma. I finished my 2nd year as a PhD student in the Computer Science department at Stanford. I will be working on the Data Cleaning project here. It is great to be back here at MSR! Apart from CS, I am interested in Chess, Table Tennis, Foosball, Badminton, Tennis, and every sport! I also love puzzle solving, music, movies, and other activities. So if any of you are interested in any of these, let me know!

For more information, you are welcome to visit my homepage.

Liying Sui

liyingBefore joining UCSD, I got my B.S. degree from Shandong University, P.R.China, and my M.E. degree from Institute of Computing Technology, Chinese Academy of Science.

Having spent six years in the PhD program in UCSD database group gave me a lot of experience in logic and databases. I have been working with my advisor, Victor Vianu and Alin Deutsch, on specification and verification of interactive, data-driven Web Services/Application, workflow areas. We make extensive use of database optimization and model checking techniques in our research. I am happy to be here for my internship.

Dong Xin

dongxin

I’m a fourth year PhD student in the data mining group at University of Illinois at Urbana-Champaign.

I work with Prof. Jiawei Han on scalable data mining algorithms. Before joining UIUC, I received MS and BS in Computer Science from Zhejiang University, China, in 2002 and 1999.

2005

Pedro Bizarro

pedro.madison2

I am a 4th year PhD at the University of Wisconsin – Madison. I have been working with Prof. David DeWitt on adaptive query processing both on the context of traditional databases and data stream systems. Currently I am interning with the DMX group at Microsoft Research.

I am also a Fulbright student and I have a MS in CS from UW-Madison and another from New University of Lisbon in Portugal. I am crazy by coffee, soccer, and tennis. And I was born on April’s Fools!

Bee-Chung Chen

head_bee-chung

My name is Bee-Chung Chen. I am a Ph.D. student at University of Wisconsin – Madison working with Prof. Raghu Ramakrishnan on data mining and database-related topics. I received my B.S. and M.S. degrees in Computer Science and Information Engineering from National Taiwan University. For more information about me, please visit my homepage.

Eric Chu

eric_chu

Eric Chu is a second-year PhD student at the University of Wisconsin-Madison. His research interests are in database Systems and his advisor is Prof. Jeff Naughton. Prior to attending UW-Madison, Eric got his Bachelor degree in Computer Engineering at another UW – University of Washington, where he worked with Prof. Alon Halevy. This summer Eric is working in the DMX group with Sanjay Agrawal and Vivek Narasayya on the AutoAdmin project. For more information, visit his homepage.

Luna Dong

Luna Dong is a fourth-year student at Univ. of Washington. She’s advised by Alon Halevy and her research interests are personal information management and data integration. Before going to UW, Luna got her Master’s degree in Peking Univ. and her Bachelor’s degree in Nankai Univ. in China.

Govind Kabra

govind

I am a PhD student at University of Illinois, Urbana-Champaign advised by Dr. Kevin Chang. I am working with him on Metaquerier System for exploration and integration of deep web databases. This summer, I am working on developing a model for fine grained authorization in databases. In my leisure, I play squash, badminton, or basketball or watch movies. Visit my homepage for more info.

Anish Das Sarma

anishAnish Das Sarma is a first year CS PhD student at tanford University. His PhD advisor is Jennifer Widom, and he is working on uncertainty and lineage in databases as part of the Trio project at Stanford (http://www-db.stanford.edu/trio/). This summer he is interning in the DMX group at MSR, working with Vivek Narasayya on the Auto Admin project. Prior to joining Stanford, Anish received his B-Tech. degree in Computer Science and Engineering from IIT-Bombay, where he was also awarded the Dr. Shankar Dayal Sharma Gold medal.

Anish is keenly interested in various extra-curricular activities. He is an active chess player (Rating, FIDE: 2071, USCF: 2121), and also loves table-tennis, swimming, and various other sports. He is also interested in music, and plays the keyboard and tabla (Indian percussion instrument).

Anish’s homepage contains more information.

Utkarsh Srivastava

I am a third year PhD student working with Prof. Jennifer Widom at the InfoLab (formerly database group) at Stanford University. My primary research interests lie in statistics and query optimization for emerging applications such as data streams, web services as well as classical relational databases.

Dong Xin

dongxin

I’m a 3rd year PhD student in the data mining group at University of Illinois at Urbana-Champaign, studying under Prof. Jiawei Han.

Before joining UIUC, I received MS and BS in Computer Science from Zhejiang University, China, in 2002 and 1999.

Wei Vivian Zhang

I am a MS student at University of Wisconsin – Madison. Currently I am an intern in the Data Management, Exploration and Mining Group at Microsoft Research. I am working on data cleaning. I like music, dancing and badminton.

2004

Faculty

S. Sudarshan

sudarshanS. Sudarshan who is currently here as a Visiting Researcher in the DMX group, is on a years sabbatical from IIT Bombay, India where he is a Professor in the Computer Science and Engineering Department. Sudarshan got his PhD from the Univ. of Wisconsin, Madison, and worked in the database research group at AT&T Bell Laboratories for 3 years before moving to IIT Bombay in 1995. His research interests are in the area of query processing and optimization, and his most recent areas have included keyword querying on databases, parametric and nested query optimization, and fine-grained authorization in databases.

Interns

Nilesh Dalvi

I am a third year Ph.D. student at University of Washington where I work with Dan Suciu on probabilistic models for data integration. At Microsoft, I am working with Surajit Chaudhuri on using user preferences for ranking database query results. In my spare time, I can be found playing tennis, bridge, solving cryptic crosswords or reading books.

Seung-won Hwang

I am a PhD student from University of Illinois at Urbana-Champaign. My primary research interest is ranked query processing. Besides computer science, I love playing the violin and backpacking. I’m excited to spend another summer at MSR and enjoy northwest outdoors.

Shubha Nabar

visito5

Hi! I’m a second year PhD student at Stanford University, where my advisor is Rajeev Motwani. My research interests include algorithms for applications areas like networking and databases. This summer I’m interning in the DMX group and working on the Autoadmin project. My non-academic interests include playing badminton, squash (racquet sports in general), reading and cooking.

Stratos Papadomanolakis

visito6

I am a 3d year Ph.D. student in Carnegie Mellon University. I am interested in self-tuning database systems, specifically in automating the design of database structures based on workload and hardware configuration information. For my internship I am working with Vivek Narasayya on the Database Tuning Advisor.

Alpa Shah

visito7

I am a second year PhD student at Columbia University and my advisor is Luis Gravano. I am working on developing efficient strategies for relational query processing over plain text documents by relying on information extraction and information retrieval techniques.

Dilys Thomas

visito8

I am a second year PhD student at Stanford, advised by Rajeev Motwani.

I am interested in Algorithm Design, and am applying it recently to Data Streams, Privacy, and Query Optimization.

My other interests include outdoor sports esp, soccer.

Ying Xu

visito9

I am a first year PhD student at Stanford working with Professor Rajeev Motwani. I’m interested in algorithms, especially with database application background. This summer I’m working with Venky on a data cleaning problem. In my free time, hiking and reading are my favorite outdoor/indoor activities.

Wendy Wang

visito10

I am a second-year PhD student at University of British Columbia. My advisor is Laks V.S. Lakshmanan. My research area is XML security. For this summer I will work with Nico on top-k ranking problem. My non-academic interest includes movies, classical music and badminton.

2003

Faculty

Prof. Raghu Ramakrishnan

ramakrishnan-medium

Raghu Ramakrishnan is a Professor of Computer Sciences and Vilas Associate at the University of Wisconsin-Madison. From 1999 to 2002, he served as Chairman and CTO of QUIQ, a company that developed a novel approach to customer support by facilitating collaboration among customers and capturing that interaction in a reusable knowledge base. His research is in the area of database systems and data mining. He is a Fellow of the Association for Computing Machinery (ACM), and has received a Packard Foundation Fellowship, a Presidential Young Investigator award, and an ACM SIGMOD Contributions Award. He has written the widely-used text “Database Management Systems” (with J. Gehrke).

Prof. Gerhard Weikum

weikum-photo2-72dpi

Gerhard Weikum is a Full Professor in the Department of Computer Science of the University of the Saarland at Saarbruecken, Germany. Gerhard is co-author of more than 100 refereed publications, and he has recently written a textbook on Transactional Information Systems, published by Morgan Kaufmann. He received the 2002 VLDB ten-year award for his work on automatic tuning. Gerhard serves on the editorial boards of ACM TODS and IEEE CS TKDE, and he will be program committee chair for the 2004 SIGMOD conference in Paris.

Interns

Eugene Agichtein

visito2

I am a PhD student at Columbia University. I am working on information extraction from unstructured text. Specifically, I like applying machine learning techniques to improve the quality and scalability of information extraction while requiring little or no manual input.

Brian Babcock

Im a third-year PhD student at Stanford, where my advisor is Rajeev Motwani. My research is mostly in the area of algorithms for processing streaming data, and Ive also done some work on approximate query processing. This is my second summer as an intern in the DMX group. My non-academic interests include soccer, backpacking, and rock climbing.

Ashish Gupta

visito3

I am a third year graduate student from University of Washington. Back at UW, I work with Prof. Dan Suciu on processing of streaming XML. Over the summer, I will be working with Vivek, Nico and Sanjay on the View Merging problem, a part of the Autoadmin project. In my free time, I like to watch movies, go trekking and try my hand at virtually any outdoor sport what so ever.

Vagelis Hristidis

photoMy name is Vagelis Hristidis and I just finished my 4th year of the PhD program at UC San Diego. My advisor there is Yannis Papakonstantinou. My thesis (which will be completed within the next year) topic is Keyword Search in Databases. For more information on my research visit my web page at www.db.ucsd.edu/people/vagelis.

In my free time I plan to visit the main sights of the city and explore the Pro Club, for which I heard good things. The weather may not be as good as in San Diego, but a change is always interesting J.

By the way, I come from Greece.

Zheng Huang

8-18-2002_030My name is Zheng Huang. I finished my B.E. in Zhejiang University (P.R.China) in 2002, after which I came to University of Wisconsin-Madison working with Prof. Raghu Ramakrishnan on data mining and databases. Before I came to UW-Madison, I did some work of image/video processing and content-based image retrieval. I once worked as a visiting student at Microsoft Research Asia with Dr. Hongjiang Zhang, Dr. Mingjing Li and Dr. Lei Zhang.

BTW, as many guys from China, I like playing ping-pong. I played tennis and badminton a lot too.

Seung-won Hwang

Hi, I am Seung-won Hwang, a third year Ph.D student from University of Illinois at Urbana-Champaign. I received my MS and BS from University of Illinois and KAIST (Korea Advanced Institute of Science and Technology) respectively. My primary research interest is ranked query processing. Besides computer science, I love playing the violin and backpacking. Im looking forward to visiting national parks in the northwest.

Daniel Kifer

me2

I am a 3rd year PhD student in Computer Science at Cornell. My advisor is Johannes Gehrke, and my research interests are in data mining, specifically, interesting problems in data mining. I am also interested in ping pong, backgammon, ping pong,and, of course, making the occasional bad joke.

Ravi Ramamurthy

visito1

I am a graduate student from UW-Madison and I am hoping to finish my PhD sometime next year. I am generally interested in adaptive query processing and this summer I will be working with Vivek and Surajit on the AutoAdmin project. My other interests include playing badminton and playing the violin.

Utkarsh Srivastava

visito4

I am a first year PhD student at Stanford University working with Jennifer Widom. I am primarily interested in approximation techniques for stream processing and the design of an adaptive query processing architecture for the same. In my free time I pursue music and a game of pool never hurts!

Qi Su

qisu2

I am a first year PhD student at Stanford working with Professor Jennifer Widom. I did my undergrad at University of Wisconsin-Madison. I enjoy tennis, golf, volleyball and Muay Thai kickboxing.