Portrait of Sudipta Sengupta

Sudipta Sengupta

About


Ph.D., Electrical Engg. & Computer Science, MIT, Cambridge, USA.
M.S., Electrical Engg. & Computer Science, MIT, Cambridge, USA.
B.Tech., Computer Science & Engg., IIT-Kanpur, India.

ACM Fellow, IEEE Fellow (news article)

Sudipta Sengupta is leading an end-to-end innovation agenda at Microsoft Research and contributing to Microsoft’s transformation to the cloud infrastructure and services business. He has caught early trends in networking, storage, and data management by starting multiple research projects in these areas that have gone beyond advancing the state-of-the-art in Computer Science. To this end, he has initiated partnerships with engineering groups through deep, multi-year engagements and shipped his research in some of Microsoft’s most visible products and services. By successfully formulating and executing this end-to-end approach to research, he has ultimately influenced industry thinking and practice in his areas of work.

Sudipta’s work on oblivious routing of network traffic, which makes the network provide predictable guarantees in the face of highly variable and unpredictable traffic, received two major IEEE awards – the IEEE Leonard G. Abraham Prize and the IEEE William R. Bennett Prize. Variable traffic appears in multiple network settings, including Internet backbone and cloud data centers. Continuing this line of work, Sudipta took on the challenges of networking at scale for the modern cloud data center and designed the network architecture and traffic oblivious routing algorithms for VL2, a new generation data center network that introduced foundational ideas and has been deployed in the Microsoft cloud.

Sudipta’s research on data deduplication advanced the frontier from backup to primary data, persuaded the company to develop deduplication technology in-house instead of acquiring it from outside, and was incorporated into the new primary data deduplication feature in Windows Server 2012 and I/O deduplication for virtualized storage in Windows Server 2012 R2. Primary data deduplication was rated among the top new features in Windows Server 2012 by customers, analysts, and digerati. This technology provided early thought leadership in the primary storage market where data deduplication is table stakes today, with major storage offerings building upon and extending ideas that Windows Server pioneered.

Sudipta advocated rethinking data storage and management for flash memory, championed the development of the first flash-optimized data store in Bing, and shipped multiple flash based key-value stores and indexing technology in Azure DocumentDB, Bing ObjectStore, and SQL Server Hekaton. Today, engineering groups within Microsoft and across the industry see the clear need for flash in their products and services. Sudipta’s work has helped develop broad understanding that the software stack needs to be optimized to exploit the benefits of flash and work around its peculiarities.

Previously, Sudipta spent five years at Bell Laboratories, the Research Division of Lucent Technologies, where he worked on Internet routing, optical switching, network security, wireless networks, and network coding. Before that, he had a two-year stint at Tellium, an optical networking pioneer, that grew from an early-stage startup to a public company during his tenure there. At both Lucent and Tellium, Sudipta conceived and led the development of new product features that were critical to customer contract wins. At both these companies, he was also responsible for shaping and defining the company’s vision for next-generation Internet backbone architectures.

Sudipta received a Ph.D. and an M.S. in Electrical Engg. & Computer Science from Massachusetts Institute of Technology (MIT), USA, and a B.Tech. in Computer Science & Engg. from Indian Institute of Technology (IIT), Kanpur, India. He was awarded the President of India Gold Medal at IIT-Kanpur for graduating at the top of his class across all disciplines. He has published 80+ research papers in some of the top conferences, journals, and technical magazines. He has authored 50+ patents (granted or pending) in the area of computer systems, networking, storage, and data management. He has taught advanced courses at academic/research and industry conferences. His work has received widespread coverage in media/press and blogs.

Sudipta is ACM Fellow and IEEE Fellow. He serves on the Editorial Board of IEEE/ACM Transactions on Networking. He has been recognized in the academic/research and industry community with the following awards/prizes/honors:

  • ACM Fellow for contributions to cloud networking, storage, and data management,
  • IEEE Fellow for contributions to network design, routing, and its applications to Internet backbone, cloud data centers, and peer-to-peer systems,
  • IEEE Communications Society William R. Bennett Prize for work on oblivious routing schemes for handling highly variable network traffic,
  • IEEE Communications Society Leonard G. Abraham Prize for work on oblivious routing schemes for handling highly variable traffic in IP-over-Optical networks,
  • Bell Labs President’s Teamwork Achievement Award for technology transfer of research into Lucent products,
  • IEEE ICME 2009 Best Paper Award for work on peer-to-peer based distribution of real-time layered video,
  • Microsoft Gold Star Award which recognizes “important career milestones of people leaders, thought leaders, and customer leaders as they take on roles to increase their contribution to Microsoft’s long term success”, and
  • Microsoft Research Technology Transfer Award for shipping research into Microsoft’s products and services.

Sudipta’s work on oblivious routing of network traffic was awarded the IEEE Communications Society William R. Bennett Prize for 2011 and the IEEE Communications Society Leonard G. Abraham Prize for 2008. At Microsoft, he has applied traffic oblivious routing ideas to design VL2, a low-cost, flexible, and agile next generation data center network using commodity switches. Check out the VL2 paper in ACM SIGCOMM 2009. The ideas in VL2 have been deployed in the new generation networking infrastructure across Microsoft’s cloud data centers. The paper has been recognized by ACM as one of “the most important research results published in CS in recent years” and appeared as an invited paper in the Research Highlights section of the Communications of the ACM (CACM).

Sudipta is working on non-volatile memory technologies for speeding up cloud/data center/server applications that can exploit the sweet spot between cost and performance. FlashStore is the first graduate in this body of work. It is a high throughput, low latency, key-value store using flash as persistent cache above hard disk and can help to speedup cloud backend services that use an underlying key-value store for data processing. Check out the paper in VLDB 2010. This work has been blogged here. FlashStore is frugal in RAM usage at 6 bytes per key-value pair. In continuing work on a system called SkimpyStash, Sudipta has reduced the memory usage by another 6-fold to about 1 byte per key-value pair. This work appeared in ACM SIGMOD 2011. The ideas in FlashStore have been incorporated in production into Bing ObjectStore, the distributed storage backend powering multiple properties in Bing and Office365. Checkout the Engineering @ Microsoft article on this.

The ongoing Bw-Tree/LLAMA project exploits modern hardware (multi-core CPUs and flash based SSDs) to build a high performance ordered index. It is completely lock-free (latch-free) and uses storage in a log-structured manner. It can be combined with a transactional component to provide full transactional semantics (as part of the Deuteronomy architecture). LLAMA exposes a generic page store interface that brings the above benefits to any page-oriented access method layered on top of it. Check out the Bw-Tree paper in IEEE ICDE 2013 and the LLAMA paper in VLDB 2013. Bw-Tree is shipping in SQL Server Hekaton, Azure DocumentDB, and Bing ObjectStore

Sudipta’s work on flash memory has been covered by Microsoft Research, Network World, Tech World, D’Technology Weblog, Channel Register, Storage Newsletter, PC Advisor, Myce, CDRinfo, and Computer World.

Sudipta initiated the investigation of flash memory based indexes for speeding up data deduplication. He built among the earliest flash-assisted storage deduplication systems, ChunkStash, that uses a specialized chunk hash index on flash to speed up duplicate data detection. Check out the paper in USENIX ATC 2010. He has partnered with the Windows Server team at Microsoft to design and build the new primary data deduplication feature in Windows Server 2012. Key contributions include a new data chunking algorithm for better change detection and more uniform chunk size distribution, specialized data structures with low RAM footprint and optimized for flash-memory for speeding up duplicate data detection (based on ChunkStash), a data partitioning and reconciliation technique to further scale index resource usage with data size, and making deduplication friendly to primary data serving workload. Check out the paper in USENIX ATC 2012 which has been blogged here. Data deduplication is among the top new Windows Server 2012 features being talked about by customers/ analysts/ digerati. Here is a sampling of the press coverage: Microsoft Research, The Register, Windows IT Pro, Ars Technica, IT World, and Tech Republic.

Publications

2017

2016

2015

Schema-Agnostic Indexing with Azure DocumentDB
Dharma Shukla, Shireesh Thota, Karthik Raman, Madhan Gajendran, Ankur Shah, Sergii Ziuzin, Krishnan Sundaram, Miguel Gonzalez Guajardo, Anna Wawrzyniak, Samer Boshra, Renato Ferreira, Mohamed Nassar, Michael Koltachev, Ji Huang, Sudipta Sengupta, Justin Levandoski, David Lomet, in Proceedings of the VLDB Endowment, September 4, 2015, View abstract, View external link

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Other

Dr. Sengupta is teaching tutorials on data deduplication at USENIX FAST 2013 and on data center networks at ACM SIGCOMM 2013, IEEE Hot Interconnects 2012, IEEE GLOBECOM 2011, IEEE Hot Interconnects 2011, ACM SIGMETRICS 2011, IEEE ICC 2011, and ICCCN 2011, on peer-to-peer systems at ACM SIGMETRICS 2010, on oblivious routing of Internet traffic at IEEE ICC 2009 and ACM SIGMETRICS 2008, and on wireless network coding at ACM MOBIHOC 2008.

Data Center Networking
Tutorial at ACM SIGCOMM 2013, Hong Kong, August 2013.

Data Deduplication: Technologies, Trends, and Challenges
Tutorial at USENIX FAST 2013, San Jose, CA, February 2013.

Interconnection Networks for Cloud Data Centers
Tutorial at IEEE Hot Interconnects 2012, Santa Clara, CA, August 2012.

Next Generation Data Center Networks for Cloud Computing
Tutorial at IEEE GLOBECOM 2011, Houston, TX, December 2011.

Interconnection Networks for Cloud Data Centers
Tutorial at IEEE Hot Interconnects 2011, Santa Clara, CA, August 2011.

Cloud Data Center Networks: Technologies, Trends, and Challenges
Tutorial at ACM SIGMETRICS 2011, San Jose, CA, June 2011.

Cloud Data Center Networks: Scalability and Commoditization
Tutorial at ICCCN 2011, Maui, Hawaii, August 2011.

Networking the Data Center for Cloud Computing
Tutorial at IEEE ICC 2011, Kyoto, Japan, June 2011.

Beyond File Sharing: Recent Technologies and Trends in Peer-to-peer Systems
Tutorial at ACM SIGMETRICS 2010, New York (USA), June 2010.

Oblivious Routing and Applications
Tutorial at IEEE ICC 2009, Dresden, Germany, June 2009.

Network Coding and its Impact on Wireless System Design
Tutorial at ACM MOBIHOC 2008, Hong Kong SAR, May 2008.

Advances in Oblivious Routing of Internet Traffic
Tutorial at ACM SIGMETRICS 2008, Annapolis, Maryland (USA), June 2008.

Network Security: Technologies, Trends, and Challenges
Invited Short Course at High Performance Switching and Routing (HPSR) Conference, New York (USA), May 2007.

Next-Generation Optical Networks: IP and Optical Layer Convergence
Tutorial at IEEE GLOBECOM 2004, Dallas (USA), December 2004.

Generalized Multi-Protocol Label Switching (GMPLS): Architecture, Protocols, and Standards
Tutorial at IEEE GLOBECOM 2003, San Francisco (USA), December 2003.

Protection and Restoration in Optical Ring and Mesh Networks
Invited Tutorial at Fourth International Workshop on Design of Reliable Communication Networks (DRCN), Banff (Canada), October 2003.

IP-Optical Internetworking: Trends, Technologies, and Standardization
Short Course at OPTICOMM 2003, Dallas (USA), October 2003.

Control and Management of Optical Cross-Connect Mesh Networks
Short Course at National Fiber Optic Engineers Conference (NFOEC) 2003, Orlando (USA), September 2003.

Management Plane Based End-to-end Service Provisioning across Core and Metro Optical Networks
Invited Course Lecture at Indian Institute of Management (IIM), Calcutta (India), November 2002.

Dynamic Provisioning and Restoration of Lightpaths in Mesh Optical Networks: Architectures, Protocols, and Algorithms
Invited Short Course at OPTICOMM 2002, Boston (USA), July 2002.

IP-Centric Control and Management of Optical Networks
Short Course at OPTICOMM 2001, Denver (USA), August 2001.

Control and Management of Modern Optical Networks
Tutorial at IEEE Hot Interconnects IX, Palo Alto (USA), August 2001.

Control and Management for Optical Networks: An IP-Centric Approach
Tutorial at IEEE INFOCOM 2001, Anchorage (USA), April 2001.

Recent Talks

The Bw-Tree Key-Value Store: From Research to Production | (video)
Invited Talk at UCSD Computer Science and Engineering, October 2015, and at Northwest Database Society (NWDS), hosted by UW Database Group, January 2016.

The Bw-Tree Key-Value Store and Its Applications to Server/Cloud Data Management in Production
Talk at UC Berkeley AMPLab, Berkeley, CA, USA and at Storage Developer Conference (SDC) 2015, Santa Clara, CA, USA, September 2015.

Evolution of Data Center Networking
Invited Keynote at IEEE LANMAN 2014, Reno, USA, May 2014.

Data Center Networking: What was not Working? What is Working? What needs Work?
Invited Panel Speaker at ICCCN 2013, Nassau, Bahamas, July 2013.

Primary Data Deduplication in Windows Server 2012
Talk at Storage Developer Conference (SDC) 2012, Santa Clara, CA, USA, September 2012.

Primary Data Deduplication: From Research to Windows Server 2012
Talk at Amazon.com, Inc., Seattle, WA, USA , August 2012. (Hosted by James Hamilton)

Smart Pricing: Parallels from the Cloud Computing World
Invited Talk at Smart Data Pricing Forum, Princeton University, Princeton, NJ, USA , July 2012.

App Aware Smart Pricing Enabled Cross-Provider Wireless Network Fabric
Invited Panel Talk at Smart Data Pricing Forum, Princeton University, Princeton, NJ, USA , July 2012.

Speeding Up Cloud/Server Applications Using Flash Memory
Talk at Storage Developer Conference (SDC) 2011, Santa Clara, CA, USA, September 2011.

Service

Dr. Sengupta is serving (or, has served)