Portrait of Jin Li

Jin Li

Partner Research Manager

About

Dr. Jin Li is a Partner Researcher Manager of the Cloud Computing and Storage (CCS) group in Microsoft Research – Technologies. He engaged research in an end-to-end approach, and believes that the ultimate milestone of cool system research is a product of significant impact. In addition to pursue original research and publishing papers in premier venues, he leads the team to go the extra miles to work with product groups and create huge business impact for Microsoft.

Dr. Li’s latest passion is Prajna, a distributed computing platform. Prajna is developed echoing the call for Microsoft to be the productivity and platform company for the mobile-first and cloud-first world. It fills the void of real-time big data computing on .Net platform. Prajna is open sourced here. It is designed to be a generic distributed computing platform, with core functionality being the execution of an arbitrary closure (C#, F#, native code, etc.) on any remote node, in public cloud or in private cluster. It supports interactive big data computing across a cluster with in-memory computation. Prajna has also a managed web service (Prajna Hub), which can help developer to quickly prototype and host cloud service and run services on mobile Apps. Prajna also supports distributed machine learning (e.g., distributed neural network trainer using Caffe on each node).

Dr. Li received his Ph.D. (with honor) from Tsinghua University (Beijing, China) in 1994. He joined Microsoft in 1999, as one of the founding members of Microsoft Research Asia (Beijing, China) (he has won a Microsoft Gold Star service award in 1999 for his contribution). From 2000, Dr. Li has also served as an Affiliated Professor in Tsinghua University. He was awarded the prestigious Microsoft Gold Star Service Award 4 times, in 1999, 2001, 2006 and 2010.

Projects

MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World

Established: June 29, 2016

MSR Image Recognition Challenge (IRC) @ACM Multimedia 2016 Import Dates/Updates: New! We are hosting new challenges at ICCV 2017. Visit MsCeleb.org for more details. Participants information disclosed in "Team Information" section below 6/21/2016: Evaluation Result Announced in "Evaluation Result " section below. 6/17/2016: Evaluation finished. 14 teams finished the grand challenge! 6/13/2016: Evaluation started. 6/13/2016: Dry run finished, 14 out of 19 teams passed, see details in "Update Details" below 6/10/2016: Dry run update 3: 8 teams…

MSR Image Recognition Challenge (IRC) @ IEEE ICME 2016

Established: February 23, 2016

MSR Image Recognition Challenge (IRC) @ IEEE ICME 2016 (past) ICME 2016 Image Recognition Grand Challenge Session: Time: 10:00-11:40, Wednesday, July 13, 2016  Room: Grand III Deep Multi-Context Network for Fine-Grained Visual Recognition Xinyu Ou1,2,3, Zhen Wei2,4, Hefei Ling1, Si Liu2, Xiaochun Cao2 1Huazhong University of Science and Technology 2Chinese Academy of Science 3Yunnan Open University 4University of Electronic Science and Technology of China Ensemble Deep Neural Networks for Domain-Specific Image Recognition Wenbo Li,…

Clickture

Established: March 11, 2014

A Large-Scale Real-World Image Dataset We argue that the massive amount of click data from commercial search engines provides a data set that is unique in the bridging of the semantic and intent gap. Search engines generate millions of click data (a.k.a. image-query pairs), which provide almost "unlimited" yet strong connections between semantics and images, as well as connections between users' intents and queries. This site is to introduce such as dataset, Clickture. The dataset,…

Publications

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1993

Projects

Downloads

Tree+Network-Coding-Based Multiparty Conferencing

March 2010

    Click the icon to access this download

  • Website

EAC Player and Parser

May 2002

    Click the icon to access this download

  • Website

Other

Cloud Compute

Prajna: Cloud Service and Interactive Big Data Analytics

Dr. Li’s latest passion is Prajna, a distributed computing platform. Prajna is developed echoing the call for Microsoft to be the productivity and platform company for the mobile-first and cloud-first world. It fills the void of real-time big data computing on .Net platform. Prajna is open sourced here. It is designed to be a generic distributed computing platform, with core functionality being the execution of an arbitrary closure (C#, F#, native code, etc.) on any remote node, in public cloud or in private cluster. It supports interactive big data computing across a cluster with in-memory computation. The programming API is similar to Spark. Prajna has also a managed web service (Prajna Hub), which can help developer to quickly prototype and host cloud service and run services on mobile Apps. Prajna also supports distributed machine learning (e.g., distributed neural network trainer using Caffe on each node). Please find more information here.

Storage

Cloud Storage

1.    Erasure coding

Dr. Li has advocated the use of erasure coding in cloud from 2006. Through out the years, he has evangelized erasure coding to dozens of Microsoft product groups, and according to the feedback he got from the product group engineers, has fined tuned both the design of erasure coded storage system and the erasure code used. Partner with Azure, he and a number of other MSR researchers have participated in the local reconstruction code (LRC) project in Windows Azure Storage. This is a new family of erasure codes that provide significant reduction in storage overhead and cut down the minimum number of fragments that need to be read to reconstruct a data fragment. It leads to hundreds of millions dollars of savings for Microsoft, a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. His group has also architected the erasure code used in Storage Spaces in Windows 8.1 and Windows Server 2012 R2.

2.    Primary data deduplication and end-to-end deduplication

Picking up the rising interest in deduplication from Microsoft Technical Community Network, he has partnered with Windows File Server group to architect and implement the Primary Data Deduplication feature in Windows Server 2012 [Paper] and End-to-End Deduplication for Storage Virtualization in Windows Server 2012 R2. Key contributions include a new data chunking algorithm, a low RAM footprint indexing data structure to detect duplicate data (based on ChunkStash), and a data partitioning and reconciliation technique, the latter two for scaling index resource usage with data size. It leads to major saving to customers (20-82%), and is among top 3 features for Windows File Server introduced at Windows Server 2012. The feature has received rave reviews ( The Register, IT Pro, Arts Technica, IT World, Tech Republic ), and there are evidence that some customers upgrading to WIndows Server 2012 for the primary data deduplication feature only.

3.    SSD(Flash) based storage

When evangelizing for erasure coded storage, he noticed that the storage engineers care dearly for disk I/O performance, while Solid State Drive (SSD) disrupts Hard Disk Drive (HDD) in term of I/O performance. He conducted a series of research to exploit the benefit of SSD for storage applications. “FlashStore” has implemented a SSD optimized, low RAM footprint key-value store that organizes storage on flash in a log-structured manner. It was techtransferred to Pegasus SSD in Microsoft backend. SkimpyStash has implemented an ultra-low RAM footprint key-value store. The storage layer design of SkimpyStash has been incorporated into BW-Tree, a joint project among CCS, MSR Database group, and Azure DocumentDB team, and is shipping in SQL Server 2014 (Hekaton) and Azure DocumentDB.

Communication and Networking

Communication and Networking

1.    RemoteFX for WAN (in Windows 8 and Windows Server 2012).

Dr. Li has assisted the technical evaluation of the acquisition of Calista Technologies. He became aware that the new Remote Desktop experience requires significant improvement in transport to provide a fast and fluid user experience. He and Sanjeev Mehrotra have worked together to develop a new UDP transport that was optimized for networks with packet loss with FEC to recover from losses without retransmission [Paper], and a new rate control protocol URCP [Paper] that learnt network parameters to achieve best performance across any networks. When you see this screen shot, RemoteFX for WAN is functioning. RemoteFX for WAN is the default Remoting protocol for Windows 8 and Windows Server 2012.

2.    BranchCahce

He has developed and coded the content aware chunking algorithm used for Windows 8 BranchCache, a serverless P2P sharing protocol.

3.    NAT Traversal

He developed the symmetric NAT traversal algorithm used in Windows Live Mesh and Teredo (after Windows 7). At time of deployment, it raises the NAT traversal scucess rate from 60% to 85%.

4.    Erasure coding.

His team (especially Sanjeev Mehrotra, Cheng Huang) have architected and coded the erasure coding used in Lync, Xbox and RemoteFX for WAN.

5.    Bandwidth management

He has contributed to the bandwidth management and QoS monitoring design of Lync 2013.

Compression

Compression Research

Dr. Jin Li started his research career in Multimedia Compression. His noteworthy contribution are related to image/video/audio compression standard, include:

  1. Operational rate-distortion (R-D) optimality in embedded coding and sub-bitplane scanning (incorporated into JPEG 2000).
  2. Visual weighting and visual progressive compression (incorporated into JPEG 2000)
  3. Motion Compensated Lifting (a.k.a., Motion compensated temporal filtering) (incorporated into H.264/SVC)
  4. JPEG Interactive Protocol (JPIP) (incorporated into JPEG 2000, Part 9)
  5. Arbitrary shape wavelet transform with phase alignment (incorporated into MPEG 4, shape adaptive wavelet coding)
  6. Multiview Image Compression (pioneer work that leads to H.264 MVC)
  7. Scalable audio coding (incorporated into MPEG 4, audio lossless format)

In his early career, he has implemented JPEG, MPEG-1, and JBIG, thereby gained hand-on knowledge of the inner working of the compression standards. He has won the best Ph.D. thesis award at Tsinghua University in 1994, for his work on variable block size motion compensation and a semi-object based video compression algorithm that significantly improves the performance of motion compensation for MPEG-1.

From 1994-1996, Dr. Li joined Univ. of Southern California, in the Media Communications Lab of Prof. C.-C. Jay Kuo. From 1996-1999, he has joined Sharp Labs of America (SLA), and has represented the interest of SLA in JPEG 2000 and MPEG-4 standard activity. He joined Microsoft in 1999, first at Microsoft Research Asia(MSRA) (he has won a Microsoft gold star award in 2000 for his contribution in founding MSRA), and moved to Microsoft Research Redmond in 2001. He has worked on Fractal compression, video rate control, coding artifect removal, vector wavelet, wavelet packet, etc.

Bio

Dr. Jin Li is a Partner Researcher Manager of the Cloud Computing and Storage (CCS) group in Microsoft Research – Technologies. He engaged research in an end-to-end approach, and believes that the ultimate milestone of cool system research is a product of significant impact. In addition to pursue original research and publishing papers in premier venues, he leads the team to go the extra miles to work with product groups and create huge business impact for Microsoft.

Dr. Li’s latest passion is Prajna, a distributed computing platform. Prajna is developed echoing the call for Microsoft to be the productivity and platform company for the mobile-first and cloud-first world. It fills the void of real-time big data computing on .Net platform. Prajna is open sourced here. It is designed to be a generic distributed computing platform, with core functionality being the execution of an arbitrary closure (C#, F#, native code, etc.) on any remote node, in public cloud or in private cluster. It supports interactive big data computing across a cluster with in-memory computation. The programming API is similar to Spark. Prajna has also a managed web service (Prajna Hub), which can help developer to quickly prototype and host cloud service and run services on mobile Apps. Prajna also supports distributed machine learning (e.g., distributed neural network trainer using Caffe on each node).

Dr. Li has advocated the use of erasure coding in cloud from 2006. Through out the years, he has evangelized erasure coding to dozens of Microsoft product groups, and according to the feedback he got from the product group engineers, has fined tuned both the design of erasure coded storage system and the erasure code used. Partner with Azure, he and a number of other MSR researchers have participated in the local reconstruction code (LRC) project in Windows Azure Storage. This is a new family of erasure codes that provide significant reduction in storage overhead and cut down the minimum number of fragments that need to be read to reconstruct a data fragment. It leads to hundreds of millions of dollars of savings for Microsoft, a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. His group has also architected the erasure code used in Storage Spaces in Windows 8.1 and Windows Server 2012 R2, and the erasure code used in Lync, Xbox and RemoteFX.

Picking up the rising interest in deduplication from Microsoft Technical Community Network, he has partnered with Windows File Server group to architect and implement the Primary Data Deduplication feature in Windows Server 2012 [Paper] and End-to-End Deduplication for Storage Virtualization in Windows Server 2012 R2. Key contributions include a new data chunking algorithm, a low RAM footprint indexing data structure to detect duplicate data (based on ChunkStash), and a data partitioning and reconciliation technique, the latter two for scaling index resource usage with data size. It leads to major saving to customers (20-82%), and is among top 3 features for Windows File Server introduced at Windows Server 2012. The feature has received rave reviews (The Register, IT Pro, Arts Technica, IT World, Tech Republic), and there are evidence that some customers upgrading to WIndows Server 2012 for the primary data deduplication feature only.

When evangelizing for erasure coded storage, he noticed that the storage engineers care dearly for disk I/O performance, while Solid State Drive (SSD) disrupts Hard Disk Drive (HDD) in term of I/O performance. He conducted a series of research to exploit the benefit of SSD for storage applications. “FlashStore” has implemented a SSD optimized, low RAM footprint key-value store that organizes storage on flash in a log-structured manner. It was techtransferred to Pegasus SSD in Microsoft backend. SkimpyStash has implemented an ultra-low RAM footprint key-value store. The storage layer design of SkimpyStash has been incorporated into BW-Tree, a joint project among CCS, MSR Database group, and Azure DocumentDB team, and is shipping in SQL Server 2014 (Hekaton) and Azure DocumentDB.

Dr. Li has assisted in the technical evaluation for the acquisition of Calista Technologies by Microsoft. After the close of acquisition, he partnered with the Remote Desktop Virtualization (RDV) team, and has assisted to architect and implement the RemoteFX for WAN feature in Windows 8 and Windows Server 2012, which provides fast and fluid user experience in a remote session running over any WAN and wireless networks [Paper].

Dr. Li received his Ph.D. (with honor) from Tsinghua University (Beijing, China) in 1994. He joined Microsoft in 1999, as one of the founding members of Microsoft Research Asia (Beijing, China) (he has won a Microsoft Gold Star service award in 1999 for his contribution). From 2000, Dr. Li has also served as an Affiliated Professor in Tsinghua University. He was awarded the prestigious Microsoft Gold Star Service Award 4 times, in 1999, 2001, 2006 and 2010.

Dr. Li was the recipient of Young Investigator Award from Visual Communication and Image Processing’98 (VCIP) in 1998, the ICME 2009 Best Paper Award, and the USENIX ATC 2012 Best Paper Award. He is/was the Associate Editor/Guest Editor of IEEE Trans. On Multimedia, Journal of Selected Area of Communication, Journal of Visual Communication and Image Representation, P2P networking and applications, Journal of Communications. He is the current ICME steering committee chair. He has served on the TPCs and Organization Committee of many conferences, e.g., as the General Chair of PV2009, the lead Program Chair of ICME 2011, the TPC Chair of CCNC 2013 and the TPC Chair of ACM Multimedia 2016. He is an IEEE Fellow.

Fun Fact

demo_to_xiaoping

I demoed to Xiaoping Deng, the reformist leader of the People’s Republic of China, in 1984. This event leads to the quote “Computer literacy should start with children” (计算机普及要从娃娃抓起), which is an iconic event in China.