Microsoft Research Blog

The Crossroads of Innovation and Privacy: Private Synthetic Data for Generative AI

May 29, 2024

Synthetic data could potentially help address some privacy concerns with AI model development and training, but it comes with limitations. Researchers at Microsoft are exploring techniques for producing more realistic data with strong privacy protections.

A flow chart with four successive blocks. Starting with a data owner, private data is provisioned to train a language model with differential privacy. The language model is subsequently prompted to generate novel synthetic data resembling the private data. This data can be used for down-stream applications such as machine learning, feedback analysis or statistical analysis.

Recent Posts

The Crossroads of Innovation and Privacy: Private Synthetic Data for Generative AI

May 29, 2024

Synthetic data could potentially help address some privacy concerns with AI model development and training, but it comes with limitations. Researchers at Microsoft are exploring techniques for producing more realistic data with strong privacy protections.
Research Focus: Week of May 27, 2024

May 29, 2024

How can generative AI tools represent less common identities and narratives; Can LLMs help players participate in game narratives; Using LLMs to improve geospatial demographic data; A Graph RAG Approach to Query-Focused Summarization; and more.
GigaPath: Whole-Slide Foundation Model for Digital Pathology

May 22, 2024 | Hoifung Poon and Naoto Usuyama

Digital pathology helps decode tumor microenvironments for precision immunotherapy. In joint work with Providence and UW, we’re sharing Prov-GigaPath, the first whole-slide pathology foundation model, for advancing clinical research.
Research Focus: Week of May 13, 2024

May 15, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Large language models (LLMs) have shown remarkable performance in generating text similar to that created by people, proving…
Microsoft at CHI 2024: Innovations in human-centered design

May 15, 2024

From immersive virtual experiences to interactive design tools, Microsoft Research is at the frontier of exploring how people engage with technology. Discover our latest breakthroughs in human-computer interaction research at CHI 2024.
RASCAL: Novel robotics for scalable and highly available automated storage and retrieval

May 14, 2024

RASCAL is an untethered robot with a modular design, allowing it to move flexibly along and between evenly spaced storage shelves. Discover how it can address the availability and scalability challenges of existing automated storage and retrieval systems.
MatterSim: A deep-learning model for materials under real-world conditions

May 13, 2024 | Han Yang, Jielan Li, Hongxia Hao, and Ziheng Lu

Property prediction for materials under realistic conditions has been a long-standing challenge within the digital transformation of materials design. MatterSim investigates atomic interactions from the very fundamental principles of quantum mechanics.
Enhanced autoscaling with VASIM: Vertical Autoscaling Simulator Toolkit

May 13, 2024

Autoscaling can optimize cloud resource usage and costs by adjusting to demand. VASIM shows that simplifying testing and refinement of autoscaling algorithms can enable rapid development and evaluation of more efficient & cost-effective autoscaling strategies.
LLM profiling guides KV cache optimization

May 8, 2024 | Liyuan Liu and Jianfeng Gao

LLMs rely on memory-intensive mechanisms like the key-value (KV) cache to store and quickly retrieve data. FastGen optimizes KV cache usage, reducing LLM memory demands by up to 50% while maintaining performance.
LoftQ: Reimagining LLM fine-tuning with smarter initialization

May 7, 2024

LoftQ boosts LLM efficiency by streamlining the fine-tuning process, reducing computational demands while preserving high performance. Innovations like this can help make AI technology more energy-efficient.
Research Focus: Week of April 29, 2024

May 2, 2024

In this edition: Can LLMs transform natural language into formal method postconditions; Semantically aligned question + code generation for automated insight generation; Explaining CLIP performance disparities on blind/low vision data; plus recent news.
Microsoft at ASPLOS 2024: Advancing hardware and software for high-scale, secure, and efficient modern applications

April 29, 2024 | Rodrigo Fonseca and Madan Musuvathi

From AI and deep learning to innovations in infrastructure, researchers from Microsoft are bridging the gap between architecture, programming languages, and operating systems to advance the state of the art at ASPLOS 2024.