Microsoft Research Cambridge

Machine Intelligence

Advanced machine learning research, grounded in trust, efficiency, capability.

KBLaM blog | A flowchart illustrating the process of handling a prompt using a language model. The process begins with documents being used to construct and summarize a knowledge base (KB) offline. The summarized KB is then encoded and fed into the main process. A prompt goes through a tokenizer, followed by rectangular attention, and then into the large language model (LLM). The LLM retrieves information from the encoded KB to generate an answer.

Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs

Layered image of Phi Silica, a state-of-the-art small language model integrated into Windows 11 Copilot+PCs

Phi Silica, small but mighty on-device SLM

A Ladder of Reasoning: Testing the power of imagination in LLMs

Memory

We are working on models of memory to make factual knowledge in large language models both transparent and controllable. The goal is to enable high precision knowledge infusion at scale – with full provenance and access control.

Our approach makes LLM memory interpretable, with clear source attribution and the ability to detect and mitigate hallucinations by distinguishing between grounded and sourceless outputs. The knowledge made available in the LLM’s memory is also fully manageable, enabling dynamic editing and control of what information is available to the model at runtime.

We collaborate closely with research science teams across Microsoft Research and applied science teams within Copilot product groups to drive real-world impact. Our work bridges fundamental research and product deployment, contributing to both the scientific community and Microsoft’s Copilot experiences. We publish at leading conferences, such as ICLR and EMNLP, and open source our research to advance the broader field.

Learn more:

KBLaM: Knowledge Base augmented Language Model
Paper | Repo (opens in new tab) | Blog

Learning to Extract Structured Entities Using Language Models
Paper | Repo (opens in new tab)

DiSK: A Diffusion Model for Structured Knowledge
Paper