{"id":1133691,"date":"2025-03-18T09:00:00","date_gmt":"2025-03-18T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1133691"},"modified":"2025-03-18T07:28:45","modified_gmt":"2025-03-18T14:28:45","slug":"introducing-kblam-bringing-plug-and-play-external-knowledge-to-llms","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/introducing-kblam-bringing-plug-and-play-external-knowledge-to-llms\/","title":{"rendered":"Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1.png\" alt=\"KBLaM blog | A flowchart illustrating the process of handling a prompt using a language model. The process begins with documents being used to construct and summarize a knowledge base (KB) offline. The summarized KB is then encoded and fed into the main process. A prompt goes through a tokenizer, followed by rectangular attention, and then into the large language model (LLM). The LLM retrieves information from the encoded KB to generate an answer.\" class=\"wp-image-1134055\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-1280x720.png 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>Large language models (LLMs) have demonstrated remarkable capabilities in reasoning, language understanding, and even creative tasks. Yet, a key challenge persists: how to efficiently integrate external knowledge.<\/p>\n\n\n\n<p>Traditional methods such as fine-tuning and Retrieval-Augmented Generation (RAG) come with trade-offs\u2014fine-tuning demands costly retraining, while RAG introduces separate retrieval modules that increase complexity and prevent seamless, end-to-end training. In-context learning, on the other hand, becomes increasingly inefficient as knowledge bases grow, facing quadratic computational scaling that hinders its ability to handle large repositories. A comparison of these approaches can be seen in Figure 1.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a-new-way-to-integrate-knowledge\">A new way to integrate knowledge<\/h2>\n\n\n\n<p>To address these challenges, we introduce the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/kblam-knowledge-base-augmented-language-model-2\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Knowledge Base-Augmented Language Model (KBLaM)<\/strong><\/a> \u2014a novel approach that integrates structured knowledge bases into pre-trained LLMs. Instead of relying on external retrieval modules or costly fine-tuning, KBLaM encodes knowledge into <strong>continuous key-value vector pairs<\/strong>, efficiently embedding them within the model\u2019s attention layers using a specialized <strong>rectangular attention mechanism<\/strong>, which implicitly performs retrieval in an integrated manner.<\/p>\n\n\n\n<p>We use structured knowledge bases to represent the data, allowing us to <strong>consolidate knowledge <\/strong>and <strong>leverage structure<\/strong>. This design allows it to scale <strong>linearly<\/strong> with the size of the knowledge base while maintaining <strong>dynamic<\/strong> <strong>updates<\/strong> without retraining, making it far more efficient than existing methods.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"670821\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Microsoft research newsletter<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-label=\"Microsoft Research Newsletter\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/09\/Newsletter_Banner_08_2019_v1_1920x1080.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft Research Newsletter<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-newsletter\" class=\"large\">Stay connected to the research community at Microsoft.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button is-style-fill-chevron\">\n\t\t\t\t\t\t<a href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-describedby=\"microsoft-research-newsletter\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t\t\tSubscribe today\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"scalable-efficient-and-future-ready\">Scalable, efficient, and future-ready<\/h2>\n\n\n\n<p>At its core, KBLaM is designed to integrate structured knowledge into LLMs, making them more efficient and scalable. It achieves this by converting external knowledge bases\u2014collections of facts structured as triples consisting of an entity, a property, and a value\u2014into a format that LLMs can process naturally.&nbsp; Such knowledge bases allow for consolidated, reliable sources of knowledge.<\/p>\n\n\n\n<p>To create these knowledge bases, we first extract structured data in JSON format using small language models. We then apply <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/alexandria\/\" target=\"_blank\" rel=\"noreferrer noopener\">Project Alexandria<\/a>\u2019s probabilistic clustering. Once we have this structured knowledge base, KBLaM follows a three-step pipeline:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Knowledge Encoding:<\/strong> Each knowledge triple is mapped into a key-value vector pair using a <strong>pre-trained sentence encoder<\/strong> with <strong>lightweight linear adapters<\/strong>. The key vector, derived from the entity name and property, encodes \u201cindex information,\u201d while the value vector captures the corresponding property value. This allows us to create <strong>continuous, learnable key-value representations<\/strong>.<\/li>\n\n\n\n<li><strong>Integration with LLMs:<\/strong> These key-value pairs, or <strong><em>knowledge tokens<\/em><\/strong>, are augmented into the model\u2019s attention layers using a specialized <strong>rectangular attention structure<\/strong>. Unlike traditional transformer models that process all tokens equally and come with quadratic cost\u2014such as GPT-4, Phi, and Llama\u2014rectangular attention enables the model to attend over knowledge with linear cost, as illustrated in Figure 2. Compared to standard attention mechanisms in generative language models, where each token attends to all preceding tokens, our approach introduces a more efficient structure. In this setup, language tokens (such as those from a user\u2019s question) attend to all knowledge tokens. However, knowledge tokens do not attend to one another, nor do they attend back to the language tokens. This selective attention pattern significantly reduces computational cost while preserving the model\u2019s ability to incorporate external knowledge effectively.<br><br>This linear cost, which is crucial for the efficiency of KBLaM, effectively amounts to treating each fact independently\u2014an assumption that holds for most facts.&nbsp;For example, the model\u2019s name, KBLaM, and the fact that the research was conducted at Microsoft Research are very weakly correlated. This rectangular attention is implemented as an extension of standard attention. During training, we keep the base model\u2019s weights frozen, ensuring that when no knowledge tokens are provided, the model functions exactly as it did originally.<\/li>\n\n\n\n<li><strong>Efficient Knowledge Retrieval:<\/strong> Through this rectangular attention, the model learns to dynamically retrieve relevant knowledge tokens during inference, eliminating the need for separate retrieval steps.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1531\" height=\"832\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-FigureEdit.png\" alt=\"Figure 1: A diagram comparing KBLaM and existing approaches. With RAG, we take the user\u2019s prompt and use that to retrieve relevant documents from an external corpus using some retriever module, and append a tokenized version of those relevant documents in the context. This is relatively cheap, but requires many components. On the other hand, In Context Learning just puts the entire corpus into the context. This is simple, involving only one component, but is expensive. Our method, KBLaM, makes a structured knowledge base from the documents in an offline process, and includes the entire knowledge base to the context, while using a novel variant of attention, rectangular attention, so that the cost is linear in the size of the knowledge base. This results in a system where the retrieval only requires a single, trainable component, that is also cheap.\" class=\"wp-image-1133892\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-FigureEdit.png 1531w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-FigureEdit-300x163.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-FigureEdit-1024x556.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-FigureEdit-768x417.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-FigureEdit-240x130.png 240w\" sizes=\"auto, (max-width: 1531px) 100vw, 1531px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 1:<em> KBLaM allows for attention over the entire knowledge base instead of having an external retriever.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"628\" height=\"209\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM_rec-attn-viz-v2.png\" alt=\"Figure 2: A diagram illustrating rectangular attention. Unlike regular attention, the attention matrix is not square, as we remove the parts where the knowledge base would attend over itself. This allows for KBLaM to scale linearly with the number of items in its context.\" class=\"wp-image-1133586\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM_rec-attn-viz-v2.png 628w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM_rec-attn-viz-v2-300x100.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM_rec-attn-viz-v2-240x80.png 240w\" sizes=\"auto, (max-width: 628px) 100vw, 628px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 2: <em>By having the user\u2019s question attend to the knowledge base, while treating facts in the knowledge base independently, KBLaM scales efficiently and linearly with the size of the knowledge base.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<p>Unlike RAG, which appends retrieved document chunks to prompts, KBLaM allows for <strong>direct integration<\/strong> of knowledge into the model. Compared to in-context learning,&nbsp; KBLaM\u2019s rectangular attention maintains a <strong>linear memory footprint<\/strong>, making it vastly more scalable for large knowledge bases.&nbsp;<\/p>\n\n\n\n<p>Its efficiency is a game-changer. While traditional in-context learning methods struggle with quadratic memory growth due to self-attention overhead, KBLaM\u2019s linear overhead means we can store much more knowledge in the context. In practice, this means KBLaM can store and process <strong>over 10,000 knowledge triples<\/strong>, the equivalent of approximately <strong>200,000 text tokens on a single GPU<\/strong>\u2014a feat that would be computationally prohibitive with conventional in-context learning. The results across a wide range of triples and&nbsp;can be seen in Figure 3. Remarkably, it achieves this while extending a base model that has a context length of only <strong>8K tokens<\/strong>. Additionally, KBLaM enables <strong>dynamic updates: modifying <\/strong>a single knowledge triple does not require retraining or re-computation of the entire knowledge base.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"547\" height=\"236\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-vs-RAG.png\" alt=\"Figure 3: Two graphs, showing time to first token, and memory usage for both KBLaM and RAG. KBLaM\u2019s time to first token remains relatively constant across a large range of knowledge base sizes, with the time-to-first-token with 4096 triples in the context being lower than that of conventional RAG with 5 triples in the context. The memory usage is also much lower, with KBLaM with 512 triples having a similar memory usage to RAG at 5 triples.\" class=\"wp-image-1133584\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-vs-RAG.png 547w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-vs-RAG-300x129.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-vs-RAG-240x104.png 240w\" sizes=\"auto, (max-width: 547px) 100vw, 547px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 3: <em>KBLaM is much faster and uses much less memory than adding the equivalent number of triples in the context using conventional RAG-like approaches. In particular, we have lower time to first token with 4,096 tripes in the context with KBLaM than we would with 5 triples in the context.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"enhancing-interpretability-and-reliability\">Enhancing interpretability and reliability<\/h2>\n\n\n\n<p>Another major benefit of KBLaM is its <strong>interpretability<\/strong>. Unlike in-context learning, where knowledge injection is opaque, KBLAM\u2019s <strong>attention weights<\/strong> provide clear insights into how the model utilizes knowledge tokens. Experiments show that KBLaM assigns high attention scores to relevant knowledge triples, effectively mimicking a soft retrieval process.<\/p>\n\n\n\n<p>Furthermore, KBLaM enhances <strong>model reliability<\/strong> by learning through its training examples when not to answer a question if the necessary information is missing from the knowledge base. In particular, with knowledge bases larger than approximately 200 triples, we found that the model refuses to answer questions it has no knowledge about more precisely than a model given the information as text in context. This feature helps reduce <strong>hallucinations<\/strong>, a common problem in LLMs that rely on internal knowledge alone, making responses more accurate and trustworthy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-future-of-knowledge-augmented-ai\">The future of knowledge-augmented AI<\/h2>\n\n\n\n<p>KBLaM represents a major step forward in integrating structured knowledge into LLMs. By offering a scalable, efficient, and interpretable alternative to existing techniques, it paves the way for AI systems that can stay up to date and provide reliable, knowledge-driven responses. In fields where accuracy and trust are critical\u2014such as medicine, finance, and scientific research\u2014this approach has the potential to transform how language models interact with real-world information.<\/p>\n\n\n\n<p>As AI systems increasingly rely on dynamic knowledge rather than static model parameters, we hope KBLaM will serve as a bridge between raw computational power and real-world understanding.<\/p>\n\n\n\n<p>However, there is still work to be done before it can be deployed at scale. Our current model has been trained primarily on factual question-answer pairs, and further research is needed to expand its capabilities across more complex reasoning tasks and diverse knowledge domains.<\/p>\n\n\n\n<p>To accelerate progress, we are releasing <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/KBLaM\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>KBLaM\u2019s code and datasets<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to the research community, and we are planning integrations with the Hugging Face transformers library. By making these resources available, we hope to inspire further research and adoption of <strong>scalable, efficient knowledge augmentation<\/strong> for LLMs. The future of AI isn\u2019t just about generating text\u2014it\u2019s about generating knowledge that is accurate, adaptable, and deeply integrated with the evolving world. KBLaM is a step in that direction.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introducing KBLaM, an approach that encodes and stores structured knowledge within an LLM itself. By integrating knowledge without retraining, it offers a scalable alternative to traditional methods.<\/p>\n","protected":false},"author":43518,"featured_media":1134055,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1133691","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[1142643],"related-projects":[],"related-events":[],"related-researchers":[{"type":"guest","value":"xi-wang","user_id":"1133697","display_name":"Xi Wang","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/xi-wang-660a47153\/\" aria-label=\"Visit the profile page for Xi Wang\">Xi Wang<\/a>","is_active":true,"last_first":"Wang, Xi","people_section":0,"alias":"xi-wang"},{"type":"guest","value":"liana-mikaelyan","user_id":"1133703","display_name":"Liana Mikaelyan","author_link":"<a href=\"https:\/\/www.microsoft.com\/applied-sciences\/people\/liana-mikaelyan\" aria-label=\"Visit the profile page for Liana Mikaelyan\">Liana Mikaelyan<\/a>","is_active":true,"last_first":"Mikaelyan, Liana","people_section":0,"alias":"liana-mikaelyan"},{"type":"guest","value":"mathew-salvaris","user_id":"1133699","display_name":"Mathew Salvaris","author_link":"<a href=\"https:\/\/www.microsoft.com\/applied-sciences\/people\/mathew-salvaris\" aria-label=\"Visit the profile page for Mathew Salvaris\">Mathew Salvaris<\/a>","is_active":true,"last_first":"Salvaris, Mathew","people_section":0,"alias":"mathew-salvaris"},{"type":"user_nicename","value":"James Hensman","user_id":43404,"display_name":"James Hensman","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jameshensman\/\" aria-label=\"Visit the profile page for James Hensman\">James Hensman<\/a>","is_active":false,"last_first":"Hensman, James","people_section":0,"alias":"jameshensman"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-960x540.png\" class=\"img-object-cover\" alt=\"KBLaM blog | A flowchart illustrating the process of handling a prompt using a language model. The process begins with documents being used to construct and summarize a knowledge base (KB) offline. The summarized KB is then encoded and fed into the main process. A prompt goes through a tokenizer, followed by rectangular attention, and then into the large language model (LLM). The LLM retrieves information from the encoded KB to generate an answer.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/KBLaM-BlogHeroFeature-1400x788-1-1.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"March 18, 2025","formattedExcerpt":"Introducing KBLaM, an approach that encodes and stores structured knowledge within an LLM itself. By integrating knowledge without retraining, it offers a scalable alternative to traditional methods.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1133691","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43518"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1133691"}],"version-history":[{"count":39,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1133691\/revisions"}],"predecessor-version":[{"id":1134400,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1133691\/revisions\/1134400"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1134055"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1133691"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1133691"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1133691"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1133691"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1133691"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1133691"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1133691"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1133691"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1133691"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1133691"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1133691"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}