{"id":1132281,"date":"2025-03-01T19:30:59","date_gmt":"2025-03-02T03:30:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=1132281"},"modified":"2025-06-11T12:57:17","modified_gmt":"2025-06-11T19:57:17","slug":"secom","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/secom\/","title":{"rendered":"SeCom: On Memory Construction & Retrieval for Personalized Conversational Agents"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"2880\" height=\"864\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928.png\" class=\"attachment-full size-full\" alt=\"feature_image\" style=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928.png 2880w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928-300x90.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928-1024x307.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928-768x230.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928-1536x461.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928-2048x614.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/SeCom_0928-240x72.png 240w\" sizes=\"auto, (max-width: 2880px) 100vw, 2880px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"secom\">SeCom<\/h1>\n\n\n\n<p>On <strong>Memory Construction & Retrieval <\/strong>for Personalized Conversational Agents<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p>How can conversational agents <strong>better retain and retrieve past interactions<\/strong> for more <strong>coherent and personalized<\/strong> experiences? Our latest work &#8211; <strong>SeCom <\/strong>on <strong>Memory Construction & Retrieval<\/strong> tackles this challenge head-on!<\/p>\n\n\n\n<p>Existing approaches typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization. In SeCom, we present <strong>two key findings<\/strong>: (1) The granularity of memory unit matters: Turn-level, session-level, and summarization-based methods each exhibit limitations in both memory retrieval accuracy and the semantic quality of the retrieved content. (2) Prompt compression methods, such as&nbsp;<em>LLMLingua-2<\/em>, can effectively serve as a denoising mechanism, enhancing memory retrieval accuracy across different granularities.<\/p>\n\n\n\n<p>Building on these insights, we propose&nbsp;<strong>SeCom<\/strong>, a method that constructs the memory bank at segment level by introducing a conversation&nbsp;<strong>Se<\/strong>gmentation model that partitions long-term conversations into topically coherent segments, while applying&nbsp;<strong>Com<\/strong>pression based denoising on memory units to enhance memory retrieval. Experimental results show that&nbsp;<strong>SeCom<\/strong>&nbsp;exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+. Additionally, the proposed conversation segmentation method demonstrates superior performance on dialogue segmentation datasets such as DialSeg711, TIAGE, and SuperDialSeg. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"key-takeaways-1\">Key Takeaways<\/h2>\n\n\n\n<p>\ud83d\udca1&nbsp;<strong>Memory granularity matters: <\/strong>Turn-level, session-level & summarization-based memory struggle with retrieval accuracy and the semantic integrity or relevance of the context.<\/p>\n\n\n\n<p>\ud83d\udca1&nbsp;<strong>Prompt compression methods<\/strong> (e.g.,&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua2.html\">LLMLingua-2<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) <strong>can denoise memory retrieval, boosting <\/strong>both <strong>retrieval accuracy <\/strong>and <strong>response quality.<\/strong><\/p>\n\n\n\n<p>\u2705&nbsp;<strong>SeCom<\/strong>&nbsp;\u2013 an approach that&nbsp;<strong>segments conversations topically<\/strong>&nbsp;for <strong>memory construction <\/strong>and performs<strong> memory retrieval<\/strong> based on <strong>compressed memory units<\/strong>.<\/p>\n\n\n\n<p>\ud83d\udcca&nbsp;<strong>Result<\/strong>&nbsp;\u2013 superior performance on long-term conversation benchmarks such as LOCOMO and Long-MT-Bench+!<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading h4\" id=\"what-s-the-impact-of-memory-granularity-1\">What&#8217;s the Impact of Memory Granularity?<\/h3>\n\n\n\n<p>We first systematically investigate the impact of different memory granularities on conversational agents within the paradigm of retrieval augmented response generation. Our findings indicate that turn-level, session-level, and summarization-based methods all exhibit limitations in terms of the accuracy of the retrieval module as well as the semantics of the retrieved content, which ultimately lead to sub-optimal responses.<\/p>\n\n\n\n<p>\ud83d\udca1 Long conversations are naturally composed of coherent discourse units. To capture this structure, we introduce a conversation segmentation model that partitions long-term conversations into topically coherent segments, constructing the memory bank at the segment level. During response generation, we directly concatenate the retrieved segment-level memory units as the context, bypassing summarization to avoid the information loss that often occurs when converting dialogues into summaries.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"704\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bd378ba7d.png\" alt=\"Illustration of retrieval augmented response generation with different memory granularities. Turn-level memory\u00a0is too fine-grained, leading to fragmentary and incomplete context.\u00a0Session-level memory\u00a0is too coarse-grained, containing too much irrelevant information.\u00a0Summary based methods\u00a0suffer from information loss that occurs during summarization.\u00a0Ours (segment-level memory)\u00a0can better capture topically coherent units in long conversations, striking a balance between including more relevant, coherent information while excluding irrelevant content. \ud83c\udfaf indicates the retrieved memory units at\u00a0turn level\u00a0or\u00a0segment level\u00a0under the same context budget. [0.xx]: similarity between target query and history content.\" class=\"wp-image-1133103\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bd378ba7d.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bd378ba7d-300x206.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bd378ba7d-768x528.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bd378ba7d-800x550.png 800w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bd378ba7d-240x165.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 1. Illustration of retrieval augmented response generation with different memory granularities.<\/strong>\u00a0<em>Turn-level memory<\/em>\u00a0is too fine-grained, leading to fragmentary and incomplete context.\u00a0<em>Session-level memory\u00a0<\/em>is too coarse-grained, containing too much irrelevant information.\u00a0<em>Summary based methods<\/em>\u00a0suffer from information loss that occurs during summarization.\u00a0<em>Ours (segment-level memory)<\/em>\u00a0can better capture topically coherent units in long conversations, striking a balance between including more relevant, coherent information while excluding irrelevant content. \ud83c\udfaf indicates the retrieved memory units at\u00a0turn level\u00a0or\u00a0segment level\u00a0under the same context budget. [0.xx]: similarity between target query and history content.<\/figcaption><\/figure>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"2512\" height=\"838\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0.png\" alt=\"Illustration of how memory granularity affects (a) the response quality and (b, c) retrieval accuracy. Number of Turns Per Chunk bar graph (a) Response quality as a function of chunk size, given a total budget of 50 turns to retrieve as context. (b) Retrieval DCG obtained with different memory granularities using BM25-based retriever. (c) Retrieval DCG obtained with different memory granularities using MPNet-based retriever. \" class=\"wp-image-1133112\" style=\"width:1123px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0.png 2512w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0-300x100.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0-1024x342.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0-768x256.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0-1536x512.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0-2048x683.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3be74f0ff0-240x80.png 240w\" sizes=\"auto, (max-width: 2512px) 100vw, 2512px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 2. How memory granularity affects (a) the response quality and (b, c) retrieval accuracy.<\/strong><\/figcaption><\/figure>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading h4\" id=\"does-memory-denoising-help\">Does Memory Denoising Help?<\/h3>\n\n\n\n<p>Inspired by the notion that natural language tends to be inherently redundant, we hypothesize that such redundancy can act as noise for retrieval systems, complicating the extraction of key information. Therefore, we propose removing such redundancy from memory units prior to retrieval by leveraging prompt compression methods such as&nbsp;<em>LLMLingua-2<\/em>. Figure 3 shows the results.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"2142\" height=\"744\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d.png\" alt=\"Illustration of how prompt compression method (LLMLingua-2) can serve as an effective denoising technique to enhance the memory retrieval system by: (a) improving the retrieval recall with varying context budget K; (b) benefiting the retrieval system by increasing the similarity between the query and relevant segments while decreasing the similarity with irrelevant ones. Compression Rate (CR) (a) Retrieval recall v.s. compression rate: #tokens after compression divided by #tokens before compression. K: number of retrieved segments. Retriever: BM25. (b) Retrieval recall v.s. compression rate: #tokens after compression divided by #tokens before compression. K: number of retrieved segments. Retriever: MPNet. (c) Similarity between the query and different dialogue segments. Blue: relevant segments. Orange: irrelevant segments. Retriever: MPNet.\" class=\"wp-image-1133118\" style=\"width:1135px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d.png 2142w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d-300x104.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d-1024x356.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d-768x267.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d-1536x534.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d-2048x711.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3bf035a64d-240x83.png 240w\" sizes=\"auto, (max-width: 2142px) 100vw, 2142px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 3. Prompt compression method (LLMLingua-2) can serve as an effective denoising technique to enhance the memory retrieval system<\/strong>\u00a0by: (a) improving the retrieval recall with varying context budget K; (b) benefiting the retrieval system by increasing the similarity between the query and relevant segments while decreasing the similarity with irrelevant ones.<\/figcaption><\/figure>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"secom-1\">SeCom<\/h2>\n\n\n\n<p>To address the above challenges, we present&nbsp;<strong>SeCom<\/strong>, a system that constructs memory bank at&nbsp;<strong>segment level<\/strong>&nbsp;by introducing a&nbsp;<em>Conversation Segmentation Model<\/em>, while applying&nbsp;<em>Compression-Based Denoising<\/em>&nbsp;on memory units to enhance memory retrieval.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"conversation-segmentation-model\">Conversation Segmentation Model<\/h3>\n\n\n\n<p>Given a conversation session \\(c\\), the conversation segmentation model \\(f_\\mathcal{I}\\) aims to identify a set of segment indices \\(\\mathcal{I}=\\{(p_k, q_k)\\} {K \\atop{k}}\\) where \\(K\\) denotes the total number of segments within the session \\(c,p_k\\) and \\(q_k\\) represent the indexes of the first and last interaction turns for the \\(k\\)-th segment \\(s_k,\\) with \\(p_k \\; {\\le} \\; q_{k}, p_{k+1} = q_k + 1\\). This can be formulated as:<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(f_{\\mathcal{I}}(c) = \\{s_k\\}{K\\atop{k=1}},\\) where \\(s_k = \\{t_{p_k}, t_{{p_k}+1},&#8230;, t_{q_k}\\}.\\)<\/p>\n\n\n\n<p>We employ <strong>GPT-4<\/strong> as the conversation segmentation model \\(f_\\mathcal{I}\\). We find that more lightweight models, such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/mistralai\/Mistral-7B-Instruct-v0.3\" target=\"_blank\" rel=\"noopener noreferrer\">Mistral-7B<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and even <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/FacebookAI\/xlm-roberta-large\" target=\"_blank\" rel=\"noopener noreferrer\">RoBERTa<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> scale models, can also perform segmentation well, making our apporach applicable in resource-constrained environments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"compression-based-memory-denoising\">Compression-Based Memory Denoising<\/h3>\n\n\n\n<p>Given a target user request \\(u^*\\) and context budget \\(N\\), the memory retrieval system \\(f_R\\) retrieves \\(N\\) memory units \\(\\{m_n \\; \\in \\; \\mathcal{M}\\}{N\\atop{n=1}}\\) from the memory bank \\(\\mathcal{M}\\) as the context in response to the user request \\(u^*\\). With the consideration that the inherent redundancy in natural language can act as noise for the retrieval system, we denoise memory units by removing such redundancy via a prompt compression model \\(f_{Comp}\\) before retrieval:<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(\\{m_n \\; \\in \\; \\mathcal{M}\\}{N\\atop{n=1}} \\leftarrow f_R(u^*, f_{Comp}(\\mathcal{M}),N).\\)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"experiments\">Experiments<\/h2>\n\n\n\n<p>We evaluate&nbsp;<strong>SeCom<\/strong>&nbsp;against four intuitive approaches and four state-of-the-art models on long-term conversation benchmarks:&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aclanthology.org\/2024.acl-long.747\/\">LOCOMO<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2308.0823\">Long-MT-Bench+<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading h4\" id=\"main-results\">Main Results<\/h3>\n\n\n\n<p>As shown in the following table,&nbsp;<strong>SeCom<\/strong>&nbsp;outperforms all baseline approaches, exhibiting a significant performance advantage, particularly on the long-conversation benchmark&nbsp;<em>LOCOMO<\/em>. Interestingly, there is a significant performance disparity in&nbsp;<em>turn-Level<\/em>&nbsp;and&nbsp;<em>session-Level<\/em>&nbsp;methods when using different retrieval models. In contrast,&nbsp;<strong>SeCom<\/strong>&nbsp;enjoys greater robustness in terms of the deployed retrieval system.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1614\" height=\"1165\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c12b16aaa.png\" alt=\"Table showing\u00a0SeCom\u00a0outperforms all baseline approaches, exhibiting a significant performance advantage, particularly on the long-conversation benchmark\u00a0LOCOMO. Interestingly, there is a significant performance disparity in\u00a0turn-Level\u00a0and\u00a0session-Level\u00a0methods when using different retrieval models. In contrast,\u00a0SeCom\u00a0enjoys greater robustness in terms of the deployed retrieval system.\" class=\"wp-image-1133127\" style=\"width:1025px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c12b16aaa.png 1614w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c12b16aaa-300x217.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c12b16aaa-1024x739.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c12b16aaa-768x554.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c12b16aaa-1536x1109.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c12b16aaa-240x173.png 240w\" sizes=\"auto, (max-width: 1614px) 100vw, 1614px\" \/><\/figure>\n\n\n\n<p>The figure below presents the pairwise comparison result by instructing GPT-4 to determine the superior response. SeCom achieves a higher win rate compared to all baseline methods. We attribute this to the fact that topical segments in SeCom can strike a balance between including more relevant, coherent information while excluding irrelevant content, thus leading to more robust and superior retrieval performance.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"2013\" height=\"497\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1b488a71.png\" alt=\"Figure presenting the pairwise comparison result by instructing GPT-4 to determine the superior response. SeCom achieves a higher win rate compared to all baseline methods. We attribute this to the fact that topical segments in SeCom can strike a balance between including more relevant, coherent information while excluding irrelevant content, thus leading to more robust and superior retrieval performance.\" class=\"wp-image-1133130\" style=\"width:989px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1b488a71.png 2013w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1b488a71-300x74.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1b488a71-1024x253.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1b488a71-768x190.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1b488a71-1536x379.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1b488a71-240x59.png 240w\" sizes=\"auto, (max-width: 2013px) 100vw, 2013px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading h4\" id=\"impact-of-the-memory-unit-granularity\">Impact of the Memory Unit Granularity<\/h3>\n\n\n\n<p>The figure below compares QA performance across different memory granularities under varying context budgets, demonstrating the superiority of&nbsp;<strong>segment-level memory<\/strong>&nbsp;over turn-level and session-level memory.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1483\" height=\"553\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1e80a979.png\" alt=\"Figure compares QA performance across different memory granularities under varying context budgets, demonstrating the superiority of\u00a0segment-level memory\u00a0over turn-level and session-level memory.\" class=\"wp-image-1133133\" style=\"width:724px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1e80a979.png 1483w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1e80a979-300x112.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1e80a979-1024x382.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1e80a979-768x286.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c1e80a979-240x89.png 240w\" sizes=\"auto, (max-width: 1483px) 100vw, 1483px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading h4\" id=\"evaluation-of-conversation-segmentation-model\">Evaluation of Conversation Segmentation Model<\/h3>\n\n\n\n<p>We evaluate the&nbsp;<strong>conversation segmentation module<\/strong>&nbsp;independently on widely used dialogue segmentation datasets:&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/17668\">DialSeg711<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aclanthology.org\/2021.findings-emnlp.145\/\">TIAGE<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aclanthology.org\/2023.emnlp-main.249\/\">SuperDialSeg<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The following table presents the result, showing that our segmentation model consistently outperforms baselines in the unsupervised segmentation setting.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1884\" height=\"772\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c226a4662.png\" alt=\"Table presents result of evaluating the\u00a0conversation segmentation module independently on widely used dialogue segmentation datasets:\u00a0DialSeg711,\u00a0TIAGE\u00a0and\u00a0SuperDialSeg, showing that our segmentation model consistently outperforms baselines in the unsupervised segmentation setting.\" class=\"wp-image-1133136\" style=\"width:1031px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c226a4662.png 1884w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c226a4662-300x123.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c226a4662-1024x420.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c226a4662-768x315.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c226a4662-1536x629.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c226a4662-240x98.png 240w\" sizes=\"auto, (max-width: 1884px) 100vw, 1884px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading h4\" id=\"the-effect-of-compression-based-memory-denoising\">The Effect of Compression-Based Memory Denoising<\/h3>\n\n\n\n<p>As shown in table below, removing the proposed compression-based memory denoising mechanism will result in a performance drop up to 9.46 points of GPT4Score on&nbsp;<em>LOCOMO<\/em>, highlighting the critical role of this denoising mechanism: by effectively improving the retrieval system, it significantly enhances the overall effectiveness of the system.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1822\" height=\"254\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c2524dcaf.png\" alt=\"Table showing that removing the proposed compression-based memory denoising mechanism will result in a performance drop up to 9.46 points of GPT4Score on\u00a0LOCOMO, highlighting the critical role of this denoising mechanism: by effectively improving the retrieval system, it significantly enhances the overall effectiveness of the system.\" class=\"wp-image-1133139\" style=\"width:1033px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c2524dcaf.png 1822w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c2524dcaf-300x42.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c2524dcaf-1024x143.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c2524dcaf-768x107.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c2524dcaf-1536x214.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/image-67c3c2524dcaf-240x33.png 240w\" sizes=\"auto, (max-width: 1822px) 100vw, 1822px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading h3\" id=\"bibtex\">BibTex<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>@inproceedings{pan2025secom,\n    title={SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents},\n    author={Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Xufang Luo and Hao Cheng and Dongsheng Li and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Jianfeng Gao},\n    booktitle={The Thirteenth International Conference on Learning Representations},\n    year={2025},\n    url={https:\/\/openreview.net\/forum?id=xKDZAW0He3}\n}<\/code><\/pre>\n\n\n","protected":false},"excerpt":{"rendered":"<p>On Memory Construction & Retrieval for Personalized Conversational Agents How can conversational agents better retain and retrieve past interactions for more coherent and personalized experiences? Our latest work &#8211; SeCom on Memory Construction & Retrieval tackles this challenge head-on! Existing approaches typically perform retrieval augmented response generation by constructing memory banks from conversation history at [&hellip;]<\/p>\n","protected":false},"featured_media":1133142,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13545,13554],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1132281","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[1133157],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[1135033],"related-articles":[],"tab-content":[],"related-researchers":[{"type":"guest","display_name":"Zhuoshi Pan","user_id":1133154,"people_section":"People","alias":""},{"type":"user_nicename","display_name":"Qianhui Wu","user_id":40741,"people_section":"People","alias":"qianhuiwu"},{"type":"user_nicename","display_name":"Hao Cheng","user_id":39922,"people_section":"People","alias":"chehao"},{"type":"user_nicename","display_name":"Dongsheng Li","user_id":39402,"people_section":"People","alias":"dongsli"},{"type":"user_nicename","display_name":"Yuqing Yang","user_id":40654,"people_section":"People","alias":"yuqyang"},{"type":"user_nicename","display_name":"Chin-Yew Lin","user_id":31493,"people_section":"People","alias":"cyl"},{"type":"guest","display_name":"H. Vicky Zhao","user_id":1133175,"people_section":"People","alias":""},{"type":"user_nicename","display_name":"Jianfeng Gao","user_id":32246,"people_section":"People","alias":"jfgao"}],"msr_research_lab":[199560,199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1132281","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":59,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1132281\/revisions"}],"predecessor-version":[{"id":1141893,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1132281\/revisions\/1141893"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1133142"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1132281"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1132281"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1132281"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1132281"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1132281"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}