{"id":1017150,"date":"2024-03-27T15:22:24","date_gmt":"2024-03-27T22:22:24","guid":{"rendered":""},"modified":"2024-04-12T14:39:48","modified_gmt":"2024-04-12T21:39:48","slug":"learning-from-interaction-with-microsoft-copilot-web","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/learning-from-interaction-with-microsoft-copilot-web\/","title":{"rendered":"Learning from interaction with Microsoft Copilot (web)"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/arxiv.org\/abs\/2403.12388\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1.jpg\" alt=\"flowchart showing how AI learns from user interactions\" class=\"wp-image-1017312\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">AI systems like Bing and Microsoft Copilot (web) are as good as they are because they continuously learn and improve from people\u2019s interactions. Since the early 2000s, user clicks on search result pages have fueled the continuous improvements of search engines. Recently, reinforcement learning from human feedback (RLHF) brought step-function improvements to response quality of generative AI models. Bing has a rich history of success in improving its AI offerings by learning from user interactions. For example, Bing pioneered the idea of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/1148170.1148177\" target=\"_blank\" rel=\"noopener noreferrer\">improving search ranking<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/personalizing-search-via-automated-analysis-of-interests-and-activities\/\" target=\"_blank\" rel=\"noreferrer noopener\">personalizing search<\/a> using <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/1148170.1148177\" target=\"_blank\" rel=\"noopener noreferrer\">short- and long-term user behavior data<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With the introduction of Microsoft Copilot (web), the way that people interact with AI systems has fundamentally changed from searching to conversing and from simple actions to complex workflows. Today, we are excited to share three technical reports on how we are <em>starting<\/em> to leverage new types of user interactions to understand and improve Copilot (web) for our consumer customers. <a id=\"_ftnref1\" href=\"#_ftn1\">[1]<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-are-people-using-copilot-web\">How are people using Copilot (web)?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">One of the first questions we asked about user interactions with Copilot (web) was, \u201cHow are people using Copilot (web)?\u201d Generative AI can perform many tasks that were not possible in the past, and it\u2019s important to understand people\u2019s expectations and needs so that we can continuously improve Copilot (web) in the ways that will help users the most.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A key challenge of understanding user tasks at scale is to transform unstructured interaction data (e.g., Copilot logs) into a <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/using-large-language-models-to-generate-validate-and-apply-user-intent-taxonomies\/\" target=\"_blank\" rel=\"noreferrer noopener\">meaningful task taxonomy<\/a>. Existing methods heavily rely on manual effort, which is not scalable in novel and under-specified domains like generative AI. To address this challenge, we introduce <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/tnt-llm-text-mining-at-scale-with-large-language-models\/\" target=\"_blank\" rel=\"noreferrer noopener\">TnT-LLM (<strong>T<\/strong>axonomy Generation <strong>and T<\/strong>ext Prediction with <strong>LLM<\/strong>s)<\/a>, a two-phase LLM-powered framework that generates and predicts task labels end-to-end with minimal human involvement (Figure 1).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"738\" height=\"375\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig1.png\" alt=\"The figure illustrates a comparison of three text data processing frameworks. The first, a labor-intensive human-in-the-loop framework, involves the manual derivation of label taxonomy and annotation before the developing the classifier. The second, a conventional unsupervised text clustering framework, clusters data initially and generates label taxonomy afterwards. The third, the TnT-LLM framework, integrates LLM in both the derivation of label taxonomy and annotation. A scatter plot shows that human-in-the-loop is highly interpretable but not very scalable, the text clustering framework is highly scalable but less interpretable, the TnT-LLM framework excels in both. \" class=\"wp-image-1017252\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig1.png 738w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig1-300x152.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig1-240x122.png 240w\" sizes=\"auto, (max-width: 738px) 100vw, 738px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Comparing our TnT-LLM framework against existing methods in terms of interpretability and scalability.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We conducted extensive human evaluation to understand how TnT-LLM performs. In discovering user intent and domain from Copilot (web) conversations, taxonomies generated by TnT-LLM are significantly more accurate than existing baselines (Figure 2).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"858\" height=\"294\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig2.png\" alt=\"The figure compares the accuracy of different AI methods in generating user intent taxonomies. Two bar plots are presented side by side, labeled with \u201cAccuracy (Intent)\u201d and \u201cAccuracy (Domain)\u201d. The methods compared are \u201cGPT-4 (TnT-LLM)\u201d, \u201cGPT-3.5-turbo (TnT-LLM)\u201d, \u201cada2 + GPT-4\u201d, \u201cada2 + GPT-3.5-turbo\u201d, \u201cInstructor-XL + GPT-4\u201d and \u201cInstructor-XL + GPT-3.5-turbo\u201d. The label \u201cGPT-4 evaluation\u201d is noted at the bottom. \u201cGPT-4 (TnT-LLM)\u201d appears to outperform other methods in this figure. \" class=\"wp-image-1017255\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig2.png 858w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig2-300x103.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig2-768x263.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig2-240x82.png 240w\" sizes=\"auto, (max-width: 858px) 100vw, 858px\" \/><figcaption class=\"wp-element-caption\">Figure 2. Evaluating the performance of TnT-LLM on user intent taxonomy generation. Error bars indicate 95% confidence intervals.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We applied TnT-LLM to a large-scale number of fully de-identified Copilot (web) conversations and traditional Bing Search sessions. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-use-of-generative-search-engines-for-knowledge-work-and-complex-tasks\/\">The results<\/a> (Figure 3) suggest that people use Copilot (web) for knowledge work tasks in domains such as writing and editing, data analysis, programming, science, and business. Further, tasks done in Copilot (web) generally are of higher complexity and more knowledge work-oriented compared to tasks done in traditional search engines. Generative AI&#8217;s emerging capabilities have evolved the tasks that machines can perform, to include some that humans have traditionally had to do without assistance. Results demonstrate that people are doing more complex tasks, frequently in the context of knowledge work, and show that this type of work is being newly assisted by Copilot (web).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"The figure compares Bing Copilot conversations with Bing Search sessions for the degree to which they are complex in nature and oriented toward knowledge work. Two scatterplots are presented side by side, one each for Bing Copilot and Bing Search. The x-axes are labeled \u201cPercent of Per-Domain Copilot Chats Classified as Complex\u201d and \u201cPercent of Per-Domain Search Sessions Classified as Complex\u201d. The y-axes are labeled \u201cPercent of Per-Domain Copilot Chats Classified as Knowledge Work\u201d for Bing Copilot and \u201cPercent of Per-Domain Search Sessions Classified as Knowledge Work\u201d for Bing Search. The points in the scatterplot are task domains, such as \u201cProgramming and scripting\u201d and \u201cGaming and entertainment\u201d. The data points in the scatter plot show that for Bing Search, the majority of search sessions are lower in both complexity and knowledge work relevance, whereas for Bing Copilot, many data points are high in both complexity and knowledge work. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"726\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig3.png\" alt=\"The figure compares Bing Copilot conversations with Bing Search sessions for the degree to which they are complex in nature and oriented toward knowledge work. Two scatterplots are presented side by side, one each for Bing Copilot and Bing Search. The x-axes are labeled \u201cPercent of Per-Domain Copilot Chats Classified as Complex\u201d and \u201cPercent of Per-Domain Search Sessions Classified as Complex\u201d. The y-axes are labeled \u201cPercent of Per-Domain Copilot Chats Classified as Knowledge Work\u201d for Bing Copilot and \u201cPercent of Per-Domain Search Sessions Classified as Knowledge Work\u201d for Bing Search. The points in the scatterplot are task domains, such as \u201cProgramming and scripting\u201d and \u201cGaming and entertainment\u201d. The data points in the scatter plot show that for Bing Search, the majority of search sessions are lower in both complexity and knowledge work relevance, whereas for Bing Copilot, many data points are high in both complexity and knowledge work. \" class=\"wp-image-1017294\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig3.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig3-300x156.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig3-1024x531.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig3-768x398.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig3-240x124.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3. Comparing the distribution of topical domains and task complexity between Bing search (left) and Copilot (web) (right).<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"estimating-and-interpreting-user-satisfaction\">Estimating and interpreting user satisfaction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To effectively learn from user interactions, it is equally important to classify user satisfaction and to understand why people are satisfied or dissatisfied while trying to complete a given task. Most important, this will allow system developers to identify areas of improvement and to amplify and suggest successful use cases for broader groups of users.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">People give explicit and implicit feedback when interacting with AI systems. In the past, user feedback was in the form of clicks, ratings, or survey verbatims. When it comes to conversational systems like Copilot (web), people also give feedback in the messages they send during the conversations (Figure 4).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"802\" height=\"571\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig4.png\" alt=\"The figure illustrates the continuous improvement process of an AI assistant. The process starts with an example conversation between a user and an AI assistant. The user in the example is unsatisfied with the response of the AI assistant. An improvement process takes the unsatisfied example as input and outputs better responses in the same conversation.  After improvement, the user is satisfied with the response. \" class=\"wp-image-1017258\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig4.png 802w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig4-300x214.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig4-768x547.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig4-240x171.png 240w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Illustrations of how people may give feedback to a chatbot in their messages.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">To capture this new category of feedback signals, we propose our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2403.12388\" target=\"_blank\" rel=\"noopener noreferrer\">Supervised Prompting for User Satisfaction Rubrics (SPUR)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> framework (Figure 5). It\u2019s a three-phase prompting framework for estimating user satisfaction with LLMs:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The <strong>supervised extraction prompt<\/strong> extracts diverse <em>in situ<\/em> textual feedback from users interacting with Copilot (web).<\/li>\n\n\n\n<li>The <strong>summarization rubric prompt<\/strong> identifies prominent textual feedback patterns and summarizes them into rubrics for estimating user satisfaction.<\/li>\n\n\n\n<li>Based on the summarized rubrics, the final <strong>scoring prompt<\/strong> takes a conversation&nbsp;between a user and the AI agent and rates how satisfied the user was.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a href=\"https:\/\/arxiv.org\/abs\/2403.12388\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"480\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig5.png\" alt=\"The figure shows the framework of Supervised Prompting for User Satisfaction Rubrics. The first step shows that a LLM explains user satisfaction or dissatisfaction based on user utterances. Then, LLM summarizes satisfaction or dissatisfaction reasons into SAT and DSAT rubrics in the second step. Finally, LLM uses SAT and DSAT rubrics to determine whether a user is satisfied with the responses of an AI agent in the third step. \" class=\"wp-image-1017261\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig5.png 975w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig5-300x148.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig5-768x378.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig5-240x118.png 240w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 5. Framework of Supervised Prompting for User Satisfaction Rubrics.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We evaluated our framework on fully de-identified conversations with explicit user thumbs up\/down in Copilot (web) (Table 1). We find that SPUR outperforms other LLM-based and embedding-based methods, especially only limited human annotations of user satisfaction are available. Open-source reward models used for RLHF cannot be a proxy for user satisfaction, because reward models are usually trained with auxiliary human feedback that may differ from the feedback from the user who was involved in the conversation with the AI agent.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Method<\/th><th>Weighted F1-score<\/th><\/tr><\/thead><tbody><tr><td>Reward (RLHF)<\/td><td>17.8<\/td><\/tr><tr><td>ASAP (SOTA of embedding)<\/td><td>57.0<\/td><\/tr><tr><td>Zero-Shot (GPT4)<\/td><td>74.1<\/td><\/tr><tr><td>SESRP (GPT4)<\/td><td><strong>77.4<\/strong><strong><\/strong><\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><center>Table 1. Performance comparison between models for user satisfaction estimation.<\/center><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Another critical feature of SPUR is its interpretability. It shows how people express satisfaction or dissatisfaction (Figure 6). For example, we see that users often give explicit positive feedback by clearly praising the response from Copilot (web). Conversely, they express explicit frustration or switch topics when encountering mistakes in the response from Copilot (web). This presents opportunities for providing customized user experience at critical moments of user satisfaction and dissatisfaction, such as context and memory reset after switching topics.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"241\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig6_v2.png\" alt=\"The figure shows two histogram plots. The left histogram plot shows the distribution of the ten-item SAT rubric, and the right histogram plot shows the distribution of the ten-item DSAT rubric in Bing Copilot. The y-axis of the left histogram shows ten summarized patterns that express how a user is satisfied with the responses of Bing Copilot, and the x-axis shows the percentage of each pattern occurring in Bing Copilot. Similarly, the y-axis of the right histogram shows 10 summarized patterns that express how a user is dissatisfied with the responses of Bing Copilot, and the x-axis shows the percentage of each pattern happening in Bing Copilot. \" class=\"wp-image-1017852\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig6_v2.png 624w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig6_v2-300x116.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Fig6_v2-240x93.png 240w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><figcaption class=\"wp-element-caption\">Figure 6. SPUR reveals the distribution of satisfaction and dissatisfaction patterns among conversations with explicit user upvotes or downvotes.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In the user task classification discussed earlier, we know that people are using Copilot (web) for knowledge work and more complex tasks. As we further apply SPUR for user satisfaction estimation, we find that people are also more satisfied when they complete or partially complete cognitively complex tasks. Specifically, when regressing task complexity on the SPUR-derived summary user-satisfaction score, we find generally increasing coefficients on increasing levels of task complexity when using the lowest level of task complexity (i.e. Remember) as a baseline, provided the task was at least partially completed (see Table 2). For instance, partially completing a Create-level task, which is the highest level of task complexity, leads to an increase in user satisfaction that is more than double the increase when partially completing an Understand-level task. Fully completing a Create-level task leads to the largest increase in user satisfaction.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"605\" height=\"436\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Table2.png\" alt=\"The table shows results from a regression analysis, including the predictor variables and their respective coefficients in the regression. In this regression, three predictor variables are regressed on user satisfaction as the outcome variable. The three predictors are task complexity, task completion, and the number of user messages. Additionally, interaction terms are included for the interactions between task complexity and task completion, and between the number of user messages and task completion. The results indicate that that when users complete more complex tasks, their user satisfaction increases. \" class=\"wp-image-1017267\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Table2.png 605w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Table2-300x216.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Copilot_Table2-240x173.png 240w\" sizes=\"auto, (max-width: 605px) 100vw, 605px\" \/><figcaption class=\"wp-element-caption\">Table 2. Regression results where the dependent variable is user satisfaction. In general, the more complex the task, the more satisfied the user whether it was partially or totally completed.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">These three reports present a comprehensive and multi-faceted approach to dynamically learning from conversation logs in Copilot (web) at scale. As AI\u2019s generative capabilities increase, users are finding new ways to use the system to help them do more and shift from traditional click reactions to more nuanced, continuous dialogue-oriented feedback. To navigate this evolving user-AI interaction landscape, it is crucial to shift from established task frameworks and relevance evaluations to a more dynamic, bottom-up approach to task identification and user satisfaction evaluation.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"key-contributors\">Key Contributors <\/h5>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/reidan\/\">Reid Andersen<\/a>, Georg Buscher, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/counts\/\">Scott Counts<\/a>, Deepak Gupta, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/brhecht\/\">Brent Hecht<\/a>, Dhruv Joshi, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sjauhar\/\">Sujay Kumar Jauhar<\/a>, Ying-Chun Lin, Sathish Manivannan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jenneville\/\">Jennifer Neville<\/a>, Nagu Rangan, Chirag Shah, Dolly Sobhani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/suri\/\">Siddharth Suri<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tarasafavi\/\">Tara Safavi<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/teevan\/\">Jaime Teevan<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/satiwary\/\">Saurabh Tiwary<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mengtwan\/\">Mengting Wan<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ryenw\/\">Ryen W. White<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaso\/\">Xia Song<\/a>, Jack W. Stokes, Xiaofeng Xu, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/loy\/\">Longqi Yang<\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><a id=\"_ftn1\" href=\"#_ftnref1\">[1]<\/a> <em>The research was performed only on fully de-identified interaction data from Copilot (web) consumers. No enterprise data was used per our commitment to enterprise customers. We have taken careful steps to protect user privacy and adhere to strict ethical and responsible AI standards. All personal, private or sensitive information was scrubbed and masked before conversations were used for the research. The access to the dataset is strictly limited to approved researchers. The study was reviewed and approved by our institutional review board (IRB).<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft researchers are taking a comprehensive and dynamic approach to help Copilot (web) continuously learn from interaction and feedback, improving the AI system and making it increasingly useful for consumers. Learn more.<\/p>\n","protected":false},"author":37583,"featured_media":1017312,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Scott Counts","user_id":"31471"},{"type":"user_nicename","value":"Jennifer Neville","user_id":"40946"},{"type":"user_nicename","value":"Mengting Wan","user_id":"39510"},{"type":"user_nicename","value":"Ryen W. White","user_id":"33481"},{"type":"user_nicename","value":"Longqi Yang","user_id":"38790"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13545,13555,13559],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1017150","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-research-area-search-information-retrieval","msr-research-area-social-sciences","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144672,643845,702211,722851,901101],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Scott Counts","user_id":31471,"display_name":"Scott Counts","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/counts\/\" aria-label=\"Visit the profile page for Scott Counts\">Scott Counts<\/a>","is_active":false,"last_first":"Counts, Scott","people_section":0,"alias":"counts"},{"type":"user_nicename","value":"Jennifer Neville","user_id":40946,"display_name":"Jennifer Neville","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jenneville\/\" aria-label=\"Visit the profile page for Jennifer Neville\">Jennifer Neville<\/a>","is_active":false,"last_first":"Neville, Jennifer","people_section":0,"alias":"jenneville"},{"type":"user_nicename","value":"Mengting Wan","user_id":39510,"display_name":"Mengting Wan","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mengtwan\/\" aria-label=\"Visit the profile page for Mengting Wan\">Mengting Wan<\/a>","is_active":false,"last_first":"Wan, Mengting","people_section":0,"alias":"mengtwan"},{"type":"user_nicename","value":"Ryen W. White","user_id":33481,"display_name":"Ryen W. White","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ryenw\/\" aria-label=\"Visit the profile page for Ryen W. White\">Ryen W. White<\/a>","is_active":false,"last_first":"White, Ryen W.","people_section":0,"alias":"ryenw"},{"type":"user_nicename","value":"Longqi Yang","user_id":38790,"display_name":"Longqi Yang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/loy\/\" aria-label=\"Visit the profile page for Longqi Yang\">Longqi Yang<\/a>","is_active":false,"last_first":"Yang, Longqi","people_section":0,"alias":"loy"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"flowchart showing how AI learns from user interactions\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Learning-from-User-Interactions-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"March 27, 2024","formattedExcerpt":"Microsoft researchers are taking a comprehensive and dynamic approach to help Copilot (web) continuously learn from interaction and feedback, improving the AI system and making it increasingly useful for consumers. Learn more.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1017150","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37583"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1017150"}],"version-history":[{"count":45,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1017150\/revisions"}],"predecessor-version":[{"id":1024983,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1017150\/revisions\/1024983"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1017312"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1017150"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1017150"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1017150"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1017150"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1017150"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1017150"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1017150"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1017150"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1017150"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1017150"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1017150"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}