{"id":1048506,"date":"2024-07-15T09:00:00","date_gmt":"2024-07-15T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1048506"},"modified":"2024-07-08T07:37:39","modified_gmt":"2024-07-08T14:37:39","slug":"rubicon-evaluating-conversations-between-humans-and-ai-systems","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/rubicon-evaluating-conversations-between-humans-and-ai-systems\/","title":{"rendered":"RUBICON: Evaluating conversations between humans and AI systems"},"content":{"rendered":"\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><em>This paper has been accepted at the <\/em><\/strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/2024.aiwareconf.org\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong><em>1<sup>st<\/sup> ACM International Conference on AI-powered Software<\/em><\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><strong><em> (AIware 2024), co-located with <\/em><\/strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/2024.esec-fse.org\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong><em>FSE 2024<\/em><\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><strong><em>. AIware is the premier international forum on AI-powered software.<\/em><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1401\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1.png\" alt=\"Rubicon paper at Alware 2024\" class=\"wp-image-1048530\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1.png 1401w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-1280x720.png 1280w\" sizes=\"auto, (max-width: 1401px) 100vw, 1401px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Generative AI has redefined the landscape of AI assistants in software development, with innovations like GitHub Copilot providing real-time, chat-based programming support. As these tools increase in sophistication and domain specialization, assessing their impact on user interactions becomes more challenging. Developers frequently question whether modifications to their AI assistants genuinely improve the user experience, as indicated in a recent <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/building-your-own-product-copilot-challenges-opportunities-and-needs\/\">paper<\/a>.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/rubicon-rubric-based-evaluation-of-domain-specific-human-ai-conversations\/\" data-bi-cN=\"RUBICON: Rubric-based Evaluation of Domain Specific Human-AI Conversations\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>RUBICON: Rubric-based Evaluation of Domain Specific Human-AI Conversations<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Traditional feedback mechanisms, such as simple thumbs-up or thumbs-down ratings, fall short in capturing the complexities of interactions within specialized settings, where nuanced data is often sparse. To address this issue, we introduce <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/rubicon-rubric-based-evaluation-of-domain-specific-human-ai-conversations\/\">RUBICON: Rubric-based Evaluation of Domain Specific Human-AI Conversations<\/a>,\u201d presented at AIware 2024. RUBICON is an automated assessment technique that transforms a minimal dataset into an extensive array of domain-specific rubrics, helping ensure that updates not only modify but meaningfully improve user interactions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"foundational-communication-principles\">Foundational communication principles<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Effective conversation, whether human-to-human or human-to-AI, adheres to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.sas.upenn.edu\/~haroldfs\/dravling\/grice.html\" target=\"_blank\" rel=\"noopener noreferrer\">four maxims<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> outlined by philosopher Paul Grice: quantity, quality, relation, and manner, ensuring that communication is concise, truthful, pertinent, and clear. In AI applications, they help create interactions that feel natural and engaging, fostering trust and empathy. Within domain-specific settings, RUBICON adapts these principles to ensure they are context-aware, improving the utility and clarity of interactions. For example, in Visual Studio, the AI helps the developer debug a program by providing detailed explanations and relevant code examples, shown in Figure 1. In Figure 2, its responses reflect that it\u2019s guided by context.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"542\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_debug-assistant_Figure1_1400px.png\" alt=\"In the image, we see two Human-AI debugging conversations side by side, both working on the same task but with different AI assistants. On the left side, the assistant suggests using an if-else block to catch and throw an exception. The user responds that they do not want to throw any exceptions. The assistant then proposes a try-catch block instead. The user ends the conversation by asking how to prevent the exception from occurring in the first place. The assistant makes assumptions without clarifying details about the scenario, leading to a superficial and unusable fix. On the right side, the assistant starts by asking the user to check a variable's value at a specific state. The user replies that the variable is empty. The assistant then forms a hypothesis and requests the relevant code file from the user. After receiving the code, the assistant provides a simple fix. The user ends the conversation by confirming that the solution worked. Here, the assistant actively investigates the error, collaborates with the user to gather information, and delivers a practical solution.\" class=\"wp-image-1048557\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_debug-assistant_Figure1_1400px.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_debug-assistant_Figure1_1400px-300x116.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_debug-assistant_Figure1_1400px-1024x396.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_debug-assistant_Figure1_1400px-768x297.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_debug-assistant_Figure1_1400px-240x93.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Contrasting interactions with two versions of the Visual Studio Debugging Assistant for the same task. On the left, the assistant makes assumptions without seeking clarification. On the right, the assistant proactively investigates the error, collaborates with the developer to gather essential information, and achieves a practical solution.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"440\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_context-awareness_Figure2_1400px.png\" alt=\"In the image, there are two sample initial responses to the same task by different debugging assistants, shown side by side. On the left, the assistant merely reiterates the meaning of the exception message and gives generic advice, such as asking the user to check why the serialization failed. On the right, the assistant identifies the probable source of the error, points out the specific method to the user, and requests the user to provide the code for that method.\" class=\"wp-image-1048551\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_context-awareness_Figure2_1400px.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_context-awareness_Figure2_1400px-300x94.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_context-awareness_Figure2_1400px-1024x322.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_context-awareness_Figure2_1400px-768x241.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_context-awareness_Figure2_1400px-240x75.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 2. Context awareness significantly improves the AI assistant\u2019s efficacy. The response on the left is generic, superficially referring to the developer\u2019s code and restating the obvious, providing little value. The reply on the right directs the developer toward a specific solution, the toJSON method.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In task-oriented environments, it\u2019s important to assess how well a conversation aligns with user expectations and assists in achieving their goals. Conversations are only useful if they advance the user&#8217;s interests, and challenges can arise when users have misaligned expectations of the AI\u2019s capabilities or when the AI directs the conversation too forcefully, prioritizing its methods over the user\u2019s preferences. RUBICON balances the interaction dynamics between the AI and developer, promoting constructive exchanges without overwhelming or under-engaging. It calibrates the extent to which the AI should hypothesize and resolve issues versus how much it should leave to the developer.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1002645\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: AI-POWERED EXPERIENCE<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" aria-label=\"Microsoft research copilot experience\" data-bi-cn=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/01\/MSR-Chat-Promo.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft research copilot experience<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-copilot-experience\" class=\"large\">Discover more about research at Microsoft through our AI-powered experience<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" aria-describedby=\"microsoft-research-copilot-experience\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cn=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t\t\tStart now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"rubicon-s-rubric-based-method-and-evaluation\">RUBICON\u2019s rubric-based method and evaluation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">RUBICON is built on the foundational work of <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/learning-from-interaction-with-microsoft-copilot-web\/\">SPUR<\/a>\u2014the Supervised Prompting for User Satisfaction Rubrics framework that was recently introduced\u2014increasing its scope and crafting a broad spectrum of potential rubrics from each batch of data. Using a language model to create concise summaries that assess the quality of conversations, emphasizing communication principles, task orientation, and domain specificity. It identifies signals of user satisfaction and outlines the shared responsibilities of the user and the AI in achieving task objectives. These summaries are then refined into rubrics.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">RUBICON\u2019s novel selection algorithm sifts through numerous candidates to identify a select group of high-quality rubrics, enhancing their predictive accuracy in practical applications, as illustrated in Figure 3. The technique doesn\u2019t require human intervention and can be trained directly on anonymized conversational data, helping to ensure customer data privacy while still extracting the important features for analysis.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"749\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_framework_Figure3_1400px.png\" alt=\"The image contains three graphics. On the left is a bad Human-AI debugging conversation, and on the right is a good one. The center graphic lists sample rubrics generated by RUBICON from events of goodness\/badness from both the conversations. Arrows connect specific events in the conversations to the corresponding rubric. For example, one arrow starts from the part of the right conversation where the assistant provides a ready-to-use code snippet to solve the bug, ending at the rubric, \u201cThe assistant provides a code snippet to illustrate the solution, aiding the user in implementing the fix.\u201d\" class=\"wp-image-1048548\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_framework_Figure3_1400px.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_framework_Figure3_1400px-300x161.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_framework_Figure3_1400px-1024x548.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_framework_Figure3_1400px-768x411.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_framework_Figure3_1400px-710x380.png 710w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_framework_Figure3_1400px-240x128.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 3. Overview of RUBICON\u2019s framework and the various steps involved.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The effectiveness of RUBICON\u2019s method is evidenced by its rubrics, which show an 18% increase in accuracy over SPUR in classifying conversations as positive or negative, as shown in Figure 4. Additionally, RUBICON achieves near-perfect precision in predicting conversation labels in 84% of cases involving unlabeled data.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"604\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_conversations_Figure4_1400px.png\" alt=\"The image depicts a workflow illustrating the RUBICON technique. It begins with a set of conversations, from which signals indicating conversation quality are extracted. An LLM then analyzes these signals, reasoning about why they occurred, using domain-specific insights and understanding of the user-assistant interaction. Another LLM summarizes these reasonings into a rubric pool, applying Gricean maxims to evaluate conversational situations. Finally, RUBICON\u2019s novel selection policy algorithm selects the top-performing rubric from this pool.\" class=\"wp-image-1048554\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_conversations_Figure4_1400px.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_conversations_Figure4_1400px-300x129.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_conversations_Figure4_1400px-1024x442.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_conversations_Figure4_1400px-768x331.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Rubicon_conversations_Figure4_1400px-240x104.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Two analogous conversations facilitated by the Debugger AI assistant are evaluated against representative rubrics. Software engineers who evaluated the conversations found the one on the left less effective and the one on the right more so. RUBICON&#8217;s rubric also gave a higher score to the conversation on the right, demonstrating that RUBICON&#8217;s method of evaluation is consistent with that of the software engineers.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"rubicon-generated-rubrics\">RUBICON-generated rubrics&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">RUBICON-generated rubrics serve as a framework for understanding user needs, expectations, and conversational norms. These rubrics have been successfully implemented in Visual Studio IDE, where they have guided analysis of over 12,000 debugging conversations, offering valuable insights into the effectiveness of modifications made to the assistant and facilitating rapid fast iteration and improvement.\u00a0For example, the rubrics <em>\u201c<\/em>The AI gave a solution too quickly, rather than asking the user for more information and trying to find the root cause of the issue,\u201d or \u201cThe AI gave a mostly surface-level solution to the problem,\u201d have indicated issues where the assistant prematurely offered solutions without gathering sufficient information. These findings led to adjustments in the AI\u2019s behavior, making it more investigative and collaborative.\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Beyond conversational dynamics, the rubrics also identify systemic design flaws not directly tied to the conversational assistant. These include issues with the user interface issues that impede the integration of new code and gaps in user education regarding the assistant\u2019s capabilities. To use RUBICON, developers need a small set of labeled conversations from their AI assistant and specifically designed prompts that reflect the criteria for task progression and completion. The methodology and example of these rubrics are detailed in the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/rubicon-rubric-based-evaluation-of-domain-specific-human-ai-conversations\/\">paper<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"implications-and-looking-ahead\">Implications and looking ahead<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Developers of AI assistance value clear insights into the performance of their interfaces. RUBICON represents a valuable step toward developing a refined evaluation system that is sensitive to domain-specific tasks, adaptable to changing usage patterns, efficient, easy-to-implement, and privacy-conscious. A robust evaluation system like RUBICON can help to improve the quality of these tools without compromising user privacy or data security. As we look ahead, our goal is to broaden the applicability of RUBICON beyond just debugging in AI assistants like GitHub Copilot. We aim to support additional tasks like migration and scaffolding within IDEs, extending its utility to other chat-based Copilot experiences across various products.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>RUBICON evaluates AI-driven conversations and improves their quality by learning detailed domain-specific rubrics from minimal data. It gathers insights on AI assistant performance while maintaining user privacy and data security.<\/p>\n","protected":false},"author":42735,"featured_media":1048530,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"guest","value":"param-biyani","user_id":"1048509"},{"type":"user_nicename","value":"Yasharth Bajpai","user_id":"42228"},{"type":"user_nicename","value":"Arjun Radhakrishna","user_id":"39405"},{"type":"user_nicename","value":"Gustavo Soares","user_id":"39183"},{"type":"user_nicename","value":"Sumit Gulwani","user_id":"33755"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13560],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1048506","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-programming-languages-software-engineering","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[663303],"related-projects":[],"related-events":[],"related-researchers":[{"type":"guest","value":"param-biyani","user_id":1048509,"display_name":"Param Biyani","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/param-biyani\/\" aria-label=\"Visit the profile page for Param Biyani\">Param Biyani<\/a>","is_active":true,"last_first":"Biyani, Param","people_section":0,"alias":"param-biyani"},{"type":"user_nicename","value":"Yasharth Bajpai","user_id":42228,"display_name":"Yasharth Bajpai","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ybajpai\/\" aria-label=\"Visit the profile page for Yasharth Bajpai\">Yasharth Bajpai<\/a>","is_active":false,"last_first":"Bajpai, Yasharth","people_section":0,"alias":"ybajpai"},{"type":"user_nicename","value":"Arjun Radhakrishna","user_id":39405,"display_name":"Arjun Radhakrishna","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/arradha\/\" aria-label=\"Visit the profile page for Arjun Radhakrishna\">Arjun Radhakrishna<\/a>","is_active":false,"last_first":"Radhakrishna, Arjun","people_section":0,"alias":"arradha"},{"type":"user_nicename","value":"Gustavo Soares","user_id":39183,"display_name":"Gustavo Soares","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/gsoares\/\" aria-label=\"Visit the profile page for Gustavo Soares\">Gustavo Soares<\/a>","is_active":false,"last_first":"Soares, Gustavo","people_section":0,"alias":"gsoares"},{"type":"user_nicename","value":"Sumit Gulwani","user_id":33755,"display_name":"Sumit Gulwani","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sumitg\/\" aria-label=\"Visit the profile page for Sumit Gulwani\">Sumit Gulwani<\/a>","is_active":false,"last_first":"Gulwani, Sumit","people_section":0,"alias":"sumitg"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"Rubicon paper at Alware 2024\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/06\/Alware_-Rubicon-BlogHeroFeature-1400x788-1.png 1401w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"July 15, 2024","formattedExcerpt":"RUBICON evaluates AI-driven conversations and improves their quality by learning detailed domain-specific rubrics from minimal data. It gathers insights on AI assistant performance while maintaining user privacy and data security.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1048506","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42735"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1048506"}],"version-history":[{"count":27,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1048506\/revisions"}],"predecessor-version":[{"id":1050819,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1048506\/revisions\/1050819"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1048530"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1048506"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1048506"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1048506"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1048506"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1048506"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1048506"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1048506"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1048506"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1048506"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1048506"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1048506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}