{"id":1145758,"date":"2025-07-24T15:07:58","date_gmt":"2025-07-24T22:07:58","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/"},"modified":"2025-07-31T11:06:25","modified_gmt":"2025-07-31T18:06:25","slug":"office-ai-science-team","status":"publish","type":"msr-group","link":"https:\/\/www.microsoft.com\/en-us\/research\/group\/office-ai-science-team\/","title":{"rendered":"Office AI Science Team"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"627\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/IreSocial-TWLIFB-1200x627-1.jpg\" class=\"attachment-full size-full\" alt=\"Stylized digital illustration of a multi-layered circuit board. A glowing blue microchip sits at the top center, with intricate circuitry radiating outward. Beneath it, four stacked layers transition in color from blue to orange, each featuring circuit-like patterns. Smaller rectangular and circular components are connected around the layers, all set against a dark background with scattered geometric shapes.\" style=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/IreSocial-TWLIFB-1200x627-1.jpg 1200w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/IreSocial-TWLIFB-1200x627-1-300x157.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/IreSocial-TWLIFB-1200x627-1-1024x535.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/IreSocial-TWLIFB-1200x627-1-768x401.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/IreSocial-TWLIFB-1200x627-1-240x125.jpg 240w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"office-ai-science\">Office AI Science<\/h1>\n\n\n\n<p>Office AI Science builds systems that are leveraged across M365,and especially within Word, Excel, and PowerPoint<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p>The\u00a0Office AI Science\u00a0team is\u00a0part of OPG.\u00a0We build\u00a0systems\u00a0that\u00a0are\u00a0leveraged\u00a0across\u00a0M365 and especially within Word, Excel, and PowerPoint.\u00a0The team\u2019s\u00a0recent\u00a0projects have included:\u00a0PPT\u00a0Summarization,\u00a0Audio\u00a0Overviews\u00a0(Podcast),\u00a0SPOCK Eval, Data Pipeline,\u00a0Natural\u00a0Language to\u00a0Office JS,\u00a0and\u00a0CUA.\u00a0<\/p>\n\n\n\n<p><em>PPT\u00a0Summarization:<\/em>\u00a0The Office AI Science team\u00a0built the first\u00a0fine-tuned SLM within M365.\u00a0The\u00a0fine-tuned\u00a0Phi-3 Vision SLM\u00a0improved p95 latency\u00a0of PPT Visual Summary feature\u00a0from 13 seconds to 2 seconds, while\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft-my.sharepoint-df.com\/:w:\/p\/mbentley\/EblR6uqMH95Jr_uMYZzetyoBa0-MEtWJTDWMyX0vDZWGLA?e=9AUVry\" target=\"_blank\" rel=\"noopener noreferrer\">maintaining quality<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0on par with GPT-4o-v.\u00a0This\u00a0optimization resulted in\u00a075 times\u00a0fewer\u00a0GPUs being used\u00a0compared to GPT-4o-v\u00a0and almost\u00a09 times the number of\u00a0PowerPoint\u00a0users\u00a0receiving\u00a0a\u00a0visual summary.\u00a0The fine-tuned SLM also powers PPT Visual Q&A, making it both faster and cheaper.\u00a0The\u00a0team also introduced\u00a0PPT Interactive Summary, which allows users to drill into visual summaries in more detail, leading to\u00a0over\u00a050% decline in thumbs down per 100k tries\u00a0over 3 months, 30% interactivity clicking on chevron to go deeper, and a 17.6% increase in weekly return rate.\u00a0The team is currently fine-tuning\u00a04o-mini-vision\u00a0with the goal of replacing remaining non-English traffic to GPT-4o-v with this smaller model\u00a0and evaluating\u00a0Phi-4\u00a0Vision for English.\u00a0<\/p>\n\n\n\n<p><em>Audio&nbsp;Overviews:<\/em>&nbsp;The team&nbsp;is building the&nbsp;Audio&nbsp;Overview&nbsp;Skill that&nbsp;introduces&nbsp;a&nbsp;podcast-like&nbsp;experience for consuming documents and artifacts.&nbsp;The feature is currently in the dogfood phase for MSIT, with production rollout scheduled for May 7 onwards.&nbsp;Users will be able&nbsp;to generate&nbsp;Audio&nbsp;Overviews&nbsp;from App Chat entry points in Word Win32 & Web, Copilot Notebooks (including OneNote), and other apps like Outlook Web, OneDrive&nbsp;Web&nbsp;and ODSP Mobile.&nbsp;Latest&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft-my.sharepoint-df.com\/:w:\/p\/mbentley\/EX5HG7vv_dpLpLa3H3auw08BSFnkwUyCGwroOnV0BpbjjQ?e=Q6FzcT\" target=\"_blank\" rel=\"noopener noreferrer\">human evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;scores overall transcript&nbsp;quality for&nbsp;the&nbsp;single file audio overview&nbsp;at&nbsp;4.08\/5.00&nbsp;compared&nbsp;to 3.76\/5.00&nbsp;for&nbsp;NotebookLM, and with&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft-my.sharepoint-df.com\/:w:\/p\/mbentley\/Efl0wIqgr7xPuUxAXn16V4QBo9ohfTNuC1BJ8poRLnJduw?e=T49IdT\" target=\"_blank\" rel=\"noopener noreferrer\">automated evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,&nbsp;the team improved the&nbsp;overall&nbsp;score from&nbsp;an initial&nbsp;4.09 to&nbsp;4.65&nbsp;with a two-step&nbsp;design leveraging&nbsp;GPT-4o and o3-mini.&nbsp;More details, including&nbsp;evaluation against multiple files for the Copilot Notebooks&nbsp;scenario&nbsp;and&nbsp;gains from moving to GPT-4.1, can be found&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft-my.sharepoint-df.com\/:w:\/p\/robsteen\/Ed720S9OOoJOpWe6b2N-w5wBw-vOiNhsPmuPCpf7FsN0fg?e=ZfywoN\" target=\"_blank\" rel=\"noopener noreferrer\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;<\/p>\n\n\n\n<p><em>SPOCK&nbsp;(AugLoop&nbsp;Eval)<\/em>:&nbsp;In collaboration with&nbsp;AugLoop,&nbsp;the Office AI Science team&nbsp;developed several key features that enable agility in evaluating App Copilot scenario quality metrics. By the end of&nbsp;FY25Q3, 22 scenarios have been onboarded across Word, PPT, Office AI, and SharePoint, with Excel onboarding in-progress. The platform currently&nbsp;reliably&nbsp;runs 300&nbsp;eval&nbsp;jobs and 30,000 tests daily. The&nbsp;automated&nbsp;scenario evaluation turnaround time&nbsp;compared&nbsp;to manual run&nbsp;has significantly decreased from days&nbsp;to 2-4 hours. SPOCK now&nbsp;supports intent detection, Leo Metrics,&nbsp;BizChat&nbsp;1K Query, Python, and Typescript customer evaluators;&nbsp;model swap and FlexV3&nbsp;eval&nbsp;are&nbsp;coming in Q4.&nbsp;Additionally, the v-team&nbsp;is automating&nbsp;the App Copilot Quality&nbsp;Dashboard&nbsp;(<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fweb.augloop-tools.officeppe.com%2Feval%2Fquality-dashboards&data=05%7C02%7CFanguang.Kong%40microsoft.com%7C6784508affbe4b86e9f808dd8263a2c1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638810086750654646%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=62JrAfgiFX3AIz2DF8ovebnoXaxgd%2Fu86qk7rhegOdo%3D&reserved=0\" target=\"_blank\" rel=\"noopener noreferrer\">\u00c6VAL &#8211; Copilot Evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>), providing a comprehensive overview of the quality of App Copilot scenarios.&nbsp;<\/p>\n\n\n\n<p><em>Data&nbsp;Pipeline:<\/em>&nbsp;The&nbsp;team also&nbsp;created an online,&nbsp;self-serve,&nbsp;on-demand&nbsp;ADF&nbsp;pipeline for mining&nbsp;Office documents from the internet.&nbsp;This allows partners to kick off&nbsp;large-scale data mining jobs for specific languages and document types and features custom metadata extractors for extracting task-dependent document representations.&nbsp;By&nbsp;leveraging&nbsp;Bing\u2019s&nbsp;precrawled&nbsp;40B URL&nbsp;RetroIndex,&nbsp;document&nbsp;discovery is fast and efficient.&nbsp;OAI Science&nbsp;and several&nbsp;partner teams&nbsp;(Word+Editor, PPT Science, Word Designer, Designer, MSAI)&nbsp;are already&nbsp;utilizing&nbsp;the data for finetuning and test set creation.&nbsp;<\/p>\n\n\n\n<p><em>Natural Language&nbsp;to Office JS:<\/em>&nbsp;The&nbsp;Office AI Science team&nbsp;is working&nbsp;to finetune o* family model for&nbsp;common&nbsp;Office&nbsp;scenarios&nbsp;like&nbsp;inserting slides from another PowerPoint file, inserting headers and footers in Word, or&nbsp;creating and finding merged ranges in Excel.&nbsp;<\/p>\n\n\n\n<p><em>CUA:<\/em>&nbsp;The team&nbsp;also&nbsp;recently&nbsp;embarked on an exploration of Computer User Agent&nbsp;(CUA)&nbsp;centered on understanding user intent and adapting in real time.&nbsp;Leveraging&nbsp;plan&nbsp;assistance&nbsp;with&nbsp;the&nbsp;Office knowledge base, the&nbsp;team&nbsp;approximately doubled&nbsp;the task completion rate&nbsp;against&nbsp;OSWorld&nbsp;PPT scenarios.&nbsp;The team is&nbsp;working on&nbsp;fine-tuning the CUA model to improve task&nbsp;completions for Office apps.&nbsp;<\/p>\n\n\n\n<p><em>For more contact:&nbsp;<\/em><a href=\"mailto:amandagu@microsoft.com\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Amanda Gunnemo<\/em><\/a><em>&nbsp;or&nbsp;<\/em><a href=\"mailto:vishalc@microsoft.com\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Vishal&nbsp;Chowdhary<\/em><\/a><em><\/em>&nbsp;<\/p>\n\n\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Office AI Science builds systems that are leveraged across M365,and especially within Word, Excel, and PowerPoint The\u00a0Office AI Science\u00a0team is\u00a0part of OPG.\u00a0We build\u00a0systems\u00a0that\u00a0are\u00a0leveraged\u00a0across\u00a0M365 and especially within Word, Excel, and PowerPoint.\u00a0The team\u2019s\u00a0recent\u00a0projects have included:\u00a0PPT\u00a0Summarization,\u00a0Audio\u00a0Overviews\u00a0(Podcast),\u00a0SPOCK Eval, Data Pipeline,\u00a0Natural\u00a0Language to\u00a0Office JS,\u00a0and\u00a0CUA.\u00a0 PPT\u00a0Summarization:\u00a0The Office AI Science team\u00a0built the first\u00a0fine-tuned SLM within M365.\u00a0The\u00a0fine-tuned\u00a0Phi-3 Vision SLM\u00a0improved p95 latency\u00a0of PPT Visual Summary [&hellip;]<\/p>\n","protected":false},"featured_media":1145543,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_group_start":"","footnotes":""},"research-area":[13556],"msr-group-type":[243694],"msr-locale":[268875],"msr-impact-theme":[],"class_list":["post-1145758","msr-group","type-msr-group","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-group-type-group","msr-locale-en_us"],"msr_group_start":"","msr_detailed_description":"","msr_further_details":"","msr_hero_images":[],"msr_research_lab":[],"related-researchers":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-projects":[],"related-events":[],"related-opportunities":[],"related-posts":[],"tab-content":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/1145758","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-group"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/1145758\/revisions"}],"predecessor-version":[{"id":1146267,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/1145758\/revisions\/1146267"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1145543"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1145758"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1145758"},{"taxonomy":"msr-group-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group-type?post=1145758"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1145758"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1145758"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}