{"id":893295,"date":"2022-10-31T15:42:00","date_gmt":"2022-10-31T22:42:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=893295"},"modified":"2022-10-31T15:42:01","modified_gmt":"2022-10-31T22:42:01","slug":"power-automate-with-copilot-the-back-story","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/power-automate-with-copilot-the-back-story\/","title":{"rendered":"Power Automate with copilot; the back story"},"content":{"rendered":"\n<p><strong>Authors:&nbsp;<\/strong>Will Dubyak, Chhaya Methani<\/p>\n\n\n\n<p>With Satya\u2019s copilot announcements &nbsp;(<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/ignite.microsoft.com\/en-US\/sessions\/8ac931a9-945f-4b45-9a6d-f37ee5c1b2bc?source=sessions\">Microsoft Ignite Opening<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> )at Ignite in the rear-view mirror, it\u2019s a good time to talk more about the kind of work and creative thinking that made it possible. If you aren\u2019t already familiar with the new ways to innovate with AI, such as the AI-based copilot to build your flow in seconds, check out the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/powerautomate.microsoft.com\/en-us\/blog\/new-ways-to-innovate-with-ai-and-microsoft-power-automate\/\">Microsoft Power Automate blog post<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;&nbsp; The idea that a plain language prompt can be used to generate a sophisticated automated workflow is powerful, and a glimpse into what the future holds with innovative large language models.&nbsp; &nbsp;But the path to this point was anything but easy and automatic.<\/p>\n\n\n\n<p>As anyone with a background in AI\/ML knows, the long pole in the execution tent for a good idea is training data.&nbsp;&nbsp; To train a model to generate a flow from a prompt assumes that we have lots of flows with associated prompts to show the model.&nbsp;<\/p>\n\n\n\n<p>We didn\u2019t. So we needed to be creative.<\/p>\n\n\n\n<p>Our solution took shape in 2 main dimensions.&nbsp;&nbsp; First, we devised a way to generate synthetic data for model training.&nbsp; We had many production flow skeletons that had been scrubbed of Personal Identifiable Information (PII), and we found ways to generate descriptions (or labels) for them to simulate the prompts a user might have generated. We also used a method to generate Natural Language (NL) utterances-flow pairs that we knew to be empirically relevant based on historical patterns in our existing Microsoft Power Automate flow data.<\/p>\n\n\n\n<p>A Power Automate flow is made up of a trigger that \u201cactivates\u201d the flow and steps that perform actions upon that trigger. For example:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\u201cWhen I get an email from my manager, send me a Teams notification\u201d;<\/li><li>\u201cSend me a message on Teams when a task is completed in Planner;\u201d<\/li><li>\u201cIf I receive an email that contains the subject &#8216;invoice&#8217; create an item on SharePoint\u201d.<\/li><\/ul>\n\n\n\n<p>We trained the first version of the model by using training data generated through manually generated prompts for the flows. &nbsp;&nbsp;We are using OpenAI Codex, which is the engine behind the GitHub Copilot tool which generates executable code from a natural language prompt. &nbsp;Because large language models lend themselves to new domains, we started achieving excellent results almost immediately.<\/p>\n\n\n\n<p>The model works by pairing a workflow with a natural language description to use as training data.&nbsp; The model \u2013 which we refer to internally as NL2Flow \u2013 learns the correspondence between the language and the flow and is later able to generate a new flow in response to a natural language prompt.&nbsp; (Interestingly, we have learned that it is working in far more than English; there was intense interest among Japanese users immediately after Ignite; even though it\u2019s not specifically trained in Japanese, it works surprisingly often!)&nbsp; There are many working production flows available, but very few of them have a description we can use in model training and testing.&nbsp;&nbsp;<\/p>\n\n\n\n<p><em><u>Generating synthetic data<\/u><\/em><\/p>\n\n\n\n<p>We augmented the data we had by generating synthetic (meaning \u201cwe created ourselves&#8221;) natural language query-flow pairs.<\/p>\n\n\n\n<p>Note that this is a reverse of the NL2Flow models.&nbsp; As a practical matter, this included fine tuning of a Codex model to generate new descriptions of an existing production flow, as well as inducing variation in the flow language by paraphrasing. The objective is not just a greater volume of training flows and descriptions, but also a broader selection of triggers and actions with which to generate flows. The team took two approaches:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\"><li>Reverse the original NL2Flow process and generate NL utterance for existing flows<\/li><li>Use a context grammar to generate synthetic label\/flow pairs<\/li><\/ol>\n\n\n\n<p><em><u>Flow2NL<\/u><\/em><\/p>\n\n\n\n<p>The first effort was to use NLG (Natural Language Generation) to generate NL descriptions from anonymized production flows.&nbsp;<\/p>\n\n\n\n<p>The figure below indicates the process.&nbsp; We input flow code to a fine-tuned Codex model and generated multiple natural language descriptions of flow activity.&nbsp; For economy of effort, these descriptions were submitted to judges for human review; they selected the ones they thought most accurate.&nbsp; On the first pass, 92% of data samples (flows) processed with this approach had agreement of 2 or more judges on at least one NL utterance that the model output.<\/p>\n\n\n\n<p>As an example, consider this flow:<\/p>\n\n\n\n<p>Flow Code:<\/p>\n\n\n\n<p><code>triggeroutputs = await shared_office365.OnNewEmailV3();&nbsp; \/\/ Trigger Function<\/code><\/p>\n\n\n\n<p><code>outputs_forward_email = shared_office365.ForwardEmail_V2(\u2018message_id\u2019: triggeroutputs?[\u2018body\u2019]?[\u2018MessageId\u2019]) \/\/ Forward email function<\/code><\/p>\n\n\n\n<p>The Flow2NL model generates the following paraphrased utterances, all of which result in the generation of the above flow.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Forward emails to a specific address<\/li><li>Forward email to another address<\/li><li>Forward emails from a specific address to a different address<\/li><\/ul>\n\n\n\n<p>Training the model with samples generated this way increases the robustness of the model to language variations. The flow chart below shows the flow of training data generated from Flow2NL pipeline, which is then used to train the NL2Flow model.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"662\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Flow2NL-Diagram-1024x662.png\" alt=\"Reverse NL2Flow Data\" class=\"wp-image-893301\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Flow2NL-Diagram-1024x662.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Flow2NL-Diagram-300x194.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Flow2NL-Diagram-768x497.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Flow2NL-Diagram-1536x994.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Flow2NL-Diagram-2048x1325.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Flow2NL-Diagram-240x155.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><em><u>Context Grammar<\/u><\/em><\/p>\n\n\n\n<p>As shown in Table 1, the extra data from Flow2NL helped in our efforts to produce good flow descriptions, but not as much as we needed.&nbsp; To achieve more diversity in flow descriptions we used a process called \u201cContext Grammar\u201d to vary flow descriptions. We iterated over all possible functions (with their corresponding prompts) needed to \u201cconstruct\u201d a flow. We created a tool called DataGen, that generates these combinations given a config file that contains the following:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\"><li>The grammar defines groups of co-occurring functions and their order in the flow. The grammar includes code patterns as well as the corresponding NL prompts, needed to replicate a real flow.<\/li><li>The list of all possible functions allowed in this group (both triggers and actions) and<\/li><li>The NL prompts or \u201cpatterns\u201d that correspond to these functions.<\/li><\/ol>\n\n\n\n<p>For example, consider the following config file describing the grammar structure to save attachments from an email. Please note that we only show iterations over one pattern (@SaveTo@) to keep it simple. The tool can expand multiple patterns recursively.<\/p>\n\n\n\n<p>Code Pattern:<\/p>\n\n\n\n<p><code>triggeroutputs = await shared_office365.OnNewEmailV3(); &nbsp;\/\/ Trigger Function<\/code><\/p>\n\n\n\n<p><code>\/\/ For loop on email attachments<\/code><\/p>\n\n\n\n<p><code>for (items_foreach in triggeroutputs?[\u2018body\u2019]?[\u2018attachments\u2019])<\/code><\/p>\n\n\n\n<p><code>{<\/code><\/p>\n\n\n\n<p><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \/\/Grammar pattern for set of possible functions allowed under this group<\/code><\/p>\n\n\n\n<p><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @SaveTo@<\/code><\/p>\n\n\n\n<p><code>}<\/code><\/p>\n\n\n\n<p>Corresponding NL Prompts describing the above code (Note: there are many ways to describe a given code):<\/p>\n\n\n\n<p>Save email attachments to @0@<\/p>\n\n\n\n<p>Store every email attachment I receive to @0@<\/p>\n\n\n\n<p>Pull attachments from outlook to @0@<\/p>\n\n\n\n<p>In the above NL-flow pair, the parameters enclosed in @ will be sampled from the list mentioned in Steps 2 & 3. The same config describes the function values that @SaveTo@ can take. The corresponding NL part will be used to replace all occurrences of @0@.<\/p>\n\n\n\n<p>Sampling from known patterns allows us to generate data inexpensively while still preserving relevant triggers and actions from our users.&nbsp; We added additional samples for under-represented connectors.<\/p>\n\n\n\n<p>Contextual Grammar enriched the training set for the NL2Flow model.&nbsp; See the results section for a detailed description of the impact of including both Flow2NL and Context Grammar.<\/p>\n\n\n\n<p><em><u>Model Training & Results<\/u><\/em><\/p>\n\n\n\n<p>Using the two approaches, we generated about 800 new training samples with Flow2NL, and about 3,000 new samples using the Context Grammar approach.&nbsp; We ensured the distribution of generated flows across topics was about the same as in the production samples.<\/p>\n\n\n\n<p>We created a test set for tracking improvements across models trained on different iterations of the data. We computed a custom similarity metric for determining flow similarity between the predicted and the ground truth code. We do a fuzzy match to compute similarity by counting the number of correctly predicted API calls (includes triggers as well as actions) divided by the total number of predicted functions. For e.g. if the ground truth for a certain flow has 5 function calls; the model predicted 6 functions and 4 of those are correct, the similarity measure would be 4\/6 = 0.66.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Source<\/td><td>Relative improvement in Similarity Measure<\/td><\/tr><tr><td>Baseline Model + Flow2NL<\/td><td>3.2%<\/td><\/tr><tr><td>Baseline + Context Grammar<\/td><td>9.5%<\/td><\/tr><tr><td>Base + Flow2NL + Context Grammar<\/td><td>15.9%<\/td><\/tr><\/tbody><\/table><figcaption><em>Table 1: Relative impact of different types of synthetic data on model performance<\/em><\/figcaption><\/figure>\n\n\n\n<p>As we can see above, both Flow2NL and Context Grammar give the best improvements over baseline when both are added to the train set. This shows how powerful it is to add synthetic samples to improve the model strategically to improve as and where needed.<\/p>\n\n\n\n<p>We invite you to create a Power Automate flow today by following this link <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/make.powerautomate.com\">Power Automate<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and clicking on \u201cCreate +.\u201d Select the option \u201cYou describe it, AI builds it\u201d.&nbsp;&nbsp; &nbsp;Please leave feedback or ask a question on our forum Microsoft Power Automate Community &#8211; Power Platform Community! We are continuously improving the model and would love to hear from you!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-1024x576.png\" alt=\"graphical user interface, application\" class=\"wp-image-893313\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-1536x864.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/Picture4.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Authors:&nbsp;Will Dubyak, Chhaya Methani With Satya\u2019s copilot announcements &nbsp;(Microsoft Ignite Opening (opens in new tab) )at Ignite in the rear-view mirror, it\u2019s a good time to talk more about the kind of work and creative thinking that made it possible. If you aren\u2019t already familiar with the new ways to innovate with AI, such as [&hellip;]<\/p>\n","protected":false},"author":41344,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":804652,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-893295","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":804652,"type":"group"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/893295","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/41344"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/893295\/revisions"}],"predecessor-version":[{"id":894420,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/893295\/revisions\/894420"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=893295"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=893295"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=893295"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=893295"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}