{"id":1104735,"date":"2024-12-20T15:56:03","date_gmt":"2024-12-20T23:56:03","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1104735"},"modified":"2025-01-06T11:12:29","modified_gmt":"2025-01-06T19:12:29","slug":"aiopslab-building-ai-agents-for-autonomous-clouds","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/aiopslab-building-ai-agents-for-autonomous-clouds\/","title":{"rendered":"AIOpsLab: Building AI agents for autonomous clouds"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1.png\" alt=\"graphical user interface, application, icon\" class=\"wp-image-1112118\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-1280x720.png 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>In our increasingly complex digital landscape, enterprises and cloud providers face significant challenges in the development, deployment, and maintenance of sophisticated IT applications. The broad adoption of microservices and cloud-based serverless architecture has streamlined certain aspects of application development while simultaneously introducing a host of operational difficulties, particularly in fault diagnosis and mitigation. These complexities can result in outages, which have the potential to cause major business disruptions, underscoring the critical need for robust solutions that ensure high availability and reliability in cloud services. As the expectation for five-nines availability grows, organizations must navigate the intricate web of operational demands to maintain customer satisfaction and business continuity.&nbsp;<\/p>\n\n\n\n<p>To tackle these challenges, recent research on using AIOps agents for cloud operations\u2014such as AI agents for incident root cause analysis (RCA) or triaging\u2014has relied on proprietary services and datasets. Other prior works use frameworks specific to the solutions that they are building, or <em>ad hoc<\/em> and static benchmarks and metrics that fail to capture the dynamic nature of real-world cloud services. Users developing agents for cloud operations tasks with Azure AI Agent Service can evaluate and improve them using AIOpsLab. Furthermore, current approaches do not agree on standard metrics or a standard taxonomy for operational tasks. <strong>This calls for a standardized and principled research framework for building, testing, comparing, and improving AIOps agents.<\/strong> The framework should allow agents to interact with realistic service operation tasks in a reproducible manner. It must be flexible in extending to new applications, workloads, and faults. Importantly, it should go beyond just evaluating the AI agents and enabling users to improve the agents themselves; for example, by providing sufficient observability and even serving as a training environment (\u201cgym\u201d) to generate samples to learn on.&nbsp;&nbsp;<\/p>\n\n\n\n<p>We developed AIOpsLab, a holistic evaluation framework for researchers and developers, to enable the design, development, evaluation, and enhancement of AIOps agents, which also serves the purpose of reproducible, standardized, interoperable, and scalable benchmarks. AIOpsLab is open sourced at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/AIOpsLab\/\">GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> with the MIT license, so that researchers and engineers can leverage it to evaluate AIOps agents at scale. We recently presented the\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/building-ai-agents-for-autonomous-clouds-challenges-and-design-principles\/\">AIOpsLab vision paper<\/a>\u00a0at SoCC &#8217;24. Please see the\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/aiopslab-a-holistic-framework-for-evaluating-ai-agents-for-enabling-autonomous-cloud\/\">p<\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fresearch%2Fpublication%2Faiopslab-a-holistic-framework-for-evaluating-ai-agents-for-enabling-autonomous-cloud%2F&data=05%7C02%7Cv-ammelfi%40microsoft.com%7C15e1c8275ce94c4d841708dd221f279e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638704239499493512%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=kW3o2Gc8EPjJJZuDrZGpqDDqIEoT%2F6sv9QZ%2B0f5BTM8%3D&reserved=0\">reprint<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0for more details about the AIOpsLab framework.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"612\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/11\/AIOps-Lab_blog_fig1-overview_1400.jpg\" alt=\"Flowchart of an AIOpsLab system. The chart is divided into four main sections: AIOps Tasks, Orchestrator, Problem Cache, and Service. AIOps Tasks list various applications like SocialNetwork, HotelReservation, E-Commerce, and others, each with associated Data, Actions, Metrics. These tasks connect to the Orchestrator. The Orchestrator is the central element and interacts with various components: it receives a Problem Query Q, detailing Problem, Task T, Workload W, Fault F, and Solution S. It is responsible for deploying or running the workload and injecting faults, as well as taking actions based on the Service State relayed by an Agent. The Problem Cache connects to a Workload Generator and a Fault Generator, creating Workload W for the Service. The Service component shows observability through Traces, Metrics, and Logs. It communicates with the Orchestrator to provide service state updates. The components are connected with arrows that indicate the flow of data and control between each part of the system. \" class=\"wp-image-1106274\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/11\/AIOps-Lab_blog_fig1-overview_1400.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/11\/AIOps-Lab_blog_fig1-overview_1400-300x131.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/11\/AIOps-Lab_blog_fig1-overview_1400-1024x448.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/11\/AIOps-Lab_blog_fig1-overview_1400-768x336.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/11\/AIOps-Lab_blog_fig1-overview_1400-240x105.jpg 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 1. System architecture of AIOpsLab.&nbsp;<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"agent-cloud-interface-aci\">Agent-cloud interface (ACI)<\/h2>\n\n\n\n<p>AIOpsLab strictly separates the agent and the application service using an intermediate orchestrator. It provides several interfaces for other system parts to integrate and extend. First, it establishes a session with an agent to share information about benchmark problems: (1) the problem description, (2) instructions (e.g., response format), and (3) available APIs to call as actions.<\/p>\n\n\n\n<p>The APIs are a set of documented tools, e.g., get logs, get metrics, and exec shell, designed to help the agent solve a task. There are no restrictions on the agent&#8217;s implementation; the orchestrator poses problems and polls it for the next action to perform given the previous result. Each action must be a valid API call, which the orchestrator validates and carries out. The orchestrator has privileged access to the deployment and can take arbitrary actions (e.g., scale-up, redeploy) using appropriate tools (e.g., helm, kubectl) to resolve problems on behalf of the agent. Lastly, the orchestrator calls workload and fault generators to create service disruptions, which serve as live benchmark problems. AIOpsLab provides additional APIs to extend to new services and generators.&nbsp;<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading h5\" id=\"example-shows-how-to-onboard-an-agent-to-aiopslab\">Example shows how to onboard an agent to AIOpsLab<\/h3>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: false; title: ; notranslate\" title=\"\">\nfrom aiopslab import Orchestrator\nclass Agent:\n    def __init__(self, prob, instructs, apis):\n        self.prompt = self.set_prompt(prob, instructs, apis)\n        self.llm = GPT4()\n\n    async def get_action(self, state: str) -> str:\n        return self.llm.generate(self.prompt + state)\n\n#initialize the orchestrator\norch = Orchestrator()\npid = \"misconfig_app_hotel_res-mitigation-1\"\nprob_desc, instructs, apis = orch.init_problem(pid)\n\n#register and evaluate the agent\nagent = Agent(prob_desc, instructs, apis)\norch.register_agent(agent, name=\"myAgent\")\nasyncio.run(orch.start_problem(max_steps=10))\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"service\">Service<\/h2>\n\n\n\n<p>AIOpsLab abstracts a diverse set of services to reflect the variance in production environments. This includes live, running services that are implemented using various architectural principles, including microservices, serverless, and monolithic.<\/p>\n\n\n\n<p>We also leverage open-sourced application suites such as DeathStarBench as they provide artifacts, like source code and commit history, along with run-time telemetry. Adding tools like BluePrint can help AIOpsLab scale to other academic and production services.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"workload-generator\">Workload generator<\/h2>\n\n\n\n<p>The workload generator in AIOpsLab plays a crucial role by creating simulations of both faulty and normal scenarios. It receives specifications from the orchestrator, such as the task, desired effects, scale, and duration. The generator can use a model trained on real production traces to generate workloads that align with these specifications. Faulty scenarios may simulate conditions like resource exhaustion, exploit edge cases, or trigger cascading failures, inspired by real incidents. Normal scenarios mimic typical production patterns, such as daily activity cycles and multi-user interactions. When various characteristics (e.g., service calls, user distribution, arrival times) can lead to the desired effect, multiple workloads can be stored in the problem cache for use by the orchestrator. In coordination with the fault generator, the workload generator can also create complex fault scenarios with workloads.&nbsp;&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"fault-generator\">Fault generator<\/h2>\n\n\n\n<p>AIOpsLab has a novel push-button fault generator designed for generic applicability across various cloud scenarios. Our approach integrates application and domain knowledge to create adaptable policies and \u201coracles\u201d compatible with AIOps scenarios. This includes fine-grained fault injection capable of simulating complex failures inspired by production incidents. Additionally, it can inject faults at various system levels, exposing root causes while maintaining semantic integrity and considering interdependencies between cloud microservices. The fault injector&#8217;s versatility can enhance the reliability and robustness of cloud systems by enabling thorough testing and evaluation of AIOps capabilities.&nbsp;<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"670821\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Microsoft research newsletter<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-label=\"Microsoft Research Newsletter\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/09\/Newsletter_Banner_08_2019_v1_1920x1080.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft Research Newsletter<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-newsletter\" class=\"large\">Stay connected to the research community at Microsoft.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button is-style-fill-chevron\">\n\t\t\t\t\t\t<a href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-describedby=\"microsoft-research-newsletter\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t\t\tSubscribe today\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"observability\">Observability<\/h2>\n\n\n\n<p>AIOpsLab is equipped with an extensible observability layer designed to provide comprehensive monitoring capabilities across various system layers for any AIOps tool. AIOpsLab collects a wide array of telemetry data, including (1) traces from Jaeger detailing the end-to-end paths of requests through distributed systems, (2) application logs formatted and recorded by Filebeat and Logstash, and (3) system metrics monitored by Prometheus. Additionally, AIOpsLab also captures lower-level system information such as syscall logs and cluster information. As mentioned, we handle potential data overload through flexible APIs to tune the telemetry data relevant to the AIOps tools.&nbsp;<\/p>\n\n\n\n<p>AIOpsLab currently supports four key tasks within the AIOps domain. These tasks include incident detection, localization, root cause diagnosis, and mitigation. Additionally, we support several popular agent frameworks: React, Autogen, and TaskWeaver. Two key insights from the study emphasize the importance of observability and well-designed ACI. Observability is crucial for clear root-cause diagnosis, for instance, pinpointing a misconfigured API gateway can be vital in preventing service downtime.<\/p>\n\n\n\n<p>Flexibility is another critical factor; the ability to execute arbitrary shell commands allowed for effective troubleshooting in real-time scenarios. Lastly, robust error handling is essential\u2014providing agents with high-quality feedback on execution barriers, such as a failed database connection, ensures swift resolution and continuous improvement.&nbsp;<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading h5\" id=\"example-of-aci-on-diagnosis-task\">Example of ACI on diagnosis task<\/h3>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-constrained wp-container-core-column-is-layout-70165e62 wp-block-column-is-layout-constrained\" style=\"flex-basis:100%\">\n<pre class=\"wp-block-code\"><code><span style=\"color:blue;\">Agent:<\/span> I need to first look at the logs to understand <span style=\"color:purple;\">if<\/span> there are any errors <span style=\"color:blue;\">or<\/span> warnings that could indicate problems. \n\n<span style=\"color:blue;\">Action:<\/span> \n\n``` \n\nget_logs(<span style=\"color:red;\">\"compose-post-service\"<\/span>, <span style=\"color:red;\">\"test-social-network\"<\/span>) \n\n```\n<span style=\"color:blue;\">Service:<\/span> &#91;<span style=\"color:green;\">2024-Aug-04 23:18:49.365494<\/span>] &lt;info&gt;: \n\n\u2026 \n\n<span style=\"color:blue;\">Thrift:<\/span> <span style=\"color:green;\">Sun Aug 4 23:19:19 2024<\/span> TSocket::open() connect() \n<mark style=\"background-color:#ffb900\" class=\"has-inline-color\">&lt;Host: user-service Port: 9090&gt;: Connection refused<\/mark>\n \n<span style=\"color:blue;\">Agent:<\/span> \u2026 unable to connect to the <span style=\"color:red;\">`user-service`<\/span> at port <span style=\"color:green;\">9090<\/span>, even though the pod is running. <mark style=\"background-color:#ffb900\" class=\"has-inline-color\">This suggests a possible network issue or misconfiguration in service discovery<\/mark><\/code><\/pre>\n<\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"next-steps\">Next steps<\/h2>\n\n\n\n<p>This research project adopts Microsoft security standards and Responsible AI principles, and we envision this research evolving into a vital resource for organizations aiming to optimize their IT operations. Additionally, we plan to collaborate with various generative AI teams to incorporate AIOpsLab as a benchmark scenario for evaluating state-of-the-art models. By doing so, we aim to foster innovation and encourage the development of more advanced AIOps solutions. This research is essential not only for IT professionals but also for anyone invested in the future of technology, as it has the potential to redefine how organizations manage operations, respond to incidents, and ultimately serve their customers in an increasingly automated world.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"acknowledgements\">Acknowledgements<\/h2>\n\n\n\n<p>We would like to thank Yinfang Chen, Manish Shetty, Yogesh Simmhan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xuchaozhang\/\">Xuchao Zhang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jonathanmace\/\">Jonathan Mace<\/a>, Dax Vandevoorde, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pedrobr\/\">Pedro Las-Casas<\/a>, Shachee Mishra Gupta, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sumann\/\">Suman Nath<\/a>, for contributing to this project.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AIOpsLab is an open-source framework designed to evaluate and improve AI agents for cloud operations, offering standardized, scalable benchmarks for real-world testing, enhancing cloud system reliability.<\/p>\n","protected":false},"author":38004,"featured_media":1112118,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Minghua Ma","user_id":"41218"},{"type":"user_nicename","value":"Gagan Somashekar","user_id":"43416"},{"type":"user_nicename","value":"Rujia Wang","user_id":"42549"},{"type":"user_nicename","value":"Chetan Bansal","user_id":"31394"},{"type":"user_nicename","value":"Saravan Rajmohan","user_id":"41039"}],"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13560],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1104735","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-programming-languages-software-engineering","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[793670,811276],"related-projects":[855579],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Minghua Ma","user_id":41218,"display_name":"Minghua Ma","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/minghuama\/\" aria-label=\"Visit the profile page for Minghua Ma\">Minghua Ma<\/a>","is_active":false,"last_first":"Ma, Minghua","people_section":0,"alias":"minghuama"},{"type":"user_nicename","value":"Gagan Somashekar","user_id":43416,"display_name":"Gagan Somashekar","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/gsomashekar\/\" aria-label=\"Visit the profile page for Gagan Somashekar\">Gagan Somashekar<\/a>","is_active":false,"last_first":"Somashekar, Gagan","people_section":0,"alias":"gsomashekar"},{"type":"user_nicename","value":"Rujia Wang","user_id":42549,"display_name":"Rujia Wang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rujiawang\/\" aria-label=\"Visit the profile page for Rujia Wang\">Rujia Wang<\/a>","is_active":false,"last_first":"Wang, Rujia","people_section":0,"alias":"rujiawang"},{"type":"user_nicename","value":"Chetan Bansal","user_id":31394,"display_name":"Chetan Bansal","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chetanb\/\" aria-label=\"Visit the profile page for Chetan Bansal\">Chetan Bansal<\/a>","is_active":false,"last_first":"Bansal, Chetan","people_section":0,"alias":"chetanb"},{"type":"user_nicename","value":"Saravan Rajmohan","user_id":41039,"display_name":"Saravan Rajmohan","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/saravar\/\" aria-label=\"Visit the profile page for Saravan Rajmohan\">Saravan Rajmohan<\/a>","is_active":false,"last_first":"Rajmohan, Saravan","people_section":0,"alias":"saravar"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"White outline illustrations for AIOps on a blue and green gradient background.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/AIOps-Lab-BlogHeroFeature-1400x788-1.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"December 20, 2024","formattedExcerpt":"AIOpsLab is an open-source framework designed to evaluate and improve AI agents for cloud operations, offering standardized, scalable benchmarks for real-world testing, enhancing cloud system reliability.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1104735","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38004"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1104735"}],"version-history":[{"count":67,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1104735\/revisions"}],"predecessor-version":[{"id":1115778,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1104735\/revisions\/1115778"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1112118"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1104735"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1104735"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1104735"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1104735"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1104735"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1104735"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1104735"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1104735"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1104735"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1104735"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1104735"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}