Woven by Toyota and Azure OpenAI Service automate 80% of software code safety fixes

Woven by Toyota develops new mobility technologies delivered through autonomous driving and advanced driver assistance systems. Streamlining code fixes to comply with safety standards was a massive undertaking—the solution to which was generative AI.

Using GPT-4o, the organization tried automatically fixing code for safety standard compliance. Following an initial correction rate of 50 percent, the trial became an official project, achieving amazing results through a multi-agent AI system.

Using the latest generative AI models and a multi-agent system provides automatic suggestions to fix around 80 percent of coding errors. The system also states reasoning and certainty, enabling engineers to make the fixes with confidence.

Gargantuan code fixes for compliance with automotive software standards

Woven by Toyota drives the evolution of mobility to enhance safety, peace of mind, fulfillment, and opportunity for all. It strives to advance the mobility of people, objects, information, and energy for a more connected world with richer human potential. Through technologies and initiatives like autonomous driving (AD), advanced driving assistant systems (ADAS), the Arene OS software platform, the Woven City mobility test course, and Toyota’s Woven Capital growth fund, the organization is weaving the future of mobility.

“On-board autonomous driving and advanced driver assistance technologies must follow Motor Industry Software Reliability Association (MISRA) coding standards,” says Suigen Koide, Tech Lead Manager of the Woven by Toyota AD/ADAS Perception & Prediction team. MISRA refers to both the coding standards on safety and reliability for embedded control systems and the organization setting these standards.

However, “MISRA compliance isn’t easy,” says Yosuke Sawai, who develops embedded control software as a Senior Engineer in the AD/ADAS Recognition Integration Team.

“MISRA is a set of C/C++ programming guidelines for developing embedded control software, but they are hundreds of pages long so it takes forever to learn everything,” Sawai continues. “Additionally, fewer engineers can handle C/C++ these days and not many fully understand MISRA. When proof-of-concept software enters the production phase, engineers have to use static analysis tools to check MISRA compliance and correct any errors.”

According to Sawai, the amount of these errors is massive.

“For example, we corrected about 60,000 errors in the software of a recognition module for advanced driver assistance.”

“After graduating, the first book I read was Taiichi Ohno's ‘Workplace Management,’ in which he insisted on eliminating waste. If we reinterpret knowledge and labor waste as that which we can substitute with AI agents, we can see a lot of waste in daily work. This project has shown us waste in our day-to-day work and a course for improvement.”

Suigen Koide, Tech Lead Manager, MLOps, Perception & Prediction, AD/ADAS, Woven by Toyota

A proof-of-concept test with Azure OpenAI Service overdelivers

AI presented an intriguing opportunity in terms of fixing code for MISRA compliance. An idea first surfaced in 2023, recalls Koide. In June 2024, a project to use generative AI to streamline AD/ADAS development began. The team also decided to run a proof-of-concept test for MISRA compliance at this time.

Yuya Mochimaru, an MLOps engineer in the AD/ADAS Perception & Prediction team, managed this test. What started as a side project requiring about 10 percent of work hours produced results beyond expectations, he says.

“We tried using Azure OpenAI Service GPT-4o to fix MISRA compliance errors in sample code written in C, where the error correction rate reached about 50 percent,” Mochimaru explains. “In a similar trial on code we had developed internally, the automatic correction rate was also about 50 percent.”

When Woven by Toyota provided the details to Microsoft Japan in a regular meeting, Microsoft found the results “extremely significant.” Kosuke Miyasaka of the Azure Operations Division of Microsoft Japan explains the reasoning below.

“This is a very advanced example of Azure OpenAI Service; its potential impact on the automotive industry is massive. Almost all automakers and their business partners put an enormous amount of effort into MISRA compliance every year. With the emergence of technologies like reasoning models and expanded multi-agent use, replicating our results at a production level could dramatically accelerate development in the mobility field.”

Given the potential impact, Woven by Toyota promoted the generative AI-based MISRA compliance side endeavor to a formal project.

“Generative AI refactoring code autonomously is what surprised me the most. Its accuracy is comparable to a veteran engineer with a thorough knowledge of embedded implementations. It also leaves remarks for future maintenance procedures and to-do comments and reasons for eventual improvements.”

Yosuke Sawai, Senior Engineer, Recognition Integration, AD/ADAS, Woven by Toyota

A system ready for development rooms in just two Azure Light-ups

Woven by Toyota’s first step was to participate in Azure Light-ups, Microsoft’s bite-sized hackathons for corporate users. At the first Light-up, which occurred in December 2024, engineers determined a system configuration applicable for development rooms and started the build.

The second Azure Light-up was in January 2025, producing an enhanced user interface and a GitHub-integrated CI/CD mechanism. Identifying original code and error reports with GitHub’s function for managing issues enabled the system to auto-generate MISRA-compliant code.

Azure OpenAI Service’s generative AI models have rapidly evolved alongside initiatives like those of Woven by Toyota’s. For example, the o1 model, a new reasoning model which is different from the GPT series, has appeared.

According to Koide, “When we tested the o1 model in January 2025, the success rate for automatic code correction jumped to about 80 percent. We had wanted to improve it to around 70 percent, so the results outstripped our expectations.”

Confident in the solution's readiness, Koide and his team made a presentation to Woven by Toyota and the Toyota Group’s subcontractors at the end of January 2025. The results delighted attendees, some of whom requested Visual Studio Code (VS Code) compatibility. This feature materialized quickly, as Mochimaru elaborates on the speed of development below.

“I’m not familiar with developing VS Code extensions, so I used Azure OpenAI Service’s generative AI, which implemented the extension in a day for feedback the following week. In fact, with generative AI, we also created a demo for GitHub integration in just a day. We realized we could rely on generative AI for system implementation.”

Yuya Mochimaru, MLOps Engineer, Perception & Prediction, AD/ADAS, Woven by Toyota

Greater accuracy and explainability with multiple agents

Microsoft would soon make a new proposal: a multi-agent system comprising multiple generative AI, providing all necessary functions without relying on singular generative AI systems. The figure below demonstrates the configuration of the system, including a generative AI model employing the new o3-mini for superior coding capabilities. The system uses popular products and services like Azure OpenAI Service, Azure App Service, Azure Cosmos DB, GitHub Enterprise, and AutoGen.

More about this diagram

The first AI agent (“Coder”) fixes the code. Another AI agent (“Reviewer”) checks the results and gives feedback to Coder with suggestions for making better corrections. Coder then makes further modifications according to the suggestions, which Reviewer checks in a repeating cycle. Once Reviewer decides the fixes are sufficient, the results progress to a third AI agent (“Evaluator”), which checks and evaluates the content. Evaluator assesses the code and generates reasons and certainty factors for changes to the code. Humans confirm the result and make the final decision on acceptance or rejection.

Keisuke Hatasaki, a Microsoft Global Black Belt, explains the effectiveness of multi-agent systems below.

“Multi-agent systems have two main purposes. The first is to minimize the fuzzy results of generative AI by having other agents evaluate and provide feedback. The second is to make it easier for third parties to understand what the generative AI is doing. Essentially, it boosts the accuracy and explainability of the generated results.”

“In a proof-of-concept test using in-house code, we achieved a code generation success rate of 97.1 percent and automatically corrected 81.5 percent of MISRA compliance errors,” says Koide. “Multi-agent systems provide important information for engineers, such as code modification reasoning and certainty factors, overcoming the black boxing inherent to singular generative AI.”

In addition, Microsoft’s AutoGen provides orchestration between agents. Mochimaru says he was able to implement this feature in just a day with the help of generative AI.

“Multi-agent systems have two main purposes. The first is to minimize the fuzzy results of generative AI by having other agents evaluate and provide feedback. The second is to make it easier for third parties to understand what the generative AI is doing. Essentially, it boosts the accuracy and explainability of the generated results.”

Keisuke Hatasaki, App Innovation Solution Specialist, Global Black Belt - Asia, Microsoft Corporation

Autonomous refactoring at the same level as experienced engineers

Having multiple agents working together also improves autonomy. Sawai explains more below.

“Generative AI refactoring code autonomously is what surprised me the most. Its accuracy is comparable to a veteran engineer with a thorough knowledge of embedded implementations. It also leaves remarks for future maintenance procedures and to-do comments and reasons for eventual improvements. ‘Super agents’ is a better name for agents in multi-agent setups.”

The term ‘refactoring’ means improving the internal organization of code without altering its behavior. It includes making messy code clean and consolidating common descriptions. This makes it easier to add features, debug, and improve the code’s maintainability.

“Leveraging multi-agent, generative AI will push software development to a new paradigm. These changes could drastically change the role that humans take in development rooms,” says Koide.

“After graduating, the first book I read was Taiichi Ohno's Workplace Management, in which he insisted on eliminating waste. If we reinterpret knowledge and labor waste as that which we can substitute with AI agents, we can see a lot of waste in daily work. This project has shown us waste in our day-to-day work and a course for improvement.”

“This was possible because of Microsoft’s user-friendly, state-of-the-art generative AI apps. I believe Microsoft will continue to be a platform for AI’s permeation into society.”

“Almost all automakers and their business partners put an enormous amount of effort into MISRA compliance every year. With the emergence of technologies like reasoning models and expanded multi-agent use, replicating our results at a production level could dramatically accelerate development in the mobility field.”

Kosuke Miyasaka, Azure App Innovation Specialist, Intelligent Cloud Office, Azure Operations Division, Microsoft Japan