{"id":1154385,"date":"2025-11-11T09:00:00","date_gmt":"2025-11-11T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1154385"},"modified":"2025-11-19T08:04:59","modified_gmt":"2025-11-19T16:04:59","slug":"bluecodeagent-a-blue-teaming-agent-enabled-by-automated-red-teaming-for-codegen-ai","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/bluecodeagent-a-blue-teaming-agent-enabled-by-automated-red-teaming-for-codegen-ai\/","title":{"rendered":"BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1.jpg\" alt=\"Three white icons on a blue-to-green gradient background: the first icon shows a circle with connected nodes, the second shows a circuit, and the third shows a flowchart\" class=\"wp-image-1154392\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"introduction\">Introduction<\/h2>\n\n\n\n<p>Large\u00a0language\u00a0models\u00a0(LLMs)\u00a0are now widely used for automated code generation across software engineering tasks. However, this powerful capability in code generation also introduces security concerns. Code generation systems could be misused for harmful purposes, such as generating malicious code.\u00a0It\u00a0could also\u00a0produce\u00a0bias-filled\u00a0code reflecting\u00a0underlying logic that is\u00a0discriminatory\u00a0or unethical. Additionally, even when completing benign tasks, LLMs may inadvertently produce vulnerable code that\u00a0contains\u00a0security flaws (e.g., injection risks, unsafe input handling). These unsafe outcomes undermine the trustworthiness of code generation models and pose threats to the broader software ecosystem, where safety and reliability are critical. <\/p>\n\n\n\n<p>Many&nbsp;studies have explored red teaming code LLMs, testing whether the models can reject unsafe requests and whether their generated code&nbsp;exhibits&nbsp;insecure patterns. For more details, see our earlier MSR blog post on&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/redcodeagent-automatic-red-teaming-agent-against-diverse-code-agents\/\">RedCodeAgent<\/a>. While red teaming has significantly improved our understanding of model failure modes, progress on blue teaming\u2014i.e., developing effective defensive mechanisms to detect and prevent such failures\u2014remains&nbsp;relatively limited. Current blue teaming approaches face several challenges: (1)&nbsp;Poor alignment with security concepts:&nbsp;additional&nbsp;safety&nbsp;prompts&nbsp;struggle to help models&nbsp;understand high-level notions,&nbsp;such as what constitutes a malicious or bias instruction, and typically lack actionable principles to guide safe decision-making. A case study is shown in Figure 1.&nbsp;(2)<strong>&nbsp;<\/strong>Over-conservatism:<strong>&nbsp;<\/strong>especially in the domain of vulnerable code detection, models tend to misclassify safe code as unsafe, leading to more false positives and reduced developer trust<strong>.<\/strong>&nbsp;(3)&nbsp;Incomplete risk coverage: without a strong knowledge foundation, models perform poorly when dealing with subtle or previously unseen risks.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>To address these challenges, researchers from the University of Chicago, University of California, Santa Barbara, University of Illinois Urbana\u2013Champaign, VirtueAI, and Microsoft Research recently released a paper: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/bluecodeagent-a-blue-teaming-agent-enabled-by-automated-red-teaming-for-codegen-ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI<\/a>. This work makes the following key contributions:&nbsp;<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Diverse red-teaming pipeline:<\/strong> The authors design a comprehensive red-teaming process that integrates multiple strategies to synthesize diverse red-teaming data for effective knowledge accumulation.<\/li>\n\n\n\n<li><strong>Knowledge-enhanced blue teaming:<\/strong> Building on the foundation of red-teaming knowledge, BlueCodeAgent significantly improves blue-teaming performance by leveraging constitutions derived from knowledge and dynamic testing.&nbsp;<\/li>\n\n\n\n<li><strong>Principled-Level Defense and Nuanced-Level analysis:<\/strong> The authors propose two complementary strategies\u2014Principled-Level Defense (via constitutions) and Nuanced-Level Analysis (via dynamic testing)\u2014and demonstrate their synergistic effects in vulnerable code detection tasks.&nbsp;<\/li>\n\n\n\n<li><strong>Generalization to seen and unseen risks:<\/strong> Empowered by comprehensive red-teaming knowledge, BlueCodeAgent generalizes effectively to unseen risks. Overall, BlueCodeAgent achieves an average 12.7% improvement in F1 score across four datasets and three tasks, attributed to its ability to distill actionable constitutions that enhance context-aware risk detection.&nbsp;<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1202\" height=\"371\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure1.png\" alt=\"Figure 1. A case study of BlueCodeAgent on the bias instruction detection task. Even when concepts such as \u201cbiased\u201d are explicitly included in additional safety prompts, models often fail to recognize biased requests (left). BlueCodeAgent (right) addresses this gap by summarizing constitutions from knowledge and applying concrete, actionable constraints benefited from red teaming to improve the defense. \" class=\"wp-image-1154397\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure1.png 1202w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure1-300x93.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure1-1024x316.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure1-768x237.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure1-240x74.png 240w\" sizes=\"auto, (max-width: 1202px) 100vw, 1202px\" \/><figcaption class=\"wp-element-caption\">Figure 1. A case study of BlueCodeAgent on the bias instruction detection task. Even when concepts such as \u201cbiased\u201d are explicitly included in additional safety prompts, models often fail to recognize biased requests (left). BlueCodeAgent (right) addresses this gap by summarizing constitutions from knowledge and applying concrete, actionable constraints benefited from red teaming to improve the defense. <\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a-blue-teaming-agent-enabled-by-red-teaming\">A blue teaming agent enabled by red teaming<\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1222\" height=\"581\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure2.png\" alt=\"Figure 2: Overview of BlueCodeAgent, an end-to-end blue teaming framework powered by automated red teaming for code security. By integrating knowledge derived from diverse red teaming and conducting dynamic sandbox-based testing, BlueCodeAgent substantially strengthens the defensive capabilities beyond static LLM analysis. \" class=\"wp-image-1154396\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure2.png 1222w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure2-300x143.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure2-1024x487.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure2-768x365.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure2-240x114.png 240w\" sizes=\"auto, (max-width: 1222px) 100vw, 1222px\" \/><figcaption class=\"wp-element-caption\">Figure 2: Overview of BlueCodeAgent, an end-to-end blue teaming framework powered by automated red teaming for code security. By integrating knowledge derived from diverse red teaming and conducting dynamic sandbox-based testing, BlueCodeAgent substantially strengthens the defensive capabilities beyond static LLM analysis. <\/figcaption><\/figure>\n\n\n\n<p>Figure 2 presents an overview of the pipeline. The framework unifies both sides of the process: red teaming generates diverse risky cases and behaviors, which are then distilled into actionable constitutions that encode safety rules on the blue-teaming side. These constitutions guide BlueCodeAgent to more effectively detect unsafe textual inputs and code outputs, mitigating limitations such as poor alignment with abstract security concepts.&nbsp;<\/p>\n\n\n\n<p>This work targets three major risk categories, covering both input\/textual-level risks\u2014including biased and malicious instructions\u2014and output\/code-level risks, where models may generate vulnerable code. These categories represent risks that have been widely studied in prior research.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"diverse-red-teaming-process-for-knowledge-accumulation\">Diverse red-teaming process for knowledge accumulation&nbsp;<\/h2>\n\n\n\n<p>Since different tasks require distinct attack strategies, the&nbsp;red-teaming&nbsp;employs multiple attack methods to generate realistic and diverse data. Specifically, the red-teaming process is divided into three categories:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Policy-based instance generation<\/strong>: To synthesize policy-grounded red-teaming data, diverse security and ethical policies are first collected. These high-level principles are then used to prompt an uncensored model to generate instances that intentionally violate the specified policies.<\/li>\n\n\n\n<li><strong>Seed-based adversarial prompt optimization<\/strong>: Existing adversarial instructions are often overly simplistic and easily rejected by models. To overcome this limitation, an adaptive red-teaming agent invokes various jailbreak tools to iteratively refine initial seed prompts until the prompts achieve high attack success rates.<\/li>\n\n\n\n<li><strong>Knowledge-driven vulnerability generation<\/strong>: To synthesize both vulnerable and safe code samples under realistic programming scenarios, domain knowledge of common software weaknesses (CWE) is leveraged to generate diverse code examples.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"knowledge-enhanced-blue-teaming-agent\">Knowledge-enhanced blue teaming agent&nbsp;<\/h2>\n\n\n\n<p>After accumulating red-teaming knowledge data, BlueCodeAgent set up <strong>Principled-Level Defense via Constitution Construction<\/strong> and <strong>Nuanced-Level Analysis via Dynamic Testing<\/strong>.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Principled-Level Defense via Constitution Construction<\/strong>&nbsp;<br>Based on the most relevant knowledge data<strong>, <\/strong>BlueCodeAgent summarizes red-teamed knowledge into actionable constitutions\u2014explicit rules and principles distilled from prior attack data. These constitutions serve as normative guidelines, enabling the model to stay aligned with ethical and security principles even when confronted with novel or unseen adversarial inputs.&nbsp;<\/li>\n\n\n\n<li><strong>Nuanced-Level Analysis via Dynamic Testing<\/strong>&nbsp;<br>In vulnerable code detection, BlueCodeAgent augments static reasoning with dynamic sandbox-based analysis, executing generated code within isolated Docker environments to verify whether the model-reported vulnerabilities manifest as actual unsafe behaviors. This dynamic validation effectively mitigates the model\u2019s tendency toward over-conservatism, where benign code is mistakenly flagged as vulnerable.&nbsp;<\/li>\n<\/ol>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"999693\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Event Series<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/microsoft-research-forum\/?OCID=msr_researchforum_MCR_Blog_Promo\" aria-label=\"Microsoft Research Forum\" data-bi-cN=\"Microsoft Research Forum\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/05\/Research-Forum-hero_1400x788.jpg\" alt=\"Research Forum | abstract background with colorful hexagons\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft Research Forum<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-forum\" class=\"large\">Join us for a continuous exchange of ideas about research in the era of general AI. Watch the first four episodes on demand.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/microsoft-research-forum\/?OCID=msr_researchforum_MCR_Blog_Promo\" aria-describedby=\"microsoft-research-forum\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft Research Forum\" target=\"_blank\">\n\t\t\t\t\t\t\tWatch on-demand\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"insights-from-bluecodeagent\">Insights from BlueCodeAgent&nbsp;<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"bluecodeagent-outperforms-prompting-baselines\">BlueCodeAgent outperforms prompting baselines&nbsp;<\/h3>\n\n\n\n<p>As shown in Figure 3, BlueCodeAgent significantly outperforms other baselines. Several findings are highlighted.&nbsp;<\/p>\n\n\n\n<p>(1) Even when test categories differ from knowledge categories to simulate unseen scenarios, BlueCodeAgent effectively leverages previously seen risks to handle unseen ones, benefiting from its knowledge-enhanced safety reasoning.&nbsp;<\/p>\n\n\n\n<p>(2) BlueCodeAgent is model-agnostic, working consistently across diverse base LLMs, including both open-source and commercial models. Its F1 scores for bias and malicious instruction detection approach 1.0, highlighting strong effectiveness.&nbsp;<\/p>\n\n\n\n<p>(3) BlueCodeAgent achieves a strong balance between safety and usability. It accurately identifies unsafe inputs while maintaining a reasonable false-positive rate on benign ones, resulting in a consistently high F1 score.&nbsp;<\/p>\n\n\n\n<p>(4) By contrast, prompting with general or fine-grained safety reminders remains insufficient for effective blue teaming, as models struggle to internalize abstract safety concepts and apply them to unseen risky scenarios. BlueCodeAgent bridges this gap by distilling actionable constitutions from knowledge, using concrete and interpretable safety constraints to enhance model alignment.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"993\" height=\"602\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure3.png\" alt=\"Figure 3. F1 scores on bias instruction detection task (BlueCodeEval-Bias) in the first row and on malicious instruction detection task (BlueCodeEval-Mal, RedCode-based) in the second row. \" class=\"wp-image-1154395\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure3.png 993w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure3-300x182.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure3-768x466.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent_figure3-240x145.png 240w\" sizes=\"auto, (max-width: 993px) 100vw, 993px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 3:<\/strong>&nbsp;F1 scores on bias instruction detection task (BlueCodeEval-Bias) in the first row and on malicious instruction detection task (BlueCodeEval-Mal) in the second row.&nbsp;<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"complementary-effects-of-constitutions-and-dynamic-testing\">Complementary effects of constitutions and dynamic testing&nbsp;<\/h2>\n\n\n\n<p>In vulnerability detection tasks, models tend to behave conservatively\u2014an effect also noted in prior research. They are often more likely to flag code as <em>unsafe<\/em> rather than <em>safe<\/em>. This bias is understandable: confirming that code is completely free from vulnerabilities is generally harder than spotting a potential issue.&nbsp;<\/p>\n\n\n\n<p>To mitigate this over-conservatism, BlueCodeAgent integrates dynamic testing into its analysis pipeline. When BlueCodeAgent identifies a potential vulnerability, it triggers a reliable model (Claude-3.7-Sonnet-20250219) to generate test cases and corresponding executable code that embeds the suspicious snippet. These test cases are then run in a controlled environment to verify whether the vulnerability actually manifests. The final judgment combines the LLM\u2019s analysis of the static code, the generated test code, run-time execution results, and constitutions derived from knowledge.&nbsp;<\/p>\n\n\n\n<p>Researchers find the two components\u2014constitutions and dynamic testing\u2014play complementary roles. Constitutions expand the model\u2019s understanding of risk, increasing true positives (TP) and reducing false negatives (FN). Dynamic testing, on the other hand, focuses on reducing false positives (FP) by validating whether predicted vulnerabilities can truly be triggered at run-time. Together, they make BlueCodeAgent both more accurate and more reliable in blue-teaming scenarios.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"summary\">Summary&nbsp;<\/h2>\n\n\n\n<p>BlueCodeAgent introduces an end-to-end blue-teaming framework designed to address risks in code generation. The key insight behind BlueCodeAgent is that comprehensive red-teaming can greatly strengthen blue-teaming defenses. Based on this idea, the framework first builds a red-teaming process with diverse strategies for generating red-teaming data. It then constructs a blue-teaming agent that retrieves relevant examples from the red-teaming knowledge base and summarizes safety constitutions to guide LLMs in making accurate defensive decisions. A dynamic testing component is further added to reduce false positives in vulnerability detection.&nbsp;<\/p>\n\n\n\n<p>Looking ahead, several directions hold promise.&nbsp;&nbsp;<\/p>\n\n\n\n<p>First, it is valuable to explore the generalization of BlueCodeAgent to other categories of code-generation risks beyond bias, malicious code, and vulnerable code. This may require designing and integrating novel red-teaming strategies into BlueCodeAgent and creating corresponding benchmarks for new risks.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Second, scaling BlueCodeAgent to the file and repository levels could further enhance its real-world utility, which requires equipping agents with more advanced context retrieval tools and memory components.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Finally, beyond code generation, it is also important to extend BlueCodeAgent to mitigate risks in other modalities, including text, image, video, and audio, as well as in multimodal applications.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>BlueCodeAgent is an end-to-end blue-teaming framework built to boost code security using automated red-teaming processes, data, and safety rules to guide LLMs\u2019 defensive decisions. Dynamic testing reduces false positives in vulnerability detection.<\/p>\n","protected":false},"author":43868,"featured_media":1154392,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13558],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142,269145],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1154385","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-security-privacy-cryptography","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river","msr-post-option-pinned-for-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[{"type":"guest","value":"chengquan-guo-2","user_id":"1152980","display_name":"Chengquan Guo ","author_link":"<a href=\"https:\/\/www.chengquanguo.com\/\" aria-label=\"Visit the profile page for Chengquan Guo \">Chengquan Guo <\/a>","is_active":true,"last_first":"Guo , Chengquan","people_section":0,"alias":"chengquan-guo-2"},{"type":"guest","value":"yuzhou-nie","user_id":"1154388","display_name":"Yuzhou  Nie","author_link":"<a href=\"https:\/\/rucnyz.github.io\/\" aria-label=\"Visit the profile page for Yuzhou  Nie\">Yuzhou  Nie<\/a>","is_active":true,"last_first":"Nie, Yuzhou ","people_section":0,"alias":"yuzhou-nie"},{"type":"guest","value":"chulin-xie","user_id":"1152981","display_name":"Chulin Xie","author_link":"<a href=\"https:\/\/alphapav.github.io\/\" aria-label=\"Visit the profile page for Chulin Xie\">Chulin Xie<\/a>","is_active":true,"last_first":"Xie, Chulin","people_section":0,"alias":"chulin-xie"},{"type":"user_nicename","value":"Zinan Lin","user_id":42327,"display_name":"Zinan Lin","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/zinanlin\/\" aria-label=\"Visit the profile page for Zinan Lin\">Zinan Lin<\/a>","is_active":false,"last_first":"Lin, Zinan","people_section":0,"alias":"zinanlin"},{"type":"guest","value":"wenbo-guo","user_id":"1154389","display_name":"Wenbo Guo","author_link":"<a href=\"https:\/\/henrygwb.github.io\/\" aria-label=\"Visit the profile page for Wenbo Guo\">Wenbo Guo<\/a>","is_active":true,"last_first":"Guo, Wenbo","people_section":0,"alias":"wenbo-guo"},{"type":"guest","value":"bo-li-3","user_id":"1154390","display_name":"Bo Li","author_link":"<a href=\"https:\/\/aisecure.github.io\/\" aria-label=\"Visit the profile page for Bo Li\">Bo Li<\/a>","is_active":true,"last_first":"Li, Bo","people_section":0,"alias":"bo-li-3"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"Three white icons on a blue-to-green gradient background: the first icon shows a circle with connected nodes, the second shows a circuit, and the third shows a flowchart\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/10\/BlueCodeAgent-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"November 11, 2025","formattedExcerpt":"BlueCodeAgent is an end-to-end blue-teaming framework built to boost code security using automated red-teaming processes, data, and safety rules to guide LLMs\u2019 defensive decisions. Dynamic testing reduces false positives in vulnerability detection.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1154385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43868"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1154385"}],"version-history":[{"count":15,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1154385\/revisions"}],"predecessor-version":[{"id":1155980,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1154385\/revisions\/1155980"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1154392"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1154385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1154385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1154385"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1154385"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1154385"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1154385"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1154385"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1154385"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1154385"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1154385"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1154385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}