{"id":1136154,"date":"2025-04-10T09:00:00","date_gmt":"2025-04-10T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1136154"},"modified":"2025-08-01T08:58:26","modified_gmt":"2025-08-01T15:58:26","slug":"debug-gym-an-environment-for-ai-coding-tools-to-learn-how-to-debug-code-like-programmers","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/debug-gym-an-environment-for-ai-coding-tools-to-learn-how-to-debug-code-like-programmers\/","title":{"rendered":"Debug-gym: an environment for AI coding tools to learn how to debug code like programmers"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2100\" height=\"1182\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1.jpg\" alt=\"A graphic with a gradient background transitioning from blue on the left to pink on the right. The graphic features a white outline of a computer monitor with code brackets on the screen, an arrow pointing downwards into the monitor, and another arrow curving around to point upwards towards a magnifying glass with a bug icon inside it.\" class=\"wp-image-1136365\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1.jpg 2100w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1536x865.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 2100px) 100vw, 2100px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The ongoing proliferation of AI coding tools is not only boosting developers\u2019 efficiency, it also signals a future where AI will generate a growing share of all new code. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.freethink.com\/robots-ai\/github-copilot\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub CEO Thomas Dohmke<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> predicted as much in 2023, when he said that &#8220;sooner than later, 80% of the code is going to be written by Copilot.&#8221;&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Both large and small software companies are already heavily using AI to generate code. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.youtube.com\/watch?v=coojA-odaTk&t=861s\" target=\"_blank\" rel=\"noopener noreferrer\">Y Combinator\u2019s Garry Tan<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> noted that 95% of code for a quarter of Y Combinator\u2019s latest batch of startups was written by large language models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In fact, <strong>most developers spend the majority of their time debugging code,<\/strong> not writing it. As maintainers of popular open-source repositories, this resonates with us. But what if an AI tool could propose fixes for hundreds of open issues, and all we had to do was approve them before merging? This was what motivated us to maximize the potential time savings from AI coding tools by teaching them to debug code.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By debugging we mean the interactive, iterative process to fix code. Developers typically hypothesize why their code crashed, then gather evidence by stepping through the program and examining variable values. They often use debugging tools like pdb (Python debugger) to assist in gathering information. This process is repeated until the code is fixed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Today&#8217;s AI coding tools boost productivity and excel at suggesting solutions for bugs based on available code and error messages. However, unlike human developers, these tools don&#8217;t seek additional information when solutions fail, leaving some bugs unaddressed, as you can see in this simple <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/debug-gym\/\" target=\"_blank\" rel=\"noopener noreferrer\">demo of how a mislabeled column stumps today\u2019s coding tools<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. This may leave users feeling like AI coding tools don\u2019t understand the full context of the issues they are trying to solve.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"introducing-debug-gym\">Introducing debug-gym<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A natural research question emerges: <strong>to what degree can LLMs use interactive debugging tools such as pdb?<\/strong> To explore this question, we released <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/debug-gym\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>debug-gym<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a> \u2013 an environment that allows code-repairing agents to access tools for active information-seeking behavior. Debug-gym expands an agent\u2019s action and observation space with feedback from tool usage, enabling setting breakpoints, navigating code, printing variable values, and creating test functions. Agents can interact with tools to investigate code or rewrite it, if confident. We believe interactive debugging with proper tools can empower coding agents to tackle real-world software engineering tasks and is central to LLM-based agent research. The fixes proposed by a coding agent with debugging capabilities, and then approved by a human programmer, will be grounded in the context of the relevant codebase, program execution and documentation, rather than relying solely on guesses based on previously seen training data.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1503\" height=\"488\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_intro_diagram_v2.png\" alt=\"Figure 1: Diagram demonstrating the code-repairing process in outline. Left: conventional code-repairing system; right: additional tools enabled by debug-gym.\" class=\"wp-image-1136156\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_intro_diagram_v2.png 1503w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_intro_diagram_v2-300x97.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_intro_diagram_v2-1024x332.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_intro_diagram_v2-768x249.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_intro_diagram_v2-240x78.png 240w\" sizes=\"auto, (max-width: 1503px) 100vw, 1503px\" \/><figcaption class=\"wp-element-caption\">Figure 1: Diagram demonstrating the code-repairing process in outline. In most existing approaches (shown in <strong>black<\/strong>), an agent rewrites its code conditioned on error messages obtained from executing the code. debug-gym equips the agent with additional tools such as pdb (shown in <strong>red<\/strong>), so it can interactively seek necessary information from the semantic space hidden behind the code and therefore have better code-repairing performance.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Debug-gym is designed and developed to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Handle repository-level information<\/strong>: the full repository is available to agents in debug-gym, allowing them to navigate and edit files.<\/li>\n\n\n\n<li><strong>Be robust and safe<\/strong>: to safeguard both the system and the development process, debug-gym runs code within sandbox Docker containers. This isolates the runtime environment, preventing harmful actions while still allowing thorough testing and debugging.&nbsp;&nbsp;<\/li>\n\n\n\n<li><strong>Be easily extensible<\/strong>: debug-gym was conceived with extensibility in mind and provides practitioners with the possibility of easily adding new tools.&nbsp;&nbsp;<\/li>\n\n\n\n<li><strong>Be text-based<\/strong>: debug-gym represents observation information in structured text (e.g., JSON format) and defines a simple syntax for text actions, making the environment fully compatible with modern LLM-based agents.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">With debug-gym, researchers and developers can specify a folder path to work with any custom repository to evaluate their debugging agent&#8217;s performance. Additionally, debug-gym includes three coding benchmarks to measure LLM-based agents\u2019 performance in interactive debugging: Aider for simple function-level code generation, Mini-nightmare for short, hand-crafted buggy code examples, and SWE-bench for real-world coding problems requiring a comprehensive understanding of a large codebase and a solution in the format of a GitHub pull request.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To learn more about debug-gym and start using it to train your own debugging agents, please refer to the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/debug-gym-a-text-based-environment-for-interactive-debugging\/\" target=\"_blank\" rel=\"noreferrer noopener\">technical report<\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/debug-gym\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"early-experimentation-promising-signal\">Early experimentation: promising signal<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For our initial attempt to validate that LLMs perform better on coding tests when they have access to debugging tools, we built a simple prompt-based agent and provided it with access to the following debug tools: eval, view, pdb, rewrite, and listdir. We used nine different LLMs as the backbone for our agent. Detailed results can be found in the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/debug-gym-a-text-based-environment-for-interactive-debugging\/\">technical report<\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2503.21557\" target=\"_blank\" rel=\"noopener noreferrer\">.<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Even with debugging tools, our simple prompt-based agent rarely solves more than half of the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.swebench.com\/lite.html\" target=\"_blank\" rel=\"noopener noreferrer\">SWE-bench <span class=\"sr-only\"> (opens in new tab)<\/span><\/a>Lite issues. We believe this is due to the scarcity of data representing sequential decision-making behavior (e.g., debugging traces) in the current LLM training corpus. However, the significant performance improvement (as shown in the most promising results in the graph below) validates that this is a promising research direction.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1934\" height=\"974\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_froggy_bar_chart.png\" alt=\"Figure 2: The success rate represents the percentage of the 300 SWE-bench Lite issues resolved, comparing between agents with and without debugging tools.\" class=\"wp-image-1136159\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_froggy_bar_chart.png 1934w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_froggy_bar_chart-300x151.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_froggy_bar_chart-1024x516.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_froggy_bar_chart-768x387.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_froggy_bar_chart-1536x774.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/DeBug_froggy_bar_chart-240x121.png 240w\" sizes=\"auto, (max-width: 1934px) 100vw, 1934px\" \/><figcaption class=\"wp-element-caption\">Figure 2: The success rate represents the percentage of the 300 SWE-bench Lite issues resolved. The green bars indicate the performance of the agent with debugging tools, while the gray bars show the performance of the agent without debugging tools. Note that both agents use the same backbone LLM to make decisions and propose code edits.<\/figcaption><\/figure>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1144027\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">PODCAST SERIES<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/ai-testing-and-evaluation-learnings-from-science-and-industry\/\" aria-label=\"AI Testing and Evaluation: Learnings from Science and Industry\" data-bi-cn=\"AI Testing and Evaluation: Learnings from Science and Industry\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_River_No_Text_1400x788.jpg\" alt=\"Illustrated headshots of Daniel Carpenter, Timo Minssen, Chad Atalla, and Kathleen Sullivan for the Microsoft Research Podcast\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">AI Testing and Evaluation: Learnings from Science and Industry<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"ai-testing-and-evaluation-learnings-from-science-and-industry\" class=\"large\">Discover how Microsoft is learning from other domains to advance evaluation and testing as a pillar of AI governance.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/ai-testing-and-evaluation-learnings-from-science-and-industry\/\" aria-describedby=\"ai-testing-and-evaluation-learnings-from-science-and-industry\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cn=\"AI Testing and Evaluation: Learnings from Science and Industry\" target=\"_blank\">\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"future-work\">Future work<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We believe that training or fine-tuning LLMs can enhance their interactive debugging abilities. This requires specialized data, such as trajectory data that records agents interacting with a debugger to gather information before suggesting a fix. Unlike conventional reasoning problems, interactive debugging involves generating actions at each step that trigger feedback from the environment. This feedback helps the agent make new decisions, requiring dense data like the problem description and the sequence of actions leading to the solution.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Our plan is to fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs. The goal is to use this model to actively build relevant context for a code generation model. If the code generation model is large, there is an opportunity to build a smaller info-seeking model that can provide relevant information to the larger one, e.g., a generalization of retrieval augmented generation (RAG), thus saving AI inference costs. The data collected during the reinforcement learning loop to train the info-seeking model can also be used to fine-tune larger models for interactive debugging.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We are open-sourcing debug-gym to facilitate this line of research. We encourage the community to help us advance this research towards building interactive debugging agents and, more generally, agents that can seek information by interacting with the world on demand.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"acknowledgements\">Acknowledgements<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We thank Ruoyao Wang for their insightful discussion on building interactive debugging agents, Chris Templeman and Elaina Maffeo for their team coaching, Jessica Mastronardi and Rich Ciapala for their kind support in project management and resource allocation, and Peter Jansen for providing valuable feedback for the technical report.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Developers spend a lot of time debugging code. Learn how debug-gym can equip AI agents to help, enabling them to set breakpoints, navigate the codebase, and print runtime variable values on demand, so they better understand the code and its execution flow.<\/p>\n","protected":false},"author":38004,"featured_media":1136365,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Eric Yuan","user_id":"37167"},{"type":"guest","value":"morgane-moss","user_id":"1136050"},{"type":"guest","value":"charbel-feghali","user_id":"1136221"},{"type":"user_nicename","value":"Chinmay Singh","user_id":"36750"},{"type":"user_nicename","value":"Darya Moldavskaya","user_id":"43569"},{"type":"guest","value":"drew-macphee","user_id":"1136277"},{"type":"user_nicename","value":"Lucas Caccia","user_id":"43647"},{"type":"user_nicename","value":"Matheus Pereira","user_id":"42417"},{"type":"user_nicename","value":"Minseon Kim","user_id":"43620"},{"type":"user_nicename","value":"Alessandro Sordoni","user_id":"37230"},{"type":"user_nicename","value":"Marc-Alexandre C\u00f4t\u00e9","user_id":"37197"}],"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13560],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1136154","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-programming-languages-software-engineering","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199571,437514],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[863034,896463,1148823],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Eric Yuan","user_id":37167,"display_name":"Eric Yuan","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/eryua\/\" aria-label=\"Visit the profile page for Eric Yuan\">Eric Yuan<\/a>","is_active":false,"last_first":"Yuan, Eric","people_section":0,"alias":"eryua"},{"type":"guest","value":"morgane-moss","user_id":1136050,"display_name":"Morgane Moss","author_link":"<a href=\"https:\/\/mormio.github.io\/\" aria-label=\"Visit the profile page for Morgane Moss\">Morgane Moss<\/a>","is_active":true,"last_first":"Moss, Morgane","people_section":0,"alias":"morgane-moss"},{"type":"guest","value":"charbel-feghali","user_id":1136221,"display_name":"Charbel Feghali","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/charbel-feghali-link\" aria-label=\"Visit the profile page for Charbel Feghali\">Charbel Feghali<\/a>","is_active":true,"last_first":"Feghali, Charbel","people_section":0,"alias":"charbel-feghali"},{"type":"user_nicename","value":"Chinmay Singh","user_id":36750,"display_name":"Chinmay Singh","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chsingh\/\" aria-label=\"Visit the profile page for Chinmay Singh\">Chinmay Singh<\/a>","is_active":false,"last_first":"Singh, Chinmay","people_section":0,"alias":"chsingh"},{"type":"user_nicename","value":"Darya Moldavskaya","user_id":43569,"display_name":"Darya Moldavskaya","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dmoldavskaya\/\" aria-label=\"Visit the profile page for Darya Moldavskaya\">Darya Moldavskaya<\/a>","is_active":false,"last_first":"Moldavskaya, Darya","people_section":0,"alias":"dmoldavskaya"},{"type":"guest","value":"drew-macphee","user_id":1136277,"display_name":"Drew MacPhee","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/drewmacphee\" aria-label=\"Visit the profile page for Drew MacPhee\">Drew MacPhee<\/a>","is_active":true,"last_first":"MacPhee, Drew","people_section":0,"alias":"drew-macphee"},{"type":"user_nicename","value":"Matheus Pereira","user_id":42417,"display_name":"Matheus Pereira","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/matpereira\/\" aria-label=\"Visit the profile page for Matheus Pereira\">Matheus Pereira<\/a>","is_active":false,"last_first":"Pereira, Matheus","people_section":0,"alias":"matpereira"},{"type":"user_nicename","value":"Minseon Kim","user_id":43620,"display_name":"Minseon Kim","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/minseonkim\/\" aria-label=\"Visit the profile page for Minseon Kim\">Minseon Kim<\/a>","is_active":false,"last_first":"Kim, Minseon","people_section":0,"alias":"minseonkim"},{"type":"user_nicename","value":"Alessandro Sordoni","user_id":37230,"display_name":"Alessandro Sordoni","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/alsordon\/\" aria-label=\"Visit the profile page for Alessandro Sordoni\">Alessandro Sordoni<\/a>","is_active":false,"last_first":"Sordoni, Alessandro","people_section":0,"alias":"alsordon"},{"type":"user_nicename","value":"Marc-Alexandre C\u00f4t\u00e9","user_id":37197,"display_name":"Marc-Alexandre C\u00f4t\u00e9","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/macote\/\" aria-label=\"Visit the profile page for Marc-Alexandre C\u00f4t\u00e9\">Marc-Alexandre C\u00f4t\u00e9<\/a>","is_active":false,"last_first":"C\u00f4t\u00e9, Marc-Alexandre","people_section":0,"alias":"macote"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"DeBug blog graphic with a line drawing of a computer screen on the left and a magnifying glass with a bug on the right.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1536x865.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/NEWDeBug-BlogHeroFeature-1400x788-1-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"April 10, 2025","formattedExcerpt":"Developers spend a lot of time debugging code. Learn how debug-gym can equip AI agents to help, enabling them to set breakpoints, navigate the codebase, and print runtime variable values on demand, so they better understand the code and its execution flow.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1136154","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38004"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1136154"}],"version-history":[{"count":17,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1136154\/revisions"}],"predecessor-version":[{"id":1136374,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1136154\/revisions\/1136374"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1136365"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1136154"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1136154"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1136154"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1136154"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1136154"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1136154"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1136154"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1136154"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1136154"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1136154"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1136154"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}