{"id":1139134,"date":"2025-05-13T02:37:04","date_gmt":"2025-05-13T09:37:04","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=1139134"},"modified":"2025-05-13T03:23:19","modified_gmt":"2025-05-13T10:23:19","slug":"teaching-llms-to-think-xian-zhang-on-advancing-mathematical-reasoning-in-ai","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/teaching-llms-to-think-xian-zhang-on-advancing-mathematical-reasoning-in-ai\/","title":{"rendered":"Teaching LLMs to think: Xian Zhang on advancing mathematical reasoning in AI"},"content":{"rendered":"\n<p>Math is more than a school subject\u2014it&#8217;s the engine behind scientific discovery, driving advances in everything from climate modeling to AI.<\/p>\n\n\n\n<p>At Microsoft Research Asia, senior researcher <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/zhxian\/\">Xian Zhang<\/a>&nbsp;is leading efforts to help AI move beyond surface-level pattern recognition toward deeper, rules-based reasoning. In a recent interview, he explained how this shift could significantly expand what large language models (LLMs) are capable of.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1024\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/xian-zhang-llms-for-mathematical-reasoning-1-768x1024.jpg\" alt=\"Xian Zhang\" class=\"wp-image-1134209\" style=\"width:481px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/xian-zhang-llms-for-mathematical-reasoning-1-768x1023.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/xian-zhang-llms-for-mathematical-reasoning-1-225x300.jpg 225w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/xian-zhang-llms-for-mathematical-reasoning-1-1153x1536.jpg 1153w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/xian-zhang-llms-for-mathematical-reasoning-1-1537x2048.jpg 1537w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/xian-zhang-llms-for-mathematical-reasoning-1-135x180.jpg 135w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/xian-zhang-llms-for-mathematical-reasoning-1.jpg 1921w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><figcaption class=\"wp-element-caption\">Xian Zhang<\/figcaption><\/figure>\n\n\n\n<p><strong>Q: Why is mathematical reasoning important for the development of LLMs and AI in general?<\/strong><\/p>\n\n\n\n<p><strong>Zhang:<\/strong>&nbsp;Mathematical reasoning plays a central role in AI development. As models acquire this skill, they improve in broader reasoning tasks by learning structured approaches and logical patterns.<\/p>\n\n\n\n<p>Math helps AI manage complexity, improving performance in code optimization, common-sense reasoning, and semantic understanding\u2014with gains in both accuracy and efficiency.<\/p>\n\n\n\n<p>Improving LLMs&#8217; understanding of mathematical structure is a step toward building AI that can handle the rigor and precision required in scientific and technical fields, ultimately accelerating the pace of discovery.<\/p>\n\n\n\n<p><strong>Q: Where does AI currently stand in terms of mathematical reasoning, and what are the main challenges?<\/strong><\/p>\n\n\n\n<p><strong>Zhang:<\/strong>&nbsp;AI\u2019s ability to reason mathematically still depends heavily on the breadth and quality of its training data. With rich and diverse data, LLMs can solve complex problems\u2014even some at the level of math Olympiads\u2014by generalizing from similar patterns. But when data is sparse or uneven, models can falter even on basic arithmetic problems.<\/p>\n\n\n\n<p>Because LLMs work by recognizing and replicating patterns, they may hallucinate solutions or miss the underlying logic entirely without enough data in a given area. In theory, if their training data were sufficiently comprehensive, their performance could rival top human problem-solvers.<\/p>\n\n\n\n<p>We often compare this to &#8220;brute-force&#8221; versus &#8220;genius&#8221; learning. With enough practice, people can solve difficult problems. Geniuses, by contrast, grasp deep patterns quickly. Most high performers combine both\u2014extensive exposure and rapid internalization. Similarly, LLMs need far more training data than humans to achieve comparable results for a single task.<\/p>\n\n\n\n<p><strong>Q: What research has your team conducted in this field?<\/strong><\/p>\n\n\n\n<p><strong>Zhang:<\/strong>&nbsp;We approach mathematical reasoning from a rules-based rather than a data-driven perspective. Our goal is to help LLMs learn the fundamental principles of math and independently apply them to new problems.<\/p>\n\n\n\n<p>To achieve this, we emphasize formalization and symbolization\u2014<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/openreview.net\/pdf?id=8ihVBYpMV4\">translating natural language math problems into formal mathematical expressions<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. This process works like a &#8220;language translation&#8221; for math\u2014once the model comprehends the symbolic representation, it can understand the underlying logic. This way, LLMs can perform various operations like calculators while maintaining strong generalization capabilities.<\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/openreview.net\/pdf?id=FiyS0ecSm0\">We\u2019ve successfully applied this process to complex inequality proofs at the Olympiad level<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, demonstrating that models can learn and apply mathematical rules. This success establishes a foundation for extending these capabilities to broader areas, including algebra, geometry, and number theory.<\/p>\n\n\n\n<p>We also developed techniques for generating synthetic math data. Through formalization, we created diverse <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/openreview.net\/pdf?id=CIcMZGLyZW\">problem-answer pairs<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/2410.15748\">composite theorems<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, similar to how a teacher designs new variations of a problem to teach students. This approach increases both the volume and diversity of training data, enhancing the model\u2019s exposure and adaptability.<\/p>\n\n\n\n<p>However, relying solely on large-scale data and computation\u2014the so-called &#8220;scaling law&#8221; approach\u2014is unsustainable. Instead, we favor a structured, rules-based methodology encompassing problem generation, understanding, and developing proofs. This enables LLMs to reason deeply rather than simply mimic patterns.<\/p>\n\n\n\n<p><strong>Q: What\u2019s the difference between mathematical and common-sense reasoning? And why do models struggle with problems like &#8220;The city gate and the pole&#8221;?<\/strong><\/p>\n\n\n\n<p><strong>Zhang:<\/strong>&nbsp;Mathematical reasoning relies on structured knowledge, clear rules, and precise procedures, making it highly logical. In contrast, common sense reasoning draws on everyday human experience and intuition, requiring an understanding of physical contexts, language nuances, and practical scenarios.<\/p>\n\n\n\n<p>To solve the &#8220;The city gate and the pole&#8221; problem, one needs to understand how objects behave in space\u2014not just an ability to perform computation. LLMs trained primarily on text data lack an internal model of the physical world. They don&#8217;t truly &#8220;know&#8221; what a pole or a city looks like, nor do they have spatial awareness. The problem exists in 2D, along the x- and y-axes, but the solution involves using the z-axis. However, the LLM doesn\u2019t consider this, so a 2D solution is given. If the LLM had spatial awareness and understood 3D, it would have recommended moving the pole along the z-axis.<\/p>\n\n\n\n<p>However, LLMs interpret only the surface meaning of words, missing any needed conceptual or spatial reasoning. Addressing this limitation is a major challenge.<\/p>\n\n\n\n<div class=\"wp-block-group wp-block-quote is-style-spectrum is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<p style=\"font-size:1rem;font-style:italic;\">Did you know? The &#8220;The city gate and the pole&#8221; fable originated in China. In the state of Lu, a man attempted to carry a long pole through a city gate. First, he held it upright, but it was too tall to pass through. Then, he turned it horizontally, but it was still too long to fit. While standing there puzzled, an old man approached and said, \u201cI am not a sage, but I\u2019ve seen many things. Why not saw the pole in half and bring it in that way?\u201d The man followed the advice\u2014and it worked. But now the pole is sawed in half.<\/p>\n<\/div>\n<\/div>\n\n\n\n<p><strong>Q: With so many kinds of math problems, can AI develop a \u201cuniversal brain\u201d for math? What is the core of mathematical reasoning in LLMs?<\/strong><\/p>\n\n\n\n<p><strong>Zhang:<\/strong>&nbsp;LLMs can already handle many math problems at the high school and even college level. At these stages, knowledge is relatively structured and the types of problems are predictable. With enough relevant data, models can identify patterns and apply appropriate rules.<\/p>\n\n\n\n<p>However, cutting-edge mathematical research presents a much greater challenge. Consider G\u00f6del\u2019s incompleteness theorem, which shows that within any axiomatic system, there are true statements that can\u2019t be proven by that system. LLMs operate within fixed rule sets and are limited when they encounter such propositions.<\/p>\n\n\n\n<p>What distinguishes human intelligence is its ability to transcend existing systems and invent new ones. Einstein&#8217;s theory of relativity emerged by breaking free from the boundaries of classical mechanics. Similarly, for AI to contribute to the frontiers of mathematics, it must evolve beyond rigid systems and construct new axiomatic frameworks\u2014essentially inventing the math needed to solve previously unsolvable problems.<\/p>\n\n\n\n<p><strong>Q: How do you see the future of AI in mathematical reasoning, and what practical value could it create?<\/strong><\/p>\n\n\n\n<p><strong>Zhang:<\/strong>&nbsp;Just as people use calculators and textbooks when solving math problems, future LLMs will also need the ability to use external tools. This tool-using ability will be vital not just in math but also in programming and decision-making.<\/p>\n\n\n\n<p>The most immediate use for improved mathematical reasoning lies in education. AI models with strong reasoning skills could support personalized learning, explain concepts clearly, and help students build a deeper understanding of math.<\/p>\n\n\n\n<p>In industry, formalized mathematical reasoning could significantly strengthen software development, particularly when it comes to code reliability and stability. This aligns with a growing research trend toward code formalization and verification.<\/p>\n\n\n\n<p>In mathematical research, we don\u2019t expect AI to replace mathematicians. Instead, it could serve as a creative partner\u2014offering new ideas or unconventional approaches that inspire breakthroughs. Several mathematicians are already exploring this kind of collaboration, using AI\u2019s divergent &#8220;thinking&#8221; to tackle unsolved problems. This human-machine synergy could reshape the future of scientific discovery.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Math is more than a school subject\u2014it&#8217;s the engine behind scientific discovery, driving advances in everything from climate modeling to AI. At Microsoft Research Asia, senior researcher Xian Zhang&nbsp;is leading efforts to help AI move beyond surface-level pattern recognition toward deeper, rules-based reasoning. In a recent interview, he explained how this shift could significantly expand [&hellip;]<\/p>\n","protected":false},"author":34512,"featured_media":1134210,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":199560,"msr_hide_image_in_river":null,"footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-1139134","msr-blog-post","type-msr-blog-post","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_assoc_parent":{"id":199560,"type":"lab"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1139134","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/34512"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1139134\/revisions"}],"predecessor-version":[{"id":1139146,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1139134\/revisions\/1139146"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1134210"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1139134"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1139134"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1139134"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1139134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}