{"id":1149051,"date":"2025-09-10T09:00:00","date_gmt":"2025-09-10T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1149051"},"modified":"2025-09-08T13:16:21","modified_gmt":"2025-09-08T20:16:21","slug":"renderformer-how-neural-networks-are-reshaping-3d-rendering","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/renderformer-how-neural-networks-are-reshaping-3d-rendering\/","title":{"rendered":"RenderFormer: How neural networks are reshaping 3D rendering"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1.jpg\" alt=\"Three white icons on a gradient background transitioning from blue to green. From left to right: network node icon, lightbulb-shaped icon with a path tool icon in the center; a monitor icon showing a web browser icon\" class=\"wp-image-1149127\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>3D rendering\u2014the process of converting three-dimensional models into two-dimensional images\u2014is a foundational technology in computer graphics, widely used across gaming, film, virtual reality, and architectural visualization. Traditionally, this process has depended on physics-based techniques like ray tracing and rasterization, which simulate light behavior through mathematical formulas and expert-designed models.<\/p>\n\n\n\n<p>Now, thanks to advances in AI, especially neural networks, researchers are beginning to replace these conventional approaches with machine learning (ML). This shift is giving rise to a new field known as neural rendering.<\/p>\n\n\n\n<p>Neural rendering combines deep learning with traditional graphics techniques, allowing models to simulate complex light transport without explicitly modeling physical optics. This approach offers significant advantages: it eliminates the need for handcrafted rules, supports end-to-end training, and can be optimized for specific tasks. Yet, most current neural rendering methods rely on 2D image inputs, lack support for raw 3D geometry and material data, and often require retraining for each new scene\u2014limiting their generalizability.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"renderformer-toward-a-general-purpose-neural-rendering-model\">RenderFormer: Toward a general-purpose neural rendering model<\/h2>\n\n\n\n<p>To overcome these limitations, researchers at Microsoft Research have developed RenderFormer, a new neural architecture designed to support full-featured 3D rendering using only ML\u2014no traditional graphics computation required. RenderFormer is the first model to demonstrate that a neural network can learn a complete graphics rendering pipeline, including support for arbitrary 3D scenes and global illumination, without relying on ray tracing or rasterization. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/renderformer-transformer-based-neural-rendering-of-triangle-meshes-with-global-illumination\/\">This work<\/a> has been accepted at SIGGRAPH 2025 and is <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/renderformer\" target=\"_blank\" rel=\"noopener noreferrer\">open-sourced on GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"architecture-overview\">Architecture overview<\/h2>\n\n\n\n<p>As shown in Figure 1, RenderFormer represents the entire 3D scene using triangle tokens\u2014each one encoding spatial position, surface normal, and physical material properties such as diffuse color, specular color, and roughness. Lighting is also modeled as triangle tokens, with emission values indicating intensity.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2419\" height=\"1008\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1.png\" alt=\"Figure 1: The figure illustrates the architecture of RenderFormer. It includes a Triangle Mesh Scene with a 3D rabbit model inside a colored cube, a Camera Ray Map grid, a View Independent Transformer (12 layers of Self-Attention and Feed Forward Network), a View Dependent Transformer (6 layers with Cross-Attention and Self-Attention), and a DPT Decoder. Scene attributes\u2014Vertex Normal, Reflectance (Diffuse, Specular, Roughness), Emission, and Position\u2014are embedded into Triangle Tokens via Linear + Norm operations. These tokens and Ray Bundle Tokens (from the Camera Ray Map) are processed by the respective transformers and decoded to produce a rendered image of a glossy rabbit in a colored room.\" class=\"wp-image-1149133\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1.png 2419w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1-300x125.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1-1024x427.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1-768x320.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1-1536x640.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1-2048x853.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig1-240x100.png 240w\" sizes=\"auto, (max-width: 2419px) 100vw, 2419px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Architecture of RenderFormer<\/figcaption><\/figure>\n\n\n\n<p>To describe the viewing direction, the model uses ray bundle tokens derived from a ray map\u2014each pixel in the output image corresponds to one of these rays. To improve computational efficiency, pixels are grouped into rectangular blocks, with all rays in a block processed together.<\/p>\n\n\n\n<p>The model outputs a set of tokens that are decoded into image pixels, completing the rendering process entirely within the neural network.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1144028\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">PODCAST SERIES<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/the-ai-revolution-in-medicine-revisited\/\" aria-label=\"The AI Revolution in Medicine, Revisited\" data-bi-cN=\"The AI Revolution in Medicine, Revisited\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/Episode7-PeterBillSebastien-AIRevolution_Hero_Feature_River_No_Text_1400x788.jpg\" alt=\"Illustrated headshot of Bill Gates, Peter Lee, and S\u00e9bastien Bubeck\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">The AI Revolution in Medicine, Revisited<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"the-ai-revolution-in-medicine-revisited\" class=\"large\">Join Microsoft\u2019s Peter Lee on a journey to discover how AI is impacting healthcare and what it means for the future of medicine.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/the-ai-revolution-in-medicine-revisited\/\" aria-describedby=\"the-ai-revolution-in-medicine-revisited\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"The AI Revolution in Medicine, Revisited\" target=\"_blank\">\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"dual-branch-design-for-view-independent-and-view-dependent-effects\">Dual-branch design for view-independent and view-dependent effects<\/h2>\n\n\n\n<p>The RenderFormer architecture is built around two transformers: one for view-independent features and another for view-dependent ones.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <strong>view-independent transformer<\/strong> captures scene information unrelated to viewpoint, such as shadowing and diffuse light transport, using self-attention between triangle tokens.<\/li>\n\n\n\n<li>The <strong>view-dependent transformer<\/strong> models effects like visibility, reflections, and specular highlights through cross-attention between triangle and ray bundle tokens.<\/li>\n<\/ul>\n\n\n\n<p>Additional image-space effects, such as anti-aliasing and screen-space reflections, are handled via self-attention among ray bundle tokens.<\/p>\n\n\n\n<p>To validate the architecture, the team conducted ablation studies and visual analyses, confirming the importance of each component in the rendering pipeline.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"963\" height=\"509\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_table-1.png\" alt=\"Table 1: A table comparing the performance of different network variants in an ablation study. The columns are labeled Variant, PSNR (\u2191), SSIM (\u2191), LPIPS (\u2193), and FLIP (\u2193). Variants include configurations such as \"full view-dependent stage,\" \"w\/o DPT,\" \"w\/o self-attention,\" and \"w\/o DPT & w\/o self-attention.\" Each variant is associated with numerical values for the four metrics, showing how removing or altering components affects performance. The full view-dependent stage achieves the highest PSNR and SSIM and lowest LPIPS and FLIP, indicating optimal performance. Additional rows explore configurations involving camera space and world space view-dependent stages with various token and layer setups.\" class=\"wp-image-1149129\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_table-1.png 963w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_table-1-300x159.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_table-1-768x406.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_table-1-240x127.png 240w\" sizes=\"auto, (max-width: 963px) 100vw, 963px\" \/><figcaption class=\"wp-element-caption\">Table 1. Ablation study analyzing the impact of different components and attention mechanisms on the final performance of the trained network. <\/figcaption><\/figure>\n\n\n\n<p>To test the capabilities of the view-independent transformer, researchers trained a decoder to produce diffuse-only renderings. The results, shown in Figure 2, demonstrate that the model can accurately simulate shadows and other indirect lighting effects.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"943\" height=\"240\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig2.png\" alt=\"Figure 2: The figure displays four 3D-rendered objects showcasing view-independent rendering effects. From left to right: a purple teapot on a green surface, a blue rectangular object on a red surface, an upside-down table casting shadows on a green surface, and a green apple-like object on a blue surface. Each object features diffuse lighting and coarse shadow effects, with distinct highlights and shadows produced by directional light sources.\" class=\"wp-image-1149132\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig2.png 943w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig2-300x76.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig2-768x195.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig2-240x61.png 240w\" sizes=\"auto, (max-width: 943px) 100vw, 943px\" \/><figcaption class=\"wp-element-caption\">Figure 2. View-independent rendering effects decoded directly from the view-independent transformer, including diffuse lighting and coarse shadow effects. <\/figcaption><\/figure>\n\n\n\n<p>The view-dependent transformer was evaluated through attention visualizations. For example, in Figure 3, the attention map reveals a pixel on a teapot attending to its surface triangle and to a nearby wall\u2014capturing the effect of specular reflection. These visualizations also show how material changes influence the sharpness and intensity of reflections.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"947\" height=\"620\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig3.png\" alt=\"Figure 3: The figure contains six panels arranged in two rows and three columns. The top row displays a teapot in a room with red and green walls under three different roughness values: 0.3, 0.7, and 0.99 (left to right). The bottom row shows the corresponding attention outputs for each roughness setting, featuring the teapot silhouette against a dark background with distinct light patterns that vary with roughness.\" class=\"wp-image-1149131\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig3.png 947w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig3-300x196.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig3-768x503.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig3-240x157.png 240w\" sizes=\"auto, (max-width: 947px) 100vw, 947px\" \/><figcaption class=\"wp-element-caption\">Figure 3. Visualization of attention outputs<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"training-methodology-and-dataset-design\">Training methodology and dataset design<\/h2>\n\n\n\n<p>RenderFormer was trained using the Objaverse dataset, a collection of more than 800,000 annotated 3D objects that is designed to advance research in 3D modeling, computer vision, and related fields. The researchers designed four scene templates, populating each with 1\u20133 randomly selected objects and materials. Scenes were rendered in high dynamic range (HDR) using Blender\u2019s Cycles renderer, under varied lighting conditions and camera angles.<\/p>\n\n\n\n<p>The base model, consisting of 205 million parameters, was trained in two phases using the AdamW optimizer:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>500,000 steps at 256\u00d7256 resolution with up to 1,536 triangles<\/li>\n\n\n\n<li>100,000 steps at 512\u00d7512 resolution with up to 4,096 triangles<\/li>\n<\/ul>\n\n\n\n<p>The model supports arbitrary triangle-based input and generalizes well to complex real-world scenes. As shown in Figure 4, it accurately reproduces shadows, diffuse shading, and specular highlights.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"805\" height=\"805\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig4.jpg\" alt=\"Figure 4: The figure presents a 3\u00d73 grid of diverse 3D scenes rendered by RenderFormer. In the top row, the first scene shows a room with red, green, and white walls containing two rectangular prisms; the second features a metallic tree-like structure in a blue-walled room with a reflective floor; and the third depicts a red animal figure, a black abstract shape, and a multi-faceted sphere in a purple container on a yellow surface. The middle row includes three constant width bodies (black, red, and blue) floating above a colorful checkered floor; a green shader ball with a square cavity inside a gray-walled room; and crystal-like structures in green, purple, and red on a reflective surface. The bottom row showcases a low-poly fox near a pink tree emitting particles on grassy terrain; a golden horse statue beside a heart-shaped object split into red and grey halves on a reflective surface; and a wicker basket, a banana and a bottle placed on a white platform.\" class=\"wp-image-1149130\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig4.jpg 805w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig4-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig4-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig4-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig4-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_fig4-360x360.jpg 360w\" sizes=\"auto, (max-width: 805px) 100vw, 805px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Rendered results of different 3D scenes generated by RenderFormer <\/figcaption><\/figure>\n\n\n\n<p>RenderFormer can also generate continuous video by rendering individual frames, thanks to its ability to model viewpoint changes and scene dynamics.<\/p>\n\n\n\n<figure class=\"wp-block-video aligncenter\"><video height=\"2160\" style=\"aspect-ratio: 3840 \/ 2160;\" width=\"3840\" controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer_animate.mp4\"><\/video><figcaption class=\"wp-element-caption\">3D animation sequence rendered by RenderFormer <\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"looking-ahead-opportunities-and-challenges\">Looking ahead: Opportunities and challenges<\/h2>\n\n\n\n<p>RenderFormer represents a significant step forward for neural rendering. It demonstrates that deep learning can replicate and potentially replace the traditional rendering pipeline, supporting arbitrary 3D inputs and realistic global illumination\u2014all without any hand-coded graphics computations.<\/p>\n\n\n\n<p>However, key challenges remain. Scaling to larger and more complex scenes with intricate geometry, advanced materials, and diverse lighting conditions will require further research. Still, the transformer-based architecture provides a solid foundation for future integration with broader AI systems, including video generation, image synthesis, robotics, and embodied AI.\u00a0<\/p>\n\n\n\n<p>Researchers hope that RenderFormer will serve as a building block for future breakthroughs in both graphics and AI, opening new possibilities for visual computing and intelligent environments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p> RenderFormer, from Microsoft Research, is the first model to show that a neural network can learn a complete graphics rendering pipeline. It\u2019s designed to support full-featured 3D rendering using only machine learning\u2014no traditional graphics computation required. <\/p>\n","protected":false},"author":43868,"featured_media":1149127,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Yue Dong","user_id":"35060"}],"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1149051","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144710],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Yue Dong","user_id":35060,"display_name":"Yue Dong","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yuedong\/\" aria-label=\"Visit the profile page for Yue Dong\">Yue Dong<\/a>","is_active":false,"last_first":"Dong, Yue","people_section":0,"alias":"yuedong"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"Three white icons on a gradient background transitioning from blue to green. From left to right: network node icon, lightbulb-shaped icon with a path tool icon in the center; a monitor icon showing a web browser icon\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/09\/RenderFormer-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yuedong\/\" title=\"Go to researcher profile for Yue Dong\" aria-label=\"Go to researcher profile for Yue Dong\" data-bi-type=\"byline author\" data-bi-cN=\"Yue Dong\">Yue Dong<\/a>","formattedDate":"September 10, 2025","formattedExcerpt":"RenderFormer, from Microsoft Research, is the first model to show that a neural network can learn a complete graphics rendering pipeline. It\u2019s designed to support full-featured 3D rendering using only machine learning\u2014no traditional graphics computation required.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1149051","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43868"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1149051"}],"version-history":[{"count":6,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1149051\/revisions"}],"predecessor-version":[{"id":1149336,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1149051\/revisions\/1149336"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1149127"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1149051"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1149051"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1149051"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1149051"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1149051"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1149051"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1149051"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1149051"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1149051"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1149051"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1149051"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}