{"id":1082997,"date":"2024-09-05T13:21:41","date_gmt":"2024-09-05T20:21:41","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=1082997"},"modified":"2024-10-31T09:24:02","modified_gmt":"2024-10-31T16:24:02","slug":"silvanforge-a-schedule-guided-retargetable-compiler-for-decision-tree-inference","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/silvanforge-a-schedule-guided-retargetable-compiler-for-decision-tree-inference\/","title":{"rendered":"SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference"},"content":{"rendered":"<p>The proliferation of machine learning together with the rapid evolution of the hardware ecosystem has led to a surge in the demand for model inference on a variety of hardware. Decision tree based models are the most popular models on tabular data. This paper is motivated by the problems encountered when targeting inference of these models to run at peak performance on CPU and GPU targets. Existing solutions are neither portable nor achieve the best possible performance for the specific hardware they target.<\/p>\n<p>This paper describes SilvanForge, a <em>schedule-guided<\/em>,\u00a0<em>retargetable<\/em>\u00a0compiler for decision tree based models that searches over several optimization choices and automatically generates high-performance inference routines for CPUs and GPUs. SilvanForge has two core components. The first is a scheduling language that encapsulates the optimization space, and techniques to efficiently explore this space. The second is an optimizing retargetable compiler that can generate code for any specified schedule. SilvanForge&#8217;s ability to use different data layouts, loop structures and caching strategies enables it to achieve portable performance across a range of targets.<\/p>\n<p>SilvanForge generated code is an order of magnitude faster than XGBoost and about 2-5x faster on average than RAPIDS FIL and Tahoe over several batch sizes. While these systems only target NVIDIA GPUs, SilvanForge achieves competent performance on AMD GPUs as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The proliferation of machine learning together with the rapid evolution of the hardware ecosystem has led to a surge in the demand for model inference on a variety of hardware. Decision tree based models are the most popular models on tabular data. This paper is motivated by the problems encountered when targeting inference of these [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2024-11-15","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":false,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13563,13560,13547],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[269142],"msr-field-of-study":[],"msr-conference":[266136],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1082997","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-research-area-programming-languages-software-engineering","msr-research-area-systems-and-networking","msr-locale-en_us","msr-post-option-include-in-river"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-11-15","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":0,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/sosp24-final168.pdf","id":"1098567","title":"sosp24-final168","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/sigops.org\/s\/conferences\/sosp\/2024\/accepted.html","label_id":"243118","label":0}],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":1098567,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/10\/sosp24-final168.pdf"}],"msr-author-ordering":[{"type":"text","value":"Ashwin Prasad","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Sampath Rajendra","user_id":43107,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sampath Rajendra"},{"type":"user_nicename","value":"Kaushik Rajan","user_id":32574,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kaushik Rajan"},{"type":"text","value":"R Govindarajan","user_id":0,"rest_url":false},{"type":"text","value":"Uday Bondhugula","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199562,199565],"msr_event":[1073397],"msr_group":[144939,957177],"msr_project":[967329],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":967329,"post_title":"Domain Specialization","post_name":"domain-specialization","post_type":"msr-project","post_date":"2023-10-16 02:14:29","post_modified":"2024-01-12 08:47:20","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/domain-specialization\/","post_excerpt":"Scaling performance beyond Moore's law Domain&nbsp;specialization&nbsp;is&nbsp;expected&nbsp;to&nbsp;play&nbsp;a&nbsp;big&nbsp;role&nbsp;in&nbsp;how&nbsp;computer&nbsp;systems&nbsp;evolve&nbsp;in&nbsp;future. With&nbsp;the&nbsp;end&nbsp;of&nbsp;Moore's&nbsp;law,&nbsp;we&nbsp;are&nbsp;already&nbsp;seeing&nbsp;CPU,&nbsp;GPU&nbsp;and domain specific&nbsp;hardware&nbsp;evolving&nbsp;rapidly. The next decade&nbsp;is&nbsp;therefore&nbsp;expected&nbsp;to&nbsp;see&nbsp;big&nbsp;changes&nbsp;in&nbsp;how&nbsp;we&nbsp;develop,&nbsp;compile&nbsp;and&nbsp;run&nbsp;software. This project focuses on data systems, a class of systems where, as the data sizes grow, performance scaling is going to be of importance.First,&nbsp;we&nbsp;believe&nbsp;that domain-specific&nbsp;compilers&nbsp;will&nbsp;play&nbsp;a&nbsp;crucial&nbsp;strategic&nbsp;role&nbsp;in&nbsp;helping&nbsp;software&nbsp;leverage&nbsp;the&nbsp;changing&nbsp;hardware&nbsp;landscape. Such compilers will be multi-layered and will progressively lower computation through multiple intermediate abstractions, performing domain specific optimizations at the higher layers and specializing code to the hardware in lower layers. We&nbsp;have&nbsp;been&nbsp;working&nbsp;on&nbsp;two&nbsp;such&nbsp;domain&nbsp;specific&nbsp;compilers&nbsp;in&nbsp;the&nbsp;data&nbsp;domain. Second, new hardware specific algorithms need&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967329"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1082997","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1082997\/revisions"}],"predecessor-version":[{"id":1083006,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1082997\/revisions\/1083006"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1082997"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1082997"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1082997"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1082997"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1082997"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1082997"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1082997"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1082997"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1082997"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1082997"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1082997"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1082997"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1082997"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}