{"id":739456,"date":"2021-06-07T11:38:11","date_gmt":"2021-06-07T18:38:11","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=739456"},"modified":"2021-06-07T15:16:59","modified_gmt":"2021-06-07T22:16:59","slug":"p3-distributed-deep-graph-learning-at-scale","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/p3-distributed-deep-graph-learning-at-scale\/","title":{"rendered":"P3: Distributed Deep Graph Learning at Scale"},"content":{"rendered":"<p><span dir=\"ltr\">Graph Neural Networks (GNNs) have gained significant atten<\/span><span dir=\"ltr\">tion in the recent past, and become one of the fastest growing <\/span><span dir=\"ltr\">subareas in deep learning. While several new GNN architec<\/span><span dir=\"ltr\">tures have been proposed, the scale of real-world graphs\u2014in <\/span><span dir=\"ltr\">many cases billions of nodes and edges\u2014poses challenges <\/span><span dir=\"ltr\">during model training. In this paper, we present<\/span><span dir=\"ltr\">P<\/span><span dir=\"ltr\">3<\/span><span dir=\"ltr\">, a sys<\/span><span dir=\"ltr\">tem that focuses on scaling GNN model training to large <\/span><span dir=\"ltr\">real-world graphs in a <\/span><span dir=\"ltr\">distributed <\/span><span dir=\"ltr\">setting. We observe that <\/span><span dir=\"ltr\">scalability challenges in training GNNs are fundamentally <\/span><span dir=\"ltr\">different from that in training classical deep neural networks <\/span><span dir=\"ltr\">and distributed graph processing; and that commonly used <\/span><span dir=\"ltr\">techniques, such as intelligent partitioning of the graph do not <\/span><span dir=\"ltr\">yield desired results. Based on this observation, <\/span><span dir=\"ltr\">P<\/span><span dir=\"ltr\">3 <\/span><span dir=\"ltr\">proposes <\/span><span dir=\"ltr\">a new approach for distributed GNN training. Our approach <\/span><span dir=\"ltr\">effectively eliminates high communication and partitioning <\/span><span dir=\"ltr\">overheads, and couples it with a new <\/span><span dir=\"ltr\">pipelined push-pull <\/span><span dir=\"ltr\">par<\/span><span dir=\"ltr\">allelism based execution strategy for fast model training. <\/span><span dir=\"ltr\">P<\/span><span dir=\"ltr\">3 <\/span><span dir=\"ltr\">exposes a simple API that captures many different classes <\/span><span dir=\"ltr\">of GNN architectures for generality. When further combined <\/span><span dir=\"ltr\">with a simple caching strategy, our evaluation shows that <\/span><span dir=\"ltr\">P<\/span><span dir=\"ltr\">3 <\/span><span dir=\"ltr\">is <\/span><span dir=\"ltr\">able to outperform existing state-of-the-art distributed GNN <\/span><span dir=\"ltr\">frameworks by up to 7<\/span><span dir=\"ltr\">\u00d7.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. While several new GNN architectures have been proposed, the scale of real-world graphs\u2014in many cases billions of nodes and edges\u2014poses challenges during model training. In this paper, we presentP3, a system that [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"USENIX","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Symposium on Operating Systems Design and Implementation (OSDI)","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2021-7-14","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/www.usenix.org\/conference\/osdi21","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-739456","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-7-14","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"USENIX","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.usenix.org\/conference\/osdi21\/presentation\/gandhi","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"guest","value":"swapnil-gandhi","user_id":684123,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=swapnil-gandhi"},{"type":"user_nicename","value":"Anand Iyer","user_id":38907,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Anand Iyer"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/739456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/739456\/revisions"}],"predecessor-version":[{"id":751798,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/739456\/revisions\/751798"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=739456"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=739456"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=739456"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=739456"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=739456"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=739456"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=739456"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=739456"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=739456"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=739456"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=739456"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=739456"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=739456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}