{"id":501995,"date":"2018-08-21T09:57:40","date_gmt":"2018-08-21T16:57:40","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=501995"},"modified":"2019-07-07T23:19:02","modified_gmt":"2019-07-08T06:19:02","slug":"dowhy-a-library-for-causal-inference","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/dowhy-a-library-for-causal-inference\/","title":{"rendered":"DoWhy \u2013 A library for causal inference"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-502040\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-1024x576.png\" alt=\"\" width=\"1024\" height=\"576\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p>For decades, causal inference methods have found wide applicability in the social and biomedical sciences. As computing systems start intervening in our work and daily lives, questions of cause-and-effect are gaining importance in computer science as well. To enable widespread use of causal inference, we are pleased to announce a new software library, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/Microsoft\/dowhy\">DoWhy<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Its name is inspired by Judea Pearl\u2019s do-calculus for causal inference. In addition to providing a programmatic interface for popular causal inference methods, DoWhy is designed to highlight the critical but often neglected assumptions underlying causal inference analyses. DoWhy does this by first making the underlying assumptions explicit, for example, by explicitly representing identified estimands. And secondly by making sensitivity analysis and other robustness checks a first-class element of the causal inference process. Our goal is to enable people to focus their efforts on identifying assumptions for causal inference, rather than on details of estimation.<\/p>\n<p>Our motivation for creating DoWhy comes from our experiences in causal inference studies over the past few years, ranging from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1611.09414\">estimating the impact of a recommender system<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/distilling-outcomes-personal-experiences-propensity-scored-analysis-social-media\/\">predicting likely outcomes given a life event<\/a>. In each of these studies, we found ourselves repeating the common steps of finding the right identification strategy, devising the most suitable estimator, and conducting robustness checks, all from scratch. While we were impressed\u2014sometimes intimidated\u2014by the amount of knowledge in causal inference literature, we found that doing any empirical causal inference remained a challenging task. Ensuring we understood our assumptions and validated them appropriately was particularly daunting. More generally, we see that a \u201croll your own\u201d approach to causal inference has resulted in studies with varying (sometimes minimal) approaches to testing of key assumptions.<\/p>\n<p>We therefore asked ourselves, what if there existed a software library that provides a simple interface to common causal inference methods that codified best practices for reasoning about and validating key assumptions? Unfortunately, the challenge is that causal inference depends on estimation of unobserved quantities\u2014also known as the \u201cfundamental problem\u201d of causal inference. Unlike in supervised learning, such <em>counterfactual<\/em> quantities imply that we cannot have a purely objective evaluation through a held-out test set, thus precluding a plug-in approach to causal inference. For instance, for any intervention\u2014such as a new algorithm or a medical procedure\u2014one can either observe what happens when people are given the intervention, or when they are not. But never both. Therefore, causal analysis hinges critically on assumptions about the data-generating process.<\/p>\n<p>To succeed, it became clear to us that the assumptions need to be first-class citizens in a causal inference library. We designed DoWhy using two guiding principles\u2014making causal assumptions explicit and testing robustness of the estimates to violations of those assumptions. First, DoWhy makes a distinction between identification and estimation. Identification of a causal effect involves making assumptions about the data-generating process and going from the counterfactual expressions to specifying a target estimand, while estimation is a purely statistical problem of estimating the target estimand from data. Thus, identification is where the library spends most of its time, just like we commonly do in our projects. To represent assumptions formally, DoWhy uses the Bayesian graphical model framework where users can specify what they know, and more importantly, what they don\u2019t know, about the data-generating process. For estimation, we provide methods based on the potential-outcomes framework such as matching, stratification and instrumental variables. A happy side-effect of using DoWhy is that you will realize the equivalence and interoperability of the seemingly disjoint graphical model and potential outcome frameworks.<\/p>\n<div id=\"attachment_502001\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-502001\" class=\"wp-image-502001 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure1_DoWhy.-Separating-indentification-and-estimation-of-casual-effect-1024x417.png\" alt=\"\" width=\"1024\" height=\"417\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure1_DoWhy.-Separating-indentification-and-estimation-of-casual-effect-1024x417.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure1_DoWhy.-Separating-indentification-and-estimation-of-casual-effect-300x122.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure1_DoWhy.-Separating-indentification-and-estimation-of-casual-effect-768x313.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure1_DoWhy.-Separating-indentification-and-estimation-of-casual-effect.png 1073w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-502001\" class=\"wp-caption-text\">Figure 1 \u2013 DoWhy. Separating identification and estimation of causal effect.<\/p><\/div>\n<p>Second, once assumptions are made, DoWhy provides robustness tests and sensitivity checks to test reliability of an obtained estimate. You can test how the estimate changes as underlying assumptions are varied, for example, by introducing a new confounder or by replacing the intervention with a placebo. Wherever possible, the library also automatically checks validity of obtained estimate based on assumptions in the graphical model. Still, we also understand that automated testing cannot be perfect. DoWhy therefore stresses interpretability of its output; at any point in the analysis, you can inspect the untested assumptions, identified estimands (if any) and the estimate (if any).<\/p>\n<div id=\"attachment_502004\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-502004\" class=\"wp-image-502004 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure2_causal-inference-in-four-lines.-a-sample-run-of-dowhy.-1024x481.png\" alt=\"\" width=\"1024\" height=\"481\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure2_causal-inference-in-four-lines.-a-sample-run-of-dowhy.-1024x481.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure2_causal-inference-in-four-lines.-a-sample-run-of-dowhy.-300x141.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure2_causal-inference-in-four-lines.-a-sample-run-of-dowhy.-768x360.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/Figure2_causal-inference-in-four-lines.-a-sample-run-of-dowhy..png 1057w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-502004\" class=\"wp-caption-text\">Figure 2 \u2013 Causal inference in four lines. A sample run of DoWhy.<\/p><\/div>\n<p>In the future, we look forward to adding more features to the library, including support for more estimation and sensitivity methods and interoperability with available estimation software. We welcome your feedback and contributions as we develop the library. You can check out the DoWhy Python library on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/Microsoft\/dowhy\">Github<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. We include a couple of examples to get you started through Jupyter notebooks <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/causalinference.gitlab.io\/dowhy\/\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. If you are interested in learning more about causal inference, do check our tutorial on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/causalinference.gitlab.io\/kdd-tutorial\/\">causal inference and counterfactual reasoning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, presented at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.kdd.org\/kdd2018\/\">KDD 2018<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> on Sunday, August 19th.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For decades, causal inference methods have found wide applicability in the social and biomedical sciences. As computing systems start intervening in our work and daily lives, questions of cause-and-effect are gaining importance in computer science as well. To enable widespread use of causal inference, we are pleased to announce a new software library, DoWhy. Its [&hellip;]<\/p>\n","protected":false},"author":37074,"featured_media":502040,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Amit Sharma","user_id":"30997"},{"type":"user_nicename","value":"Emre Kiciman","user_id":"31739"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194453,194474,194464],"tags":[],"research-area":[13563,13559,13547,13568],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-501995","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-data-visulalization","category-technology-for-emerging-markets","msr-research-area-data-platform-analytics","msr-research-area-social-sciences","msr-research-area-systems-and-networking","msr-research-area-technology-for-emerging-markets","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199562],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144903,470706,685431],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Amit Sharma","user_id":30997,"display_name":"Amit Sharma","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/amshar\/\" aria-label=\"Visit the profile page for Amit Sharma\">Amit Sharma<\/a>","is_active":false,"last_first":"Sharma, Amit","people_section":0,"alias":"amshar"},{"type":"user_nicename","value":"Emre Kiciman","user_id":31739,"display_name":"Emre Kiciman","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/emrek\/\" aria-label=\"Visit the profile page for Emre Kiciman\">Emre Kiciman<\/a>","is_active":false,"last_first":"Kiciman, Emre","people_section":0,"alias":"emrek"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/08\/SCS-MS-Research_20180816_1400x788_DoWhy_T2-5b7b4771617f2-343x193.png 343w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/amshar\/\" title=\"Go to researcher profile for Amit Sharma\" aria-label=\"Go to researcher profile for Amit Sharma\" data-bi-type=\"byline author\" data-bi-cN=\"Amit Sharma\">Amit Sharma<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/emrek\/\" title=\"Go to researcher profile for Emre Kiciman\" aria-label=\"Go to researcher profile for Emre Kiciman\" data-bi-type=\"byline author\" data-bi-cN=\"Emre Kiciman\">Emre Kiciman<\/a>","formattedDate":"August 21, 2018","formattedExcerpt":"For decades, causal inference methods have found wide applicability in the social and biomedical sciences. As computing systems start intervening in our work and daily lives, questions of cause-and-effect are gaining importance in computer science as well. To enable widespread use of causal inference, we&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/501995","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37074"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=501995"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/501995\/revisions"}],"predecessor-version":[{"id":502079,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/501995\/revisions\/502079"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/502040"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=501995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=501995"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=501995"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=501995"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=501995"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=501995"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=501995"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=501995"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=501995"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=501995"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=501995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}