{"id":720649,"date":"2020-05-27T09:00:40","date_gmt":"2020-05-27T16:00:40","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=720649"},"modified":"2021-01-25T13:19:01","modified_gmt":"2021-01-25T21:19:01","slug":"announcing-hummingbird-a-library-for-accelerating-inference-with-traditional-machine-learning-models","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/announcing-hummingbird-a-library-for-accelerating-inference-with-traditional-machine-learning-models\/","title":{"rendered":"Announcing: Hummingbird A library for accelerating inference with traditional machine learning models"},"content":{"rendered":"<p>Traditional machine learning (ML), such as linear regressions and decision trees, is extremely popular. As shown in the chart below of the Kaggle Survey from 2019, the most popular ML algorithms are still traditional (shown in green).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-685056 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_kaggle_2102x10-1024x495.png\" alt=\"Azure Data - Hummingbird - Kaggle Survey from 2019\" width=\"1024\" height=\"495\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_kaggle_2102x10-1024x495.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_kaggle_2102x10-300x145.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_kaggle_2102x10-768x371.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_kaggle_2102x10-1536x742.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_kaggle_2102x10-2048x990.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p>Recently, the ever-increasing interest around deep learning and neural networks has led to a vast increase in processing frameworks that are highly specialized and optimized for running these types of computations. Frameworks like TensorFlow, PyTorch, and ONNX Runtime are built around the idea of a computational graph that models the dataflow of individual units and have tensors as their basic computational unit. These frameworks can run efficiently on hardware accelerators (e.g. GPUs) and their prediction performance can be further optimized with compiler frameworks such as TVM.<\/p>\n<p>Unfortunately, traditional ML libraries and toolkits (such as Scikit-Learn, ML.NET, and H2O) are usually developed to run on CPU environments. While they may potentially exploit multi-core parallelism to improve performance, they do not use a common abstraction (such as tensors) to represent their computation. The lack of this common extraction means that for these frameworks to make use of hardware acceleration, one would need to have many implementations ((for each operator) x (for each hardware backend)) which does not scale well. This means that traditional ML is often missing out on the potential accelerations that deep learning and neural networks enjoy.<\/p>\n<h2>Announcing: Hummingbird<\/h2>\n<p>We are announcing Hummingbird, a library for accelerating inference (scoring\/prediction) in traditional machine learning models. Internally, Hummingbird compiles traditional ML pipelines into tensor computations to take advantage of the optimizations that are being implemented for neural network systems. This allows users to seamlessly leverage hardware acceleration without having to re-engineer their models.<\/p>\n<p>This first open-source release of Hummingbird currently supports converting the following trees to PyTorch:<\/p>\n<ul>\n<li>scikit-learn: DecisionTreeClassifier, RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier, and ExtraTreesClassifier<\/li>\n<li>XGBoost: XGBClassifier and XGBRegressor<\/li>\n<li>LightGBM: LGBMClassifier and LGBMRegressor<\/li>\n<\/ul>\n<p>You can see a complete list of our support operators here. We are experimenting with many frameworks and backends, and we will continue to release additional operators and features in the upcoming weeks.<\/p>\n<h3>The code<\/h3>\n<p>Here\u2019s an example of a RandomForestClassifier in scikit-learn<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">from sklearn.ensemble import RandomForestClassifier\r\nfrom sklearn.datasets import load_breast_cancer\r\n\r\n# Create and train a RandomForestClassifier model\r\nX, y = load_breast_cancer(return_X_y=True)\r\nX = X.astype('|f4')\r\nskl_model = RandomForestClassifier(n_estimators=500, max_depth=7)\r\nskl_model.fit(X, y)\r\n# Execute prediction using scikit-learn model\r\npred = skl_model.predict(X)<\/pre>\n<p>To enable Hummingbird and execute the scikit-learn model on PyTorch, users only need to add:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">from hummingbird.ml import convert<\/pre>\n<p>And change the prediction code as follows:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\"># Use Hummingbird to convert the model to PyTorch\r\nmodel = convert(skl_model, 'pytorch')\r\n\r\n# Execute prediction on CPU using PyTorch\r\npred_cpu_hb = model.predict(X)<\/pre>\n<p>The translated model can then be seamlessly executed on GPU as well:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">model.to('cuda')\r\npred_gpu_hb = model.predict(X)<\/pre>\n<p>From here, you can experiment with different parameters, see speedups between CPU and GPU, and compare against your initial model. Also, check out some of our sample notebooks that provide additional examples and benchmarking functionality. You can see the documentation here.<\/p>\n<h3>The details<\/h3>\n<p>Hummingbird works by reconfiguring algorithmic operators such that we can perform more regular computations which are amenable to vectorized and GPU execution. Each operator is slightly different, and we incorporate multiple strategies. This example explains one of Hummingbird\u2019s strategies for translating a decision tree into tensors involving GEMM (GEneric Matrix Multiplication), where we implement the traversal of the tree using matrix multiplications. (GEMM is one of the three tree conversion strategies we currently support.)<\/p>\n<p>Below, we have a simple decision tree:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-685059 alignnone\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_leftgraph_458x-300x250.png\" alt=\"Azure Data - Hummingbird - simple decision tree\" width=\"300\" height=\"250\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_leftgraph_458x-300x250.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_leftgraph_458x.png 458w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>In this example, the tree takes as input a feature vector with six elements (x\u2208R6), four decision nodes (orange), and five leaf nodes (blue). We translate the decision tree into neural networks with two additional layers.<\/p>\n<p>And now, the transformed tree:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-685062 alignnone\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_rightgraph_464-280x300.png\" alt=\"Azure Data - Hummingbird - transformed decision tree\" width=\"280\" height=\"300\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_rightgraph_464-280x300.png 280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_rightgraph_464.png 464w\" sizes=\"auto, (max-width: 280px) 100vw, 280px\" \/><\/p>\n<ul>\n<li>The first step takes all the features (x1 \u2013 x6) and evaluates all the conditions (nodes) of the tree together in one single one matrix multiplication.<\/li>\n<li>For the second step, we put all the leaf nodes (\u21131-\u21135) together and evaluate all of them together using matrix multiplication.<\/li>\n<\/ul>\n<p>Although this leads to redundant computation from checking all conditions (not just the ones we know to be true), this is the key that allows us to do the vectorized computation. To offset this additional computation, we batch tensor operations and minimize the number of kernel invocations in addition to built-in tensor runtime optimizations.<\/p>\n<h3>Performance<\/h3>\n<p>We ran the example above of RandomForestClassifier on a NVidia P100 GPU-enabled VM. You can see the notebook here for the full example, which includes imports and test data setup.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-685065\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2-300x300.png\" alt=\"Azure Data - Hummingbird - notebook code\" width=\"600\" height=\"601\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2-300x300.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2-1022x1024.png 1022w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2-150x150.png 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2-768x769.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2-180x180.png 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2-360x360.png 360w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummmingbird_notebook2.png 1124w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p>For RandomForestClassifier with these parameters, Hummingbird provides a ~5x speedup on CPU, and ~50x speedup on GPU.<\/p>\n<p>The table below shows some additional performance data for RandomForestClassifier, LGBMClassifier, and XGBClassifier. We tested Hummingbird on several of the datasets in NVidia\u2019s GDM-bench with an average speed-up of 65x from scikit-learn to PyTorch. The chart reports the average of 5 runs for a batch size of 10K predictions, run on a NVidia P100 VM with 6 CPU cores.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-685068 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_batchexp_765x3.png\" alt=\"Azure Data - Hummingbird - batch experiment chart\" width=\"765\" height=\"325\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_batchexp_765x3.png 765w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/20200504_blog_hummingbird_batchexp_765x3-300x127.png 300w\" sizes=\"auto, (max-width: 765px) 100vw, 765px\" \/><\/p>\n<p>Our tech report provides additional details, where we have a full performance breakdown including per-operator results with varied batch sizes and on a variety of devices. Hummingbird is competitive and even outperforms (by up to 3x) hand-crafted kernels on micro-benchmarks, while enabling seamless end-to-end acceleration (with a speedup of up to 1200\u00d7) of ML pipelines.<\/p>\n<h3>Next steps<\/h3>\n<p>In the upcoming months, we look forward to adding many additional operators, input formats, and backend support, as we outline in our roadmap. We will soon release our linear and logistic regressors. We are investigating how to best integrate HB with existing platforms and are currently integrating Hummingbird with ONNX and its converters. We welcome contributions and collaborators.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We are announcing Hummingbird, a library for accelerating inference (scoring\/prediction) in traditional machine learning models. Internally, Hummingbird compiles traditional ML pipelines into tensor computations to take advantage of the optimizations that are being implemented for neural network systems.<\/p>\n","protected":false},"author":38004,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":684024,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-720649","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":684024,"type":"group"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/720649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38004"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/720649\/revisions"}],"predecessor-version":[{"id":720661,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/720649\/revisions\/720661"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=720649"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=720649"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=720649"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=720649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}