{"id":25954,"date":"2019-01-09T09:00:03","date_gmt":"2019-01-09T17:00:03","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/?p=25954"},"modified":"2021-09-17T14:12:45","modified_gmt":"2021-09-17T21:12:45","slug":"how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/","title":{"rendered":"How to automate machine learning on SQL Server 2019 big data clusters"},"content":{"rendered":"<p>In this post, we will explore how to use automated machine learning (AutoML) to create new machine learning models over your data in SQL Server 2019 big data clusters.<\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/sql-server-2019\">SQL Server 2019<\/a> <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/big-data-cluster\">big data clusters<\/a> make it possible to use the software of your choice to fit machine learning models on big data and use those models to perform scoring. In fact, <a href=\"https:\/\/spark.apache.org\/\">Apache Spark<\/a><sup>TM<\/sup>, the popular open source big data framework, is now built in! Apache Spark<sup>TM<\/sup> includes the MLlib Machine Learning Library, and the open source community has developed a wealth of additional packages that integrate with and extend Apache Spark<sup>TM<\/sup> and MLlib.<\/p>\n<h2>Automated machine learning<\/h2>\n<p>Manually selecting and tuning machine learning models requires familiarity with a variety of model types and can be laborious and time-consuming. Software for automating this process has recently become available, relieving both novice and expert Data Scientists and ML Engineers of much of the burden that comes with manual model selection and tuning.<\/p>\n<h3>H2O\u2019s open source AutoML APIs<\/h3>\n<p>H2O provides popular open source software for data science and machine learning on big data, including Apache Spark<sup>TM<\/sup> integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, however, the latter follows the conventions of Apache Spark\u2019s <a href=\"http:\/\/spark.apache.org\/docs\/latest\/ml-guide.html\">MLlib library<\/a> and allows you to build machine learning pipelines that include MLlib transformers. We will focus on the latter API in this post.<\/p>\n<p>H2OAutoML supports classification and regression. The ML models built and tuned by H2OAutoML include Random Forests, Gradient Boosting Machines, Deep Neural Nets, Generalized Linear Models, and Stacked Ensembles.<\/p>\n<p>H2OAutoML can automatically split training data into training, validation, and leaderboard frames. The h2o.automl.H2OAutoML API also allows these frames to be specified manually, which is useful when the task is to predict the future using a model trained on historical data.<\/p>\n<p>Models produced by H2OAutoML can be persisted to disk, used for prediction\/scoring in an Apache Spark<sup>TM<\/sup> cluster, used in local mode in\u00a0Apache Spark<sup>TM<\/sup> running on a single node, or used in a Java Virtual Machine (JVM) with the necessary libraries on the CLASSPATH. These options will allow batch and real-time scoring in a SQL Server 2019 big data cluster within Apache Spark<sup>TM<\/sup>, within a Transact-SQL stored procedure, or deployed as an application.<\/p>\n<h3>Running PySpark3 notebooks in Azure Data Studio<\/h3>\n<p>The code discussed in this blog is available as a Jupyter <a href=\"https:\/\/aka.ms\/powerplant-automl-nb\">notebook<\/a>\u00a0written for the PySpark3 kernel. You can now run Apache Spark<sup>TM<\/sup> notebooks in Azure Data Studio connected to a SQL Server 2019 big data cluster as described in this\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/big-data-cluster\/notebooks-guidance?view=sqlallproducts-allversions\">notebook<\/a> how-to.<\/p>\n<h2>Power plant output prediction<\/h2>\n<p>Let\u2019s take a tour through our example Jupyter <a href=\"https:\/\/aka.ms\/powerplant-automl-nb\">notebook<\/a>\u00a0that shows how a customer running a power plant would take advantage of H20 and AutoML in Apache Spark<sup>TM<\/sup> to predict power plant output. This example is based on an H20 <a href=\"https:\/\/www.h2o.ai\/blog\/h2os-automl-in-spark\/\">blog post<\/a>.<\/p>\n<p>The first cells of the notebook set\u00a0Apache Spark<sup class=\"\">TM<\/sup> parameters and install the H2O PySparkling package if it\u2019s not already installed; this package provides the pysparkling.ml.H2OAutoML class.<\/p>\n<p>Next, the notebook code downloads the CSV file containing the data and copies the file to HDFS, if it\u2019s not already present.<\/p>\n<p>Running H2OContext.getOrCreate starts the H20 engine.<\/p>\n<p>Next, the notebook uses Apache Spark<sup>TM\u00a0<\/sup>to read the data from HDFS and randomly split it into training and prediction\/test sets.<\/p>\n<p>The following screenshot shows how easy it is to invoke automated machine learning:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-25960\" src=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image-1024x333.png\" alt=\"\" width=\"1024\" height=\"333\" \/><\/p>\n<p>Here, you are defining a modeling pipeline, fitting it on the training data, and using it to generate predictions on the test data. In our example, we set maxModels=2, which results in two tree-based models and two (identical) stacked ensemble models. This is sufficient for demonstration purposes, but in practice, you should allow H2OAutoML to explore more models to achieve the best possible prediction metrics. If you simply omit the maxModels argument, then H2OAutoML will explore models for a maximum of maxRuntimeSecs, which defaults to 3600 seconds (1 hour).<\/p>\n<p>Our code follows the standard pattern for using the Apache Spark<sup>TM<\/sup> MLlib library because the pysparkling.ml.H2OAutoML class inherits from pyspark.ml.wrapper.JavaEstimator.<\/p>\n<p>Notice that we included an Apache Spark<sup>TM<\/sup> SQLTransformer in our pipeline, showing that a standard Spark MLlib <a href=\"http:\/\/spark.apache.org\/docs\/latest\/ml-pipeline.html#transformers\">transformer<\/a> can be used with a pysparkling.ml.H2OAutoML <a href=\"http:\/\/spark.apache.org\/docs\/latest\/ml-pipeline.html#estimators\">estimator<\/a> in an Apache Spark<sup>TM<\/sup>\u00a0MLlib <a href=\"http:\/\/spark.apache.org\/docs\/latest\/ml-pipeline.html#pipeline\">pipeline<\/a>. During both training and scoring, this transformer will skip any rows that have a Celsius temperature value of less than or equal to 10.<\/p>\n<p>You can see the generalization performance of our model by looking at the leaderboard. The generalization performance we get for predictions on held-out data should be similar to the leaderboard performance. You can use Apache Spark&#8217;s\u00a0RegressionEvaluator class to compute metrics such as the mean absolute error (MAE). As expected, the MAE for predictions on held-out data is similar to the leaderboard MAE, with both typically between 2.3 and 2.5.<\/p>\n<h2>Scale and monitor big data in SQL Server 2019 big data clusters<\/h2>\n<p>With SQL Server 2019, not only can you automatically select and tune machine learning models, you can also easily scale and monitor your big data cluster.<\/p>\n<h3>Scaling to big data<\/h3>\n<p>Using SQL Server 2019 big data clusters, large amounts of computing and memory resources can be leveraged to process data at scale quickly and efficiently. To scale to big data, you have the ability to configure the following parameters:<\/p>\n<ul>\n<li>The number and size of nodes in the cluster<\/li>\n<li>The number of Apache Spark<sup>TM<\/sup> pods<\/li>\n<li>YARN scheduler memory and cores<\/li>\n<li>Apache Spark<sup>TM<\/sup> Driver and Executor memory, cores, and the number of executors per pod<\/li>\n<li>Livy timeout<\/li>\n<\/ul>\n<p>Details on setting these parameters are included in the sample <a href=\"https:\/\/aka.ms\/powerplant-automl-nb\">notebook<\/a>.<\/p>\n<h3>Monitoring and diagnostics<\/h3>\n<p>SQL Server 2019 big data clusters include powerful tools for monitoring and diagnostics. The sample <a href=\"https:\/\/aka.ms\/powerplant-automl-nb\">notebook<\/a> includes instructions for accessing the following graphical user interfaces for monitoring, controlling, and troubleshooting runs in Apache Spark<sup>TM<\/sup>:<\/p>\n<p>YARN UI<\/p>\n<ul>\n<li>Shows the available and used memory and virtual cores in the Apache Spark<sup>TM<\/sup> cluster<\/li>\n<li>Lists running and completed Apache Spark<sup>TM<\/sup> applications<\/li>\n<li>Provides links to the Apache Spark<sup>TM<\/sup> UI for running applications and Spark History for completed applications<\/li>\n<li>Allows running applications to be terminated<\/li>\n<\/ul>\n<p>Apache Spark<sup>TM<\/sup> UI<\/p>\n<ul>\n<li>Provides detailed information on running Apache Spark<sup>TM<\/sup> applications<\/li>\n<\/ul>\n<p>Apache Spark<sup>TM<\/sup> History<\/p>\n<ul>\n<li>Provides details on completed Apache Spark<sup>TM<\/sup> applications<\/li>\n<li>Includes newly available Microsoft diagnostics for Apache Spark<sup>TM<\/sup> applications<\/li>\n<\/ul>\n<p>H2O Flow UI<\/p>\n<ul>\n<li>Monitors H2O job progress and engine status<\/li>\n<\/ul>\n<h2>Conclusion<\/h2>\n<p>In this blog post, you\u2019ve learned that SQL Server has gained a powerful new capability in the 2019 preview \u2013 and learned how to run machine learning workloads on big data using built-in Apache Spark<sup>TM<\/sup>, with the ability to leverage additional packages of your choosing such as H2O\u2019s automated machine learning software. We have taken a tour through a sample Apache Spark<sup>TM<\/sup> notebook for automated machine learning that can be run in Azure Data Studio against a SQL Server 2019 big data cluster. And you\u2019ve seen how you can scale resources such as nodes, cores, and memory, and monitor Apache Spark<sup>TM<\/sup> applications using built-in graphical user interfaces.<\/p>\n<h2>Getting started<\/h2>\n<ul>\n<li><a href=\"https:\/\/aka.ms\/eapsignup\">Sign up for the Early Adoption Program<\/a> if you would like to try out the new SQL Server 2019 big data clusters!<\/li>\n<li>Read the <a href=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2018\/09\/25\/introducing-microsoft-sql-server-2019-big-data-clusters\/\">SQL Server big data clusters announcement blog post<\/a><\/li>\n<li>Read more about the details of big data clusters in the <a href=\"https:\/\/info.microsoft.com\/ww-landing-SQLDB-Microsoft-SQL-Server-WhitePaper.html\">SQL Server 2019 big data clusters white paper<\/a>.<\/li>\n<li>Download <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/azure-data-studio\/what-is?view=sqlallproducts-allversions\">Azure Data Studio<\/a> and the SQL Server 2019 Extension to manage your big data clusters<\/li>\n<li>Check out the SQL Server 2019 big data cluster <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/big-data-cluster\/big-data-cluster-overview?view=sqlallproducts-allversions\">docs<\/a><\/li>\n<\/ul>\n<h2>Resources<\/h2>\n<p>The pysparkling.ml.H2OAutoML class is part of H2O\u2019s Spark integration, Sparkling Water, which is documented <a href=\"http:\/\/docs.h2o.ai\/sparkling-water\/2.3\/latest-stable\/doc\/pysparkling.html\">here<\/a>. Unfortunately, this site currently lacks detailed documentation of pysparkling.ml.H2OAutoML. Instead, you can find help on pysparkling.ml.H2OAutoML\u2018s attributes and methods by running the following python commands:<\/p>\n<p>from pysparkling.ml import H2OAutoML<\/p>\n<p>help(H2OAutoML)<\/p>\n<p>Since pysparkling.ml.H2OAutoML and h2o.automl.H2OAutoML share underlying code, it is also helpful to refer to the latter\u2019s <a href=\"http:\/\/docs.h2o.ai\/h2o\/latest-stable\/h2o-docs\/automl.html\">documentation<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we will explore how to use automated machine learning (AutoML) to create new machine learning models over your data in SQL Server 2019 big data clusters.<\/p>\n","protected":false},"author":5562,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"post_tag":[],"product":[2536],"content-type":[2424],"topic":[2451],"coauthors":[2581],"class_list":["post-25954","post","type-post","status-publish","format-standard","hentry","product-sql-server-2019","content-type-best-practices","topic-big-data","review-flag-1593580427-503","review-flag-1-1593580431-15","review-flag-2-1593580436-981","review-flag-3-1593580441-293","review-flag-5-1593580452-31","review-flag-integ-1593580287-179","review-flag-lever-1593580264-545","review-flag-new-1593580247-437"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to automate machine learning on SQL Server 2019 big data clusters - Microsoft SQL Server Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to automate machine learning on SQL Server 2019 big data clusters - Microsoft SQL Server Blog\" \/>\n<meta property=\"og:description\" content=\"In this post, we will explore how to use automated machine learning (AutoML) to create new machine learning models over your data in SQL Server 2019 big data clusters.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft SQL Server Blog\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/sqlserver\" \/>\n<meta property=\"article:published_time\" content=\"2019-01-09T17:00:03+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-09-17T21:12:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image-1024x333.png\" \/>\n<meta name=\"author\" content=\"Mario Inchiosa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SQLServer\" \/>\n<meta name=\"twitter:site\" content=\"@SQLServer\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Mario Inchiosa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 min read\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/\"},\"author\":[{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/author\/mario-inchiosa\/\",\"@type\":\"Person\",\"@name\":\"Mario Inchiosa\"}],\"headline\":\"How to automate machine learning on SQL Server 2019 big data clusters\",\"datePublished\":\"2019-01-09T17:00:03+00:00\",\"dateModified\":\"2021-09-17T21:12:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/\"},\"wordCount\":1361,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image-1024x333.png\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/\",\"name\":\"How to automate machine learning on SQL Server 2019 big data clusters - Microsoft SQL Server Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image-1024x333.png\",\"datePublished\":\"2019-01-09T17:00:03+00:00\",\"dateModified\":\"2021-09-17T21:12:45+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image.png\",\"contentUrl\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image.png\",\"width\":1450,\"height\":472,\"caption\":\"a screenshot of a cell phone\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to automate machine learning on SQL Server 2019 big data clusters\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#website\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/\",\"name\":\"Microsoft SQL Server Blog\",\"description\":\"Official News from Microsoft\u2019s Information Platform\",\"publisher\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#organization\",\"name\":\"Microsoft SQL Server Blog\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png\",\"contentUrl\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft SQL Server Blog\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"http:\/\/www.facebook.com\/sqlserver\",\"https:\/\/x.com\/SQLServer\",\"https:\/\/www.youtube.com\/user\/MSCloudOS\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to automate machine learning on SQL Server 2019 big data clusters - Microsoft SQL Server Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/","og_locale":"en_US","og_type":"article","og_title":"How to automate machine learning on SQL Server 2019 big data clusters - Microsoft SQL Server Blog","og_description":"In this post, we will explore how to use automated machine learning (AutoML) to create new machine learning models over your data in SQL Server 2019 big data clusters.","og_url":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/","og_site_name":"Microsoft SQL Server Blog","article_publisher":"http:\/\/www.facebook.com\/sqlserver","article_published_time":"2019-01-09T17:00:03+00:00","article_modified_time":"2021-09-17T21:12:45+00:00","og_image":[{"url":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image-1024x333.png","type":"","width":"","height":""}],"author":"Mario Inchiosa","twitter_card":"summary_large_image","twitter_creator":"@SQLServer","twitter_site":"@SQLServer","twitter_misc":{"Written by":"Mario Inchiosa","Est. reading time":"5 min read"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#article","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/"},"author":[{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/author\/mario-inchiosa\/","@type":"Person","@name":"Mario Inchiosa"}],"headline":"How to automate machine learning on SQL Server 2019 big data clusters","datePublished":"2019-01-09T17:00:03+00:00","dateModified":"2021-09-17T21:12:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/"},"wordCount":1361,"commentCount":0,"publisher":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#organization"},"image":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image-1024x333.png","inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/","url":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/","name":"How to automate machine learning on SQL Server 2019 big data clusters - Microsoft SQL Server Blog","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage"},"image":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image-1024x333.png","datePublished":"2019-01-09T17:00:03+00:00","dateModified":"2021-09-17T21:12:45+00:00","breadcrumb":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#primaryimage","url":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image.png","contentUrl":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/01\/2248-image.png","width":1450,"height":472,"caption":"a screenshot of a cell phone"},{"@type":"BreadcrumbList","@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2019\/01\/09\/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/"},{"@type":"ListItem","position":2,"name":"How to automate machine learning on SQL Server 2019 big data clusters"}]},{"@type":"WebSite","@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#website","url":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/","name":"Microsoft SQL Server Blog","description":"Official News from Microsoft\u2019s Information Platform","publisher":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#organization","name":"Microsoft SQL Server Blog","url":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft SQL Server Blog"},"image":{"@id":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/#\/schema\/logo\/image\/"},"sameAs":["http:\/\/www.facebook.com\/sqlserver","https:\/\/x.com\/SQLServer","https:\/\/www.youtube.com\/user\/MSCloudOS"]}]}},"msxcm_display_generated_audio":false,"msxcm_animated_featured_image":null,"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/posts\/25954","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/users\/5562"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/comments?post=25954"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/posts\/25954\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/media?parent=25954"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/post_tag?post=25954"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/product?post=25954"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/content-type?post=25954"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/topic?post=25954"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/wp-json\/wp\/v2\/coauthors?post=25954"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}