{"id":51966,"date":"2021-08-27T16:35:06","date_gmt":"2021-08-27T15:35:06","guid":{"rendered":""},"modified":"2021-10-06T16:53:57","modified_gmt":"2021-10-06T15:53:57","slug":"building-scalable-data-science-applications-using-containers-part-5","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/","title":{"rendered":"Building Scalable Data Science Applications using Containers \u2013 Part 5"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader.jpg\" alt=\"An illustration representing a data warehouse, next to an illustration of Bit the Raccoon.\" width=\"1920\" height=\"700\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader.jpg 1920w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-300x109.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-1024x373.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-768x280.jpg 768w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-1536x560.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-330x120.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-800x292.jpg 800w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-400x146.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader.jpg\" \/><\/p>\n<p>Welcome to the fifth part of this blog series around using containers for Data Science. In parts <a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/cross-industry\/2019\/05\/22\/how-to-use-containers-part-1?ocid=AID3038246\">one<\/a>,\u00a0<a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/cross-industry\/2019\/05\/31\/how-to-use-containers-part-2?ocid=AID3038246\">two<\/a>,\u00a0<a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/cross-industry\/2019\/06\/07\/how-to-use-containers-in-data-science-with-docker-and-azure-part-3?ocid=AID3038246\">three<\/a>, and\u00a0<a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2019\/10\/03\/using-containers-to-run-r-shiny-workloads-in-azure-part-4?ocid=AID3038246\">four<\/a>, I covered a number of building blocks that we\u2019ll use here. If this is the first blog you\u2019ve seen, it\u2019s worth skimming the first four parts, or even going back and progressing through them. I make a number of assumptions about your familiarity with Docker, storage, and multi-container applications, covered in these previous blogs.<\/p>\n<p>The\u00a0objective\u00a0of this\u00a0blog is to:<\/p>\n<ul>\n<li data-leveltext=\"-\" data-font=\"Arial\" data-listid=\"1\" data-aria-posinset=\"0\" data-aria-level=\"1\">Build a common data science pattern using\u00a0multiple components that will be held in containers.<\/li>\n<li data-leveltext=\"-\" data-font=\"Arial\" data-listid=\"1\" data-aria-posinset=\"0\" data-aria-level=\"1\">Provide some\u00a0considerations\u00a0for scalability and resilience.<\/li>\n<li data-leveltext=\"-\" data-font=\"Arial\" data-listid=\"1\" data-aria-posinset=\"0\" data-aria-level=\"1\">Use this as the foundation for an\u00a0Azure\u00a0Kubernetes\u00a0Service\u00a0deployment\u00a0in a subsequent post.<\/li>\n<\/ul>\n<p>This blog will not always demonstrate good data science practice; I\u2019d rather focus on exposing patterns that are worth being aware of that help provide a catalyst for learning. There are many other sources for performance optimisation, and architectural robustness, but this requires a broader level of understanding than I assume in this article. However, I will usually point out when poor practice is being demonstrated.<\/p>\n<p>For example, some performance patterns that aren\u2019t ideal &#8211; the database in the diagram below is a bottleneck and will constrain performance. The remit of this blog is to slowly build on core principles, show how to work with them and use these as a basis for further understanding.<\/p>\n<p>This will be a two-part post. In the first part, we will build the environment locally using docker-compose and make some observations about limitations. In the second, we will migrate the functionality across to Azure Kubernetes Service.<\/p>\n<p>We\u2019ll\u00a0use a simple image classification scenario\u00a0requiring\u00a0a number of\u00a0technical capabilities. These form\u00a0a\u00a0simple\u00a0process that classifies a pipeline of images into one of 10 categories.<\/p>\n<p>The scenario we\u2019ll be building\u00a0assumes\u00a0many typical project constraints:<\/p>\n<ol>\n<li data-leveltext=\"%1)\" data-font=\"Arial\" data-listid=\"2\" data-aria-posinset=\"1\" data-aria-level=\"1\">We\u00a0have no\u00a0control\u00a0of how many or how fast the\u00a0images\u00a0arrive.<\/li>\n<li data-leveltext=\"%1)\" data-font=\"Arial\" data-listid=\"2\" data-aria-posinset=\"1\" data-aria-level=\"1\">The\u00a0classification\u00a0model has been pretrained.<\/li>\n<li data-leveltext=\"%1)\" data-font=\"Arial\" data-listid=\"2\" data-aria-posinset=\"1\" data-aria-level=\"1\">Every image must be classified.\u00a0In other words, we cannot just ignore errors or crashes.<\/li>\n<li data-leveltext=\"%1)\" data-font=\"Arial\" data-listid=\"2\" data-aria-posinset=\"1\" data-aria-level=\"1\">We need to record classification\u00a0results\u00a0in a\u00a0resilient data store.<\/li>\n<li data-leveltext=\"%1)\" data-font=\"Arial\" data-listid=\"2\" data-aria-posinset=\"1\" data-aria-level=\"1\">As\u00a0we can\u2019t control our incoming workload, we\u00a0need\u00a0to scale our classification as required\u00a0&#8211;\u00a0up to accommodate throughput,\u00a0or\u00a0down to control costs.<\/li>\n<li data-leveltext=\"%1)\" data-font=\"Arial\" data-listid=\"2\" data-aria-posinset=\"1\" data-aria-level=\"1\">We will monitor\u00a0throughput, performance, and accuracy\u00a0to\u00a0allow us to scale our resource and\u00a0in\u00a0potentially\u00a0detect statistical drift.<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS1.jpg\" alt=\"A diagram showing how the project will work\" width=\"723\" height=\"442\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS1.jpg 723w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS1-300x183.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS1-330x202.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS1-400x245.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS1.jpg\" \/><\/p>\n<p>The overall application is outlined in the diagram above. We want to provide some resilience; For example, a failed categorisation due to a crashed process will be retried. At this stage we won\u2019t provide a highly available database, or an event queue with automatic failover, but we may consider this when we move it to Kubernetes.<\/p>\n<p>&nbsp;<\/p>\n<h3>Let\u2019s\u00a0begin<\/h3>\n<p>We\u2019ll hold our application under a single\u00a0directory tree. Create the\u00a0<strong>containers\u00a0<\/strong>directory, and then\u00a0beneath that, create\u00a0four\u00a0sub directories named\u00a0<strong>postgres<\/strong>,\u00a0<strong>python<\/strong>,\u00a0<strong>rabbitmq\u00a0<\/strong>and\u00a0<strong>worker<\/strong>.<\/p>\n<pre>$ mkdir -p containers\/postgres\/ containers\/python\/ containers\/rabbitmq\/ containers\/worker\r\n$ tree containers\r\n\r\ncontainers\/\r\n\u251c\u2500\u2500 postgres\r\n\u251c\u2500\u2500 python\r\n\u251c\u2500\u2500 rabbitmq\r\n\u2514\u2500\u2500 worker<\/pre>\n<p>&nbsp;<\/p>\n<h3>Create\u00a0Your\u00a0Persistent\u00a0storage<\/h3>\n<p>Containers are designed to be disposable processes and stateless. We\u2019ll need to ensure that whenever a container terminates, its state can be remembered. We\u2019ll do that using Docker volumes for persistent storage. As our overall architecture diagram shows, we\u2019ll need this for Postgres, RabbitMQ and to hold our images.<\/p>\n<p>Create\u00a0the\u00a0docker volumes\u00a0and then\u00a0confirm\u00a0they\u2019re there.<\/p>\n<pre>$ <strong>docker volume create scalable-app_db_data<\/strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong># for Postgres<\/strong>\r\nscalable-app_db_data\r\n$ <strong>docker volume create scalable-app_image_data<\/strong>\u00a0\u00a0\u00a0\u00a0\u00a0 <strong># to hold our images<\/strong>\r\nscalable-app_image_data\r\n$ <strong>docker volume create scalable-app_mq_data<\/strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong># for Rabbit data<\/strong>\r\nscalable-app_mq_data\r\n$ <strong>docker volume create scalable-app_mq_log<\/strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong># for Rabbit logs<\/strong>\r\nscalable-app_mq_log\r\n$ <strong>docker volume ls<\/strong>\r\nDRIVER\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 VOLUME NAME\r\nlocal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0scalable-app_db_data\r\nlocal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0scalable-app_image_data\r\nlocal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0scalable-app_mq_data\r\nlocal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0scalable-app_mq_log\r\n$<\/pre>\n<p>&nbsp;<\/p>\n<h3>Load the\u00a0Source\u00a0Images<\/h3>\n<p>I\u2019ve used a\u00a0publicly available set of images\u00a0for classification &#8211;\u00a0the classic\u00a0<a href=\"https:\/\/www.cs.toronto.edu\/~kriz\/cifar.html\">CIFAR<\/a>\u00a0data set.\u00a0Data sets are\u00a0often\u00a0already post-processed to allow for easy inclusion into machine learning code.\u00a0I found a source\u00a0that has them\u00a0in jpg form, which\u00a0can be downloaded\u00a0<a href=\"https:\/\/github.com\/YoongiKim\/CIFAR-10-images?ocid=AID3038246\">here<\/a>.<\/p>\n<p>We\u2019ll first clone the CIFAR image repository, then load those images into a volume\u00a0using\u00a0a tiny alpine container\u00a0and\u00a0show that they have been copied to the persistent volume.\u00a0 We\u2019ll\u00a0also\u00a0use\u00a0this\u00a0volume as part of the process to\u00a0queue and categorise each image.\u00a0 Note that in the text below, you can refer to a running container by the\u00a0prefix of its identity\u00a0if\u00a0it\u00a0is unique.\u00a0 Hence \u2018343\u2019 below refers to the\u00a0container with an ID uniquely beginning with\u00a0\u2018343\u2019.<\/p>\n<pre>$ mkdir images\r\n$ cd images\r\n$ git clone https:\/\/github.com\/YoongiKim\/CIFAR-10-images.git\r\nCloning into 'CIFAR-10-images'...\r\nremote: Enumerating objects: 60027, done.\r\nremote: Counting objects: 100% (60027\/60027), done.\r\nremote: Compressing objects: 100% (37\/37), done.\r\nremote: Total 60027 (delta 59990), reused 60024 (delta 59990), pack-reused 0\r\nReceiving objects: 100% (60027\/60027), 19.94 MiB | 2.75 MiB\/s, done.\r\nResolving deltas: 100% (59990\/59990), done.\r\nChecking out files: 100% (60001\/60001), done.\r\n$\r\n$ docker run --rm -itd -v scalable-app_image_data:\/images alpine\r\n343b5e3ad95a272810e51ada368c1c6e070f83df1c974e88a583c17462941337\r\n$\r\n$ docker cp CIFAR-10-images 343:\/images\r\n$ docker exec -it 343 ls -lr \/images\/CIFAR-10-images\/test\/cat | head\r\ntotal 4000\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 954 Dec 22 12:50 0999.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 956 Dec 22 12:50 0998.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 915 Dec 22 12:50 0997.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 902 Dec 22 12:50 0996.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 938 Dec 22 12:50 0995.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 957 Dec 22 12:50 0994.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 981 Dec 22 12:50 0993.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 889 Dec 22 12:50 0992.jpg\r\n-rw-r--r--\u00a0\u00a0\u00a0 1 501\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dialout\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 906 Dec 22 12:50 0991.jpg\r\n$ docker stop 343\r\n343<\/pre>\n<p>&nbsp;<\/p>\n<h3>The Queueing Service<\/h3>\n<p>We\u2019ll process images by adding them to a queue and letting worker processes simply take them from the queue. This allows us to scale our workers and ensure some resilience around the requests. I\u2019ve chosen RabbitMQ as it\u2019s very easy to use and accessible from many programming languages.<\/p>\n<p>To create the\u00a0RabbitMQ service,\u00a0create a\u00a0Dockerfile\u00a0in the\u00a0<strong>containers\/rabbitmq<\/strong>\u00a0directory\u00a0with\u00a0the following:<\/p>\n<pre>FROM rabbitmq:3-management\r\n\r\nEXPOSE 5672\r\nEXPOSE 15672<\/pre>\n<p>&nbsp;<\/p>\n<p>Now go into that directory and build it:<\/p>\n<pre><strong>$ docker build -t rabbitmq<\/strong> .\r\nSending build context to Docker daemon\u00a0 14.85kB\r\nStep 1\/3 : FROM rabbitmq:3-management\r\n3-management: Pulling from library\/rabbitmq\r\n.\r\n.\r\n.\r\nDigest: sha256:e1ddebdb52d770a6d1f9265543965615c86c23f705f67c44f0cef34e5dc2ba70\r\nStatus: Downloaded newer image for rabbitmq:3-management\r\n---&gt; db695e07d0d7\r\nStep 2\/3 : EXPOSE 5672\r\n---&gt; Running in 44098f35535c\r\nRemoving intermediate container 44098f35535c\r\n---&gt; 7406a95c39b3\r\nStep 3\/3 : EXPOSE 15672\r\n---&gt; Running in 388bcbf65e3f\r\nRemoving intermediate container 388bcbf65e3f\r\n---&gt; db76ef2233d1\r\nSuccessfully built db76ef2233d1\r\nSuccessfully tagged rabbitmq:latest\r\n<strong>$<\/strong><\/pre>\n<p>&nbsp;<\/p>\n<p>Now start a container based on that image:<\/p>\n<pre><strong>$ docker run -itd\u00a0 -v \"scalable-app_mq_log:\/var\/log\/rabbitmq\" -v \"scalable-app_mq_data:\/var\/lib\/rabbitmq\" --name \"rabbitmq\" --hostname rabbitmq -p 15672:15672 -p 5672:5672 rabbitmq<\/strong>\r\nf02ae9d41778968ebcd2420fe5cfd281d9b5df84f27bd52bd23e1735db828e18\r\n$<\/pre>\n<p>&nbsp;<\/p>\n<p>If you\u00a0open up\u00a0a browser and go to localhost:15672, you should see the following:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS2.jpg\" alt=\"A screenshot of the RabbitMQ login screen\" width=\"458\" height=\"202\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS2.jpg 458w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS2-300x132.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS2-330x146.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS2-400x176.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS2.jpg\" \/><\/p>\n<p>Log in with\u00a0username\u00a0<strong>guest\u00a0<\/strong>and password\u00a0<strong>guest<\/strong>, and you\u00a0should\u00a0see something like the following:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3.jpg\" alt=\"The default screen you will see after logging into RabbitMQ\" width=\"847\" height=\"495\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3.jpg 847w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3-300x175.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3-768x449.jpg 768w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3-330x193.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3-800x468.jpg 800w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3-400x234.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS3.jpg\" \/><\/p>\n<p>This will\u00a0allow us to monitor\u00a0queues.<\/p>\n<p>Go to the\u00a0<strong>containers\/python<\/strong>\u00a0directory\u00a0and create a new file called\u00a0<strong>fill_queue.py<\/strong>. The code below finds a list of all the images to be categorised and adds it to our queue.<\/p>\n<p>I start at the mounted directory of images, and do a tree walk finding every image (ending in png, jpg, or jpeg). I use the location in the full path to define the expected category is (fNameToCategory), and build up an array of JSON payloads.<\/p>\n<p>I then connect to the Rabbit Server. Note that in this case, that\u00a0<strong>HOSTNAME\u00a0<\/strong>is defined as your Docker host\u2019s IP address \u2013 in this case, the Docker host\u2019s \u2018localhost\u2019, because the python container has a different localhost than the RabbitMQ container.<\/p>\n<p>I\u00a0declare a new channel, and queue and publish each\u00a0IMGS entry\u00a0as a separate message.<\/p>\n<p>There is a debugging print to\u00a0show the number of images.\u00a0 If all goes well, you shouldn\u2019t see this\u00a0as it will scroll off the screen. Hopefully, you see\u00a0thousands of messages\u00a0showing progress.<\/p>\n<pre>#!\/usr\/bin\/env python\r\nimport pika\r\nimport sys\r\nimport os\r\nimport json\r\n\r\nROOT=\"\/images\"\r\nrLen = len(ROOT)\r\nclasses = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')\r\nHOSTNAME=\"&lt;Enter your Docker host\u2019s IP address&gt;\"\r\n\r\n# Determine the expected category by parsing the directory (after the root path)\r\ndef fnameToCategory(fname):\r\n    for c in classes:\r\n        if (fname.find(c) &gt; rLen):\r\n            return (classes.index(c))\r\n    return -1 # This should never happen\r\n\r\nIMGS=[]\r\nfor root, dirs, files in os.walk(ROOT):\r\n    for filename in files:\r\n        if filename.endswith(('.png', '.jpg', '.jpeg')):\r\n            fullpath=os.path.join(root, filename)\r\n            cat = fnameToCategory(fullpath)\r\n            data = {\r\n                \"image\" : fullpath,\r\n                \"category\": cat,\r\n                \"catName\": classes[cat]\r\n            }\r\n            message = json.dumps(data)\r\n            IMGS.append(message)\r\n\r\nconnection = pika.BlockingConnection(pika.ConnectionParameters(host=HOSTNAME))\r\nchannel = connection.channel()\r\n\r\nchannel.queue_declare(queue='image_queue', durable=True)\r\n\r\nprint(\"Number of Images = \", len(IMGS))\r\n\r\nfor i in IMGS:\r\n    channel.basic_publish( exchange='', routing_key='image_queue', body=i,\r\n        properties=pika.BasicProperties( delivery_mode=2,\u00a0 )\r\n    )\r\n    print(\"Queued \", i)\r\n\r\nconnection.close()<\/pre>\n<p>&nbsp;<\/p>\n<p>In the same <strong>containers\/python<\/strong> directory, create a Dockerfile for your python engine:<\/p>\n<pre>FROM python:3.7-alpine\r\n\r\n# Add core OS requirements\r\nRUN apk update &amp;&amp; apk add bash vim\r\n\r\n# Add Python Libraries\r\nRUN pip install pika\r\n\r\nADD\u00a0 fill_queue.py \/<\/pre>\n<p>&nbsp;<\/p>\n<p>Now build the Docker image:<\/p>\n<pre>$ docker build -t python .\r\nSending build context to Docker daemon\u00a0\u00a0 27.8MB\r\nStep 1\/4 : FROM python:3.7-alpine\r\n---&gt; 459651397c21\r\nStep 2\/4 : RUN apk update &amp;&amp; apk add bash vim\r\n---&gt; Running in dc363417cf12\r\n.\r\n.\r\n.\r\nSuccessfully installed pika-1.1.0\r\nRemoving intermediate container b40f1782f0c1\r\n---&gt; 35891fccb860\r\nStep 4\/4 : ADD\u00a0 fill_queue.py \/\r\n---&gt; 17cd19050b21\r\nSuccessfully built 17cd19050b21\r\nSuccessfully tagged python:latest\r\n$<\/pre>\n<p>&nbsp;<\/p>\n<p>Now, run the container, mounting the volume containing our images and executing our script:<\/p>\n<pre>$ docker run --rm -v scalable-app_image_data:\/images\u00a0 -it python python \/fill_queue.py\r\n\r\nNumber of Images =\u00a0 60000\r\nQueued\u00a0 {\"image\": \"\/images\/CIFAR-10-images\/test\/dog\/0754.jpg\", \"category\": 5, \"catName\": \"dog\"}\r\nQueued\u00a0 {\"image\": \"\/images\/CIFAR-10-images\/test\/dog\/0985.jpg\", \"category\": 5, \"catName\": \"dog\"}\r\n.\r\n.\r\n.\r\n.<\/pre>\n<p>&nbsp;<\/p>\n<p>While this is running, you should see the queued messages increase until it reaches 60,000.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4.jpg\" alt=\"The queued messages reading 60,000.\" width=\"860\" height=\"351\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4.jpg 860w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4-300x122.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4-768x313.jpg 768w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4-330x135.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4-800x327.jpg 800w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4-400x163.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS4.jpg\" \/><\/p>\n<p>Now click on the \u2018Queues\u2019 link in the RabbitMQ management console, and you will see that those messages\u00a0are\u00a0now\u00a0in the\u00a0&#8216;<strong>image_queue<\/strong>&#8216; queue\u00a0waiting to be requested.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5.jpg\" alt=\"A screenshot indicating the location of image_queue\" width=\"859\" height=\"440\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5.jpg 859w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5-300x154.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5-768x393.jpg 768w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5-330x169.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5-800x410.jpg 800w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5-400x205.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS5.jpg\" \/><\/p>\n<p>If you now click on the\u00a0image_queue\u00a0link, you\u2019ll get a more detailed view of activity within that queue.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6.jpg\" alt=\"A screenshot showing more details of image_queue\" width=\"862\" height=\"636\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6.jpg 862w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6-300x220.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6-768x567.jpg 768w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6-330x243.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6-800x590.jpg 800w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6-400x295.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS6.jpg\" \/><\/p>\n<p>&nbsp;<\/p>\n<h3>Providing a Database Store<\/h3>\n<p>Now provision\u00a0the\u00a0database environment, which will\u00a0simply\u00a0record categorisation results.<\/p>\n<p>In the\u00a0<strong>containers\/postgres<\/strong>\u00a0directory, create a\u00a0Dockerfile\u00a0containing the following:<\/p>\n<pre>FROM postgres:11.5\r\n\r\nCOPY pg-setup.sql \/docker-entrypoint-initdb.d\/\r\n\r\nEXPOSE 5432\r\n\r\nCMD [\"postgres\"]<\/pre>\n<p>&nbsp;<\/p>\n<p>In the same directory, create a file called\u00a0pg-setup.sql\u00a0containing the following:<\/p>\n<pre>CREATE TABLE CATEGORY_RESULTS (\r\n    FNAME\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0VARCHAR(1024) NOT NULL,\r\n    CATEGORY\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0NUMERIC(2) NOT NULL,\r\n    PREDICTION\u00a0\u00a0\u00a0\u00a0NUMERIC(2) NOT NULL,\r\n    CONFIDENCE\u00a0\u00a0\u00a0 REAL);<\/pre>\n<p>&nbsp;<\/p>\n<p>And build the Postgres container image:<\/p>\n<pre>$ docker build -t postgres .\r\nSending build context to Docker daemon\u00a0 4.096kB\r\nStep 1\/4 : FROM postgres:11.5\r\n---&gt; 5f1485c70c9a\r\nStep 2\/4 : COPY pg-setup.sql \/docker-entrypoint-initdb.d\/\r\n---&gt; e84511216121\r\n.\r\n.\r\n.\r\nRemoving intermediate container d600e2f45564\r\n---&gt; 128ad35a028b\r\nSuccessfully built 128ad35a028b\r\nSuccessfully tagged postgres:latest\r\n$<\/pre>\n<p>&nbsp;<\/p>\n<p>Start the Postgres service. Note that here we\u2019re mounting a docker volume to hold the persistent data when the container terminates.<\/p>\n<pre>$ docker run --name postgres --rm -v scalable-app_db_data:\/var\/lib\/postgresql\/data -p 5432:5432 -e POSTGRES_PASSWORD=password -d postgres\r\ndfc9bbffd83de9bca35c54ed0d3f4afd47c0d03f351c87988f827da15385b4e6\r\n$<\/pre>\n<p>&nbsp;<\/p>\n<p>If you now connect to the database, you should see that a table has been created for you. This will contain our categorisation results. Note, the password in this case is \u2018password\u2019 as we specified in the POSTGRES_PASSWORD environment variable when starting the container.<\/p>\n<pre>$ <strong>psql -h localhost -p 5432 -U postgres<\/strong>\r\nPassword for user postgres:\r\npsql (11.5)\r\nType \"help\" for help.\r\n\r\npostgres=# \\d\r\n            List of relations\r\n Schema |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | Type\u00a0 |\u00a0 Owner\r\n--------+------------------+-------+----------\r\n public | category_results | table | postgres\r\n(1 row)\r\n\u00a0\r\npostgres=# \\d category_results\r\n                Table \"public.category_results\"\r\n   Column\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Type\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | Collation | Nullable | Default\r\n------------+-------------------------+-----------+----------+---------\r\n fname\u00a0\u00a0\u00a0\u00a0\u00a0 | character varying(1024) |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | not null |\r\n category\u00a0\u00a0 | numeric(2,0)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | not null |\r\n prediction | numeric(2,0)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | not null |\r\n confidence | real\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<p>&nbsp;<\/p>\n<h3>The Classification Process<\/h3>\n<p>The final function will request something off the queue, classify it, and record a result. This is the worker process and uses a pretrained CIFAR model from\u00a0<a href=\"https:\/\/gluon-cv.mxnet.io\/index.html\">Gluon<\/a> together with our pika library that we used to add to the RabbitMQ queue. One design principle for this application is that we should be able to scale up the number of classifiers to support demand. This is possible because the queue is accessible by many workers simultaneously. The workers request messages in a round-robin fashion, meaning that the process can be parallelised to increase throughput.<\/p>\n<p>In your\u00a0<strong>containers\/worker<\/strong> directory, create the following\u00a0Dockerfile:<\/p>\n<pre>FROM ubuntu\r\n\r\nRUN apt-get update\r\nRUN apt-get install -y python3 python3-pip\r\n\r\nRUN pip3 install --upgrade mxnet gluoncv pika\r\nRUN pip3 install psycopg2-binary\r\n\r\n# Add worker logic necessary to process queue items\r\nADD\u00a0 worker.py \/\r\n\r\n# Start the worker\r\nCMD [\"python3\", \".\/worker.py\" ]<\/pre>\n<p>&nbsp;<\/p>\n<p>Also create a file called worker.py with the following content:<\/p>\n<pre>#!\/usr\/bin\/env python\r\n\r\nfrom mxnet import gluon, nd, image\r\nfrom mxnet.gluon.data.vision import transforms\r\nfrom gluoncv import utils\r\nfrom gluoncv.model_zoo import get_model\r\nimport psycopg2\r\nimport pika\r\nimport time\r\nimport json\r\n\r\ndef predictCategory(fname):\r\n    img = image.imread(fname)\r\n\r\n    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\r\n\r\n    transform_fn = transforms.Compose([\r\n        transforms.Resize(32), transforms.CenterCrop(32), transforms.ToTensor(),\r\n        transforms.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])\r\n    ])\r\n    img = transform_fn(img)\r\n    net = get_model('cifar_resnet110_v1', classes=10, pretrained=True)\r\n\r\n    pred = net(img.expand_dims(axis=0))\r\n    ind = nd.argmax(pred, axis=1).astype('int')\r\n    print('The input picture is classified as [%s], with probability %.3f.'%\r\n        (class_names[ind.asscalar()], nd.softmax(pred)[0][ind].asscalar()))\r\n    return ind.asscalar(), nd.softmax(pred)[0][ind].asscalar()\r\n\r\ndef InsertResult(connection, fname, category, prediction, prob):\r\n    count=0\r\n    try:\r\n        cursor = connection.cursor()\r\n\r\n        qry = \"\"\" INSERT INTO CATEGORY_RESULTS (FNAME, CATEGORY, PREDICTION, CONFIDENCE) VALUES (%s,%s,%s,%s)\"\"\"\r\n        record = (fname, category, prediction, prob)\r\n        cursor.execute(qry, record)\r\n\r\n\u00a0       connection.commit()\r\n        count = cursor.rowcount\r\n\r\n    except (Exception, psycopg2.Error) as error :\r\n        if(connection):\r\n            print(\"Failed to insert record into category_results table\", error)\r\n\r\n    finally:\r\n        cursor.close()\r\n        return count\r\n\r\n#\r\n# Routine to pull message from queue, call classifier, and insert result to the DB\r\n#\r\ndef callback(ch, method, properties, body):\r\n    data = json.loads(body)\r\n    fname = data['image']\r\n    cat = data['category']\r\n    print(\"Processing\", fname)\r\n    pred, prob = predictCategory(fname)\r\n    if (logToDB == 1):\r\n        count = InsertResult(pgconn, fname, int(cat), int(pred), float(prob))\r\n    else:\r\n        count = 1\u00a0 # Ensure the message is ack'd and removed from queue\r\n\r\n    if (count &gt; 0):\r\n        ch.basic_ack(delivery_tag=method.delivery_tag)\r\n    else:\r\n        ch.basic_nack(delivery_tag=method.delivery_tag)\r\n\r\nlogToDB=1\u00a0\u00a0\u00a0 # Set this to 0 to disable storing data in the database\r\n\r\npgconn = psycopg2.connect(user=\"postgres\", password=\"password\",\r\n                host=\"<strong>&lt;Your host IP&gt;<\/strong>\", port=\"5432\", database=\"postgres\")\r\n\r\nconnection = pika.BlockingConnection(pika.ConnectionParameters(host=\u2019<strong>&lt;Your host IP&gt;<\/strong>\u2019))\r\nchannel = connection.channel()\r\n\r\nchannel.queue_declare(queue='image_queue', durable=True)\r\nprint(' [*] Waiting for messages. To exit press CTRL+C')\r\n\r\nchannel.basic_qos(prefetch_count=1)\r\nchannel.basic_consume(queue='image_queue', on_message_callback=callback)\r\n\r\nchannel.start_consuming()<\/pre>\n<p>&nbsp;<\/p>\n<p>Let\u2019s pick this apart a little. After importing the required libraries, I define a function <strong>predictCategory<\/strong> that takes as an argument, a filename identifying an image to classify. This then uses a pretrained model from the gluon library, and returns a classification, and a classification confidence.<\/p>\n<p>The\u00a0next function\u00a0<strong>InsertResult\u00a0<\/strong>writes a single record into the database containing the path of the image being processed, the category it should have been, what category it was predicted to be, and a\u00a0prediction\u00a0confidence.<\/p>\n<p>The\u00a0final function is a\u00a0<strong>callback\u00a0<\/strong>function that pulls these together. It deconstructs the message\u2019s JSON payload, calls the function to categorise the image, and then calls the function recording the result. If there are no functional errors, then we\u2019ll acknowledge (<strong>basic_ack<\/strong>) receipt of the message and it will be removed from the queue. If there are functional errors, then we\u2019ll do a <strong>basic_nack<\/strong>, and place the message back on the queue. If there is another worker available, then it can take it, or we can retry it later. This ensures that if a worker process dies or is interrupted for some reason, that everything in the queue can eventually be processed.<\/p>\n<p>There is\u00a0also\u00a0a variable\u00a0<strong>logToDB<\/strong>, which you can set to 0 or 1 to disable or enable logging to the database. It might be useful to see whether the database is a significant bottleneck by testing performance with and without logging.<\/p>\n<p>I\u00a0create a connection to the database, a connection to RabbitMQ using\u00a0the\u00a0host\u2019s IP address, and a channel using the\u00a0<strong>image_queue\u00a0<\/strong>queue.\u00a0 Once again,\u00a0be aware that the hosts IP address will reroute any message requests to the underlying container hosting our RabbitMQ service.<\/p>\n<p>I\u00a0then wait on messages to appear forever, processing queue items one by one.<\/p>\n<pre>$\u00a0<strong>docker build -t\u00a0worker .<\/strong>\r\nSending build context to Docker daemon\u00a0 5.632kB\r\nStep 1\/6 : FROM ubuntu\r\n---&gt; 94e814e2efa8\r\nStep 2\/6 : RUN apt-get update\r\n---&gt; Running in 3cbb2343f94f\r\n.\r\n.\r\n.\r\nStep 6\/6 : ADD\u00a0 worker.py \/\r\n---&gt; bc96312e6352\r\nSuccessfully built bc96312e6352\r\nSuccessfully tagged worker:latest\r\n$<\/pre>\n<p>&nbsp;<\/p>\n<p>We can start a worker to begin the process of categorising our images.<\/p>\n<pre>$ <strong>docker run --rm -itd -v scalable-app_image_data:\/images worker<\/strong>\r\n061acbfcf1fb4bdf43b90dd9b77c2aca67c4e1d012777f308c5f89aecad6aa00\r\n$\r\n$ Docker logs 061a\r\n<strong>[*] Waiting for messages. To exit press CTRL+C\r\nProcessing \/images\/CIFAR-10-images\/test\/dog\/0573.jpg\r\nModel file is not found. Downloading.\r\nDownloading \/root\/.mxnet\/models\/cifar_resnet110_v1-a0e1f860.zip from https:\/\/apache-mxnet.s3-accelerate.dualstack.amazonaws.com\/gluon\/models\/cifar_resnet110_v1-a0e1f860.zip...\r\n6336KB [00:04, 1374.69KB\/s]<\/strong>\r\nThe input picture is classified as [dog], with probability 0.998.\r\nProcessing \/images\/CIFAR-10-images\/test\/dog\/0057.jpg\r\nThe input picture is classified as [dog], with probability 0.996.\r\nProcessing \/images\/CIFAR-10-images\/test\/dog\/0443.jpg\r\nThe input picture is classified as [deer], with probability 0.953.\r\n.\r\n.<\/pre>\n<p>&nbsp;<\/p>\n<p>Clearly, it\u2019s not great practice to use a training set as part of a testing process. However, we\u2019re not measuring model effectiveness and accuracy here. We\u2019re simply seeking to understand how to categorise many thousands of images with a scalable approach, so actually any images will do no matter where they came from. The first thing the worker does is download a pretrained model. There\u2019s no need to train it. In your own environment, you may consider doing something similar by using the latest stable model to support the data being tested. It then takes an item from the queue, categorises it, removes it from the queue, and progresses to the next item.<\/p>\n<p>&nbsp;<\/p>\n<p>If we now query the database, it\u2019s clear that the worker has been busy:<\/p>\n<pre><strong>$ psql -h localhost -p 5432 -U postgres<\/strong>\r\nPassword for user postgres:\r\npsql (11.5)\r\nType \"help\" for help.\r\n\r\npostgres=# select * from category_results ;\r\n                   fname\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | category | prediction | confidence\r\n-------------------------------------------+----------+------------+------------\r\n \/images\/CIFAR-10-images\/test\/dog\/0826.jpg |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5 |\u00a0\u00a0 0.999194\r\n \/images\/CIFAR-10-images\/test\/dog\/0333.jpg |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5 |\u00a0\u00a0 0.992484\r\n.\r\n.\r\n.<\/pre>\n<p>&nbsp;<\/p>\n<p>Let\u2019s look at the queue itself:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS7.jpg\" alt=\"A screenshot showing more details of image_queue\" width=\"692\" height=\"564\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS7.jpg 692w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS7-300x245.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS7-307x250.jpg 307w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS7-330x269.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS7-400x326.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS7.jpg\" \/><\/p>\n<p>As you can see, there is a processing rate of 12 requests per second. Let\u2019s kick off a couple more workers:<\/p>\n<pre><strong>$ for w in 1 2 ; do docker run --rm -itd -v scalable-app_image_data:\/images worker; done<\/strong>\r\nee1732dd3d4a1abcd8ab356262603d8a24523dca237ea1102c3a953c86a221bf\r\na26c14a28b5605345ed6d09cd4d21d2478d34a8ce22668d0aac37a227af21c3e\r\n$<\/pre>\n<p>&nbsp;<\/p>\n<p>Look at the queue again:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS8.jpg\" alt=\"A screenshot showing more details of image_queue\" width=\"685\" height=\"534\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS8.jpg 685w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS8-300x234.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS8-321x250.jpg 321w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS8-330x257.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS8-400x312.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS8.jpg\" \/><\/p>\n<p>And now the ack rate has increased to 22 per second. You might at this point be thinking that increasing containers here is the next logical step. However, you shouldn\u2019t expect linear scalability. RabbitMQ has its own bottlenecks, as does the database and the python code. There are many public resources that discuss improving RabbitMQ performance including the use of prefetch counts clustering, reduced queue size, multiple queues, or using CPU affinity. For that matter, changing the code to use threads, parallelise certain functions, or even removing the durable flag are also likely to help. This article isn\u2019t going to focus on any of those, so I\u2019ll leave it to you to do your own research on what works for your code and scenarios. One other thing you might like to try at some point is to use RabbitMQ clusters with an HA-Proxy load balancer, which may improve your performance. A non-Docker example can be found <a href=\"https:\/\/www.cloudkb.net\/rabbitmq-cluster-setup-haproxy\/\">here<\/a>.<\/p>\n<p>In any case, let\u2019s convert what we have into a multi-container application using docker-compose. We can then use that as the basis for a Kubernetes environment.<\/p>\n<pre>$ tree containers\r\n\r\ncontainers\/\r\n\u251c\u2500\u2500 postgres\r\n\u2502\u202f\u202f \u251c\u2500\u2500 Dockerfile\r\n\u2502\u202f\u202f \u251c\u2500\u2500 pg-setup.sql\r\n\u251c\u2500\u2500 python\r\n\u2502\u202f\u202f \u251c\u2500\u2500 Dockerfile\r\n\u2502\u202f\u202f \u251c\u2500\u2500 fill_queue.py\r\n\u251c\u2500\u2500 rabbitmq\r\n\u2502\u202f\u202f \u2514\u2500\u2500 Dockerfile\r\n\u2514\u2500\u2500 worker\r\n    \u251c\u2500\u2500 Dockerfile\r\n    \u2514\u2500\u2500 worker.py<\/pre>\n<p>&nbsp;<\/p>\n<p>We can convert all the work done so far into fewer steps with docker-compose and a couple of scripts. In the directory holding the containers directory, create a new file called docker-compose.yml:<\/p>\n<pre>version: '3'\r\n\r\nservices:\r\n    sa_postgres:\r\n        build: containers\/postgres\r\n        ports:\r\n            - \"5432:5432\"\r\n        volumes:\r\n            - scalable-app_db_data:\/var\/lib\/postgresql\/data\r\n        environment:\r\n            - POSTGRES_PASSWORD=password\r\n\r\n    sa_rabbitmq:\r\n        build: containers\/rabbitmq\r\n        hostname: rabbitmq\r\n        ports:\r\n            - 5672:5672\r\n            - 15672:15672\r\n        volumes:\r\n            - scalable-app_mq_log:\/var\/log\/rabbitmq\r\n            - scalable-app_mq_data:\/var\/lib\/rabbitmq\r\n\r\n    sa_worker:\r\n        build: containers\/worker\r\n        depends_on:\r\n            - sa_postgres\r\n            - sa_rabbitmq\r\n        volumes:\r\n            - scalable-app_image_data:\/images\r\n        restart: always\r\n# number of containers?\r\n\r\nvolumes:\r\n    scalable-app_db_data:\r\n    scalable-app_image_data:\r\n    scalable-app_mq_data:\r\n    scalable-app_mq_log:<\/pre>\n<p>&nbsp;<\/p>\n<p>Now Build the composite application:<\/p>\n<pre>$ <strong>docker-compose build<\/strong>\r\nBuilding sa_postgres\r\nStep 1\/4 : FROM postgres:11.5\r\n---&gt; 5f1485c70c9a\r\nStep 2\/4 : COPY pg-setup.sql \/docker-entrypoint-initdb.d\/\r\n---&gt; 2e57fe31a9ab\r\nStep 3\/4 : EXPOSE 5432\r\n---&gt; Running in 6f02f7f92a19\r\nRemoving intermediate container 6f02f7f92a19\r\n.\r\n.\r\n.<\/pre>\n<p>&nbsp;<\/p>\n<p>Before you start the composite application, make sure you do a \u2018docker ps -a\u2019 to see the currently running containers and stop\/remove them. When that\u2019s done, start the application and specify how many worker containers you want to service the queue.<\/p>\n<pre>$ <strong>docker-compose up -d --scale sa_worker=2<\/strong>\r\nCreating network \"scalable-app_default\" with the default driver\r\nCreating volume \"scalable-app_scalable-app_db_data\" with default driver\r\nCreating volume \"scalable-app_scalable-app_image_data\" with default driver\r\nCreating volume \"scalable-app_scalable-app_mq_data\" with default driver\r\nCreating volume \"scalable-app_scalable-app_mq_log\" with default driver\r\nCreating scalable-app_sa_python_1\u00a0\u00a0 ... done\r\nCreating scalable-app_sa_postgres_1 ... done\r\nCreating scalable-app_sa_rabbitmq_1 ... done\r\nCreating scalable-app_sa_worker_1\u00a0\u00a0 ... done\r\nCreating scalable-app_sa_worker_2\u00a0\u00a0 ... done\r\n.\r\n.<\/pre>\n<p>&nbsp;<\/p>\n<p>There are a couple of things to note here. First, there is now a network shared between all containers, so we won\u2019t have to refer to our host network within the code. We can now change our hostnames to refer to our other containers.\u00a0 Secondly, when we start and stop our application, everything is brought up together, or if needed, in an order to support dependencies. Lastly, the constituent volumes and images are created with names that are prefixed by the application name, which helps identify how they\u2019re used, and helping remove conflict with other resources.<\/p>\n<p>Let\u2019s bring the service down\u00a0and make those changes.<\/p>\n<pre>$ <strong>docker-compose down<\/strong>\r\nStopping scalable-app_sa_worker_2\u00a0\u00a0 ... done\r\nStopping scalable-app_sa_worker_1\u00a0\u00a0 ... done\r\nStopping scalable-app_sa_rabbitmq_1 ... done\r\nStopping scalable-app_sa_postgres_1 ... done\r\nRemoving scalable-app_sa_worker_2\u00a0\u00a0 ... done\r\nRemoving scalable-app_sa_worker_1\u00a0\u00a0 ... done\r\nRemoving scalable-app_sa_rabbitmq_1 ... done\r\nRemoving scalable-app_sa_postgres_1 ... done\r\nRemoving scalable-app_sa_python_1\u00a0\u00a0 ... done\r\nRemoving network scalable-app_default\r\n(base) JLIEM-SB2:containers jon$<\/pre>\n<p>&nbsp;<\/p>\n<p>In the containers\/worker\/worker.py file, make the following changes to your host identifiers:<\/p>\n<pre>.\r\n.\r\nlogToDB=1\u00a0\u00a0\u00a0 # Set this to 0 to disable storing data in the database\r\n\r\npgconn = psycopg2.connect(user=\"postgres\", password=\"password\",\r\n                          host=\"<strong>sa_postgres<\/strong>\", port=\"5432\", database=\"postgres\")\r\n\r\nconnection = pika.BlockingConnection(pika.ConnectionParameters(host='<strong>sa_rabbitmq<\/strong>'))\r\nchannel = connection.channel()\r\n.\r\n.<\/pre>\n<p>&nbsp;<\/p>\n<p>In your containers\/python\/fill_queue.py file, change your hostname:<\/p>\n<pre>HOSTNAME=\"<strong>sa_rabbitmq<\/strong>\"<\/pre>\n<p>&nbsp;<\/p>\n<p>And restart again:<\/p>\n<pre><strong>$ docker-compose up -d --scale sa_worker=2<\/strong>\r\nCreating network \"scalable-app_default\" with the default driver\r\nCreating volume \"scalable-app_scalable-app_db_data\" with default driver\r\nCreating volume \"scalable-app_scalable-app_image_data\" with default driver\r\nCreating volume \"scalable-app_scalable-app_mq_data\" with default driver\r\nCreating volume \"scalable-app_scalable-app_mq_log\" with default driver\r\nCreating scalable-app_sa_python_1\u00a0\u00a0 ... done\r\nCreating scalable-app_sa_postgres_1 ... done\r\nCreating scalable-app_sa_rabbitmq_1 ... done\r\nCreating scalable-app_sa_worker_1\u00a0\u00a0 ... done\r\nCreating scalable-app_sa_worker_2\u00a0\u00a0 ... done\r\n.\r\n.<\/pre>\n<p>&nbsp;<\/p>\n<p>You can now populate the message queue with images to process. The following script mounts the image volume on a temporary container, copies the images to the volume, and then starts a process to populate the queue.<\/p>\n<pre># clone the CIFAR images, if they're not already there\r\nif [ ! -d \"CIFAR-10-images\" ]; then\r\n    git clone https:\/\/github.com\/YoongiKim\/CIFAR-10-images.git\r\nfi\r\n\r\n# Start a small container to hold the images\r\nCID=$(docker run --rm -itd -v scalable-app_scalable-app_image_data:\/images alpine)\r\necho \"Copying content to container $CID:\/images\"\r\n\r\n# Copy the content\r\ndocker cp CIFAR-10-images $CID:\/images\r\ndocker stop $CID\r\n\r\ndocker run --rm -v scalable-app_scalable-app_image_data:\/images\u00a0 -it python python \/fill_queue.py<\/pre>\n<p>&nbsp;<\/p>\n<p>And as expected, we can see that the queue is both being populated and being processed by the worker nodes that are sitting in the background.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS9.jpg\" alt=\"A screenshot showing more details of image_queue, and that it is now processing\" width=\"695\" height=\"427\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS9.jpg 695w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS9-300x184.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS9-330x203.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS9-400x246.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/08\/SRappAKS9.jpg\" \/><\/p>\n<p>&nbsp;<\/p>\n<h3>Conclusions<\/h3>\n<p>This post outlined how to containerise a multi-component application reflecting a typical data science classification process. It ingests images and provides a scalable mechanism for classifying them and recording the results. As mentioned, the focus here is not on good data science practice, or good containerisation practice but to reflect on options available to support learning around containerisation with a data science frame of reference.<\/p>\n<p>This post will\u00a0be used\u00a0as a foundation for the next part in this series, which will be to\u00a0convert it to\u00a0use Kubernetes and PaaS services.<\/p>\n<p>&nbsp;<\/p>\n<h3 class=\"x-hidden-focus\">About the author<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-thumbnail size-thumbnail alignright lazyloaded\" src=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/05\/Jon-Machtynger-150x150.jpg\" alt=\"Jon Machtynger\" width=\"150\" height=\"150\" data-sizes=\"\" data-src=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/05\/Jon-Machtynger-150x150.jpg\" data-srcset=\"\" \/>Jon is a Microsoft Cloud Solution Architect specialising in Advanced Analytics &amp; Artificial Intelligence with over 30 years of experience in understanding, translating and delivering leading technology to the market. He currently focuses on a small number of global accounts helping align AI and Machine Learning capabilities with strategic initiatives. He moved to Microsoft from IBM where he was Cloud &amp; Cognitive Technical Leader and an Executive IT Specialist.<\/p>\n<p class=\"x-hidden-focus\">Jon has been the Royal Academy of Engineering Visiting Professor for Artificial Intelligence and Cloud Innovation at Surrey University since 2016, where he lectures on various topics from machine learning, and design thinking to architectural thinking.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this first of two blogs, Jon Machtynger takes a look at building an environment locally using docker-compose, while making some observations about limitations.<\/p>\n","protected":false},"author":430,"featured_media":36918,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"categories":[594],"post_tag":[519],"content-type":[],"coauthors":[531],"class_list":["post-51966","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technetuk","tag-technet-uk"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building Scalable Data Science Applications using Containers \u2013 Part 5 - Microsoft Industry Blogs - United Kingdom<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building Scalable Data Science Applications using Containers \u2013 Part 5 - Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"og:description\" content=\"In this first of two blogs, Jon Machtynger takes a look at building an environment locally using docker-compose, while making some observations about limitations.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"article:published_time\" content=\"2021-08-27T15:35:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-10-06T15:53:57+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jon Machtynger\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jon Machtynger\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"18 min read\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/author\\\/jon\\\/\",\"@type\":\"Person\",\"@name\":\"Jon Machtynger\"}],\"headline\":\"Building Scalable Data Science Applications using Containers \u2013 Part 5\",\"datePublished\":\"2021-08-27T15:35:06+00:00\",\"dateModified\":\"2021-10-06T15:53:57+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/\"},\"wordCount\":2506,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/01\\\/datasolutionthumb.jpg\",\"keywords\":[\"TechNet UK\"],\"articleSection\":[\"TechNet UK\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/\",\"url\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/\",\"name\":\"Building Scalable Data Science Applications using Containers \u2013 Part 5 - Microsoft Industry Blogs - United Kingdom\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/01\\\/datasolutionthumb.jpg\",\"datePublished\":\"2021-08-27T15:35:06+00:00\",\"dateModified\":\"2021-10-06T15:53:57+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/01\\\/datasolutionthumb.jpg\",\"contentUrl\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/01\\\/datasolutionthumb.jpg\",\"width\":800,\"height\":450,\"caption\":\"An illustration representing a data warehouse, next to an illustration of Bit the Raccoon.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2021\\\/08\\\/27\\\/building-scalable-data-science-applications-using-containers-part-5\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building Scalable Data Science Applications using Containers \u2013 Part 5\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/#organization\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"url\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"contentUrl\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Industry Blogs - United Kingdom\"},\"image\":{\"@id\":\"https:\\\/\\\/www.microsoft.com\\\/en-gb\\\/industry\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building Scalable Data Science Applications using Containers \u2013 Part 5 - Microsoft Industry Blogs - United Kingdom","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/","og_locale":"en_US","og_type":"article","og_title":"Building Scalable Data Science Applications using Containers \u2013 Part 5 - Microsoft Industry Blogs - United Kingdom","og_description":"In this first of two blogs, Jon Machtynger takes a look at building an environment locally using docker-compose, while making some observations about limitations.","og_url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/","og_site_name":"Microsoft Industry Blogs - United Kingdom","article_published_time":"2021-08-27T15:35:06+00:00","article_modified_time":"2021-10-06T15:53:57+00:00","og_image":[{"width":800,"height":450,"url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","type":"image\/jpeg"}],"author":"Jon Machtynger","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jon Machtynger","Est. reading time":"18 min read"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#article","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/"},"author":[{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/author\/jon\/","@type":"Person","@name":"Jon Machtynger"}],"headline":"Building Scalable Data Science Applications using Containers \u2013 Part 5","datePublished":"2021-08-27T15:35:06+00:00","dateModified":"2021-10-06T15:53:57+00:00","mainEntityOfPage":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/"},"wordCount":2506,"commentCount":0,"publisher":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization"},"image":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","keywords":["TechNet UK"],"articleSection":["TechNet UK"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/","name":"Building Scalable Data Science Applications using Containers \u2013 Part 5 - Microsoft Industry Blogs - United Kingdom","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#primaryimage"},"image":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","datePublished":"2021-08-27T15:35:06+00:00","dateModified":"2021-10-06T15:53:57+00:00","breadcrumb":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#primaryimage","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","contentUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","width":800,"height":450,"caption":"An illustration representing a data warehouse, next to an illustration of Bit the Raccoon."},{"@type":"BreadcrumbList","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/"},{"@type":"ListItem","position":2,"name":"Building Scalable Data Science Applications using Containers \u2013 Part 5"}]},{"@type":"WebSite","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#website","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/","name":"Microsoft Industry Blogs - United Kingdom","description":"","publisher":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization","name":"Microsoft Industry Blogs - United Kingdom","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Industry Blogs - United Kingdom"},"image":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/51966","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/users\/430"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/comments?post=51966"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/51966\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media\/36918"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media?parent=51966"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/categories?post=51966"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/post_tag?post=51966"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/content-type?post=51966"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/coauthors?post=51966"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}