German car manufacturer BMW has always been at the forefront of technological advancement within the auto industry. And a major part of the company’s innovation focus is on its production system, ensuring quality and flexibility alike.
When it comes to cutting-edge production systems, and more precisely logistics, new autonomous vehicles are among the very most innovative new technologies. However, in order to use these vehicles in a manufacturing environment, they need to be administered by open transportation services. For the moment, these robots and the systems that run them are still vendor-specific.
While open source technology, IoT, and cloud-based connectivity have dramatically increased in other areas of manufacturing, to date this has not been the case with robotics. BMW needs a way to connect their robots to an open source platform, allowing a heterogeneous fleet of robots to work together in perfect harmony with the human workers on the assembly line.
BMW aims to truly bring innovation to the factory floor by creating a system through which all of the robots in its colossal manufacturing plants – from the warehouses to the production lines – are connected with and communicating via one cloud-based, open source system.
This new system is production critical; hence it has to be highly available, robust, and easily scalable, so that it may be used for a significant number of robots. To accelerate this technical transformation, BMW has partnered with Microsoft to tackle this robotics innovation challenge.
Objectives and Challenges
BMW highlighted one process on the production line where efficiency could be dramatically increased by adding multiple robots working side by side with humans. Until now, parts are mainly transported from the warehouse to specific assembly docks on the production line using route trains. This process should be tackled by autonomous robots.
Microsoft and BMW worked through the objectives together and came up with three challenges that they could collaborate on. The first of these was to create an architecture steering autonomous transport robots that comply with all the above-mentioned criteria. The second challenge was to connect the orchestration system to hundreds of physical robots running on the factory floor in an automated and scalable manner. Finally, given the high cost of interruptions to the BMW production line, there is value in simulating the orchestration of the robots and their behavior. This would provide a higher degree of confidence in the decisions regarding robot fleet size, robot behavior, and orchestration algorithms.
Over the course of twelve months Microsoft and BMW partnered three different times to help BMW with its vision for technical transformation. An open-source package called ROS-Industrial was used to help provide the building blocks for the robotics work.
The First Engagement: Orchestration
When a human worker on the assembly line empties a parts bin, the bin needs to be swapped out with a fresh bin that has the right parts for the human worker to continue to assemble the vehicles that are coming down the assembly line. The orchestration engine is a collection of micro services in Azure that receive the order from a backend system, select the most suitable robot of the fleet, and assign the order to it so that parts can be transported to the assembly line.
The first integration edge of the orchestration system is an ingest endpoint that can receive the order and safely store it before placing it in a queue for processing. Upon successfully saving and queueing the order, the orchestration system responds back with a HTTP-200. The order is then picked up by the Order Manager for assignment to a robot.
The Order Manager looks at all of the orders that are stored in the queue, asks the Fleet Manager for an inventory of available robots, and creates a plan to deliver them. The process of planning the delivery of each order is a highly customized function that is deeply influenced by the environment where the work is being done. In our shared Github repositories we have created a simple planning plug (assignment class) that could be easily swapped out for something more specific to someone else’s needs.
Orders that are ready to be sent to the robots are sent to a dispatcher. The dispatcher wraps the order in a job that the robot understands and sends the job to the robot over the Azure IoT Hub. The robot will periodically update the status of the job as it proceeds with the delivery of the order to the assembly line. While the robot is online and operating in the factory, it is also broadcasting telemetry back to the orchestration system (vehicle status). The telemetry sent from the robot to the orchestration system contains information about the battery level of the robot, location, and heading. The telemetry is sent through the Azure IoT Hub back to the orchestration system and is collected by the Fleet Manager. The Fleet Manager uses this information to calculate the availability of a robot, and also schedule the robot for maintenance functions like recharging its battery.
The Second Engagement: ROS Integration
Once BMW had an orchestration system capable of scheduling and dispatching work across a fleet of autonomous robots, our next challenge was to connect the robots in a plug and play mode to that system. BMW’s Autonomous Transport System (ATS) robots run on a software stack built atop a flexible open-source framework called the Robotic Operating System (ROS). ROS lacked the ability to natively talk to Azure IoT Hub, and furthermore needed a way to interpret and respond to the commands issued by the orchestration system.
Our next engagement set out to solve exactly those challenges with the goal of having real ATS robots fulfilling real work in the real world. At its core ROS runs a peer-to-peer network of processes called nodes which support both synchronous RPC-style communication over services and asynchronous streaming of data over topics. Crucially, although nodes communicate in a peer-to-peer fashion, a centralized ROS Master node acts as a lookup service for service and topic registrations. ROS code is built and distributed as packages that anyone can develop to extend the native capabilities of ROS using any of several supported languages.
However, although ROS is distributed with a variety of tools and capabilities for commonly-used functionality, we needed to create our own packages to handle the integration with Azure IoT Hub and to handle the interactions between the ATS robots and the orchestration system. We designed the Azure IoT Hub Relay as an adapter to bridge between the MQTT interface exposed by Azure IoT Hub and the native messaging capabilities of ROS. The Relay runs on the robot as a ROS node. It facilitates both bidirectional communication between ROS topics and the IoT Hub, as well as fulfilling commands from IoT Hub by invoking ROS services. With the Relay running on our ATS robot we could register our robot as an IoT device in Azure IoT Hub and start sending messages in both directions. We then released the Relay back to the ROS community as an open-source project to give ROS developers the ability to leverage all of the capabilities of the Azure cloud. You can check out the Github repository at https://github.com/Microsoft/ros_azure_iothub.
This gave us a ROS topic containing the stream of messages from the orchestration system but nothing listening to that topic. We therefore then had to build another ROS node that could understand those messages and respond appropriately. In the case of the ATS robots our initial instruction set was pretty basic: move to a given location, pick up a bin, and drop a bin. The basic navigation components in ROS support moving to a given coordinate pair – we just needed to forward that instruction to the appropriate ROS topic in the format that the navigation node expected. Similarly, another custom node provided services that would drive the servos needed to raise and lower the ATS robot. Our node was essentially a controller which would delegate the handling of commands by either publishing a message on a ROS topic or calling a particular service – a common design pattern in ROS.
That took care of the Cloud-to-Device messaging – but we still needed to be able to send telemetry back to the orchestration system to keep it notified of the current state. Things like current location, battery level, and error conditions all needed to be taken into account when the orchestration system made decisions about work assignments. We did this by configuring the Relay node to listen to a particular ROS topic and to publish those messages as Device-to-Cloud messages via Azure IoT Hub. We tagged the messages with metadata that IoT Hub could use to route the incoming telemetry to an Azure EventHub message queue. The messages were then read and handled by an Azure Function which could inform the orchestration system of any changes in robot state.
In order for the navigation commands to work, the robot needs to actually understand where it is and where you would like it to go. ROS solves this problem using Simultaneous Localization and Mapping (SLAM). Given a map, SLAM works out the current location by looking around using its sensors and identifying the region on the map that looks like what it sees – a process called localization. Since it knows the coordinates of that spot on the map, it can plot out a path to navigate to the desired destination.
But how can the robot localize, much less navigate, without a map? We solved that challenge by creating a ROS node to bootstrap the ATS robot with an initial map and a service in the orchestration system to send a map on demand. When the robot starts up, this node sends a message over the Relay to the orchestration system and asks for an initial map. The service replies over Azure IoT Hub and the Relay node then publishes the map data on a particular topic that the Bootstrap node listens on. Once the map is loaded to the ROS mapping subsystem, the robot sends a telemetry message to the orchestration system informing it that the robot is ready to accept work.
With those pieces in place we had a fully-functioning system. ATS robots could be brought online and bootstrap all the information they needed to accept work. The orchestration system could be quickly notified about any changes in robot state that might affect its decision making. And the robots could respond appropriately to an extensible list of commands from the orchestration system. BMW then set about testing a limited number of ATS robots in a designated part of one of their factories to see how the system worked in real life.
But all of this raised yet another set of questions. How can we know that the orchestration system will make good decisions given a large amount of work and a large fleet of robots? How can we verify that the ROS nodes running on the ATS robots behave well under real world conditions? How can we know how many robots are necessary to fulfill a given volume of work? To answer these and other pressing questions, we needed to turn to simulation.
The Third Engagement: Simulation
Path optimization, congestion control, and human safety protocols are just a few of the key challenges to be addressed in a factory environment with robots. Yet, many of the problems that arise with large fleets of robots do not manifest themselves when small numbers of experimental robots are first introduced to the factory floor. Attempting to run real life experiments at scale can cause interruptions on the factory floor, leading to lost revenue – not to mention the initial investment needed for the robot fleet. With simulation at scale, these concerns could be first explored in the virtual world without needing to invest in cost-prohibitive experiments in the real world. Microsoft, BMW, and ROS Industrial came together to create a scalable simulation system that could support such explorations.
There are three basic components needed to run ROS simulation at scale: a simulation engine, a ROS adapter, and an orchestrator to scale the fleet of ROS robots. The simulation engine drives the physics of the world and generates LIDAR and other types of sensor data, which would otherwise come from physical sensors on the robot. The simulation engine also tracks the movement of robots in the virtual space and moves them according to commands given from the robot. The ROS adapter transforms the sensor data from the simulator and publishes them to the robots in a standardized ROS format. Conversely, the adapter transforms ROS commands from the robot into movements executed by the simulator. Finally, an orchestrator is needed to deploy the simulator and the robots, and the orchestrator can ensure that communication can happen across the components.
The first step in building out the simulation system was to evaluate available options for simulation engines with a base requirement that they also be categorized as open source software. The engine needed to scale effectively for the overall simulation to support over 100 robots – this was critical. In addition, the simulation needed to run in real time since it would eventually be used to track the activity of robots executing real delivery jobs, albeit in the virtual world. To reach an informed decision on the simulator, Microsoft and ROS Industrial partnered to perform tests on each of the three most prevalent robotics simulation engines, utilizing a Real Time Factor (RTF) metric to gauge how well a simulator was able to maintain its speed as the number of robots increased.
Gazebo, Stage, and ARGoS each had a niche target audience and performed differently under high load. Both Gazebo and Stage-Ros had heavy support from the ROS community and had a wide range of robots which could be run in simulation. Gazebo, in particular, was built specifically with ROS in mind, and the simulation environment works with ROS messages by default. Stage had a ROS package available and was installed by default with many releases of ROS. However, both Gazebo and Stage-Ros failed to perform at scale, and the RTF began to degrade quickly.
Ultimately, ARGoS was the most suitable choice. ARGoS – implemented with a multi-threaded architecture – could take advantage of many cores (up to 128 virtual cores in Azure), whereas both Gazebo and Stage-Ros could only leverage a few cores. The major drawback of ARGoS was its relatively small community of developers and its nearly nonexistent usage in the ROS community. There was only one open source project available at the time – a sample ARGoS-ROS bridge that demonstrated how to send a simple custom ROS message (list of objects) from ARGoS and receive movement messages in response. In order to leverage ARGoS for BMW’s use case, it required adding much more functionality to the bridge to take it beyond its initial sample concept and generalize it for ROS robotic simulation.
The ARGoS-ROS bridge allowed the simulator and robots to communicate over a number of essential topics. The robot needs to send movement commands to the simulation over the /cmd_vel topic. The simulation engine needs to send sensor data to the robots, including laser scans and odometry (over the /scan and /odom topics). Finally, the bridge needs to format the sensor data into ROS standardized messages and populate the fields necessary for ROS navigation libraries. Not all of the information required for ROS was available in ARGoS, so new generic ARGoS plugins for LIDAR and ground truth odometry were written to accommodate the necessary changes.
The simulation engine also functioned as a clock server and as the source of truth for the current time in the virtual world. This required a way to share the centralized clock with the robots. Additional functionality was added to the ARGoS-ROS bridge to publish the simulated timestamp with each cycle of the simulation, so that the robots could stay in sync. Without such a mechanism, variations in simulation speeds led to abnormal robot behavior in simulation, given the simulator and robots had differing perceptions of time.
BMW-specific changes were implemented in the simulator as well. ARGoS supported only a few types of robots for swam robotics applications, but these robots did not have the same dimensions or behave in the same way as the custom BMW robots. In order for the simulation to mimic the real world, physics representations matching the BMW robots needed to be implemented. In addition, an OpenGL plugin was also added so that visualization could be used. For the open source version of the project, the Turtlebot Waffle PI robot was written to appeal to the larger ROS community.
To run simulation at scale, a distributed approach was used for the robots. Kubernetes was chosen as the orchestrator, and it managed the deployment of both the simulator and the robots to a cluster. A dedicated node was given to the simulator, which consumed far more CPU and memory than the robots. The robots, on the other hand, only each consumed a few CPU and could be distributed across the cluster, since they had no dependencies on each other. Both ACS-Engine and AKS were suitable choices for creating a Kubernetes cluster on Azure. As the number of robots increased, ACS-Engine became the more practical choice, since heterogeneous clusters could be used to provision one very large simulator virtual machine and several smaller virtual machines for the robots.
ROS networking became a bottleneck as the number of robots grew. Originally, one shared ROS master node helped map topics between the simulator and robots. This pattern of the one master node and distinct namespaces for each of the robots was common in multi-robot configurations for Gazebo and Stage-Ros. But this design created several challenges. For one, many packages in ROS are not written to fully accommodate namespaces, which then caused mismatched topics. Also, the one ROS master had a shared transform tree that decayed in performance with each additional robot.
To resolve the issue, the open source project NIMBRO helped relay specific topics to the other nodes, which allowed for a ROS master node for each robot. This is the most desirable configuration, as robots are run in the real world with their own ROS master, and the simulation should test ROS controllers under similar conditions.
In the end, the simulation system scaled to more than 100 robots, and BMW was able to run scenarios that would otherwise require much higher investment for real-world testing. Additionally, a simulation environment served several functions. It not only provided a much-needed exploration environment for their questions around robot fleet management, but also served as a staging ground for new deployments to their factories, giving BMW more confidence that new releases would perform well under realistic loads.
Our partnership with BMW to develop a parts delivery system on Azure, connected to autonomous robots, was a success. Rolling out a robot fleet in a highly critical assembly line area can only be done with great care and consideration for the safety of the plant and its workers. To this end, BMW is currently testing the robot functions in a test facility representing a portion of the full assembly plant. Additional testing is being done within the simulation environment with the same live data used to run the plant. These two testing methods together will build confidence in the true capability of the robot, the accuracy of the overall system, and help define the volume at which the robots can be safely released into production.
Looking ahead, BMW plans on running the simulation environment and the production environment side-by-side with the production data feeding into both. In the simulation environment, the virtual robots are running the same release of ROS as the real robots, the virtual world is using the same map as the robots use to navigate the real world, and the robots are communicating with an instance of the orchestration software that involves the same bits running in production. Bringing future upgrades through the simulation environment will provide a production-like test environment to validate the changes with very high levels of confidence.
Microsoft has created a ROS package that enables two-way communication between any ROS-powered robot and Microsoft’s Azure cloud via Azure IoT Hub. Telemetry from robots can be sent to Azure for processing while command and control messages can be sent through Azure to individual robots. Microsoft and BMW are actively improving the system, bringing new functionality (Edge) to ensure high availability and robustness for any assembly scenario.
We have shared two ROS frameworks on github you can download to learn more. One is a ROS simulation package that allows you to simulate up to 300 robots in a factory environment. The other is a ROS orchestration which will allow you to orchestrate the control of 300 robots from messages sent via Azure IOT.
Finally, all of the enhancements to the ARGoS simulation platform have been submitted back to their original projects for inclusion and the scripts necessary to run the environment in Azure are available online.