Microsoft Digital works diligently 24 hours a day, 7 days a week to help protect Microsoft IP, its employees, and its overall business health from security threats. It recently implemented Azure Sentinel to replace a preexisting, on-premises solution for security information and event management (SIEM). With Azure Sentinel, Microsoft Digital can ingest and appropriately respond to more than 20 billion cybersecurity events per day. Azure Sentinel supplies cloud-scale SIEM functionality that allows integration with crucial systems, provides accurate and timely response to security threats, and supports the SIEM requirements of Microsoft Digital.
Understanding SIEM at Microsoft
Microsoft Digital is responsible for maintaining security and compliance standards across Microsoft. Managing the massive volume of incoming security-related data is critical to Microsoft’s business health. Historically, Microsoft Digital has performed SIEM using a third-party tool hosted on-premises in Microsoft datacenters. However, Microsoft Digital recognized several areas in which they could improve their service by implementing a next-generation SIEM tool. Some of the challenges when using the old tool included:
- Limited ability to accommodate increasing incoming traffic. Ingesting data into the previous SIEM tool was time consuming due to limited ingestion processes. As the number of incoming cybersecurity events continued to grow, it became more evident that the solution we were using wouldn’t be able to maintain the necessary throughput for data ingestion.
- On-premises scalability and agility issues. The previous solution’s on-premises nature limited our ability to scale effectively and respond to changing business and security requirements at the speed that we required.
- Increased training requirements. We needed to invest more resources in training and onboarding with the previous solution, because it was on-premises and customized to meet our requirements. If we recruited employees from outside Microsoft, they needed to learn the new solution—including its complex on-premises architecture—from the ground up.
As part of our ongoing digital transformation, Microsoft Digital is moving to cloud-based solutions with proven track records and active, customer-facing development and involvement. We need our technology stack to evolve at the speed of our business.
Modernizing SIEM with Azure Sentinel
In response to the challenges presented, we began assessing options for a new SIEM environment that would address the challenges positioning Microsoft Digital to manage continued growth of the cybersecurity landscape.
Feature assessment and planning
In partnership with the Azure Sentinel product team, Microsoft Digital’s security division assessed whether Sentinel would be a suitable replacement for our previous solution. Sentinel is a Microsoft-developed, cloud-native enterprise SIEM solution that uses the cloud’s agility and scalability to ensure rapid threat detection and response through:
- Elastic scaling.
- AI–infused detection capability.
- A broad set of out-of-the-box data connectivity and ingestion solutions.
To move to Azure Sentinel, we needed to verify that equivalent features and capabilities were available in the new environment. We aligned security teams across Microsoft to ensure that we met all requirements. Some of these teams had mature monitoring and detection definitions in place, and we needed to understand those scenarios to accommodate feature-performance requirements. The issues that our previous solution presented narrowed our focus with respect to whether Sentinel would work, including throughput, agility, and usability.
Throughout the assessment period and into migration, Microsoft Digital worked closely with the Azure Sentinel product team to ensure that Azure Sentinel could provide the feature set Microsoft Digital required. Our engagement with the Sentinel team addressed two sets of needs simultaneously. We received significant incident-response benefits from Azure Sentinel while the product team worked with Microsoft Digital as if it were a customer. This close collaboration meant that the product team could identify what enterprise-scale customers needed more quickly. Not only were our requirements met, but we were able to provide feedback and testing for the Sentinel product team. This helped them better serve their large customers that have similar challenges, requirements, and needs.
Defining and refining SIEM detections
As we developed standards that met our new requirements, we also evaluated our previous SIEM solution’s functionality to determine how it would transition to Azure Sentinel. We examined three key aspects of incoming security data ingestion and event detection:
- Data-source validity. We pull incoming SIEM data from hundreds of data locations across Microsoft. As time has passed, some of these data sources remained valid but others no longer provided relevant SIEM data. We assessed our entire data-source footprint to determine which data sources Azure Sentinel should ingest and which ones were no longer required. This process helped us to better understand our data-source environment and refine the amount of data ingested. There were several data sources that we weren’t ingesting with the previous solution because of performance limitations. We knew that we wanted to increase ingestion capability when moving to Azure Sentinel.
- Detection importance. Our team examined event-detection definitions used throughout the previous SIEM solution, so we could understand how detections were being performed, which detection definitions generated alerts, and the volume of alerts from each detection. This information helped us identify the most important detection definitions, so we could prioritize these definitions in the migration process.
- Detection validity. Our security teams evaluated the list of detections from our SIEM environment so we could identify invalid detections or detection definitions that required refinement. This helped us create a more streamlined set of detections when moving into Azure Sentinel, including combining multiple detection definitions and removing several detections.
Throughout this process, we worked with the Security Operations team to evaluate detections end-to-end. They got involved in the detection and data-source refinement process and were exposed to how these detections and data sources would work in Azure Sentinel.
After feature parity and throughput capabilities were confirmed, we began the migration process from our previous solution to Azure Sentinel. Based on our initial testing, we added several implementation steps to ensure that our Azure Sentinel environment would readily meet our security environment’s needs.
Onboarding data sources
Properly onboarding data sources was a critical component in our implementation and one of the biggest benefits of the Azure Sentinel environment. With the massive amount of default connectors available in Sentinel, we were able to connect to most of our data sources without further customization. This included cloud data sources such as Azure Active Directory, Azure Security Center, and Microsoft Defender. However, it also included on-premises data sources, such as Windows Events and firewall systems.
We also connected to several enrichment sources that supplied more information for threat-hunting queries and detections. These enrichments sources included data from human-resources systems and other nontypical data sources. We used playbooks to create many of these connections.
We keep Azure Sentinel data in hot storage for 90 days, using Kusto Query Language (KQL) queries for detections, hunting, and investigation. We also use Azure Data Explorer for warm storage and Azure Data Lake for cold storage and retrieval for up to two years.
Based on testing, we refined our detection definitions further in Sentinel to support better alert suppression and aggregation. We didn’t want to overwhelm our Security Operations team with incidents. Therefore, we refined our detection definitions to include suppression logic when notification wasn’t required and aggregation logic to ensure that similar and related events were grouped together and not surfaced as multiple, individual alerts.
Increasing scale with the cloud
We used dedicated clusters for Azure Monitor Log Analytics to support the data-ingestion scalability we required. At a large enterprise scale, our previous solution was exceeding its capacity at 10 billion events per day. With dedicated clusters, we were able to accommodate that initial volume and add additional data sources to improve alert detection, thereby increasing our event ingestion to almost 20 billion events per day.
Our environment required several customizations to Sentinel functionality, which we implemented by using standard Azure Sentinel features and extension capabilities to meet our needs while still staying within the boundaries of standard functionality. Using common features for customization made our changes to Azure Sentinel easy to document and helped our security operations team better and more quickly understand and use the new features. We made several important customizations including:
- Integration with our IT service-management system. We integrated Azure Sentinel with our security incident management solution. This had a two-fold positive effect, as it extended Sentinel information into our case-management environment and provided our support teams with exactly the information they need, regardless of which tool they’re in.
- Implementation of Azure Security Center playbook to support scale. We used a playbook to automate the addition of more than 800 Azure subscriptions to Azure Security Center. We’ll use this same automation soon to include approximately 20,000 additional subscriptions.
- High volume ingestion with Azure Event Hub and Azure Virtual Machine scales sets. We built a custom solution that ingested the large volume of events from our firewall systems that exceeded the capabilities of on-premises collection agents. With the new solution, we can ingest more than 100,000 events per second into Azure Sentinel from on-premises firewalls.
We’ve experienced several important benefits from using Azure Sentinel as our SIEM tool, including:
- Faster query performance. Our query speed with Azure Sentinel improved drastically. It’s 12 times faster than it was with the previous solution, on average, and is up to 100 times faster with some queries.
- Simplified training and onboarding. Using a cloud-based, commercially available solution like Azure Sentinel means it’s much simpler to onboard and train employees. Our security engineers don’t need to understand the complexities of an underlying on-premises architecture. They simply start using Sentinel for security management.
- Greater feature agility. Azure Sentinel’s feature set and capabilities iterate at a much faster rate than we could maintain with our on-premises developed solution.
- Improved data ingestion. Azure Sentinel’s out-of-the box connectors and integration with the Azure platform make it much easier to include data from anywhere and extend Azure Sentinel functionality to integrate with other enterprise tools. On average, it’s 18 times faster to ingest data into Azure Sentinel using a built-in data connector than it was with our previous solution.
Throughout our Sentinel implementation, we reexamined and refined our approach to SIEM. At Microsoft’s scale, very few implementations go exactly as planned from beginning to end. However, we derived several points with our Sentinel implementation, including:
- More testing enables more refinement. We tested our detections, data sources, and processes extensively. The more we tested, the better we understood how we could improve test results. This, in turn, meant more opportunities to refine our approach.
- Customization is necessary but achievable. We capitalized on the flexibility of Azure Sentinel and the Azure platform often during our implementation. We found that while out-of-the-box features didn’t meet all our requirements, we were able to create customizations and integrations to meet the needs of our security environment.
- Large enterprise customers might require a dedicated cluster. We used dedicated Log Analytics clusters to allow ingestion of nearly 20 billion events per day. In other large enterprise scenarios, moving from a shared cluster to a dedicated cluster might be necessary for adequate performance.
The first phase of our migration is complete! However, there’s still more to discover with Azure Sentinel. We’re taking advantage of new ways to engage and interact with connected datasets and using machine learning to manage some of our most complex detections. As we continue to grow our SIEM environment in Azure Sentinel, we’re capitalizing on Sentinel’s cloud-based benefits to help meet our security needs at an enterprise level. Sentinel provides our security operations teams with a single SIEM solution that has all the tools they need to successfully complete and manage security events and investigations.
© 2022 Microsoft Corporation. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.