Our internal security team works diligently 24 hours a day, 7 days a week to help protect Microsoft IP, its employees, and its overall business health from security threats.
We recently implemented Microsoft Sentinel to replace a preexisting, on-premises solution for security information and event management (SIEM). With Microsoft Sentinel, we can ingest and appropriately respond to more than 20 billion cybersecurity events per day.
Microsoft Sentinel supplies cloud-scale SIEM functionality that allows integration with crucial systems, provides accurate and timely response to security threats, and supports the SIEM requirements of our team.
Our team is responsible for maintaining security and compliance standards across Microsoft. Managing the massive volume of incoming security-related data is critical to Microsoft’s business health. Historically, we have performed SIEM using a third-party tool hosted on-premises in Microsoft datacenters.
However, we recognized several areas in which they could improve their service by implementing a next-generation SIEM tool. Some of the challenges when using the old tool included:
- Limited ability to accommodate increasing incoming traffic. Ingesting data into the previous SIEM tool was time consuming due to limited ingestion processes. As the number of incoming cybersecurity events continued to grow, it became more evident that the solution we were using wouldn’t be able to maintain the necessary throughput for data ingestion.
- On-premises scalability and agility issues. The previous solution’s on-premises nature limited our ability to scale effectively and respond to changing business and security requirements at the speed that we required.
- Increased training requirements. We needed to invest more resources in training and onboarding with the previous solution, because it was on-premises and customized to meet our requirements. If we recruited employees from outside Microsoft, they needed to learn the new solution—including its complex on-premises architecture—from the ground up.
As part of our ongoing digital transformation, we’re moving to cloud-based solutions with proven track records and active, customer-facing development and involvement. We need our technology stack to evolve at the speed of our business.
[Read more about how we’re securing our enterprise and responding to cybersecurity attacks with Microsoft Sentinel. | Discover how we’re improving our security by protecting elevated-privilege accounts at Microsoft.]
Modernizing SIEM with Microsoft Sentinel
In response to the challenges presented, we began assessing options for a new SIEM environment that would address the challenges positioning our team to manage continued growth of the cybersecurity landscape.
Feature assessment and planning
In partnership with the Microsoft Sentinel product team, our internal security division assessed whether Sentinel would be a suitable replacement for our previous solution. Sentinel is a Microsoft-developed, cloud-native enterprise SIEM solution that uses the cloud’s agility and scalability to ensure rapid threat detection and response through:
- Elastic scaling.
- AI–infused detection capability.
- A broad set of out-of-the-box data connectivity and ingestion solutions.
To move to Microsoft Sentinel, we needed to verify that equivalent features and capabilities were available in the new environment. We aligned security teams across Microsoft to ensure that we met all requirements. Some of these teams had mature monitoring and detection definitions in place, and we needed to understand those scenarios to accommodate feature-performance requirements. The issues that our previous solution presented narrowed our focus with respect to whether Sentinel would work, including throughput, agility, and usability.
Throughout the assessment period and into migration, we worked closely with the Microsoft Sentinel product team to ensure that Microsoft Sentinel could provide the feature set we required. Our engagement with the Microsoft Sentinel team addressed two sets of needs simultaneously. We received significant incident-response benefits from Microsoft Sentinel while the product team worked with us as if we were a customer. This close collaboration meant that the product team could identify what enterprise-scale customers needed more quickly.
Not only were our requirements met, but we were able to provide feedback and testing for the Microsoft Sentinel product team. This helped them better serve their large customers that have similar challenges, requirements, and needs.
Defining and refining SIEM detections
As we developed standards that met our new requirements, we also evaluated our previous SIEM solution’s functionality to determine how it would transition to Microsoft Sentinel. We examined three key aspects of incoming security data ingestion and event detection:
- Data-source validity. We pull incoming SIEM data from hundreds of data locations across Microsoft. As time has passed, some of these data sources remained valid but others no longer provided relevant SIEM data. We assessed our entire data-source footprint to determine which data sources Microsoft Sentinel should ingest and which ones were no longer required. This process helped us to better understand our data-source environment and refine the amount of data ingested. There were several data sources that we weren’t ingesting with the previous solution because of performance limitations. We knew that we wanted to increase ingestion capability when moving to Microsoft Sentinel.
- Detection importance. Our team examined event-detection definitions used throughout the previous SIEM solution, so we could understand how detections were being performed, which detection definitions generated alerts, and the volume of alerts from each detection. This information helped us identify the most important detection definitions, so we could prioritize these definitions in the migration process.
- Detection validity. Our security teams evaluated the list of detections from our SIEM environment so we could identify invalid detections or detection definitions that required refinement. This helped us create a more streamlined set of detections when moving into Microsoft Sentinel, including combining multiple detection definitions and removing several detections.
Throughout this process, we worked with the Microsoft Security Operations team to evaluate detections end-to-end. They got involved in the detection and data-source refinement process and were exposed to how these detections and data sources would work in Microsoft Sentinel.
After feature parity and throughput capabilities were confirmed, we began the migration process from our previous solution to Microsoft Sentinel. Based on our initial testing, we added several implementation steps to ensure that our Sentinel environment would readily meet our security environment’s needs.
Onboarding data sources
Properly onboarding data sources was a critical component in our implementation and one of the biggest benefits of the Microsoft Sentinel environment. With the massive amount of default connectors available in Sentinel, we were able to connect to most of our data sources without further customization. This included cloud data sources such as Microsoft Azure Active Directory, Microsoft Defender for Cloud, and Microsoft Defender. However, it also included on-premises data sources, such as Windows Events and firewall systems.
We also connected to several enrichment sources that supplied more information for threat-hunting queries and detections. These enrichments sources included data from human-resources systems and other nontypical data sources. We used playbooks to create many of these connections.
We keep Microsoft Sentinel data in hot storage for 90 days, using Kusto Query Language (KQL) queries for detections, hunting, and investigation. We also use Microsoft Azure Data Explorer for warm storage and Microsoft Azure Data Lake for cold storage and retrieval for up to two years.
Based on testing, we refined our detection definitions further in Sentinel to support better alert suppression and aggregation. We didn’t want to overwhelm our Security Operations team with incidents. Therefore, we refined our detection definitions to include suppression logic when notification wasn’t required and aggregation logic to ensure that similar and related events were grouped together and not surfaced as multiple, individual alerts.
Increasing scale with the cloud
We used dedicated clusters for Microsoft Azure Monitor Log Analytics to support the data-ingestion scalability we required. At a large enterprise scale, our previous solution was exceeding its capacity at 10 billion events per day. With dedicated clusters, we were able to accommodate that initial volume and add additional data sources to improve alert detection, thereby increasing our event ingestion to > 20 billion events per day.
Our environment required several customizations to Sentinel functionality, which we implemented by using standard Microsoft Sentinel features and extension capabilities to meet our needs while still staying within the boundaries of standard functionality. Using common features for customization made our changes to Sentinel easy to document and helped our security operations team better and more quickly understand and use the new features. We made several important customizations including:
- Integration with our IT service-management system. We integrated Microsoft Sentinel with our security incident management solution. This had a two-fold positive effect, as it extended Sentinel information into our case-management environment and provided our support teams with exactly the information they need, regardless of which tool they’re in.
- Implementation of Microsoft Defender for Cloud playbook to support scale. We used a playbook to automate the addition of more than 20,000 Azure subscriptions to Microsoft Defender for Cloud.
- High volume ingestion with Microsoft Azure Event Hub and Microsoft Azure Virtual Machine scales sets. We built a custom solution that ingested the large volume of events from our firewall systems that exceeded the capabilities of on-premises collection agents. With the new solution, we can ingest more than 100,000 events per second into Microsoft Sentinel from on-premises firewalls.
We’ve experienced several important benefits from using Microsoft Sentinel as our SIEM tool, including:
- Faster query performance. Our query speed with Microsoft Sentinel improved drastically. It’s 12 times faster than it was with the previous solution, on average, and is up to 100 times faster with some queries.
- Simplified training and onboarding. Using a cloud-based, commercially available solution like Microsoft Sentinel means it’s much simpler to onboard and train employees. Our security engineers don’t need to understand the complexities of an underlying on-premises architecture. They simply start using Sentinel for security management.
- Greater feature agility. Microsoft Sentinel’s feature set and capabilities iterate at a much faster rate than we could maintain with our on-premises developed solution.
- Improved data ingestion. Microsoft Sentinel’s out-of-the box connectors and integration with the Microsoft Azure platform make it much easier to include data from anywhere and extend Sentinel functionality to integrate with other enterprise tools. On average, it’s 18 times faster to ingest data into Sentinel using a built-in data connector than it was with our previous solution.
Throughout our Microsoft Sentinel implementation, we reexamined and refined our approach to SIEM. At Microsoft’s scale, very few implementations go exactly as planned from beginning to end. However, we derived several points with our Sentinel implementation, including:
- More testing enables more refinement. We tested our detections, data sources, and processes extensively. The more we tested, the better we understood how we could improve test results. This, in turn, meant more opportunities to refine our approach.
- Customization is necessary but achievable. We capitalized on the flexibility of Microsoft Sentinel and the Microsoft Azure platform often during our implementation. We found that while out-of-the-box features didn’t meet all our requirements, we were able to create customizations and integrations to meet the needs of our security environment.
- Large enterprise customers might require a dedicated cluster. We used dedicated Log Analytics clusters to allow ingestion of nearly 20 billion events per day. In other large enterprise scenarios, moving from a shared cluster to a dedicated cluster might be necessary for adequate performance.
The first phase of our migration is complete! However, there’s still more to discover with Microsoft Sentinel. We’re taking advantage of new ways to engage and interact with connected datasets and using machine learning to manage some of our most complex detections. As we continue to grow our SIEM environment in Sentinel, we’re capitalizing on Sentinel’s cloud-based benefits to help meet our security needs at an enterprise level. Sentinel provides our security operations teams with a single SIEM solution that has all the tools they need to successfully complete and manage security events and investigations.
- Read more about how we’re securing our enterprise and responding to cybersecurity attacks with Microsoft Sentinel.
- Discover how we’re improving our security by protecting elevated-privilege accounts at Microsoft.
Want more information? Email us and include a link to this story and we’ll get back to you.
Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.