This is the Trace Id: f8a8a18ec13af67515800ca5cf28a598
March 08, 2023

Kantar Group’s Media business accelerates insights, goes cloud-native with Azure Cosmos DB for PostgreSQL

Technical Story

Kantar Group

Kantar is a world-leading marketing data and analytics company that uses technology and expertise to help clients understand people and shape brands across more than 90 markets.  With access to 25 years of brand, advertising, and media intelligence, Kantar’s reach is staggering. Its proven market research platform is used by a long roster of marquee clients to spot trends, target customers, and plan strategies, and that platform is going cloud-native on Azure. The only challenge was to find a data platform that can meet Kantar’s impressive scope and performance requirements. Kantar’s choice for its Media subsidiary is Azure Cosmos DB for PostgreSQL, a resilient, fully managed service with the power to scale out in seconds. The move to Azure was so seamless, Kantar Media’s customers didn’t even notice a change—which is exactly what the company wanted.

“We are building a huge global platform, and Azure is the right fit to help us consolidate and bridge our data assets across all our markets.”

Vivien Chen, Vice President of Software Architecture, Kantar Media

The platform behind 25 years of consumer trends

With offices around the world, Kantar is a global behemoth in brand, advertising, and audience analytics. Its customers include media moguls, agencies, and brand owners. As Kantar’s website underscores, “We know more about how people live, work, shop, vote, eat, drink, post and think than anyone else.”

Part of that knowledge comes from Kantar Media’s industry-leading advertising intelligence systems. The company generates, analyzes, and constantly refreshes terabytes of data. Across a suite of custom apps, the data is used in thousands of daily reports for Kantar customers. The reports are backed by a highly customized data platform that includes years of historical data. This archive makes Kantar unique in the industry and enables its customers to see trends over time. But it’s not easy to maintain.

As Kantar Media Vice President of Software Architecture Vivien Chen explains, “We’re a big company, and our growth is huge. Volume has always been the barrier we’re trying to overcome, in terms of performance and delivering insights for our clients.”

In 2021, another barrier arose when the company received some unwelcome news. Its data warehousing vendor unexpectedly announced the end of support for the on-premises system that Kantar Media had adopted only a few years earlier. The vendor’s replacement was a cloud-based container solution that Kantar judged insufficient for its needs.

“That hurt,” admits Chen. There was a silver lining, though. Kantar turned the situation into an opportunity to go all in with its cloud strategy, setting its sights on an open, cloud-native data platform. The caveat? “Nothing that would force us to a specific path,” Chen recalls. Flexibility was the key.

A disappointing round of comparison shopping

Chen’s team began looking for a cost-effective, scalable, and resilient solution capable of performing as well as—if not better than—its current data warehouse. Preferably, the ideal solution would work with the significant investment Kantar Media already had in its custom tooling and middleware layer, representing years of work. As it happens, the list of suitable, high-performance candidates was short.

As Kantar Media Director of Technology Charles Lee points out, the maturity and complexity of Kantar’s platform made its use case unique. The existing on-premises solution was an extract-transform-load (ETL) pipeline for high-throughput transactional apps, in addition to a number-crunching engine for analytics and reporting apps.

As Lee explains, it wasn’t easy to find a replacement with the concurrency to support both requirements. “We tested other platforms, running a 24-hour day of actual client report requests against each database candidate. The performance was not really impressive.”

The only cloud-native solution to check all the boxes

As part of a larger datacenter consolidation effort, Kantar has been moving its assets to Azure, shifting its IT budget from a capital expense to a predictable, pay-as-you-go operational expense. Azure also provided Chen and Lee with a data platform that met their steep requirements for performance, stability, scalability, and—importantly—cost. It was Azure Cosmos DB for PostgreSQL, powered by the Citus database extension to PostgreSQL, formerly known as Azure Database for PostgreSQL - Hyperscale (Citus).

“When I told the product manager that I had found a solution that matches what we do today and it's cheaper, it was a done deal,” Chen relates. Azure Cosmos DB for PostgreSQL proved itself during the proof-of-concept phase. “Azure Cosmos DB for PostgreSQL was the only solution that could run a full day of reporting for us. That's huge.”

Another factor was Kantar’s preference for open source and experience with the popular PostgreSQL object-relational database system. Azure Cosmos DB for PostgreSQL is built on open-source PostgreSQL. That common basis not only eased the move to Azure but also supported one of Kantar’s top goals: code reuse. With a large codebase accumulated over the years, including legacy queries that use an earlier style of multiple selects and multiple temp tables, Kantar developers hoped to keep the refactoring and data model changes to a minimum. It was the best way to reduce the impact to the numerous downstream apps that read from and write to the database. Fortunately, the syntax of the new solution was similar to Kantar’s existing codebase.

“The approach that we used in our architecture has stood the test of time. There was no reason to change that,” Chen observes.

Nonetheless, the team took the opportunity to “clean house,” as she puts it, optimizing the data model to take better advantage of the distributed tables that the Citus extension brings to Azure Cosmos DB for PostgreSQL. As Kantar Media’s scalability and performance requirements grow, apps can seamlessly scale to multiple nodes by transparently distributing the database tables. Distributed tables also support the low latency that Kantar needs for its complex queries.

Lee notes another benefit for developers. “Partition table names are transparent to applications, so that simplifies application logic.”

To get started, developers can set up a database cluster as a single node, knowing that the power of distributing tables is always available. They get the performance and scalability superpowers of the Citus extension, plus all the benefits of platform as a service (PaaS)—for example, a managed service that Azure helps keep secure and up to date.

“We wanted open source, a cloud-native solution, and no vendor lock-in,” Lee summarizes. “Azure Cosmos DB for PostgreSQL checks all the boxes.”

“Azure Cosmos DB for PostgreSQL was the only solution that could run a full day of reporting for us. That's huge.”

Vivien Chen, Vice President of Software Architecture, Kantar Media

A scale-out architecture for fast queries

The move to Azure included a migration of Kantar Media’s middleware solution to Azure Virtual Machines. To enable the middleware to support Azure Cosmos DB for PostgreSQL, some code refactoring was inevitable, which led to more rewrites. Ultimately, the engineers decided to overhaul reporting and ETL processes to remove obsolete functionality.

“Could we have done a lift and shift? We could have done that,” Chen says, pointing out that the choices were by opportunity and not necessity. “The developers were really eager to do this. Instead of trying to fix technical debt, they said, ‘Let’s move to the cloud and make sure we do it right.’”

By optimizing the existing application and data architecture, the developers could take advantage of the latest Azure capabilities. The new architecture functionally parallels Kantar’s previous one. ETL features—those tools used to aggregate and combine data from multiple sources into a coherent store—are largely unchanged (see Figure 1).

When a customer generates a report, the request enters a centralized queue that is regularly polled by the reporting middleware. This self-contained system processes each request based on client profiles. It determines which data is needed based on criteria such as time, media, and product selections, validating a request against a user’s data access rights. Then it retrieves the most optimal data set for the type of analysis.

Queries can be complex. They are generated at runtime, and the back end must deliver results as fast as possible. Sometimes this goal runs counter to the need for frequent data updates. To balance both needs, Kantar expects the architecture to provide results in near real time. That’s why, as Lee points out, “The parallelism and the scale-out of Azure Cosmos DB for PostgreSQL are definitely important features.”

According to Chen, “The biggest challenges were selecting the proper distribution and partition keys and adapting our code to use the computing power of Azure Cosmos DB for PostgreSQL.”

The developers got help from the Microsoft FastTrack program, and a strong partnership formed between the companies. As an early adopter of the technology, Kantar’s feedback helped Microsoft improve the service, and the Kantar architecture continues to push the boundaries. For example, Kantar experienced locking issues that affected performance during concurrent processing. Microsoft engineers quickly rolled out a fix.

“Every millisecond counts when you do a lot of concurrent usage,” Chen observes, noting that with these optimizations, Kantar Media is meeting its goal to provide reports to customers in near real time. “In this journey, the Microsoft team was very helpful.”

Kantar uses the Azure portal to manage and monitor the health of Azure Cosmos DB for PostgreSQL and other resources, including:

Azure Cosmos DB for PostgreSQL Infographic
Figure 1. Azure Cosmos DB for PostgreSQL replaces Kantar Media’s on-premises data warehousing solution, providing distributed tables in a managed service that better meets the company’s growing scale and cloud-native strategy.

A data giant looks ahead

The decision to use Azure Cosmos DB for PostgreSQL became very straightforward. “There were not many solutions out there, and we tried as many as possible,” Chen recalls. The company can now store its entire data archive on Azure—that’s 25-plus years of data, and it’s growing daily. “I don't think there are many companies that do that,” she adds.

Furthermore, when Kantar Media deployed its new data platform on Azure, users continued to work without interruption. That’s as it should be when migrating a back end, Chen thinks. “That’s the beauty of this solution. From the user’s perspective, it's like magic. You request something and voilà! Here's your data. Just what you asked for!”

The Kantar Media team continues to optimize its new data platform and to make the most of the scalability provided by Azure Cosmos DB for PostgreSQL. As Lee says, “The cluster can be more elastic, more dynamic.”

Throughout the migration, “Microsoft was a true partner,” Chen concludes. “We are building a huge global platform, and Azure is the right fit to help us consolidate and bridge our data assets across all our markets.”

“We wanted open source, a cloud-native solution, and no vendor lock-in. So that’s why we approached Citus. Azure Cosmos DB for PostgreSQL checks all the boxes.”

Charles Lee, Director of Technology, Kantar Media

Take the next step

Fuel innovation with Microsoft

A man wearing headphones and smiling

Talk to an expert about custom solutions

Let us help you create customized solutions and achieve your unique business goals.
A woman smiling and a pointing to a screen showing some statistics

Drive results with proven solutions

Achieve more with the products and solutions that helped our customers reach their goals.

Follow Microsoft