Based on a strategic decision to make more data available at a faster pace, CPFL started the Mega Lake project using Microsoft Azure and Microsoft Power Platform tools. The project enabled the company to leverage its data-driven culture that promotes data democratization by relying on a single source of information. Now they are focused on improving their new platform using machine learning (ML) and AI.
CPFL has been operating in the electric energy segment in Brazil since 1912, having made a solid contribution to the development of the cities and states through generation, transmission, distribution, and service provision.
In 2021, the company made the strategic decision to transform its Big Data and Analytics area with the goal of building a data lake that would support the development of business projects in general. Among them is predictive maintenance of the power plant and power grid equipment, the forecast of defaulting customers, and the Image Fraud Recognition project, which uses computer vision to detect fraud and energy theft through satellite images.
The project started in 2023 and it moved to its second phase in the following year, when it was consolidated and granted the funds to accelerate this process.
Mega Lake Project—data democratization within CPFL
CPFL decided to face the challenge of providing faster delivery of a larger volume of data so that its business areas could perform better analysis and positively impact executive decision making company-wide. Another goal was to ensure data security and governance without burdening the IT staff with manual labor.
“The democratization of data generates opportunities for improvement at all levels, from operation to expansion and development strategies. We have analytical centers that consume data from the data lake and develop ML and AI models to support strategic decision making,” says Vivian Marcello, Big Data Coordinator at CPFL.
With a data lake and environment ready, the company kicked off the Mega Lake project (now in its second phase of development), in which the technical team works on the raw data and makes it available in domains that can be accessed by the business teams in a quick, simple, and automated way with the support from Microsoft.
“The Mega Lake project has achieved excellent results. Our data lake grew 1500% in comparison to the December 2022 scenario, and we have made more than 70 data domains available. We currently have more than 240 billion records that are updated on a daily basis. Throughout the journey, Microsoft has been supporting us to ensure our success,” explains Vivian Marcello.
Analytical culture empowered by data democratization and a single source of information
The platform developed on Microsoft Azure brings together Microsoft Azure Data Factory for data orchestration, while Microsoft Azure Databricks consolidates data science with Microsoft Azure Synapse Analytics. In addition, Microsoft Azure Storage provides storage in three layers: Raw, with raw data; Silver, with clean data in a table; and Gold, that organizes the data domains and the output from analytics and predictions. Unity Catalog (a Microsoft Azure Databricks solution) is responsible for governing data, while Microsoft Power BI is used for visualization.
The new platform has empowered CPFL's analytics capabilities, reinforcing its data-driven culture and enabling more accurate and richer analytics across the company.
"We are moving towards data democratization and strengthening the use of data within the company. The data lake is fed data that is less unscathed, on a more optimized platform for the creation of complex analyzes, machine learning, and AI models,” says Vivian Marcello.
CPFL has built new data domains, which are being made available for consumption, and satellite image ingestion systems in the data lake, which have allowed the development of models that identify areas with propensity for fraud (energy theft). A model that maps regions at risk of flooding near dams is also under development. In addition, analytics are used to predict defaulting customers and financial and operational indicators, with the goal of automating processes, raising data awareness among the company’s teams, and generating results for the company.
“Our goal is to bring together data from different sources in one place. By doing so, we can have a protected and highly governed place, enabling our employees to drive results more safely,” adds Vivian Marcello.
Next step: Between AI and governance
CPFL now operates in a new reality, with an advanced analytical capacity that also allows it to reflect more. The company has already been implementing machine learning and AI models to raise the bar even higher with Azure Machine Learning.
Moving forward, the company is working on Microsoft Power Automate, in connection with Microsoft Teams and Microsoft Azure Databricks, thus taking another important step towards data democratization. The idea is to have a generative AI chatbot that further simplifies access to strategic information. CPFL is currently on the developmental phase of a project to unite more than 100 chatbots that were created internally, with the goal of consolidating them in a single place.
"With the emergence of generative AI and Microsoft Copilot, we have a new challenge ahead of us in 2024. We are focused on taking another step towards modernization, by integrating new tools into our data lake to enhance our results and provide quick answers to support decision making," concludes Vivian Marcello.
“The democratization of data generates opportunities for improvement at all levels, from operation to expansion and development strategies. We have analytical centers that consume data from the data lake and develop ML and AI models to support strategic decision making.”
Vivian Marcello, Big Data Coordinator, CPFL
Follow Microsoft