Microsoft recently partnered with a revenue and customs agency of a major European government that is responsible for the collection of taxes, the payment of state support, and the administration of other regulatory regimes, including the national minimum wage. This agency forms an integral part of the financial well-being of most adult citizens, residents, and businesses and is one of the leading tax agencies in the world in terms of digital transformation.

Continue to read how our Microsoft for Public Finance teams worked with this revenue and customs agency to build a modern solution to reduce the time of forensics and ensure a robust method of data aggregation.

The scale of the challenge

The revenue agency serves a population of over 60 million people and over four million registered businesses, ranging from small companies to larger multibillion-dollar conglomerates, and here lies the problem to be addressed by the revenue and customs agency, and Microsoft co-engineering project. The risk analysis team within the agency relied on multi-data aggregations and joins from datasets distributed across the various functions within, and outside of the organization. This was a time-consuming process that required a lot of domain knowledge, and knowing exactly what to look for in order to set up the right kind of forensic process on a person(s) of significant control, or a company subject to corporation tax. If a relationship or dataset was missed during the data aggregation stage, the forensic process could be led astray. The agency wanted to build a modern solution to reduce the time for forensics and to ensure a robust method of data aggregation allowing the risk analysis team to enrich their process. This engagement also served as a starting point for them on the path of building a 360-degree view of the taxpayer, which will not only enable better compliance and risk profiling but also better taxpayer base segmentation for personalized targeted programs and communication.

The secret sauce: Code-with, not for

The Commercial Software Engineering team (CSE) within Microsoft is a multi-industry-based organization that supports strategic Microsoft customers by:

  • Building solutions in a code-with engineering capacity.
  • Unblocking innovation roadblocks by providing best-in-class cloud development expertise.
  • Providing support with industry insights to accelerate the customer’s engineering process toward their digital strategy and vision.

A typical CSE engagement consists of the Microsoft side of four to five software engineers, a technical program manager, and a project management office (PMO) representative that forms what is called a “dev crew.” The dev crew and the additional customer’s engineering team, typically several software engineers and a product owner, form a single co-engineering team working towards a common goal. The aim of this arrangement is for both teams to learn from each other and share both industry and technical knowledge. A successful engagement concludes not only with an innovative outcome but also with the co-engineering team having gained deeper technical and industry expertise.

Solution and design

One of the main tasks to get to a successful outcome was for the risk analysis team to get all relevant data points on a person of significant control and the relationships of these data points, without having to go through the process of explicitly building out these relationships. Furthermore, the outcome needed to display the connections this person had with other people that own significant control in companies. Finally, the output had to be displayed as a user-friendly graphical interface that allows others in the organization to be able to consume the outcome as a quick informative illustration.

The design of the co-engineering team’s solution consisted of a graph database with a data model that preserves the relationships between the data points. This graph database would store the data in the form of:

  • Nodes: A subject point such as a person or company that holds attributes of that subject.
  • Edges: Denotes the relationship between nodes and how they interact with each other.

To have this solution operate in production with a consistent level of quality, the raw data needed to be ingested and transformed into the graph format to produce the desired output. As part of the solution, a repeatable data ingestion layer was created to ingest the various datasets from their existing locations, transform the data via a data transformation layer, before ultimately landing in the graph database. In this case, leveraging the Azure Cosmos DB Gremlin API.

Co-engineering business impact

The implementation of this solution gives a tax agency the ability to enrich their analysis with data that is already linked. The limitation of discerning information from scattered data and only based on datasets the analyst is aware of would no longer be an issue. The risk assessment team within the agency can now simply look up a company, by company ID, and be able to view all the relevant information about that company, its connections to other companies, and the people who are connected to it. The data displays second, third, and fourth connection layers along with their attributes.

This widens the business value by:

  • Cutting down the time-to-value by focusing on the analysis.
  • Reducing time-consuming join operations, as relationships are stored with the data.
  • Delivering effective business intelligence (BI) that gives the user a simpler visual representation of complex relationships.
  • Delivering a key capability towards their vision of making tax digital.
  • Enabling effective cross-learning between two engineering teams operating as one.

Co-engineering technical impact

The outcome of the engagement was the result of a timeboxed co-innovation project. The CSE and customer teams worked diligently across the span of several scrum-based sprints to meet the goals outlined at the beginning of the partnership. With a continuous effort to improve with each iteration, the engineering fundamentals were reviewed on a weekly basis, along with a continuous process of backlog refinement and an end-of-sprint retrospective to revisit, assess, and remodel the main priorities of the subsequent sprints.

The resulting outcome was:

  • A DevOps solution that ensures robust continuous integration and continuous deployment pipelines and rollout of infrastructure as code.
  • A repeatable data ingestion and transformation pipeline for longevity and futureproofing.
  • A flexible graph database (DB) that accommodates for scale and domain requirements.
  • An application programming interface (API) that enables the integration of the graph DB into internal systems.
  • A visualization component for rapid search enrichment and information gain.
  • The sharing of engineering best practices has enriched engineering teams, the customer, and Microsoft.

Future plans

The outcome of this engagement has led to the discussions and the ideations around the next phase of capabilities for the graph DB within the revenue and customs agency. With the data being transformed, stored, and queried in the graph format, ideations on analytics and machine learning to further reduce the time to value for the analysis team have been explored.

Such as:

  • Widening the scope of data being pulled into the data model for further enrichment of results.
  • Implementation of Graph Neural Networks to pave the way for targeted recommendations.
  • Deeper data mining capabilities.

Next steps

At Microsoft, we are continuously improving on the graph database solution for public finance and tax agencies. With our partners, we seek to enrich the capabilities offered and to provide the right kind of industry domain knowledge for the solution. If you would like to learn more about how we can help you leverage this solution, visit our Microsoft for Public Finance web page.