Data Mesh

Data Strategy

Data Products

Data Governance

Data Platform

Data Infrastructure

Modernizing data management involves both organizational and technological changes that promote business agility and reduce integration costs.

Industries:

Finance & Insurance - Retail & FMCG - Transportation - Energy & Utility - Life Science - Industrial

Solutions:

Data Strategy - Data Products - Data Governance - Data Platform - Data Infrastructure

Technologies:

Aws - Azure - Blindata - Confluent - Google Cloud Platform - Open Data Mesh - Snowflake - Terraform

Overview

The market for data and analytics continues to grow steadily and all organizations recognize data as a key asset for their competitive advantage. However, data management is facing difficulties in meeting continuous demands while having to deal with IT teams acting as bottlenecks, whose main efforts are related to data management and necessary integrations for their use.

Quantyca Data Mesh: evoluzione de ruolo nella tecnologia

Historically, IT was seen as a cost center at the mercy of the business and technology as a tool to create digital assets. Over time, there has been a convergence that, with the fourth industrial revolution, has seen the central role of technology recognized not only for digitizing processes but also for the creation of new processes or business models.

IT has increasingly less of a supporting function and is establishing itself as a true business function. Data is the protagonist of this change, displacing the role historically played by applications and imposing a rethinking and adaptation of data management.

Among the main difficulties of traditional approaches to data management are:

The inadequacy of the IT budget, which does not grow proportionally to demand and leads to a gap between business demands and IT delivery capabilities.
The need for digital transformation of a constantly growing number of applications.
The growth of technological offerings and the push of digital transformation, which lead to an increase in spending on integrations

The increasing complexity of integrations, mainly due to:

A historically application-centric approach
Multiplication of data sources and consumers
Convergence between transactional and analytical use cases

Modern approaches to data management, in order to scale data acquisition, consolidation, and utilization processes, focus on both organizational and technological aspects. At the organizational level, the focus is on the scalability of the operational model, achieved through:

Strategic data management as products
Decentralization of ownership over domains

At the technological level, the focus is on the automation of integration processes, achieved through:

Active metadata
Declarative services
AI-driven integrations

While the Data Fabric approach exclusively focuses on technological aspects, the Data Mesh approach first focuses on organizational aspects before moving on to technological ones and is based on four principles:

Domain ownership: distributing data responsibilities not by integration pipeline phase (e.g. ingestion, transformation, enrichment, etc.) but by business domain (e.g. Product Discovery, Payments, Shipment, etc.). The responsibility for data is closer to the team that produces it and is not entrusted to intermediaries.
Data as a product: treating data as a real product, for example by clearly defining its access interfaces, ensuring its quality, providing for versioning, documenting everything necessary, and respecting agreed SLOs and SLAs with different consumers. A prerequisite is the ability to manage data along with the processes that generate it and the underlying infrastructure as an atomic unit of deployment (architectural quantum).
Self-serve data platform: a shared platform agnostic to individual domains that offers self-service mode services for the functionalities required for data product development, avoiding each individual domain from carrying the load. The goal is to reduce usage complexity, allowing product teams to focus more on integration logic and less on technical aspects.
Federated computational governance: ensuring interoperability between the data exposed by different data products through a federated governance process in which a central team, often composed of representatives from different product and platform teams, defines global policies that must then be implemented locally with platform support and automatically verified.

Challenges

The data mesh has galvanized the entire data management community and proposes an innovative approach with technological impacts, but also, and above all, organizational impacts. However, it is a young approach that still needs to demonstrate its validity on a large scale, requires significant efforts, and is not suitable for all organizations.

The implementation of the Data Mesh paradigm must deal with various difficulties, including:

→ Organizational and technological impacts

It is not a product that can be purchased and immediately used. It is an approach to data management that intervenes at both the organizational and technological levels.

→ Immaturity and gaps in the technological market

Existing technologies solve specific problems or are attempts by large players to reposition themselves in this offering space. A well-prepared engineering structure is required to build one’s own platform.

→ Stability of the solution

It is a relatively recent approach, and organizations willing to adopt it must have the ability and desire to experiment with solutions that are still being perfected.

→ Propensity for change and sponsorship

A deep organizational transformation requires strong sponsorship and widespread buy-in and is sustainable only for organizations capable of metabolizing it and ready to question past choices.

In addition, the data mesh can be adopted according to different variations depending on the decentralization that one intends to implement and that works on two axes:

Team responsibilities
The perimeter of central governance

The composition of team responsibilities provides for two extreme cases:

Total decentralization with an organization that provides for all teams to be autonomous in their domains and supported by a platform team
Total centralization, providing only for the separation between the platform team and the only team responsible for all data products

The scope of action of the central governance team can be limited to the bare essentials, giving rise to a mesh with a completely distributed topology with maximum freedom for product teams and a lean support platform. If, on the other hand, the perimeter is vast, there will be a strongly governed mesh with fewer degrees of freedom for product teams and a more complex support platform converging towards a data fabric model.

Therefore, there are many possible combinations of levels of decentralization along these two dimensions that can lead to translations of the original principles into data mesh implementations that are also very different from each other. Often, the same organization can incrementally go through different mesh models before reaching the optimal level of decentralization for its context.

Solution

Through our advisory services, we guide our clients in defining their data strategy and implementing solutions with a modern approach to data management. It is within this process that we also evaluate the suitability of the data mesh paradigm both organizationally and technologically. To support the implementation of data mesh, we have also produced tools to manage technology and governance-related challenges.

Our solution starts with understanding an organization’s reality and its needs, which is achieved during the analysis phase of defining the data strategy. This phase clarifies all functional and application elements of an organization, including difficulties and expectations, business model and organizational structure, processes and domains, systems and data. This is followed by the solution definition, which is aligned with the same dimensions. If applicable, the adoption of the data mesh paradigm and the details for its implementation are evaluated.

If the context is deemed suitable for the adoption of the data mesh paradigm, two ingredients are essential to enable the development of its basic components, the data products:

→ a contract that formally describes external interfaces, internal components, and who is responsible for the exposed data

→ a platform capable of maintaining these contracts, enforcing them, and using them to automate the product lifecycle in self-service mode

The contract is critical as it ensures a shared and clear method for describing all elements of the data products. The platform’s role is to make data product management as simple, scalable, and automated as possible by using its internal services and workflows, such as registration, validation, creation, and distribution.

The specification, and consequently the descriptor, describes the data product, while the platform uses the descriptors to automate the product’s operations. We adopt the Data Product Descriptor Specification (DPDS) as the specification, an open specification that declaratively defines a data product in all its components using a JSON or YAML descriptor document. It is released under the Apache 2.0 license and managed by the Open Data Mesh Initiative. DPDS is technology-independent and collects the description of all the components used to build a data product, from infrastructure to interfaces. DPDS is designed to be easily extendable using custom properties or leveraging external standards such as OpenAPI, AsyncAPI, Open SLO, etc., to define its components.

Using the specification, it is possible to guarantee interoperability among various data products by centrally defining policies, the responsibility of the federated governance team, while product teams are responsible for implementing products that conform to these standards.

For implementation, reliance is placed on a platform shared among various product teams and accessible in self-service mode. The architecture’s goal is to control and facilitate the implementation of shared standards, contain the cognitive load on various product teams, reduce TCO, and automate data management activities as much as possible. Its services can be organized into these three groups:

→ Utility plane: Responsible for providing services for accessing underlying infrastructure resources. The exposed services decouple consumers from the actual resources provided by the underlying infrastructure

→ Data product experience plane: Offers a series of operations for managing the data lifecycle: creation, updating, version control, validation, distribution, search, and deactivation

→ Data mesh experience plane: Capable of operating with multiple data products by leveraging their metadata, aims to offer a marketplace where data products can be searched, explored, and connected

The Open Data Mesh (ODM) platform is the solution we use internally to implement this architecture. It will be open source in early 2023. Currently, the ODM platform only covers the services of the utility plane and the data product experience plane, while the data mesh experience plane can be developed using governance tools available on the market, such as Collibra or Blindata, or through custom solutions.

The Utility Plane exposes the main functionalities of the underlying infrastructure, which will later be orchestrated by the data product experience plane to provide high-level services to product teams in a self-service and declarative way. Some examples of services commonly exposed by the utility plane include:

→ Policy Service, which controls global governance policies

→ Meta Service, which propagates metadata associated with data products to a dedicated metadata management system such as Confluent Schema Registry, Collibra or Blindata.

→ Provision Service, responsible for managing the infrastructure

→ Build Service, which compiles applications and generates executable artifacts

→ Integration service, which deploys application artifacts

The Utility Plane decouples the services of the Data Product Experience Plane from the underlying infrastructure, allowing for independent evolution with minimal impact on already implemented products. Each service of the Utility Plane is composed of a standard interface and a pluggable adapter that interacts with a specific underlying infrastructure application, making the platform easily extendable and adaptable.

The Data Product Experience Plane exposes the necessary services to manage the lifecycle of a data product. Among the basic services that a typical product plane must provide are the Registry Service and the Deployment Service. The Registry Service allows you to publish and associate a new version of the descriptor with the data product and make it available on demand. The Deployment Service is responsible for managing, creating, and releasing the Data Product Container based on the descriptor. To do this, it orchestrates the services offered by the Utility Plane in the following way:

Analyzes the application code and generates an executable file, where necessary, using the Build Service.
Creates the infrastructure using the Provision Service.
Registers all metadata in the external metadata repository through the Meta Service.
Distributes the applications using the Integration Service.

Additionally, every time there is a significant change in the state of the data product, it is possible to call the Policy Service to check compliance with global criteria.