Real Time Analytics

Data Platform

Architectures for data integration and processing in support of real-time analysis.

Industries:

Finance & Insurance - Retail & FMCG - Industrial - Transportation

Solutions:

Data Platform

Technologies:

Confluent - Kafka - Spark - Snowflake - Databricks - Aws - Azure - Google Cloud Platform

Quantyca Real Time Inventory use case image

Overview

In various business sectors, the speed of decision-making processes is increasingly crucial for the organization’s profit. By leveraging continuous monitoring of core business performance based on real-time data analysis, management can make timely decisions to maximize revenues from various channels and mitigate losses due to incorrect initiatives or external factors.

For digital-native companies, real-time analysis of phenomena such as customer behavior in interactions with touchpoints allows for rapid improvement in the effectiveness of automated user suggestions, increasing engagement, and enhancing the overall service experience. Conversely, the lack of timeliness in adapting or correcting a digital initiative toward the end customer can lead to a loss of attractiveness and trust in the service, resulting in potential revenue losses or subscription cancellations.

The ability to have real-time visibility into business performance can also be a competitive advantage for companies in more traditional sectors, such as retail, enabling them to launch marketing initiatives or promotional campaigns much more quickly than in the past, especially during annual sales events or special occasions.

Furthermore, the capacity to process domain events in real-time is often a necessary requirement to support the operational aspects of the business itself, as in the case of fraud prevention functionalities for banking institutions in online banking activities or real-time inventory calculations for multichannel retail companies.

Challenges

In the design of real-time data processing architectures, several technical aspects are crucial for the effectiveness of the solution. Here are some of them:

Data Offloading Strategy

To support a Real Time Analytics use case, it’s essential to adopt a data extraction strategy that allows for the detection of new records added or updated in the source system as instantaneously as possible. There are various common offloading practices to choose from. Factors to consider include the impact in terms of additional workload on the source system, the ability to detect record deletion events, the need for additional processing on the extracted records, and the possibility of making changes to the source system or installing third-party components or connectors.

Latency

To enable users to query the dataset for real-time analysis, it’s necessary to minimize latency from data extraction to their readiness for consumption. This reduces the likelihood of consumers receiving an outdated version of the data or experiencing the absence of expected data. The selection of technological components that make up the integration solution is a critical aspect.

State Management

Some transformations required in Real Time Analytics use cases, such as calculating sums, moving averages, or counters, necessitate retaining the current processing state within the application. For example, in the case of a real-time calculated sum, the application must save the current sum value to which it will apply the value of the next incoming record as an addendum. Maintaining this state requires guarantees of persistence, high availability, and fault tolerance to ensure the resumption of operations and data consistency even after restarts. Additionally, for launching the Real Time Analytics system, it’s fundamental to plan an initialization strategy that allows starting from a specific initial state and applying real-time event calculations to it (e.g., the initial stock balance).

Asynchronicity Management

Solutions supporting event stream processing are typically characterized by asynchronous data flows, which introduce complexity compared to traditional batch flows:

Eventual Consistency: Flows related to logically correlated domain entities occur asynchronously and independently. Therefore, strict transactional consistency, ensuring reading a consistent snapshot of the entire domain entity aggregate at any time, is not guaranteed.

Delayed Events and Ordering

Some events may be received by the stream processing application later than their business time window, in which aggregation values are often calculated. Moreover, to ensure integration solution scalability, distributed architectures are adopted, leading to the relinquishment of global event ordering.

Exactly-Once Delivery Semantics

Ensuring that each event is received by consuming applications exactly once, without the possibility of losing events or creating duplication.

Solution

We can distinguish three macro categories of technical solutions.

Real-time stream processing

Characterized by the presence of an event streaming platform in the data architecture, which not only allows the real-time import of data from source applications but also provides a set of tools enabling continuous transformation of event streams (real-time ETL) using SQL-like frameworks or libraries for common programming languages.

These solutions allow the calculation of moderately complex metrics and KPIs by applying event-by-event calculation logic while maintaining an internal state with the current aggregated value if necessary. Results can also be calculated with reference to time windows relative to the event generation timestamp, and the choice of the specific window type (hopping, sliding, tumbling, session-based, etc.) may vary case by case. The analysis KPIs calculated by the stream processing engine can be accessed through API interfaces exposed by the framework itself, emitting results in push or pull mode, or downloaded in real-time to a low-latency data store for ingestion, from which they can be read using real-time reporting tools.

The following image provides an overview of solutions of this type.

Solutions falling into this category include those that involve analytical event processing through stream processing frameworks such as Spark Streaming, Kafka Streams, or Apache Flink. The following image represents the key components of an event streaming platform.

High-performance data store

Solutions of this type, on the other hand, involve a lighter component of event messaging that serves as a Fast Layer to support real-time data ingestion extracted from the source in the form of events into a high-performance computational database. In this schema, data is imported into the analytical database with minimal transformation, after which the internal optimization features of storage and query offered by the database are utilized to ensure good read performance.

Modern databases provide numerous analytical and aggregation functions that support both structured and semi-structured data and enable calculations on complex data types, such as geometric and geographic data.

The following image provides an overview of solutions of this kind.

Solutions falling into this category leverage technologies such as Apache Kafka or AWS Kinesis as a fast layer and Snowflake, Google Big Query, AWS Redshift, or Single Store as high-performance data stores.

Data Lakehouse

Solutions of this kind are similar in design to those that leverage the analytical capabilities of a High-Performance Data Store, as seen earlier. The difference lies in the fact that in this case, data is not imported into a traditional analytical database but is instead inserted into an object store or a distributed file system in the form of raw textual files (JSON) or binaries (ORC, Parquet, Avro), with append mode and a schema-on-read approach.

Schema parsing, as well as the application of analytical calculation logic, is done at query time by a highly optimized query engine for reading from an object store. The query engine accesses the data present in the external object store through External Tables, which can be optimized through the definition of materialized views or cached datasets to further enhance performance.

The following image provides an overview of solutions of this type.

In this category, solutions that leverage technologies such as AWS S3, Azure Blob Storage, Google Cloud Storage, or HDFS as an object store and Databricks, Dremio, AWS Athena as high-performance query engines are included.

Benefits

Fast Decision Making

The availability of real-time data for analysis allows for continuous monitoring of the core business's performance and enables quick strategic or operational decisions, leveraging automation, to improve the service provided to the end customer or profit opportunities.

Omnichannel Support

The ability to process data in the form of event streams is a key factor in seamlessly integrating data from various channels and enabling business processes that require a point of contact between them (e.g., managing a single warehouse for online and traditional retail).

Risk Prevention

Some Real-Time Analytics applications, such as real-time system monitoring or fraud detection features, shield the company from the possibility of damaging its reputation or losing a share of profits. They also help avoid having to reimburse users or pay fines imposed by regulators.

Success Stories

Rinascente – Real Time Inventory

Rinascente develops an application for the real-time calculation of stock...