Lecture 1

⏳ Duration: 1.5h

🎯 Lecture Goal

Introducing students to the fundamentals of real-time analytics, the differences between data processing modes (batch, streaming, real-time), as well as key applications and challenges.

What is Real-Time Data Analytics?

Definition and Key Concepts

Real-Time Data Analytics is the process of processing and analyzing data immediately after it is generated, without the need for storage and later processing. The goal is to obtain instant insights and responses to changing conditions in business, technology, and scientific systems.

Key Features of Real-Time Data Analytics:

  1. Low latency – data is analyzed within milliseconds or seconds of being generated.
  2. Streaming vs. Batch Processing – data analysis can occur continuously (streaming) or at predefined intervals (batch).
  3. Integration with IoT, AI, and ML – real-time analytics often works in conjunction with the Internet of Things (IoT) and artificial intelligence algorithms.
  4. Real-time decision-making – e.g., instant fraud detection in banking transactions.

Business Applications of Real-Time Data Analytics

Finance and Banking

  • Fraud Detection – real-time transaction analysis helps identify anomalies indicating fraud.
  • Automated Trading – HFT (High-Frequency Trading) systems analyze millions of data points in fractions of a second.
  • Dynamic Credit Scoring – instant risk assessment of a customer’s creditworthiness.

E-Commerce and Digital Marketing

  • Real-Time Offer Personalization – dynamic product recommendations based on users’ current behavior.
  • Dynamic Pricing – companies like Uber, Amazon, and hotels adjust prices in real time based on demand.
  • Social Media Monitoring – sentiment analysis of customer feedback and immediate response to negative comments.

Telecommunications and IoT

  • Network Infrastructure Monitoring – real-time log analysis helps detect failures before they occur.
  • Smart Cities – real-time traffic analysis optimizes traffic light systems dynamically.
  • IoT Analytics – IoT devices generate data streams that can be analyzed in real time (e.g., smart energy meters).

Healthcare

  • Patient Monitoring – real-time analysis of medical device signals to detect life-threatening conditions instantly.
  • Epidemiological Analytics – tracking disease outbreaks based on real-time data.

Real-time data analytics is a key component of modern IT systems, enabling businesses to make faster and more precise decisions. It is widely used across industriesβ€”from finance and e-commerce to healthcare and IoT.

Differences Between Batch Processing, Near Real-Time Analytics, and Real-Time Analytics

There are three main approaches to data processing:

  1. Batch Processing
  2. Near Real-Time Analytics
  3. Real-Time Analytics

Each differs in processing speed, technological requirements, and business applications.

Batch Processing

πŸ“Œ Definition:

Batch Processing involves collecting large amounts of data and processing them at scheduled intervals (e.g., hourly, daily, or weekly).

πŸ“Œ Characteristics:

  • βœ… High efficiency for large datasets
  • βœ… Processes data after it has been collected
  • βœ… Does not require immediate analysis
  • βœ… Typically cheaper than real-time processing
  • ❌ Delays – results are available only after processing is complete

πŸ“Œ Use Cases:

  • Generating financial reports at the end of a day/month
  • Analyzing sales trends based on historical data
  • Creating offline machine learning models

πŸ“Œ Example Technologies:

  • Hadoop MapReduce
  • Apache Spark (in batch mode)
  • Google BigQuery
import pandas as pd  
df = pd.read_csv("transactions.csv")  

df['transaction_date'] = pd.to_datetime(df['transaction_date'])
df['month'] = df['transaction_date'].dt.to_period('M') 

# Agg
monthly_sales = df.groupby(['month'])['amount'].sum()

monthly_sales.to_csv("monthly_report.csv")  

print("Raport save!")

If you wanted to create data for an example.

import pandas as pd
import numpy as np

np.random.seed(42)
data = {
    'transaction_id': [f'TX{str(i).zfill(4)}' for i in range(1, 1001)],
    'amount': np.random.uniform(10, 10000, 1000), 
    'transaction_date': pd.date_range(start="2025-01-01", periods=1000, freq='h'), 
    'merchant': np.random.choice(['Merchant_A', 'Merchant_B', 'Merchant_C', 'Merchant_D'], 1000),
    'card_type': np.random.choice(['Visa', 'MasterCard', 'AmEx'], 1000)
}

df = pd.DataFrame(data)
csv_file = 'transactions.csv'
df.to_csv(csv_file, index=False)

Near Real-Time Analytics – Analysis Nearly in Real Time

πŸ“Œ Definition:

Near Real-Time Analytics refers to data analysis that occurs with minimal delay (typically from a few seconds to a few minutes). It is used in scenarios where full real-time analysis is not necessary, but excessive delays could impact business operations.

πŸ“Œ Characteristics:

  1. βœ… Processes data in short intervals (a few seconds to minutes)
  2. βœ… Enables quick decision-making but does not require millisecond-level reactions
  3. βœ… Optimal balance between cost and speed
  4. ❌ Not suitable for systems requiring immediate response

πŸ“Œ Use Cases:

  • Monitoring banking transactions and detecting fraud (e.g., analysis within 30 seconds)
  • Dynamically adjusting online ads based on user behavior
  • Analyzing server and network logs to detect anomalies

πŸ“Œ Example Technologies:

  • Apache Kafka + Spark Streaming
  • Elasticsearch + Kibana (e.g., IT log analysis)
  • Amazon Kinesis

Example of a data producer sending transactions to an Apache Kafka system.

from kafka import KafkaProducer
import json
import random
import time
from datetime import datetime

# Ustawienia dla producenta
bootstrap_servers = 'localhost:9092'
topic = 'transactions' 


def generate_transaction():
    transaction = {
        'transaction_id': f'TX{random.randint(1000, 9999)}',
        'amount': round(random.uniform(10, 10000), 2),  
        'transaction_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        'merchant': random.choice(['Merchant_A', 'Merchant_B', 'Merchant_C', 'Merchant_D']),
        'card_type': random.choice(['Visa', 'MasterCard', 'AmEx']),
    }
    return transaction

producer = KafkaProducer(
    bootstrap_servers=bootstrap_servers,
    value_serializer=lambda v: json.dumps(v).encode('utf-8') 
)


for _ in range(1000):  
    transaction = generate_transaction()
    producer.send(topic, value=transaction) 
    print(f"Sent: {transaction}")
    time.sleep(1) 

producer.flush()
producer.close()

Consument example

from kafka import KafkaConsumer
import json  

consumer = KafkaConsumer(
    'transactions',
    bootstrap_servers='localhost:9092',
    auto_offset_reset='earliest',
    value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)


for message in consumer:
    transaction = message.value
    if transaction["amount"] > 8000:
        print(f"🚨 Anomaly transaction: {transaction}")

Example of the DataSet

{
    "transaction_id": "TX1234",
    "amount": 523.47,
    "transaction_date": "2025-02-11 08:10:45",
    "merchant": "Merchant_A",
    "card_type": "Visa"
}

Real-Time Analytics

πŸ“Œ Definition:

Real-Time Analytics refers to the immediate analysis of data and decision-making within fractions of a second (milliseconds to one second). It is used in systems requiring real-time responses, such as stock market transactions, IoT systems, and cybersecurity.

πŸ“Œ Characteristics:

  1. βœ… Extremely low latency (milliseconds to seconds)
  2. βœ… Enables instant system response
  3. βœ… Requires high computational power and scalable architecture
  4. ❌ More expensive and technologically complex than batch processing

πŸ“Œ Use Cases:

  • High-Frequency Trading (HFT) – making stock market transaction decisions within milliseconds
  • Autonomous Vehicles – real-time analysis of data streams from cameras and sensors
  • Cybersecurity – detecting network attacks in fractions of a second
  • IoT Analytics – instant anomaly detection in industrial sensor data

πŸ“Œ Example Technologies:

  • Apache Flink
  • Apache Storm
  • Google Dataflow

πŸ”Ž Comparison:

Feature Batch Processing Near Real-Time Analytics Real-Time Analytics
Latency Minutes – hours – days Seconds – minutes Milliseconds – seconds
Processing Type Batch (offline) Streaming (but not fully immediate) Streaming (true real-time)
Infrastructure Cost πŸ“‰ Low πŸ“ˆ Medium πŸ“ˆπŸ“ˆ High
Implementation Complexity πŸ“‰ Simple πŸ“ˆ Medium πŸ“ˆπŸ“ˆ Difficult
Use Cases Reports, offline ML, historical analysis Transaction monitoring, dynamic ads HFT, IoT, real-time fraud detection

πŸ“Œ When to Use Batch Processing?

  1. βœ… When immediate analysis is not required
  2. βœ… When handling large volumes of data processed periodically
  3. βœ… When aiming to reduce costs

πŸ“Œ When to Use Near Real-Time Analytics?

  1. βœ… When analysis is needed within a short time (seconds – minutes)
  2. βœ… When fresher data is required but not full real-time processing
  3. βœ… When seeking a balance between performance and cost

πŸ“Œ When to Use Real-Time Analytics?

  1. βœ… When every millisecond matters (e.g., stock trading, autonomous vehicles)
  2. βœ… When detecting fraud, anomalies, or incidents instantly
  3. βœ… When a system must respond to events immediately

Real-time analytics is not always necessaryβ€”often, near real-time is sufficient and more cost-effective. The key is to understand business requirements before choosing the right solution.

Why is Real-Time Analytics Important?

Real-time analytics is becoming increasingly crucial across various industries as it enables organizations to make immediate decisions based on up-to-date data. Here are some key reasons why real-time analytics matters:

Faster Decision-Making

Real-time analytics allows businesses to react to changes and events instantly. This is essential in dynamic environments such as:

  • Marketing: Ads can be adjusted in real time based on user behavior (e.g., personalized content recommendations).
  • Finance: Fraud detection in real-time, where every minute counts in preventing financial losses.

Real-Time Monitoring

Companies can continuously track key operational metrics. Examples:

  • IoT (Internet of Things): Monitoring the condition of machines and equipment in factories to detect failures and prevent downtime.
  • Healthtech: Tracking patients’ vital signs and detecting anomalies, which can save lives.

Improved Operational Efficiency

Real-time analytics helps identify and address operational issues before they escalate. Examples:

  • Logistics: Tracking shipments and monitoring transport status in real time to improve efficiency and reduce delays.
  • Retail: Monitoring inventory levels in real-time and adjusting orders accordingly.

Competitive Advantage

Organizations leveraging real-time analytics gain an edge by responding faster to market changes, customer needs, and crises. With real-time insights:

  • Businesses can make proactive decisions ahead of competitors.
  • Companies can enhance customer relationships by responding instantly to their needs (e.g., adjusting offerings dynamically).

Enhanced User Experience (Customer Experience)

Real-time data analysis enables personalized user interactions as they happen. Examples:

  • E-commerce: Analyzing shopping cart behavior in real time to offer discounts or remind users of abandoned items.
  • Streaming Services: Optimizing video/streaming quality based on available bandwidth.

Anomaly Detection and Threat Prevention

In today’s data-driven world, real-time anomaly detection is critical for security. Examples:

  • Cybersecurity: Detecting suspicious network activities and preventing attacks in real time (e.g., DDoS attacks, unauthorized logins).
  • Fraud Prevention: Instant identification of suspicious transactions in banking and credit card systems.

Cost Optimization

Real-time analytics helps optimize resources and reduce costs. Examples:

  • Energy Management: Monitoring energy consumption in real time to optimize corporate energy expenses.
  • Supply Chain Optimization: Tracking inventory and deliveries to reduce storage and transportation costs.

Predictive Capabilities

Real-time analytics supports predictive processes that anticipate future behaviors or problems and address them before they occur. Examples:

  • Predictive Maintenance: Combining real-time data with predictive models to foresee machine failures.
  • Demand Forecasting: Adjusting production or stock levels based on live market trends.

Real-time analytics is not just about data analysisβ€”it is a crucial element of business strategy in a world that demands agility, flexibility, and rapid adaptation. Companies that implement these technologies can significantly enhance their financial performance, customer service, operational efficiency, and competitive advantage.