Info

Real Time Analytics

Kod: 222891-D

Winter semester 2022/2023, SGH Warsaw School of Economics

Basics information about this course can be found in the syllabus.

List of books! I recommend.

If You don’t know what Python is go here.

Schedule

Lectures

The lecture is carried out in hybrid mode. It is OPTIONAL and takes place in Aula I building G

  1. 20-02-2023 (Monday) 08:00-9:30 - Lecture 1 Completed topics:
    • structured and unstructured data,
    • introduction to Big Data
    • data generation processes
    • OLAP and OLTP data processing models.
  2. 27-02-2023 (Monday) 08:00-9:30 - Lecture 2 Completed topics:
    • batch processing vs data stream processing
    • ETL
    • the MapReduce pattern
    • business requirements for the data stream
    • definitions of: event, event stream processing, event analysis,
    • batch apps and streaming apps
  3. 06-03-2023 (Monday) 08:00-9:30 - Lecture 3 Completed topics:
    • Time in streaming data processing
    • Operation of the client-server system: REST API
  4. 13-03-2023 (Monday) 08:00-9:30 - Lecture 4 Completed topics:
    • Lambda and Kappa architectures
    • pub/sub communication for Apache Kafka

Lectures end with a TEST: 10 questions - 20 minutes. The test is conducted via MS Teams.

Labs

  1. 21-03-2023 (tuesday) 08:00-11:30 - C4D 2 groups
  2. 28-03-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  3. 04-04-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  4. 18-04-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  5. 25-04-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  6. 09-05-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  7. 16-05-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  8. 23-05-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  9. 30-05-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy
  10. 06-06-2023 (tuesday) 08:00-11:30 - C4D, 2 grupy

Place

Lectures 1-4: G-Aula I Labs 1-10: C-4D

Exam

Lectures will end with a test (last class). Positive evaluation of the test (above 13 points) entitles you to carry out the exercises.

After the exercises, homework will be carried out via the MS teams’ platform. Passing all exercises and tasks entitles you to complete the project.

The project should be carried out in groups of no more than 5 people.

Project requirements:

  • The project should present a BUSINESS PROBLEM that can be implemented using the information provided online. (This does not mean that you cannot use batch processing, e.g. to generate a model).
  • Data should be sent to Apache Kafka and further processed and analyzed from there.
  • The programming language is free - applies to each component of the project.
  • BI tools can be used
  • Data sources can be a table, artificially generated data, IoT, etc.

Technology

Participating in the classes, you must know and at least use the following information technologies:

  1. GIT
  2. Python, Jupyter notebook, Jupyter lab, Colab
  3. Docker
  4. Apache Spark, Apache Flink, Apache Kafka, Apache Beam
  5. Databricks Community edition Web page.