Real-Time Analytics Lectures for SGH students

Syllabus

Real Time Analytics

SGH Warsaw School of Economics

ECTS: 3

Language: EN

level: medium

day of week: Monday

Teacher: Sebastian Zając, sebastian.zajac@sgh.waw.pl

Website: http://sebkaz-teaching.github.io/RealTimeEN

Description

Making the right decisions based on data and their analysis in business is a process and daily. Modern methods of modeling by machine learning (ML), artificial intelligence (AI), or deep learning not only allow better understanding of business, but also support making key decisions for it. The development of technology and increasingly new business concepts of working directly with the client require not only correct but also fast decisions. The classes offered are designed to provide students with experience and comprehensive theoretical knowledge in the field of real-time data processing and analysis, and to present the latest technologies (free and commercial) for the processing of structured data (originating e.g. from data warehouses) and unstructured (e.g. images, sound, video streaming) in on-line mode. The course will present the so called lambda and kappa structures for data processing into data lake along with a discussion of the problems and difficulties encountered in implementing real-time modeling for large amounts of data. Theoretical knowledge will be gained (apart from the lecture part) through the implementation of test cases in tools such as Apache Spark, Nifi, Microsoft Azure and SAS. During laboratory classes student will benefit from fully understand the latest information technologies related to real-time data processing.

List of Topics

  1. Modelling, learning and prediction in batch mode (offline learning) and incremental (online learning) modes. Problems of incremental machine learning.
  2. Data processing models in Big Data. From flat files to Data Lake. Real-time data myth and facts
  3. NRT systems (near real-time systems), data acquisition, streaming and analytics.
  4. Algorithms for estimating model parameters in incremental mode. Stochastic Gradient Descent.
  5. Lambda and Kappa architecture. Designing IT architecture for real-time data processing.
  6. Preparation of the micro-service with the ML model for prediction use.
  7. Structured and unstructured data. Relational databases and NoSQL databases.
  8. Aggregations and reporting in NoSQL databases (on the example of the MongoDB or Cassandra)
  9. Basic of object-oriented programming in Python in linear and logistic regression, neural network analysis using the sklearn, TensorFlow and Keras.
  10. IT architecture of Big Data processing. Preparation of a virtual env for Apache Spark.

Conditions for passing

  • test 30%
  • practical test 30% (IF)
  • group project 40% (70%)

Books

  1. S. Zajac, “Modelowanie dla biznesu, Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe”. SGH (2022)
  2. Frątczak E., red. “Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring”. SGH, Warszawa 2019.
  3. Frątczak E., red., “Zaawansowane metody analiz statystycznych”, Oficyna Wydawnicza SGH, Warszawa 2012.
  4. Indest A., Wild Knowledge. Outthik the Revolution. LID publishing.com 2017.
  5. Real Time Analytic. “The Key to Unlocking Customer Insights & Driving the Customer Experience”. Harvard Business Review Analytics Series, Harvard Business School Publishing, 2018.
  6. Svolba G., “Applying Data Science. Business Case Studies Using SAS”. SAS Institute Inc., Cary NC, USA, 2017.
  7. Ellis B. “Real-Time Analytics Techniques to Analyze and Visualize Streaming data.” , Wiley, 2014
  8. Familiar B., Barnes J. “Business in Real-Time Using Azure IoT and Cortana Intelligence Suite” Apress, 2017