Real-Time Analytics Lectures for SGH students

Environment

Python env with Jupyter LAB

For our first a few laboratories we will use just python codes. Check what is Your Python3 environment.

In the terminal try first:

python
# and
python3

I have python3 (You shouldn’t use python 2.7 version) so i get new and clear python environment.

The easiest way how to run JupyterLab with your new python env. For __ You can choose what You want.

python3 -m venv <name of Your env>

source <name of your env>/bin/activate
# . env/bin/activate
pip install --no-cache --upgrade pip setuptools

pip install jupyterlab numpy pandas matplotlib scipy
# or
pip install -r requirements.txt

jupyterlab

go to web browser: localhost:8888

If You want rerun jupyterlab (after computer reset) just go to Your folder and run:

source <name of your env>/bin/activate
jupyterlab

Python env with JupyterLAB Docker Version

Cookiecutter project

From GitHub repository You can find how to use a cookiecutter for any data science project or other kind of programs.

To run and build full dockerfile project: Create python env and install cookiecutter library.

python3 -m venv venv
source venv/bin/activate
pip --no-cache install --upgrade pip setuptools
pip install cookiecutter

and run:

cookiecutter https://github.com/sebkaz/jupyterlab-project

You can run a cookiecutter project directly from GitHub repo.

Answer questions:

cd jupyterlab
docker-compose up -d --build

To stop:

docker-compose down

Cookiecutter with config yaml file

  1. Python, Julia, R
  2. All + Apache Spark

Clone repo and run:

python3 -m cookiecutter https://github.com/sebkaz/jupyterlab-project --no-input --config-file=spark_template.yml --overwrite-if-exists

Older Docker version with Jupyter notebook

From GitHub repository

Take Dockerfile from Git repository and run:

docker build -t docker-data-science

docker run -d -p 8888:8888 docker-data-science

From Docker Hub repository

You can also run this image from DockerHub repo:

docker run -d -p 8888:8888 sebkaz/docker-data-science

After docker run go to http://localhost:8888

PASS: root

!REMEMBER - I don’t use -v (volume) option so You must save Your works all the time.

Older version with SPARK in Jupyter notebook

From GitHub repository

Take Dockerfile from GitHub repository and build the image:

docker build -t docker-spark-jupyter

After that You can run it with:

docker run -d -p 8888:8888 docker-spark-jupyter

From Docker Hub repository

You can also run this image from DockerHub repo:

docker run -d -p 8888:8888 sebkaz/docker-spark-jupyter

After docker run: go to http://localhost:8888

PASS: root

REMEMBER - I don’t use -v (volume) option, so You must save Your works all the time.

Apache AIRFLOW - local mode

mkdir airflow-local
cd airflow-local

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.3.0/dockert-compose.yaml'

mkdir ./dags ./logs ./plugins

echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env

cat .env

first run

docker-compose up airflow-init

To run env

docker-compose up -d --build

Web browser

localhost:8080

to stop

docker-compose down --volumes --rmi all