https://github.com/wittline/optimizing-public-transportation
Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.
https://github.com/wittline/optimizing-public-transportation
kafka stream-processing streaming udacity-nanodegree
Last synced: about 1 year ago
JSON representation
Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.
- Host: GitHub
- URL: https://github.com/wittline/optimizing-public-transportation
- Owner: Wittline
- Created: 2021-07-01T19:23:41.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-07-12T03:43:44.000Z (almost 5 years ago)
- Last Synced: 2025-01-29T09:39:29.233Z (over 1 year ago)
- Topics: kafka, stream-processing, streaming, udacity-nanodegree
- Language: Python
- Homepage:
- Size: 362 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Monitoring the status of public Transportation with Apache Kafka
We will build an Streaming event pipeline around Kafka and its ecosystem that allows us to simulate and display the status of train lines in real time, using public data from the Chicago Transit Authority.
# Data Source
Public data from Chicago Transit Authority
# Architecture

# How to run the project with docker
- Install Docker Desktop on Windows, it will install **docker compose** as well, docker compose will alow you to run multiple containers applications
- Install git-bash for windows, once installed , open **git bash** and download this repository, this will download the **docker-compose.yaml** file, and other files needed.
## Dependencies
- Kafka
- Zookeeper
- Schema Registry
- REST Proxy
- Kafka Connect
- KSQL
- Kafka Connect UI
- Kafka Topics UI
- Schema Registry UI
- Postgres
The docker-compose file does not run your code, to start docker-compose, navigate to the starter directory containing docker-compose.yaml and run the following commands using git bash:
```
$> cd starter
$> docker-compose up
Starting zookeeper ... done
Starting kafka0 ... done
Starting schema-registry ... done
Starting rest-proxy ... done
Starting connect ... done
Starting ksql ... done
Starting connect-ui ... done
Starting topics-ui ... done
Starting schema-registry-ui ... done
Starting postgres ... done
```
You will see a large amount of text print out in your terminal and continue to scroll. This is normal! This means your dependencies are up and running.
To check the status of your environment, you may run the following command at any time from a separate terminal instance:
```
$> docker-compose ps
Name Command State Ports
-----------------------------------------------------------------------------------------------------------------
starter_connect-ui_1 /run.sh Up 8000/tcp, 0.0.0.0:8084->8084/tcp
starter_connect_1 /etc/confluent/docker/run Up 0.0.0.0:8083->8083/tcp, 9092/tcp
starter_kafka0_1 /etc/confluent/docker/run Up 0.0.0.0:9092->9092/tcp
starter_ksql_1 /etc/confluent/docker/run Up 0.0.0.0:8088->8088/tcp
starter_postgres_1 docker-entrypoint.sh postgres Up 0.0.0.0:5432->5432/tcp
starter_rest-proxy_1 /etc/confluent/docker/run Up 0.0.0.0:8082->8082/tcp
starter_schema-registry-ui_1 /run.sh Up 8000/tcp, 0.0.0.0:8086->8086/tcp
starter_schema-registry_1 /etc/confluent/docker/run Up 0.0.0.0:8081->8081/tcp
starter_topics-ui_1 /run.sh Up 8000/tcp, 0.0.0.0:8085->8085/tcp
starter_zookeeper_1 /etc/confluent/docker/run Up 0.0.0.0:2181->2181/tcp, 2888/tcp, 3888/tcp
```
## Connecting to Services in Docker Compose
Now that your project’s dependencies are running in Docker Compose, we’re ready to get our project up and running. Windows Users Only: You must first install librdkafka-dev in your WSL Linux.
Run the following command in your Ubuntu terminal:
```
sudo apt-get install librdkafka-dev -y
```
## Stopping Docker Compose and Cleaning Up
When you are ready to stop Docker Compose you can run the following command:
```
$> docker-compose stop
Stopping starter_postgres_1 ... done
Stopping starter_schema-registry-ui_1 ... done
Stopping starter_topics-ui_1 ... done
Stopping starter_connect-ui_1 ... done
Stopping starter_ksql_1 ... done
Stopping starter_connect_1 ... done
Stopping starter_rest-proxy_1 ... done
Stopping starter_schema-registry_1 ... done
Stopping starter_kafka0_1 ... done
Stopping starter_zookeeper_1 ... done
```
If you would like to clean up the containers to reclaim disk space, as well as the volumes containing your data:
```
$> docker-compose rm -v
Going to remove starter_postgres_1, starter_schema-registry-ui_1, starter_topics-ui_1, starter_connect-ui_1, starter_ksql_1, starter_connect_1, starter_rest-proxy_1, starter_schema-registry_1, starter_kafka0_1, starter_zookeeper_1
Are you sure? [yN] y
Removing starter_postgres_1 ... done
Removing starter_schema-registry-ui_1 ... done
Removing starter_topics-ui_1 ... done
Removing starter_connect-ui_1 ... done
Removing starter_ksql_1 ... done
Removing starter_connect_1 ... done
Removing starter_rest-proxy_1 ... done
Removing starter_schema-registry_1 ... done
Removing starter_kafka0_1 ... done
Removing starter_zookeeper_1 ... done
```
# Running the producer
```
cd producers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python simulation.py
```
# Running the Faust Stream Processing Application
```
cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
faust -A faust_stream worker -l info
```
# Running the KSQL Creation Script
```
cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python ksql.py
```
# Running the consumer
```
cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python server.py
```