Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/san089/optimizing-public-transportation
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
https://github.com/san089/optimizing-public-transportation
faust kafka-api kafka-application kafka-cluster kafka-connect kafka-consumer kafka-producer kafka-schema-registry kafka-sql kafka-streams kafka-topic
Last synced: 3 months ago
JSON representation
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
- Host: GitHub
- URL: https://github.com/san089/optimizing-public-transportation
- Owner: san089
- License: mit
- Created: 2020-01-07T18:48:09.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-08-14T22:08:50.000Z (over 1 year ago)
- Last Synced: 2024-10-13T00:06:38.376Z (4 months ago)
- Topics: faust, kafka-api, kafka-application, kafka-cluster, kafka-connect, kafka-consumer, kafka-producer, kafka-schema-registry, kafka-sql, kafka-streams, kafka-topic
- Language: Python
- Homepage:
- Size: 496 KB
- Stars: 29
- Watchers: 4
- Forks: 13
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Optimizing-Public-Transportation
## Architecture
![Architecture](https://github.com/san089/Optimizing-Public-Transportation/blob/master/docs/architecture.png)#### Overview
In this project we construct a streaming event pipeline around Apache Kafka and its ecosystem. Using public dataset from [Chicago Transit Authority](https://www.transitchicago.com/data/) we constructed an event pipeline around Kakfa that allows to simulate and display status of train in real time.**Arrival and Turnstiles ->** Producers that create train arrival and turnstile information into our kafka cluster. Arrivals indicate that a train has arrived at a particular station while the turnstile event indicate a passanger has entered the station.
**Weather ->** A REST Proxy prodcer that periodically emits weather data by a REST Proxy and emits that to the kafka cluster.
**Postgres SQL and Kafka Connect ->** Extract data from stations and push it to kafka cluster.
**Kafka status server ->** Consumes data from kafka topics and display on the UI.
![Results](https://github.com/san089/Optimizing-Public-Transportation/blob/master/docs/results.png)
### Environment
- Docker (I used bitnami kafka image available [here](https://hub.docker.com/r/bitnami/kafka)
- Python 3.7### Running and Testing
First make sure all the service are up and running:
For docker use:docker-compose up
Docker-Compose will take 3-5 minutes to start, depending on your hardware.
Once Docker-Compose is ready, make sure the services are running by connecting to them using DOCKER URL provided below:![](https://github.com/san089/Optimizing-Public-Transportation/blob/master/docs/services.png)
Also, you need to install requirements as well, use below command to create a virtual environment and install requirements:
1. `cd producers`
2. `virtualenv venv`
3. `. venv/bin/activate`
4. `pip install -r requirements.txt`Same for the consumers, setup environment as below:
1. `cd consumers`
2. `virtualenv venv`
3. `. venv/bin/activate`
4. `pip install -r requirements.txt`#### Running Simulation
Run the producers using simulation.py in producers folder:python simulation.py
Run the Faust Stream Processing Application:
cd consumers
faust -A faust_stream worker -l infoRun KSQL consumer as below:
cd consumers
python ksql.pyTo run consumer server:
cd consumers
python server.py### Resources
[Confluent Python Client Documentation](https://docs.confluent.io/current/clients/confluent-kafka-python/#)
[Confluent Python Client Usage and Examples](https://github.com/confluentinc/confluent-kafka-python#usage)
[REST Proxy API Reference](https://docs.confluent.io/current/kafka-rest)
[Kafka Connect JDBC Source Connector Configuration Options](https://docs.confluent.io/current/connect/kafka-connect-jdbc/source-connector/source_config_options.html)