https://github.com/dumbremadhura/apache-kafka-city-bike-project
Real-time data pipeline using Apache Kafka and Python to stream Citi Bike NYC station data. Demonstrates producing and consuming messages via Kafka, containerized with Docker. Built as part of a hands-on Kafka learning project.
https://github.com/dumbremadhura/apache-kafka-city-bike-project
citibikenyc data-streaming docker gbfs kafka kafka-python real-time
Last synced: 4 months ago
JSON representation
Real-time data pipeline using Apache Kafka and Python to stream Citi Bike NYC station data. Demonstrates producing and consuming messages via Kafka, containerized with Docker. Built as part of a hands-on Kafka learning project.
- Host: GitHub
- URL: https://github.com/dumbremadhura/apache-kafka-city-bike-project
- Owner: dumbremadhura
- Created: 2025-06-28T11:08:41.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-28T11:56:34.000Z (4 months ago)
- Last Synced: 2025-06-28T12:25:57.818Z (4 months ago)
- Topics: citibikenyc, data-streaming, docker, gbfs, kafka, kafka-python, real-time
- Language: Python
- Homepage: https://www.linkedin.com/in/madhuradumbre/
- Size: 152 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🚴♂️ City Bike Kafka Streaming Project
This project is a hands-on Kafka streaming pipeline built as part of **Darshil Parmar's Apache Kafka course on DataVidhya**.
It demonstrates how to fetch real-time Citi Bike NYC data from public APIs, produce it to Kafka topics, and consume it using simple Python scripts — helping you understand the core concepts of Kafka in practice.## 🧩 Project Architecture

## 📁 Project Structure
```bash
city-bike-project/
│
├── bikes/ # Core logic for fetching and sending bike data
│ └── bikes.py
│
├── kafka_consumer/ # Kafka consumer script
│ └── consumer.py
│
├── kafka_producer/ # Kafka producer script
│ └── producer.py
│
├── services/ # HTTP service wrapper
│ └── http_service.py
│
├── constants/ # API routes and Kafka topic names
│ ├── routes.py
│ └── topics.py
│
├── docker-compose.yml # Docker setup for Kafka and Zookeeper
├── .gitignore # Ignore unneeded files
└── README.md # You're here
```## 🔧 Prerequisites
- Docker and Docker Compose installed
- Python 3.7+
- `requests` and `kafka-python` libraries installed for Python scripts## 🐳 Setup with Docker
1. **Start Kafka and Zookeeper**
```bash
docker-compose up -d
```2. **Enter Kafka container**
```bash
docker exec -it bash
```****You can find the Kafka container name with:****
```bash
docker ps
```3. **Create Kafka Topics**
```bash
kafka-topics.sh --create --topic bikes_station_information --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
kafka-topics.sh --create --topic bikes_station_status --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
```## Run the Project
***Open multiple terminal windows for the following:***
1. **Start Producer script**
```bash
python3 bikes/bikes.py
# This will fetch data from the Citi Bike API and produce messages to Kafka topics.
```2. **Start Consumer script**
```bash
python3 consumer/bike_consumer.py
# This will consume messages from Kafka and print them to the console.
```## 📚 References
Built for educational purposes as part of Darshil Parmar's Apache Kafka course on https://courses.analyticsvidhya.com/.Citi Bike NYC GBFS data: https://gbfs.citibikenyc.com/gbfs/en/
## 👩💻 Project Author
Madhura Dumbre,
Data Engineer📍 Bengaluru, India
***🔗 https://www.linkedin.com/in/madhuradumbre/***