Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ahmetfurkandemir/data-engineering-project-with-hdfs-and-kafka
Data Engineering Project with Hadoop HDFS and Kafka
https://github.com/ahmetfurkandemir/data-engineering-project-with-hdfs-and-kafka
data data-engineer data-engineering data-engineering-pipeline docker docker-compose hadoop hadoop-filesystem hadoop-hdfs hdfs hdfs-client hdfs-dfs kafka kafka-consumer kafka-producer kafka-ui kafkaui pipline python python-hdfs-client
Last synced: 3 months ago
JSON representation
Data Engineering Project with Hadoop HDFS and Kafka
- Host: GitHub
- URL: https://github.com/ahmetfurkandemir/data-engineering-project-with-hdfs-and-kafka
- Owner: AhmetFurkanDEMIR
- License: mit
- Created: 2023-11-04T12:35:07.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-04T12:56:14.000Z (over 1 year ago)
- Last Synced: 2024-05-07T18:17:01.277Z (10 months ago)
- Topics: data, data-engineer, data-engineering, data-engineering-pipeline, docker, docker-compose, hadoop, hadoop-filesystem, hadoop-hdfs, hdfs, hdfs-client, hdfs-dfs, kafka, kafka-consumer, kafka-producer, kafka-ui, kafkaui, pipline, python, python-hdfs-client
- Language: Python
- Homepage:
- Size: 3.46 MB
- Stars: 8
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
data:image/s3,"s3://crabby-images/e7534/e75345df83186b79ca20e8042eb789f4749f803d" alt="" data:image/s3,"s3://crabby-images/aee00/aee008a1c6463b6b8279c7a97fed4f1a65e84a3b" alt="" data:image/s3,"s3://crabby-images/cd419/cd419aae644258c93a6f3cefc927c7975bbd0b5b" alt="" data:image/s3,"s3://crabby-images/6827b/6827b21a68ac124d9f0565c02f7b21241971a0d4" alt="" data:image/s3,"s3://crabby-images/02907/02907fc6e683db1075c7c5e24994e68196b8dec5" alt=""
# Data Engineering Project with HDFS and Kafkadata:image/s3,"s3://crabby-images/d5295/d52950a2446f555be713f2f1a07a593af1a48b91" alt=""
A project to create a data pipeline with data taken from Hepsiburada data engineering case study.
* [docker-compose.yml](/docker-compose.yml)
* [config-hadoop](/config-hadoop)
* [Producer](/docker/producer/)
* [Dockerfile](/docker/producer/Dockerfile)* [HB data](/docker/producer/hb-data.json)
* [Kafka producer](/docker/producer/kafka_producer.py)
* [requirements](/docker/producer/requirements.txt)
* [Consumer](/docker/consumer/)
* [Dockerfile](/docker/consumer/Dockerfile)
* [Kafka consumer](/docker/consumer/kafka_consumer.py)
* [requirements](/docker/consumer/requirements.txt)
* [HDFS](/docker/consumer/hdfs.py)
### Steps
Open an Ubuntu machine via AWS EC2 for the project.
data:image/s3,"s3://crabby-images/e9079/e90792af698685eb064e554a81b6b45939f65e32" alt=""
Open the necessary ports on the machine through the firewall.
data:image/s3,"s3://crabby-images/92aed/92aed240d25828dc712cdca0f5df3cdde069f23f" alt=""
You also need to open the necessary ports with the operating system.
```bash
sudo ufw allow 9870
sudo ufw allow 8080
sudo ufw allow 8088
```Then, stand up the docker images.
```bash
docker-compose up --build
```One minute after the images stand up, data begins to be written to the Kafka topic and activity begins in the data pipeline.
Data from Kafka topic. IP:8080 or [0.0.0.0:8080](http://0.0.0.0:8080)
data:image/s3,"s3://crabby-images/d2ff4/d2ff4c4de7e944246b24e45f46a2fbe211089c4f" alt=""Hadoop HDFS interface. IP:9870 or [0.0.0.0:9870](http://0.0.0.0:9870)
data:image/s3,"s3://crabby-images/6c121/6c121c541521a20e7bd91f08694eabb31cb8b8b7" alt=""Data from HDFS. IP:9870 or [0.0.0.0:9870](http://0.0.0.0:9870)
data:image/s3,"s3://crabby-images/a84d2/a84d220cea898215e6ebb165eb402ce8bbe3bcb6" alt=""Hadoop cluster interface. IP:8088 or [0.0.0.0:8088](http://0.0.0.0:8088)
data:image/s3,"s3://crabby-images/f2384/f23846a9611d29b4b7ea96b915d6427fae3910ab" alt=""[Ahmet Furkan Demir](https://ahmetfurkandemir.com/)