Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yosrak5/data-streaming
This project involves the development of a robust data engineering pipeline that orchestrates the seamless ingestion, processing, and storage of data .
https://github.com/yosrak5/data-streaming
airflow-dags apache cassandra docker etl kafka python spark
Last synced: about 2 months ago
JSON representation
This project involves the development of a robust data engineering pipeline that orchestrates the seamless ingestion, processing, and storage of data .
- Host: GitHub
- URL: https://github.com/yosrak5/data-streaming
- Owner: yosrak5
- Created: 2024-10-05T17:59:53.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-10-06T09:37:24.000Z (4 months ago)
- Last Synced: 2024-10-30T06:33:41.915Z (3 months ago)
- Topics: airflow-dags, apache, cassandra, docker, etl, kafka, python, spark
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Streaming
This project involves the development of a robust data engineering pipeline that orchestrates the seamless ingestion, processing, and storage of data. The pipeline is built using a combination of Python, Apache Airflow for workflow automation, and Apache Kafka for real-time data streaming. It follows a comprehensive ETL (Extract, Transform, Load) process and leverages Cassandra as a distributed database for scalable data storage. All components are containerized using Docker, ensuring easy deployment and scalability across environments.