https://github.com/jacqueline-dev/crypto-data-engineering
https://github.com/jacqueline-dev/crypto-data-engineering
airflow coinbase crypto dataengineering docker etl-pipeline pandas postgresql powerbi python
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/jacqueline-dev/crypto-data-engineering
- Owner: Jacqueline-dev
- Created: 2025-04-07T12:54:15.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-07T15:13:13.000Z (about 1 year ago)
- Last Synced: 2025-04-07T16:24:52.960Z (about 1 year ago)
- Topics: airflow, coinbase, crypto, dataengineering, docker, etl-pipeline, pandas, postgresql, powerbi, python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# βΏ Crypto ETL Pipeline with Airflow, PostgreSQL, Docker, and Power BI
## π Overview
This project is a data engineering pipeline focused on cryptocurrency. It extracts data from the public CoinGecko API, transforms it using Python and Pandas, stores it in SQLite, orchestrates processes with Apache Airflow, and provides interactive dashboards via Power BI.
The entire solution is containerized using Docker to ensure portability and consistency across different environments.
## π§ Technologies Used
- **Python** β ETL scripting and automation
- **Pandas** β Data cleaning and transformation
- **CoinGecko API** β Public crypto market data
- **PostgreSQL** β Relational database for storage
- **Apache Airflow** β Pipeline orchestration and scheduling
- **Docker** β Containerization of the whole stack
- **Power BI** β Data visualization
- **VS Code + WSL (Ubuntu)** β Development environment
## π Pipeline Architecture
- **Ingestion:** Python scripts call the CoinGecko API to fetch real-time crypto data.
- **Transformation:** Data is cleaned and shaped using Pandas.
- **Loading:** Transformed data is loaded into a PostgreSQL database.
- **Orchestration:** Apache Airflow schedules and monitors ETL jobs.
- **Visualization:** Power BI connects to the database to create dashboards.
## π Project Structure
```bash
crypto-etl-pipeline/
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignored files and folders
βββ docker-compose.yml # File to spin up containers
βββ .env # Environment variables
βββ src/ # ETL scripts
β βββ extraction.py # Extracts data from CoinGecko API
β βββ transform.py # Cleans and transforms data
β βββ load.py # Loads data into PostgreSQL
β βββ config.py # Configurations and variable reading
β βββ utils.py # Auxiliary functions
βββ dags/ # Apache Airflow DAGs
β βββ crypto_dag.py # DAG that orchestrates the ETL pipeline
βββ docs/ # DocumentaΓ§Γ£o e diagramas
βββ arquitetura.pdf # Architecture diagram
βββ fluxo_de_dados.png # Data flow diagram
βββ setup_ambiente.md # Environment setup guide