An open API service indexing awesome lists of open source software.

https://github.com/jacqueline-dev/crypto-data-engineering


https://github.com/jacqueline-dev/crypto-data-engineering

airflow coinbase crypto dataengineering docker etl-pipeline pandas postgresql powerbi python

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

          

# β‚Ώ Crypto ETL Pipeline with Airflow, PostgreSQL, Docker, and Power BI

## πŸ“Œ Overview

This project is a data engineering pipeline focused on cryptocurrency. It extracts data from the public CoinGecko API, transforms it using Python and Pandas, stores it in SQLite, orchestrates processes with Apache Airflow, and provides interactive dashboards via Power BI.

The entire solution is containerized using Docker to ensure portability and consistency across different environments.

## πŸ”§ Technologies Used

- **Python** – ETL scripting and automation
- **Pandas** – Data cleaning and transformation
- **CoinGecko API** – Public crypto market data
- **PostgreSQL** – Relational database for storage
- **Apache Airflow** – Pipeline orchestration and scheduling
- **Docker** – Containerization of the whole stack
- **Power BI** – Data visualization
- **VS Code + WSL (Ubuntu)** – Development environment

## πŸ” Pipeline Architecture

- **Ingestion:** Python scripts call the CoinGecko API to fetch real-time crypto data.
- **Transformation:** Data is cleaned and shaped using Pandas.
- **Loading:** Transformed data is loaded into a PostgreSQL database.
- **Orchestration:** Apache Airflow schedules and monitors ETL jobs.
- **Visualization:** Power BI connects to the database to create dashboards.

## πŸ“‚ Project Structure

```bash
crypto-etl-pipeline/
β”œβ”€β”€ README.md # Project documentation
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .gitignore # Git ignored files and folders
β”œβ”€β”€ docker-compose.yml # File to spin up containers
β”œβ”€β”€ .env # Environment variables
β”œβ”€β”€ src/ # ETL scripts
β”‚ β”œβ”€β”€ extraction.py # Extracts data from CoinGecko API
β”‚ β”œβ”€β”€ transform.py # Cleans and transforms data
β”‚ β”œβ”€β”€ load.py # Loads data into PostgreSQL
β”‚ β”œβ”€β”€ config.py # Configurations and variable reading
β”‚ └── utils.py # Auxiliary functions
β”œβ”€β”€ dags/ # Apache Airflow DAGs
β”‚ └── crypto_dag.py # DAG that orchestrates the ETL pipeline
└── docs/ # DocumentaΓ§Γ£o e diagramas
β”œβ”€β”€ arquitetura.pdf # Architecture diagram
β”œβ”€β”€ fluxo_de_dados.png # Data flow diagram
└── setup_ambiente.md # Environment setup guide