https://github.com/lupusruber/crypto_stats
A project that provides a cloud-native solution for ingesting, transforming, and visualizing cryptocurrency data, utilizing modern tools and workflows for scalability and automation.
https://github.com/lupusruber/crypto_stats
data-engineering data-streaming etl-pipeline gcp terraform
Last synced: 3 months ago
JSON representation
A project that provides a cloud-native solution for ingesting, transforming, and visualizing cryptocurrency data, utilizing modern tools and workflows for scalability and automation.
- Host: GitHub
- URL: https://github.com/lupusruber/crypto_stats
- Owner: lupusruber
- Created: 2024-08-14T13:16:03.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-12-07T15:04:21.000Z (7 months ago)
- Last Synced: 2025-02-09T15:43:48.538Z (5 months ago)
- Topics: data-engineering, data-streaming, etl-pipeline, gcp, terraform
- Language: Python
- Homepage:
- Size: 536 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Crypto Statistics Data Engineering Project
This project is designed to process and visualize cryptocurrency data through a cloud-based architecture using Google Cloud Platform (GCP). The project includes end-to-end workflows for data ingestion, transformation, and visualization, with tools like Terraform for infrastructure setup, Mage for workflow orchestration, dbt for data transformation, and Metabase for interactive dashboards.
It aims to provide an automated, scalable solution for ingesting, transforming, and analyzing large volumes of cryptocurrency data in real-time.
## Technologies used
- **Cloud:** GCP (Google Cloud)
- **Infrastructure as code (IaC):** Terraform
- **Workflow orchestration:** Mage
- **Data Warehouse:** Google BigQuery
- **Data Lake:** Google Cloud Storage
- **Data Transofrmations:** dbt (Data Build Tool)
- **Data Visualizations:** Metabase## Data Ingestion DAG (Source -> Bucket)
## ETL (Bucket -> DWH)
## Dashboards
## How to replicate this project?
### 1. Clone the repo
```bash
git clone https://github.com/lupusruber/crypto_stats.git
```### 2. Create the needed infrastructure
```bash
cd terraform
terraform init
terraform plan
terraform apply
```
### 3. Get Mage and run the pipelines
```bash
docker run -it -p 6789:6789 -v $(pwd):/home/src mageai/mageai /app/run_app.sh mage start [project_name]
```
Copy the pipeline scripts inside the cointainer.### 4. Get dbt and run the models
For this project dbt cloud was used.
Create a new dbt project and add the models from the repo to the project directory.
Run the command:
```bash
dbt build
```
The staged models and the facts should be part of your big query dataset now.### 5. Get Metabase and create dashboards
```bash
docker run -d -p 3000:3000 --name metabase metabase/metabase
```
Create the dashboards using the data from Big Query.## Notes
- You need to have a GCS account
- Create a service account and download credentials
- Store credentails in [project_name]/keys/credentials.json, they are used by Terraform, Mage, dbt and Metabase