Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/acardosolima/crypto-ethereum-tokens
This project aims to create a data pipeline using Airflow to ingest dataset from Google Bigquery to a PostgreSQL database. This stack will run in a local environment using Kubernetes.
https://github.com/acardosolima/crypto-ethereum-tokens
airflow bigquery postgresql python
Last synced: about 1 month ago
JSON representation
This project aims to create a data pipeline using Airflow to ingest dataset from Google Bigquery to a PostgreSQL database. This stack will run in a local environment using Kubernetes.
- Host: GitHub
- URL: https://github.com/acardosolima/crypto-ethereum-tokens
- Owner: acardosolima
- Created: 2024-07-31T21:35:23.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2024-08-05T15:24:46.000Z (3 months ago)
- Last Synced: 2024-09-30T10:41:09.251Z (about 2 months ago)
- Topics: airflow, bigquery, postgresql, python
- Language: Python
- Homepage:
- Size: 92.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Crypto Ethereum Tokens Pipeline
This project aims to create a data pipeline using Airflow to ingest dataset from Google Bigquery to a PostgreSQL database. This stack will run in a local environment using Kubernetes.Project: bigquery-public-data
Dataset: crypto_ethereum
Table: tokens
## Table of contents
- [Crypto Ethereum Tokens Pipeline](#crypto-ethereum-tokens-pipeline)
- [Table of contents](#table-of-contents)
- [Setting up](#setting-up)
- [Google Cloud setup](#google-cloud-setup)
- [Install required programs](#install-required-programs)
- [Local environment setup](#local-environment-setup)
- [Configure airflow connections](#configure-airflow-connections)
- [BigQuery](#bigquery)
- [PostgreSQL](#postgresql)
- [Validation process](#validation-process)
- [Pre processing](#pre-processing)
- [Single execution](#single-execution)
- [Backfill](#backfill)
- [Contributors](#contributors)## Setting up
### Google Cloud setup
1. Install [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) and configure the authentication method
2. Check if CLI is authenticated and project is listed
```
gcloud auth list
gcloudconfig list project
```
3. Enable Bigquery API
```
gcloud services enable bigquery.googleapis.com
```
4. Create environment variable with project id
```
export PROJECT_ID=$(gcloud config get-value core/project)
```
5. Create service account in GCP
```
gcloud iam service-accounts create python-bigquery-sa --display-name "bigquery sa for crypto-ethereum project"
```
6. Create JSON access key for the service account
```
gcloud iam service-accounts keys create ~/key.json --iam-account python-bigquery-sa@${PROJECT_ID}.iam.gserviceaccount.com
```7. Add IAM roles to access Bigquery datasets
```
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:python-bigquery-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role "roles/bigquery.user"
```### Install required programs
1. Install [Docker Engine](https://docs.docker.com/engine/install/) and check if it is running
```
docker --version
```
2. Install [Minikube](https://k8s-docs.netlify.app/en/docs/tasks/tools/install-minikube/) and check if it's working
```
minikube status
```
3. Install [Terraform](https://developer.hashicorp.com/terraform/install) and check if it's available
```
terraform -version
```
4. Install [Helm](https://helm.sh/docs/intro/install/) and check if it's available
```
helm version
```### Local environment setup
1. Clone repo and install python dependencies within virtual environment
```
git clone https://github.com/acardosolima/crypto-ethereum-tokens.git
cd crypto-ethereum-tokens/
python -m venv .env
source .env/Scripts\activate
pip install -r requirements.txt
```
2. Execute setup.sh to initialize both postgres and airflow deployment.
```
source setup.sh
```### Configure airflow connections
In the UI interface, go to **Admin > Connections > +** and add two connections configs, one for Bigquery and another for Postgres#### BigQuery
- Connection Id = 'bigquery_credentials'
- Connection Type = Google Bigquery
- Keyfile JSON = Paste the content of json secret key generated in [step 1.6](#json_key)#### PostgreSQL
- Connection Id = postgres_credentials
- Connection Type = Postgres
- Host: Paste IP address printed in the end of *setup.sh* script
- Database: crypto_ethereum
- Login: postgres
- Password: Paste the output of $POSTGRES_PASSWORD environment variable
- Port: 5432### Validation process
#### Pre processing
Table had no rows
![1](https://github.com/user-attachments/assets/26a0576b-eddf-4035-a10d-46f18e141d76)#### Single execution
Inserted data for a single day
![2](https://github.com/user-attachments/assets/f0ec01e8-df14-4827-afae-47846a8b5776)
![3](https://github.com/user-attachments/assets/876e4479-4244-448d-86b1-745463a5ff48)#### Backfill
Executed backfill process considering the last 7 days
![4](https://github.com/user-attachments/assets/860f866f-fcc4-4e53-8966-d65ac7441dab)
![6](https://github.com/user-attachments/assets/84f1744c-8552-43a0-b9df-8b98252b58dd)## Contributors
- [Adriano C. Lima](mailto:[email protected])