Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/blockchain-etl/blockchain-etl-architecture
Blockchain ETL Architecture
https://github.com/blockchain-etl/blockchain-etl-architecture
apache-beam blockchain blockchain-analytics crypto cryptocurrency data-analytics data-engineering ethereum etl gcp gke google-bigquery google-cloud google-cloud-platform google-container-engine google-dataflow google-pubsub kubernetes on-chain-analysis real-time-analytics
Last synced: 2 months ago
JSON representation
Blockchain ETL Architecture
- Host: GitHub
- URL: https://github.com/blockchain-etl/blockchain-etl-architecture
- Owner: blockchain-etl
- License: mit
- Created: 2020-04-09T13:06:44.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-10-10T10:47:32.000Z (over 2 years ago)
- Last Synced: 2024-08-01T13:18:10.335Z (6 months ago)
- Topics: apache-beam, blockchain, blockchain-analytics, crypto, cryptocurrency, data-analytics, data-engineering, ethereum, etl, gcp, gke, google-bigquery, google-cloud, google-cloud-platform, google-container-engine, google-dataflow, google-pubsub, kubernetes, on-chain-analysis, real-time-analytics
- Homepage:
- Size: 101 KB
- Stars: 43
- Watchers: 5
- Forks: 15
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Blockchain ETL Architecture
![blockchain_etl_architecture.svg](./assets/blockchain_etl_architecture.svg)
[Google Slides version](https://docs.google.com/presentation/d/1ZMTpj_1YKBxSBwvh2Y_0P-GRkHO-KR1mXZi9vyJWvg8/edit?usp=sharing)
1. The nodes are deployed with Terraform and run in Kubernetes.
Refer to these for more details:
- Template repository for deploying Terraform configurations: https://github.com/blockchain-etl/blockchain-terraform-deployment
- Terraform configuration files for running blockchain nodes: https://github.com/blockchain-etl/blockchain-terraform
- Kubernetes manifests for running blockchain nodes: https://github.com/blockchain-etl/blockchain-kubernetes2. The blockchain data is polled periodically from the nodes and pushed to Google Pub/Sub.
Refer to these for more details:
- Article explaining how to subscribe to public blockchain data in Pub/Sub:
https://medium.com/google-cloud/live-ethereum-and-bitcoin-data-in-google-bigquery-and-pub-sub-765b71cd57b5
- Streaming blockchain data to Google Pub/Sub or Postgres in Kubernetes:
https://github.com/blockchain-etl/blockchain-etl-streaming
- CLI tools for polling blockchain data from nodes:
https://github.com/blockchain-etl/ethereum-etl,
https://github.com/blockchain-etl/bitcoin-etl,
https://github.com/blockchain-etl/eos-etl.3. Airflow DAGs export and load blockchain data to BigQuery daily.
Refer to these for more details:
- Article explaining how the DAGs work:
https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset.
- Airflow DAGs for exporting, loading, and parsing blockchain data:
https://github.com/blockchain-etl/ethereum-etl-airflow,
https://github.com/blockchain-etl/bitcoin-etl-airflow,
https://github.com/blockchain-etl/eos-etl-airflow.
4. The blockchain data is pulled from Pub/Sub, transformed and streamed to BigQuery.
Refer to these for more details:
- Dataflow pipelines for connecting Pub/Sub topics with BigQuery tables:
https://github.com/blockchain-etl/blockchain-etl-dataflow.
5. Various applications of the public blockchain data:
- Blockchain streaming analytics: https://github.com/blockchain-etl/blockchain-streaming-analytics.
- Parsing Ethereum smart contract data: https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee.
- Blockchain analytics in BigQuery: https://github.com/blockchain-etl/awesome-bigquery-views.
- Clustering Ethereum addresses: https://towardsdatascience.com/clustering-ethereum-addresses-18aeca61919d.
- Twitter bot posting anomalous transactions: https://twitter.com/BlockchainETL.
- ...