https://github.com/blockchain-etl/blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
https://github.com/blockchain-etl/blockchain-etl-streaming
apache-beam bitcoin blockchain blockchain-analytics crypto cryptocurrency data-analytics data-engineering ethereum etl gcp google-bigquery google-cloud-platform google-dataflow google-pubsub on-chain-analysis real-time real-time-analytics stream-processing web3
Last synced: 12 months ago
JSON representation
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
- Host: GitHub
- URL: https://github.com/blockchain-etl/blockchain-etl-streaming
- Owner: blockchain-etl
- License: mit
- Created: 2018-09-17T06:48:27.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-12-09T12:37:10.000Z (over 4 years ago)
- Last Synced: 2025-06-24T05:42:40.372Z (12 months ago)
- Topics: apache-beam, bitcoin, blockchain, blockchain-analytics, crypto, cryptocurrency, data-analytics, data-engineering, ethereum, etl, gcp, google-bigquery, google-cloud-platform, google-dataflow, google-pubsub, on-chain-analysis, real-time, real-time-analytics, stream-processing, web3
- Language: Python
- Homepage: https://medium.com/google-cloud/live-ethereum-and-bitcoin-data-in-google-bigquery-and-pub-sub-765b71cd57b5
- Size: 64.5 KB
- Stars: 80
- Watchers: 8
- Forks: 22
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Blockchain ETL Streaming
Streams the following Ethereum entities to Pub/Sub or Postgres using
[ethereum-etl stream](https://github.com/blockchain-etl/ethereum-etl/tree/develop/docs/commands.md#stream):
- blocks
- transactions
- logs
- token_transfers
- traces
- contracts
- tokens
Streams blocks and transactions to Pub/Sub using
[bitcoin-etl stream](https://github.com/blockchain-etl/bitcoin-etl#stream). Supported chains:
- bitcoin
- bitcoin_cash
- dogecoin
- litecoin
- dash
- zcash
## Deployment Instructions
1. Create a cluster:
```bash
gcloud container clusters create ethereum-etl-streaming \
--zone us-central1-a \
--num-nodes 1 \
--disk-size 10GB \
--machine-type custom-2-4096 \
--network default \
--subnetwork default \
--scopes pubsub,storage-rw,logging-write,monitoring-write,service-management,service-control,trace
```
2. Get `kubectl` credentials:
```bash
gcloud container clusters get-credentials ethereum-etl-streaming \
--zone us-central1-a
```
3. Create Pub/Sub topics (use `create_pubsub_topics_ethereum.sh`). Skip this step if you need to stream to Postgres.
- "crypto_ethereum.blocks"
- "crypto_ethereum.transactions"
- "crypto_ethereum.token_transfers"
- "crypto_ethereum.logs"
- "crypto_ethereum.traces"
- "crypto_ethereum.contracts"
- "crypto_ethereum.tokens"
4. Create GCS bucket. Upload a text file with block number you want to start streaming from to
`gs:///ethereum-etl/streaming/last_synced_block.txt`.
5. Create "ethereum-etl-app" service account with roles:
- Pub/Sub Editor
- Storage Object Admin
- Cloud SQL Client
Download the key. Create a Kubernetes secret:
```bash
kubectl create secret generic streaming-app-key --from-file=key.json=$HOME/Downloads/key.json -n eth
```
6. Install [helm] (https://github.com/helm/helm#install)
```bash
brew install helm
helm init
bash patch-tiller.sh
```
7. Copy [example values](example_values) directory to `values` dir and adjust all the files at least with your bucket and project ID.
8. Install ETL apps via helm using chart from this repo and values we adjust on previous step, for example:
```bash
helm install --name btc --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/bitcoin/values.yaml
helm install --name bch --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/bitcoin_cash/values.yaml
helm install --name dash --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/dash/values.yaml
helm install --name dogecoin --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/dogecoin/values.yaml
helm install --name litecoin --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/litecoin/values.yaml
helm install --name zcash --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/zcash/values.yaml
helm install --name eth-blocks --namespace eth charts/blockchain-etl-streaming \
--values values/ethereum/values.yaml --values values/ethereum/block_data/values.yaml
helm install --name eth-traces --namespace eth charts/blockchain-etl-streaming \
--values values/ethereum/values.yaml --values values/ethereum/trace_data/values.yaml
```
Ethereum block and trace data streaming are decoupled for higher reliability.
To stream to Postgres:
```bash
helm install --name eth-postgres --namespace eth charts/blockchain-etl-streaming \
--values values/ethereum/values-postgres.yaml
```
Refer to https://github.com/blockchain-etl/ethereum-etl-postgres for table schema and initial data load.
9. Use `describe` command to troubleshoot, f.e.:
```bash
kubectl describe pods -n btc
kubectl describe node [NODE_NAME]
```
Refer to [blockchain-etl-dataflow](https://github.com/blockchain-etl/blockchain-etl-dataflow)
for connecting Pub/Sub to BigQuery.