{"id":13631891,"url":"https://github.com/blockchain-etl/bitcoin-etl","last_synced_at":"2025-04-10T01:06:18.944Z","repository":{"id":41900264,"uuid":"148597387","full_name":"blockchain-etl/bitcoin-etl","owner":"blockchain-etl","description":"ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ","archived":false,"fork":false,"pushed_at":"2025-03-04T12:13:59.000Z","size":356,"stargazers_count":421,"open_issues_count":27,"forks_count":126,"subscribers_count":29,"default_branch":"master","last_synced_at":"2025-04-10T01:06:13.683Z","etag":null,"topics":["apache-beam","bitcoin","bitcoincash","blockchain-analytics","crypto","cryptocurrency","dash","data-analytics","data-engineering","dogecoin","etl","gcp","google-dataflow","google-pubsub","litecoin","on-chain-analysis","web3","zcash"],"latest_commit_sha":null,"homepage":"https://twitter.com/BlockchainETL","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blockchain-etl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-13T07:13:25.000Z","updated_at":"2025-03-18T20:19:19.000Z","dependencies_parsed_at":"2024-01-14T06:53:40.887Z","dependency_job_id":"eab33cb5-c83f-4e93-b659-75487e0d903a","html_url":"https://github.com/blockchain-etl/bitcoin-etl","commit_stats":{"total_commits":193,"total_committers":9,"mean_commits":"21.444444444444443","dds":"0.17098445595854928","last_synced_commit":"b868c93cd030c086cbd469f71bfd94799094cbf6"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blockchain-etl%2Fbitcoin-etl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blockchain-etl%2Fbitcoin-etl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blockchain-etl%2Fbitcoin-etl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blockchain-etl%2Fbitcoin-etl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blockchain-etl","download_url":"https://codeload.github.com/blockchain-etl/bitcoin-etl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137888,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","bitcoin","bitcoincash","blockchain-analytics","crypto","cryptocurrency","dash","data-analytics","data-engineering","dogecoin","etl","gcp","google-dataflow","google-pubsub","litecoin","on-chain-analysis","web3","zcash"],"created_at":"2024-08-01T22:02:42.934Z","updated_at":"2025-04-10T01:06:18.917Z","avatar_url":"https://github.com/blockchain-etl.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Bitcoin ETL\n\n[![Join the chat at https://gitter.im/ethereum-eth](https://badges.gitter.im/ethereum-etl.svg)](https://gitter.im/ethereum-etl/Lobby?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[![Build Status](https://travis-ci.com/blockchain-etl/bitcoin-etl.png)](https://travis-ci.com/blockchain-etl/bitcoin-etl)\n[![Join Telegram Group](https://img.shields.io/badge/telegram-join%20chat-blue.svg)](https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ)\n\nInstall Bitcoin ETL:\n\n```bash\npip install bitcoin-etl\n```\n\nExport blocks and transactions ([Schema](#blocksjson), [Reference](#export_blocks_and_transactions)):\n\n```bash\n\u003e bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \\\n--provider-uri http://user:pass@localhost:8332 --chain bitcoin \\\n --blocks-output blocks.json --transactions-output transactions.json\n```\n\nSupported chains:\n- bitcoin\n- bitcoin_cash\n- bitcoin_gold\n- dogecoin\n- litecoin\n- dash\n- zcash\n\nStream blockchain data continually to console ([Reference](#stream)):\n\n```bash\n\u003e pip install bitcoin-etl[streaming]\n\u003e bitcoinetl stream -p http://user:pass@localhost:8332 --start-block 500000\n```\n\nStream blockchain data continually to Google Pub/Sub ([Reference](#stream)):\n\n```bash\n\u003e export GOOGLE_APPLICATION_CREDENTIALS=/path_to_credentials_file.json\n\u003e bitcoinetl stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/crypto_bitcoin\n\n```\n\nFor the latest version, check out the repo and call\n```bash\n\u003e pip install -e .[streaming]\n\u003e python bitcoinetl.py\n```\n\n## Table of Contents\n\n- [Bitcoin ETL](#bitcoin-etl)\n  - [Table of Contents](#table-of-contents)\n  - [Schema](#schema)\n    - [blocks.json](#blocksjson)\n    - [transactions.json](#transactionsjson)\n    - [transaction_input](#transactioninput)\n    - [transaction_output](#transactionoutput)\n  - [Exporting the Blockchain](#exporting-the-blockchain)\n    - [Running in Docker](#running-in-docker)\n    - [Command Reference](#command-reference)\n      - [export_blocks_and_transactions](#exportblocksandtransactions)\n      - [enrich_transactions](#enrichtransactions)\n      - [get_block_range_for_date](#getblockrangefordate)\n      - [export_all](#exportall)\n      - [stream](#stream)\n    - [Running Tests](#running-tests)\n    - [Running Tox Tests](#running-tox-tests)\n    - [Public Datasets in BigQuery](#public-datasets-in-bigquery)\n\n\n## Schema\n\n### blocks.json\n\nField               | Type            |\n--------------------|-----------------|\nhash                | hex_string      |\nsize                | bigint          |\nstripped_size       | bigint          |\nweight              | bigint          |\nnumber              | bigint          |\nversion             | bigint          |\nmerkle_root         | hex_string      |\ntimestamp           | bigint          |\nnonce               | hex_string      |\nbits                | hex_string      |\ncoinbase_param      | hex_string      |\ntransaction_count   | bigint          |\n\n### transactions.json\n\nField                   | Type                  |\n------------------------|-----------------------|\nhash                    | hex_string            |\nsize                    | bigint                |\nvirtual_size            | bigint                |\nversion                 | bigint                |\nlock_time               | bigint                |\nblock_number            | bigint                |\nblock_hash              | hex_string            |\nblock_timestamp         | bigint                |\nis_coinbase             | boolean               |\nindex                   | bigint                |\ninputs                  | []transaction_input   |\noutputs                 | []transaction_output  |\ninput_count             | bigint                |\noutput_count            | bigint                |\ninput_value             | bigint                |\noutput_value            | bigint                |\nfee                     | bigint                |\n\n### transaction_input\n\nField                   | Type                  |\n------------------------|-----------------------|\nindex                   | bigint                |\nspent_transaction_hash  | hex_string            |\nspent_output_index      | bigint                |\nscript_asm              | string                |\nscript_hex              | hex_string            |\nsequence                | bigint                |\nrequired_signatures     | bigint                |\ntype                    | string                |\naddresses               | []string              |\nvalue                   | bigint                |\n\n### transaction_output\n\nField                   | Type                  |\n------------------------|-----------------------|\nindex                   | bigint                |\nscript_asm              | string                |\nscript_hex              | hex_string            |\nrequired_signatures     | bigint                |\ntype                    | string                |\naddresses               | []string              |\nvalue                   | bigint                |\n\n\nYou can find column descriptions in [schemas](https://github.com/blockchain-etl/bitcoin-etl-airflow/tree/master/dags/resources/stages/enrich/schemas)\n\n**Notes**:\n\n1. Output values returned by Dogecoin API had precision loss in the clients prior to version 1.14.\nIt's caused by this issue https://github.com/dogecoin/dogecoin/issues/1558\nThe explorers that used older versions to export the data may show incorrect address balances and transaction amounts.\n\n1. For Zcash, `vjoinsplit` and `valueBalance` fields are converted to inputs and outputs with type 'shielded'\nhttps://zcash-rpc.github.io/getrawtransaction.html, https://zcash.readthedocs.io/en/latest/rtd_pages/zips/zip-0243.html\n\n\n## Exporting the Blockchain\n\n1. Install python 3.5.3+ https://www.python.org/downloads/\n\n1. Install Bitcoin node https://hackernoon.com/a-complete-beginners-guide-to-installing-a-bitcoin-full-node-on-linux-2018-edition-cb8e384479ea\n\n1. Start Bitcoin.\nMake sure it downloaded the blocks that you need by executing `$ bitcoin-cli getblockchaininfo` in the terminal.\nYou can export blocks below `blocks`, there is no need to wait until the full sync\n\n1. Install Bitcoin ETL:\n\n    ```bash\n    \u003e pip install bitcoin-etl\n    ```\n\n1. Export blocks \u0026 transactions:\n\n    ```bash\n    \u003e bitcoinetl export_all --start 0 --end 499999  \\\n    --partition-batch-size 100 \\\n    --provider-uri http://user:pass@localhost:8332 --chain bitcoin\n    ```\n\n    The result will be in the `output` subdirectory, partitioned in Hive style:\n\n    ```bash\n    output/blocks/start_block=00000000/end_block=00000099/blocks_00000000_00000099.csv\n    output/blocks/start_block=00000100/end_block=00000199/blocks_00000100_=00000199.csv\n    ...\n    output/transactions/start_block=00000000/end_block=00000099/transactions_00000000_00000099.csv\n    ...\n    ```\n\n    In case `bitcoinetl` command is not available in PATH, use `python -m bitcoinetl` instead.\n\n### Running in Docker\n\n1. Install Docker https://docs.docker.com/install/\n\n1. Build a docker image\n    ```bash\n    \u003e docker build --platform linux/x86_64 -t bitcoin-etl:latest .\n    \u003e docker image ls\n    ```\n\n1. Run a container out of the image\n    ```bash\n    \u003e docker run --platform linux/x86_64 -v $HOME/output:/bitcoin-etl/output bitcoin-etl:latest export_blocks_and_transactions --start-block 0 --end-block 500000 \\\n        --provider-uri http://user:pass@localhost:8332 --blocks-output output/blocks.json --transactions-output output/transactions.json\n    ```\n\n1. Run streaming to console or Pub/Sub\n    ```bash\n    \u003e docker build --platform linux/x86_64 -t bitcoin-etl:latest-streaming -f Dockerfile_with_streaming .\n    \u003e echo \"Stream to console\"\n    \u003e docker run --platform linux/x86_64 bitcoin-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000\n    \u003e echo \"Stream to Pub/Sub\"\n    \u003e docker run --platform linux/x86_64 -v /path_to_credentials_file/:/bitcoin-etl/ --env GOOGLE_APPLICATION_CREDENTIALS=/bitcoin-etl/credentials_file.json bitcoin-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/crypto_bitcoin\n    ```\n\n1. Refer to https://github.com/blockchain-etl/bitcoin-etl-streaming for deploying the streaming app to\nGoogle Kubernetes Engine.\n\n### Command Reference\n\n- [export_blocks_and_transactions](#export_blocks_and_transactions)\n- [enrich_transactions](#enrich_transactions)\n- [get_block_range_for_date](#get_block_range_for_date)\n- [export_all](#export_all)\n- [stream](#stream)\n\nAll the commands accept `-h` parameter for help, e.g.:\n\n```bash\n\u003e bitcoinetl export_blocks_and_transactions --help\nUsage: bitcoinetl.py export_blocks_and_transactions [OPTIONS]\n\n  Export blocks and transactions.\n\nOptions:\n  -s, --start-block INTEGER   Start block\n  -e, --end-block INTEGER     End block  [required]\n  -b, --batch-size INTEGER    The number of blocks to export at a time.\n  -p, --provider-uri TEXT     The URI of the remote Bitcoin node\n  -w, --max-workers INTEGER   The maximum number of workers.\n  --blocks-output TEXT        The output file for blocks. If not provided\n                              blocks will not be exported. Use \"-\" for stdout\n  --transactions-output TEXT  The output file for transactions. If not\n                              provided transactions will not be exported. Use\n                              \"-\" for stdout\n  --help                      Show this message and exit.\n```\n\nFor the `--output` parameters the supported type is json. The format type is inferred from the output file name.\n\n#### export_blocks_and_transactions\n\n```bash\n\u003e bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \\\n  --provider-uri http://user:pass@localhost:8332 \\\n  --blocks-output blocks.json --transactions-output transactions.json\n```\n\nOmit `--blocks-output` or `--transactions-output` options if you want to export only transactions/blocks.\n\nYou can tune `--batch-size`, `--max-workers` for performance.\n\nNote that `required_signatures`, `type`, `addresses`, and `value` fields will be empty in transactions inputs.\nUse [enrich_transactions](#enrich_transactions) to populate those fields.\n\n#### enrich_transactions\n\nYou need to run bitcoin daemon with option `txindex=1` for this command to work.\n\n```bash\n\u003e bitcoinetl enrich_transactions  \\\n  --provider-uri http://user:pass@localhost:8332 \\\n  --transactions-input transactions.json --transactions-output enriched_transactions.json\n```\n\nYou can tune `--batch-size`, `--max-workers` for performance.\n\n#### get_block_range_for_date\n\n```bash\n\u003e bitcoinetl get_block_range_for_date --provider-uri http://user:pass@localhost:8332 --date=2017-03-01\n```\n\nThis command is guaranteed to return the block range that covers all blocks with `block.time` on the specified\ndate. However the returned block range may also contain blocks outside the specified date, because block times are not\nmonotonic https://twitter.com/EvgeMedvedev/status/1073844856009576448. You can filter\n`blocks.json`/`transactions.json` with the below command:\n\n```bash\n\u003e bitcoinetl filter_items -i blocks.json -o blocks_filtered.json \\\n-p \"datetime.datetime.fromtimestamp(item['timestamp']).astimezone(datetime.timezone.utc).strftime('%Y-%m-%d') == '2017-03-01'\"\n```\n\n#### export_all\n\n```bash\n\u003e bitcoinetl export_all --provider-uri http://user:pass@localhost:8332 --start 2018-01-01 --end 2018-01-02\n```\n\nYou can tune `--export-batch-size`, `--max-workers` for performance.\n\n#### stream\n\n```bash\n\u003e bitcoinetl stream --provider-uri http://user:pass@localhost:8332 --start-block 500000\n```\n\n- This command outputs blocks and transactions to the console by default.\n- Use `--output` option to specify the Google Pub/Sub topic where to publish blockchain data,\ne.g. `projects/your-project/topics/crypto_bitcoin`. Blocks and transactions will be pushed to\n`projects/your-project/topics/crypto_bitcoin.blocks` and `projects/your-project/topics/crypto_bitcoin.transactions`\ntopics.\n- The command saves its state to `last_synced_block.txt` file where the last synced block number is saved periodically.\n- Specify either `--start-block` or `--last-synced-block-file` option. `--last-synced-block-file` should point to the\nfile where the block number, from which to start streaming the blockchain data, is saved.\n- Use the `--lag` option to specify how many blocks to lag behind the head of the blockchain. It's the simplest way to\nhandle chain reorganizations - they are less likely the further a block from the head.\n- Use the `--chain` option to specify the type of the chain, e.g. `bitcoin`, `litecoin`, `dash`, `zcash`, etc.\n- You can tune `--period-seconds`, `--batch-size`, `--max-workers` for performance.\n\n\n### Running Tests\n\n```bash\n\u003e pip install -e .[dev]\n\u003e echo \"The below variables are optional\"\n\u003e export BITCOINETL_BITCOIN_PROVIDER_URI=http://user:pass@localhost:8332\n\u003e export BITCOINETL_LITECOIN_PROVIDER_URI=http://user:pass@localhost:8331\n\u003e export BITCOINETL_DOGECOIN_PROVIDER_URI=http://user:pass@localhost:8330\n\u003e export BITCOINETL_BITCOIN_CASH_PROVIDER_URI=http://user:pass@localhost:8329\n\u003e export BITCOINETL_DASH_PROVIDER_URI=http://user:pass@localhost:8328\n\u003e export BITCOINETL_ZCASH_PROVIDER_URI=http://user:pass@localhost:8327\n\u003e pytest -vv\n```\n\n### Running Tox Tests\n\n```bash\n\u003e pip install tox\n\u003e tox\n```\n\n### Public Datasets in BigQuery\n\nhttps://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblockchain-etl%2Fbitcoin-etl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblockchain-etl%2Fbitcoin-etl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblockchain-etl%2Fbitcoin-etl/lists"}