{"id":22821843,"url":"https://github.com/blkchain/blkchain","last_synced_at":"2025-04-06T06:08:21.109Z","repository":{"id":54281705,"uuid":"102003644","full_name":"blkchain/blkchain","owner":"blkchain","description":"Fast import of the blockchain into PostgreSQL.","archived":false,"fork":false,"pushed_at":"2024-12-12T00:22:24.000Z","size":154,"stargazers_count":80,"open_issues_count":2,"forks_count":25,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-30T05:07:58.871Z","etag":null,"topics":["bitcoin","blockchain","golang","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blkchain.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-31T13:29:31.000Z","updated_at":"2025-03-20T14:05:46.000Z","dependencies_parsed_at":"2023-11-09T17:29:23.643Z","dependency_job_id":"d37e98a5-c6a1-4468-91bf-af62ed3b99f6","html_url":"https://github.com/blkchain/blkchain","commit_stats":{"total_commits":69,"total_committers":6,"mean_commits":11.5,"dds":0.1159420289855072,"last_synced_commit":"d606a31dc1fb494ec395042951a83dd3d5521bbf"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blkchain%2Fblkchain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blkchain%2Fblkchain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blkchain%2Fblkchain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blkchain%2Fblkchain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blkchain","download_url":"https://codeload.github.com/blkchain/blkchain/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247441051,"owners_count":20939239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitcoin","blockchain","golang","postgresql"],"created_at":"2024-12-12T16:09:10.019Z","updated_at":"2025-04-06T06:08:21.090Z","avatar_url":"https://github.com/blkchain.png","language":"Go","funding_links":[],"categories":["Compiled list"],"sub_categories":["plv8:"],"readme":"\n# Fast Bitcoin Blockchain Postgres Import\n\n## Introduction\n\nThis is Go code aimed at importing the blockchain into Postgres as\nfast as possible. Licensed under the Apache License, Version 2.0\n\nThis code can import the entire blockchain from a Bitcoin Core node\ninto a PostgreSQL database in 21 hours on fairly basic hardware.\n\nOnce the data is imported, the tool can append new blocks as they\nappear on the network by connecting to a Core node. You can then run\nthis primitive (but completely private) [Block Explorer](https://github.com/blkchain/blocks)\nagainst this database.\n\nThis project was started mostly for fun and learning and to find out\nwhether putting the blockchain into PostgreSQL is (1) possible and (2)\nuseful. We think \"yes\" on both, but you can draw your own conclusions.\n\n## Quick Overview\n\nThe source of the data is the [Bitcoin Core](https://bitcoin.org/en/download) store. You need a Core\ninstance to download the entire blockchain, then the `cmd/import` tool\nwill be able to read the data *directly* (not via RPC) by accessing\nthe LevelDb and the blocks files as well as the UTXO set. (The Core\nprogram cannot run while this happens, but this is only necessary during\nthe initial bulk import of the data).\n\nYou should be able to build `go build cmd/import/import.go` then run\nit with (Core should not be running):\n\n```sh\n# Warning - this may take many hours\n./import \\\n     -connstr \"host=192.168.X.X dbname=blocks sslmode=disable\" \\\n     -cache-size 100000000 \\\n     -blocks ~/.bitcoin/blocks\n```\n\nThis will read all blocks and upload them to Postgres. The block\ndescriptors are first read from leveldb block index, which contains\nfile names and offsets to actual block data. Using the block index\nlets us read blocks in order which is essential for the correct\nsetting of tx_id in outputs. For every output we also query the\nLevelDb UTXO set so that we can set the `spent` column correctly.\n\nThe following log excerpt is from an import with the exact parameters\nas above, where the sending machine is 2019 MacBook Pro with 32GB of\nRAM and the receiving (PostgreSQL server) machine is a 4-core i7 with\n16GB running PostgreSQL version 15, both machines have SSDs.\n\n``` txt\n2023/11/03 16:57:13 Setting open files rlimit of 256 to 1024.\n2023/11/03 16:57:13 Tables created without indexes, which are created at the very end.\n2023/11/03 16:57:13 Setting table parameters: autovacuum_enabled=false\n2023/11/03 16:57:13 Reading block headers from LevelDb (/Users/grisha/Library/Application Support/Bitcoin/blocks/index)...\n2023/11/03 16:57:14 Read 814062 block header entries.\n2023/11/03 16:57:14 Ignoring orphan block 000000000000000000025edbf5ea025e4af2674b318ba82206f70681d97ca162\n2023/11/03 16:57:15 Read 814062 block headers.\n2023/11/03 16:57:18 Height: 64469 Txs: 72672 Time: 2010-07-05 15:34:27 -0400 EDT Tx/s: 14534.391593 KB/s: 3574.597873 Runtime: 5s\n2023/11/03 16:57:23 Height: 93635 Txs: 191499 Time: 2010-11-24 13:40:43 -0500 EST Tx/s: 19148.955229 KB/s: 5088.048924 Runtime: 10s\n2023/11/03 16:57:28 Height: 111512 Txs: 306357 Time: 2011-03-03 05:08:54 -0500 EST Tx/s: 20422.953787 KB/s: 5733.495734 Runtime: 15s\n2023/11/03 16:57:33 Height: 118521 Txs: 412626 Time: 2011-04-15 15:15:05 -0400 EDT Tx/s: 20630.450404 KB/s: 6071.899925 Runtime: 20s\n2023/11/03 16:57:38 Height: 124440 Txs: 512850 Time: 2011-05-16 17:18:22 -0400 EDT Tx/s: 20510.751291 KB/s: 6473.134698 Runtime: 25s\n2023/11/03 16:57:38 Txid cache hits: 645079 (100.00%) misses: 0 collisions: 0 dupes: 2 evictions: 364709 size: 148139 procmem: 434 MiB\n2023/11/03 16:57:43 Height: 128385 Txs: 613308 Time: 2011-06-03 12:27:53 -0400 EDT Tx/s: 20438.703244 KB/s: 6757.081103 Runtime: 30s\n... snip ...\n2023/11/04 03:27:07 WARNING: Txid cache collision at hash: 0f157800dba58b15ad242b3f7b48b4010079515e2c9e4702384cc701f05cebc0 existing id: 713414812 new id: 739931084 (prefix sz: 7).\n... snip ...\n2023/11/04 06:15:13 Height: 813296 Txs: 907828225 Time: 2023-10-22 01:13:19 -0400 EDT Tx/s: 18960.353894 KB/s: 10611.851878 Runtime: 13h18m0s\n2023/11/04 06:15:19 Height: 813321 Txs: 907874418 Time: 2023-10-22 06:24:08 -0400 EDT Tx/s: 18959.286326 KB/s: 10611.507696 Runtime: 13h18m6s\n2023/11/04 06:15:19 Txid cache hits: 2369583564 (99.91%) misses: 2042901 collisions: 1 dupes: 2 evictions: 777933798 size: 105238549 procmem: 16243 MiB\n2023/11/04 06:15:24 Height: 813339 Txs: 907900812 Time: 2023-10-22 09:00:01 -0400 EDT Tx/s: 18957.833769 KB/s: 10610.989629 Runtime: 13h18m11s\n2023/11/04 06:15:29 Closing channel, waiting for workers to finish...\n2023/11/04 06:15:29 Height: 813369 Txs: 907964339 Time: 2023-10-22 13:56:02 -0400 EDT Tx/s: 18956.918563 KB/s: 10610.608260 Runtime: 13h18m16s\n2023/11/04 06:15:30 Closed db channels, waiting for workers to finish...\n2023/11/04 06:15:30 Tx writer channel closed, committing transaction.\n2023/11/04 06:15:30 Block writer channel closed, commiting transaction.\n2023/11/04 06:15:30 TxIn writer channel closed, committing transaction.\n2023/11/04 06:15:30 TxOut writer channel closed, committing transaction.\n2023/11/04 06:15:30 TxOut writer done.\n2023/11/04 06:15:30 Block writer done.\n2023/11/04 06:15:30 TxIn writer done.\n2023/11/04 06:15:30 Tx writer done.\n2023/11/04 06:15:30 Workers finished.\n2023/11/04 06:15:30 Txid cache hits: 2369888696 (99.91%) misses: 2056367 collisions: 1 dupes: 2 evictions: 778046986 size: 105221498 procmem: 16243 MiB\n2023/11/04 06:15:30 The following txids collided:\n2023/11/04 06:15:30 Txid: 0f157800dba58b15ad242b3f7b48b4010079515e2c9e4702384cc701f05cebc0 prefix: c0eb5cf001c74c\n2023/11/04 06:15:30 Cleared the cache.\n2023/11/04 06:15:30 Creating indexes part 1, please be patient, this may take a long time...\n2023/11/04 06:15:30   Starting txins primary key...\n2023/11/04 07:17:54   ...done in 1h2m24.594s. Starting txs txid (hash) index...\n2023/11/04 07:41:41   ...done in 23m47.193s.\n2023/11/04 07:41:41 Running ANALYZE txins, _prevout_miss, txs to ensure the next step selects the optimal plan...\n2023/11/04 07:42:22 ...done in 40.348s. Fixing missing prevout_tx_id entries (if needed), this may take a long time..\n2023/11/04 07:42:22   max prevoutMiss id: 2056367 parallel: 8\n2023/11/04 07:42:22   processing range [1, 10001) of 2056367...\n2023/11/04 07:42:22   processing range [10001, 20001) of 2056367...\n... snip ...\n2023/11/04 07:49:32   processing range [2050001, 2060001) of 2056367...\n2023/11/04 07:49:39 ...done in 7m17.348s.\n2023/11/04 07:49:39 Creating indexes part 2, please be patient, this may take a long time...\n2023/11/04 07:49:39   Starting blocks primary key...\n2023/11/04 07:49:41   ...done in 1.89s. Starting blocks prevhash index...\n2023/11/04 07:49:42   ...done in 718ms. Starting blocks hash index...\n2023/11/04 07:49:42   ...done in 695ms. Starting blocks height index...\n2023/11/04 07:49:43   ...done in 450ms. Starting txs primary key...\n2023/11/04 07:59:06   ...done in 9m23.284s. Starting block_txs block_id, n primary key...\n2023/11/04 08:11:27   ...done in 12m20.978s. Starting block_txs tx_id index...\n2023/11/04 08:20:20   ...done in 8m53.257s. Creatng hash_type function...\n2023/11/04 08:20:20   ...done in 40ms. Starting txins (prevout_tx_id, prevout_tx_n) index...\n2023/11/04 09:02:06   ...done in 41m45.629s. Starting txouts primary key...\n2023/11/04 09:42:49   ...done in 40m42.816s. Starting txouts address prefix index...\n2023/11/04 11:12:41   ...done in 1h29m52.436s. Starting txins address prefix index...\n2023/11/04 14:32:54   ...done in 3h20m13.17s.\n2023/11/04 14:32:54 Creating constraints (if needed), please be patient, this may take a long time...\n2023/11/04 14:32:54   Starting block_txs block_id foreign key...\n2023/11/04 14:32:55   ...done in 173ms. Starting block_txs tx_id foreign key...\n2023/11/04 14:32:55   ...done in 7ms. Starting txins tx_id foreign key...\n2023/11/04 14:32:55   ...done in 6ms. Starting txouts tx_id foreign key...\n2023/11/04 14:32:55   ...done in 6ms.\n2023/11/04 14:32:55 Creating txins triggers.\n2023/11/04 14:32:55 Dropping _prevout_miss table.\n2023/11/04 14:32:55 Marking orphan blocks (whole chain)...\n2023/11/04 14:33:37 Done marking orphan blocks in 42.163s.\n2023/11/04 14:33:37 Reset table storage parameters: autovacuum_enabled.\n2023/11/04 14:33:37 Indexes and constraints created.\n2023/11/04 14:33:37 All done in 21h36m23.9s.\n```\n\nThere are two phases to this process, the first is just streaming the\ndata into Postgres, the second is building indexes, constraints and\notherwise tying up loose ends.\n\nThe `-cache-size` parameter is the cache of txid (the SHA256) to the\ndatabase tx_id, which import can set on the fly. This cache is also\nused to identify duplicate transactions. Having a cache of 100M entries\nachieves 99.90% hit rate (as of Nov 2023, see above). The missing ids\nwill be corrected later, but having as much as possible set from the\nbeginning will reduce the time it takes to correct them later. A 100M\nentry cache will result in the import process taking up ~16GB of RAM.\n\nAfter the initial import, the tool can \"catch up\" by importing new\nblocks not yet in the database. The catch up is many times slower than\nthe initial import because it does not have the luxury of not having\nindexes and constraints. The catch up does not read LevelDb, it simply\nuses the Bitcoin protocol to request new blocks from the node. If you\nspecify the `-wait` option, the import will wait for new blocks as\nthey are announced and write them to the DB. For example:\n\n``` sh\n# In this example there is a full core node running on 192.168.A.B\n# new blocks will be written as they come in.\n./import \\\n    -connstr \"host=192.168.X.X dbname=blocks sslmode=disable\" \\\n    -nodeaddr 192.168.A.B:8333 -wait\n```\n\n## PostgreSQL Tuning\n\n* Do not underestimate the importance of the sending (client) machine\n  performance, it is possible that the client side cannot keep up with\n  Postgres. During the initial data load, all that Postgres needs to\n  do is stream the incoming tuples to disk. The client needs to parse\n  the blocks, format the data for the Postgres writes and maintain a\n  cache of the tx_id's. Once the initial load is done, the burden\n  shifts unto the server, which needs to build indexes. You can\n  specify `-connstr nulldb` to make all database operations noops,\n  akin to writing to /dev/null. Try running it this way to see the\n  maximum speed you client is capable of before attempting to tune the\n  Postgres side.\n\n* Using SSD's on the Postgres server (as well as the sending machine) will\n  make this process go much faster. Remember to set `random_page_cost`\n  to `1` or less, depending on how fast your disk really is. The\n  blockchain will occupy more than 600GB on disk and this will grow as\n  time goes on.\n\n* Turning off `synchronous_commit` and setting `commit_delay` to\n  `100000` would make the import faster. Turning `fsync` off entirely\n  might make it faster even still (heed the documentation warnings).\n\n* `shared_buffers` should not be set high, PostgreSQL does better\n  relying on the OS disk buffers cache. Shared buffers are faster than\n  OS cache for hits, but more expensive on misses, thus the PG docs\n  advise not relying on it unless the whole working set fits in PG\n  shared buffers. Of course if your PG server has 512GB of RAM, then\n  this advice does not apply.\n\n* Setting `maintenance_work_mem` high should help with speeding up the\n  index building. Note that it can be temporarily set right in the\n  connection string (`-connstr \"host=... maintenance_work_mem=2GB\"`).\n  Increasing `max_parallel_maintenance_workers` will also help with\n  index building. Each worker will get `maintenance_work_mem` divided by\n  `max_parallel_maintenance_workers` of memory.\n\n* Setting `wal_writer_delay` to the max value of `10000` and\n  increasing `wal_buffers` and `wal_writer_flush_after` should speed\n  up the initial import in theory.\n\n* Setting `wal_level` to `minimal` may help as well. (You will also\n  need to set `max_wal_senders` to 0 if you use `minimal`).\n\n## ZFS\n\nUsing a filesystem which supports snapshots is very useful for\ndevelopment of this thing because it provides the ability to quickly\nrollback to a snapshot should anything go wrong.\n\nZFS (at least used on a single disk) seems slower than ext4, but still\nwell worth it. The settings we ended up with are:\n\n``` sh\nzfs set compression=zstd-1 tank/blocks # lz4 if your zfs is old\nzfs set atime=off tank/blocks\nzfs set primarycache=all tank/blocks\nzfs set recordsize=16k tank/blocks\nzfs set logbias=latency tank/blocks\n```\n\nIf you use ZFS, then in the Postgres config it is advisable to turn\n`full_page_writes`, `wal_init_zero` and `wal_recycle` to `off`.\n\n## Internals of the Data Stream\n\nThe initial data stream is done via `COPY`, with a separate goroutine\nstreaming to its table. We read blocks in order, iterate over the\ntransactions therein, the transactions are split into inputs, outpus,\netc, and each of those records is sent over a channel to the goroutine\nresponsible for that table. This approach is very performant.\n\nOn catch up the process is slightly more complicated because we need\nto ensure that referential integrity is maintained. Each block should\nbe followed by a commit, all outputs in a block must be commited\nbefore inputs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblkchain%2Fblkchain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblkchain%2Fblkchain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblkchain%2Fblkchain/lists"}