{"id":47309813,"url":"https://github.com/aidan-bailey/bzpuller.sh","last_synced_at":"2026-03-17T10:23:21.449Z","repository":{"id":61224056,"uuid":"548828179","full_name":"aidan-bailey/bzpuller.sh","owner":"aidan-bailey","description":"A concurrent historical prices zip puller for Binance","archived":false,"fork":false,"pushed_at":"2023-01-27T10:06:39.000Z","size":21,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-24T18:54:33.124Z","etag":null,"topics":["bash","bash-script","binance","concurrency","cryptocurrency-exchanges","historical-data"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aidan-bailey.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-10T08:44:02.000Z","updated_at":"2023-03-15T17:18:55.000Z","dependencies_parsed_at":"2023-02-15T07:46:36.988Z","dependency_job_id":null,"html_url":"https://github.com/aidan-bailey/bzpuller.sh","commit_stats":null,"previous_names":["aidan-bailey/bzpuller.sh"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aidan-bailey/bzpuller.sh","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aidan-bailey%2Fbzpuller.sh","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aidan-bailey%2Fbzpuller.sh/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aidan-bailey%2Fbzpuller.sh/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aidan-bailey%2Fbzpuller.sh/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aidan-bailey","download_url":"https://codeload.github.com/aidan-bailey/bzpuller.sh/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aidan-bailey%2Fbzpuller.sh/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30622229,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T08:10:05.930Z","status":"ssl_error","status_checked_at":"2026-03-17T08:10:04.972Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","bash-script","binance","concurrency","cryptocurrency-exchanges","historical-data"],"created_at":"2026-03-17T10:23:20.968Z","updated_at":"2026-03-17T10:23:21.443Z","avatar_url":"https://github.com/aidan-bailey.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bzpuller.sh\n\nThis script concurrently downloads and validates checksums of [Binance historical data zips](https://data.binance.vision/?prefix=data/).\nIt checks all url date combinations, discarding the ones that return 404's (e.g., if you request a symbol not in that market, it'll have a result equivalent to you not requesting that symbol at all).\nIf a checksum fails, both the zip file and checksum will be redownloaded and the cycle will continue (infinitely) until the checksum succeeds.\n\nI see two main advantages the zips have over Binance's API:\n1. No ratelimits (other than the normal DDoS protection I assume).\n2. Symbol data of Spot/Futures symbols that are no longer traded on the exchange (e.g. Luna).\n\nUnfortunately, it's not all sunshine and roses as the zips are not entirely clean:\n- Some have headers (`open_time,open,high,low,close,volume,close_time,quote_volume,count,taker_buy_volume,taker_buy_quote_volume,ignore`), some don't.\n- In one case, a single timestamp for `BZRXUSDT` was duplicated 13 times (timestamp `2021-03-22 11:57:00` on the Spot market to be precise).\n\nI think there are data discrepencies between the API data and zips (even between the daily and monthy). \nIf someone can disprove this, please let me know!\n\n**NB: Be careful about**\n1. Where you run this script (or set `OUTDIR` to) - it will fill that directory with zips and checksums to the point there will be too many files to `rm` using a wildcard and you'll either have to delete the entire directory or delete smaller wildcard batches of them until you can fit the rest into a single wildcard (this doesn't sound fun...and it's less fun than it sounds!).\n2. The values of `SWORKERS` and `ZWORKERS`. The maximum number of subprocesses running at once is $2 + SWORKERS * ZWORKERS$.\n\n## Usage\n\n``` bash\n$ ./bzpuller.sh\n------------------------------------------------------------------------\n                           BINANCE ZIP PULLER\n------------------------------------------------------------------------\n USAGE:\n           bzpuller.sh \u003cAGGREGATION\u003e \u003cMARKET\u003e \u003cINTERVAL\u003e\n\n ARGUMENTS:\n\n   AGGREGATION\n       the level of aggregation per zip\n       options: monthly daily\n   MARKET\n       market to pull\n       options: um cm spot\n   INTERVAL\n       kline interval\n       options: trades aggTrades 12h 15m 1d 1h 1m 1mo 1w 2h 30m 3d 3m 4h\n                5m 6h 8h\n\n ENV VARS:\n\n   OUTDIR\n       output directory for the csvs\n       default: current directory\n   SYMBOLS\n       symbols to fetch zips for\n       default: fetched from exchange based on market\n   QUOTE\n       skip symbols that don't have this as their quoted currency\n       default: none\n   YEARS\n       years to fetch\n       default: (2017 2018 2019 2020 2021 2022)\n   MONTHS\n       months to fetch\n       default: (01 02 03 04 05 06 07 08 09 10 11 12)\n   DAYS\n       days to fetch\n       default: (01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18\n                               19 20 21 22 23 24 25 26 27 28 29 30 31)\n   SWORKERS\n       number of symbols to fetch concurrently\n       default: half available cores\n   ZWORKERS\n       number of zips to fetch concurrently (per symbol)\n       default: half available cores\n------------------------------------------------------------------------\n```\n\n## Contribution\n\nThis script is not _strenuously_ tested so should anyone find any bugs please inform me, thanks!\n\nAdd other types:\n- [x] aggTrades\n- [ ] indexPriceKlines\n- [ ] markPriceKlines\n- [ ] premiumIndexKlines\n- [x] trades\n\nMisc:\n- [ ] Reduce code redundancy\n\n## Disclaimer\n\nThis is my first substantial Bash script.\nI accept my divide-\u0026-conquer implementation may be a bit convoluted, but it serves its current purpose.\nSpecifically, it allows new processes to be spawned right after a process finishes (rather than using a blanket `wait`\nto wait for all processes before launching the next batch or just guess with `wait \u003cpid\u003e` and potentially be idle a lot longer than needs be).\nHopefully a glance of the code will help with making sense of this.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faidan-bailey%2Fbzpuller.sh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faidan-bailey%2Fbzpuller.sh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faidan-bailey%2Fbzpuller.sh/lists"}