https://github.com/aidan-bailey/bzpuller.sh
A concurrent historical prices zip puller for Binance
https://github.com/aidan-bailey/bzpuller.sh
bash bash-script binance concurrency cryptocurrency-exchanges historical-data
Last synced: 3 months ago
JSON representation
A concurrent historical prices zip puller for Binance
- Host: GitHub
- URL: https://github.com/aidan-bailey/bzpuller.sh
- Owner: aidan-bailey
- Created: 2022-10-10T08:44:02.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-01-27T10:06:39.000Z (over 3 years ago)
- Last Synced: 2026-01-24T18:54:33.124Z (5 months ago)
- Topics: bash, bash-script, binance, concurrency, cryptocurrency-exchanges, historical-data
- Language: Shell
- Homepage:
- Size: 20.5 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# bzpuller.sh
This script concurrently downloads and validates checksums of [Binance historical data zips](https://data.binance.vision/?prefix=data/).
It checks all url date combinations, discarding the ones that return 404's (e.g., if you request a symbol not in that market, it'll have a result equivalent to you not requesting that symbol at all).
If a checksum fails, both the zip file and checksum will be redownloaded and the cycle will continue (infinitely) until the checksum succeeds.
I see two main advantages the zips have over Binance's API:
1. No ratelimits (other than the normal DDoS protection I assume).
2. Symbol data of Spot/Futures symbols that are no longer traded on the exchange (e.g. Luna).
Unfortunately, it's not all sunshine and roses as the zips are not entirely clean:
- Some have headers (`open_time,open,high,low,close,volume,close_time,quote_volume,count,taker_buy_volume,taker_buy_quote_volume,ignore`), some don't.
- In one case, a single timestamp for `BZRXUSDT` was duplicated 13 times (timestamp `2021-03-22 11:57:00` on the Spot market to be precise).
I think there are data discrepencies between the API data and zips (even between the daily and monthy).
If someone can disprove this, please let me know!
**NB: Be careful about**
1. Where you run this script (or set `OUTDIR` to) - it will fill that directory with zips and checksums to the point there will be too many files to `rm` using a wildcard and you'll either have to delete the entire directory or delete smaller wildcard batches of them until you can fit the rest into a single wildcard (this doesn't sound fun...and it's less fun than it sounds!).
2. The values of `SWORKERS` and `ZWORKERS`. The maximum number of subprocesses running at once is $2 + SWORKERS * ZWORKERS$.
## Usage
``` bash
$ ./bzpuller.sh
------------------------------------------------------------------------
BINANCE ZIP PULLER
------------------------------------------------------------------------
USAGE:
bzpuller.sh
ARGUMENTS:
AGGREGATION
the level of aggregation per zip
options: monthly daily
MARKET
market to pull
options: um cm spot
INTERVAL
kline interval
options: trades aggTrades 12h 15m 1d 1h 1m 1mo 1w 2h 30m 3d 3m 4h
5m 6h 8h
ENV VARS:
OUTDIR
output directory for the csvs
default: current directory
SYMBOLS
symbols to fetch zips for
default: fetched from exchange based on market
QUOTE
skip symbols that don't have this as their quoted currency
default: none
YEARS
years to fetch
default: (2017 2018 2019 2020 2021 2022)
MONTHS
months to fetch
default: (01 02 03 04 05 06 07 08 09 10 11 12)
DAYS
days to fetch
default: (01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31)
SWORKERS
number of symbols to fetch concurrently
default: half available cores
ZWORKERS
number of zips to fetch concurrently (per symbol)
default: half available cores
------------------------------------------------------------------------
```
## Contribution
This script is not _strenuously_ tested so should anyone find any bugs please inform me, thanks!
Add other types:
- [x] aggTrades
- [ ] indexPriceKlines
- [ ] markPriceKlines
- [ ] premiumIndexKlines
- [x] trades
Misc:
- [ ] Reduce code redundancy
## Disclaimer
This is my first substantial Bash script.
I accept my divide-&-conquer implementation may be a bit convoluted, but it serves its current purpose.
Specifically, it allows new processes to be spawned right after a process finishes (rather than using a blanket `wait`
to wait for all processes before launching the next batch or just guess with `wait ` and potentially be idle a lot longer than needs be).
Hopefully a glance of the code will help with making sense of this.