https://github.com/melroy89/bitcoin-core-web-scraper
Web Spider for mirroring Bitcoin Core bin folder. Live: https://bitcoin.melroy.org/bin/ (mirror of bitcoincore.org/bin)
https://github.com/melroy89/bitcoin-core-web-scraper
binary bitcoin bitcoin-core download mirror mirroring scrapy sync web-crawler webscraper webscraping webscrapping
Last synced: 4 months ago
JSON representation
Web Spider for mirroring Bitcoin Core bin folder. Live: https://bitcoin.melroy.org/bin/ (mirror of bitcoincore.org/bin)
- Host: GitHub
- URL: https://github.com/melroy89/bitcoin-core-web-scraper
- Owner: melroy89
- License: mit
- Created: 2021-12-23T23:28:14.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-12-23T23:29:51.000Z (over 3 years ago)
- Last Synced: 2025-01-04T19:11:51.449Z (5 months ago)
- Topics: binary, bitcoin, bitcoin-core, download, mirror, mirroring, scrapy, sync, web-crawler, webscraper, webscraping, webscrapping
- Language: Python
- Homepage: https://gitlab.melroy.org/bitcoin-dot-org/bitcoin-core-web-scraper
- Size: 17.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Bitcoin Core Web Scraper
A script for web scraping and downloading the [Bitcoin Core `bin`](https://bitcoincore.org/bin) directory.
Ideal for creating your own mirror!## Usage
### Dependencies
Run-time dependency:
* Python3 + pip (`python3 python3-dev python3-pip`)
* Additional libs for Scrapy (`libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev`)More packages will be downloaded via `pip`, see next section.
### Prepare
I advice you to use a [Python virtual environment](https://docs.python.org/3/library/venv.html#), create & activate such an environment via:
```sh
python3 -m venv env
source env/bin/activate
```Next, install the required packages via:
```sh
pip install -r requirements.txt
```### Run scraper
Execute scraper and **start downloading**:
```sh
scrapy crawl bitcoincore
```Or by running: `./start_spider.py`
*Note:* Files are stored within the `bin` sub-folder of the root-folder of this project.
Optionally, execute scraper and output the meta-data to a "feed" file (eg. JSON file):
```sh
scrapy crawl bitcoincore -O bitcoincore.json
```## Docker Image
The Docker image is [available on DockerHub](https://hub.docker.com/r/danger89/bitcoinscraper).
*Note:* The Docker Image will start the scrawler using a cronjob, so the bitcoin spider runs automatically once a week.
I provided a [docker-compose file](bitcoinscraper-compose.yml) for convenience.
**Building Docker image**
Create a Docker image locally using:
```sh
docker build -t danger89/bitcoinscraper .
```## Learn & Debug
You can use the Scrapy shell to help debugging or learn how to extract data when using `scrapy`:
```sh
scrapy shell 'https://bitcoincore.org/bin/'
```Check the `response` object for data, just an example:
```py
response.css('pre a')[3].get()
```## External Links
More info:
* [Scrapy homepage](https://scrapy.org)
* [Scrapy Tutorial docs](https://docs.scrapy.org/en/latest/intro/tutorial.html) (ideal for beginners)
* [APScheduler Cron docs](https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html)