Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/russmckendrick/discogs-scraper
A basic scraper for generating files for my website 🎸.
https://github.com/russmckendrick/discogs-scraper
discogs discogs-dump scraper
Last synced: 27 days ago
JSON representation
A basic scraper for generating files for my website 🎸.
- Host: GitHub
- URL: https://github.com/russmckendrick/discogs-scraper
- Owner: russmckendrick
- Created: 2023-04-16T11:28:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-25T14:15:04.000Z (4 months ago)
- Last Synced: 2024-08-26T13:50:37.524Z (4 months ago)
- Topics: discogs, discogs-dump, scraper
- Language: Python
- Homepage: https://www.mckendrick.rocks
- Size: 260 KB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Discogs Scraper 🎵
A basic scraper for generating files for [https://www.russ.fm/](https://www.russ.fm/) 🎸. While this was initially created for personal use, feel free to use it if you find it helpful! 😃 Although the documentation is minimal, the code is fairly straightforward.
You can find the repo containing the website files and config at [russmckendrick/records](https://github.com/russmckendrick/records/), it's a [Hugo-powered](https://gohugo.io/) site and there are ALOT of files.
## Getting Started 🚀
1. Clone the repository to your local machine.
2. Install the required dependencies using `pip install -r requirements.txt`.
3. Run the `discogs_scraper.py` script to start the scraper.## Configuration ⚙️
To customize the scraper for your needs, create a copy of the `secrets.json.example` file calling it `secrets.json` and file in the details.
## How it Works 🛠
The scraper fetches data from the Discogs API and processes the information to generate markdown files and download images. This data can then be used to create a static site showcasing your music collection 🎧.
## Running the Scraper 🏃♂️
The scraper can be run using the following commands:
To process just 10 releases every 2 seconds run the script without any flags;
```bash
$ python3 discogs_scraper.py
```You can add the `--all` flag to process all releases in your collection;
```bash
$ python3 discogs_scraper.py --all
```You can also add the `--num-items` flag to process a specific number of releases;
```bash
$ python3 discogs_scraper.py --num-items 100
```Finally, you can override the default 2 second delay between requests using the `--delay` flag, this is not recommended as it may cause issues with the Discogs API so be careful;
```bash
$ python3 discogs_scraper.py --delay 0
```You can also combine the flags to process a specific number of releases without any delay;
```bash
$ python3 discogs_scraper.py --all --delay 0
```## Contribution 🤝
If you'd like to contribute or suggest improvements, feel free to submit a pull request or open an issue on GitHub. We appreciate your input! 🌟
Enjoy scraping and building your music collection website! 🎶
## One More Thing... 🤖
Oh yeah, it was mostly written by ChapGPT 💬 with me debugging 🐛 it and adding some features. 🤓
## Some random links
For when reviewing the wrong matches and you need to move a release to the `collection_cache_overrides.json` file from your `collection_cache.json` file.
- [https://jsonlint.com/](https://jsonlint.com/)
- [https://www.text-utils.com/json-formatter/](https://www.text-utils.com/json-formatter/)
- [https://tools.applemediaservices.com/?country=gb](https://tools.applemediaservices.com/?country=gb)