{"id":22482045,"url":"https://github.com/russmckendrick/discogs-scraper","last_synced_at":"2025-03-27T19:18:18.819Z","repository":{"id":159197214,"uuid":"628573108","full_name":"russmckendrick/discogs-scraper","owner":"russmckendrick","description":"A basic scraper for generating files for my website 🎸.","archived":false,"fork":false,"pushed_at":"2025-02-09T11:23:06.000Z","size":317,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-09T11:25:56.303Z","etag":null,"topics":["discogs","discogs-dump","scraper"],"latest_commit_sha":null,"homepage":"https://www.russ.fm","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/russmckendrick.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-16T11:28:06.000Z","updated_at":"2025-02-09T11:23:09.000Z","dependencies_parsed_at":"2024-02-10T10:27:31.293Z","dependency_job_id":"424d78a5-62f6-4260-bc05-c7c2af48b48c","html_url":"https://github.com/russmckendrick/discogs-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russmckendrick%2Fdiscogs-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russmckendrick%2Fdiscogs-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russmckendrick%2Fdiscogs-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russmckendrick%2Fdiscogs-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/russmckendrick","download_url":"https://codeload.github.com/russmckendrick/discogs-scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245907558,"owners_count":20691956,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["discogs","discogs-dump","scraper"],"created_at":"2024-12-06T16:18:47.415Z","updated_at":"2025-03-27T19:18:18.811Z","avatar_url":"https://github.com/russmckendrick.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Discogs Scraper 🎵\n\nA Python application for managing a vinyl record collection, generating content for [https://www.russ.fm/](https://www.russ.fm/) 🎸. While initially created for personal use, feel free to use it if you find it helpful! The site is powered by [Hugo](https://gohugo.io/) and you can find the website files and config at [russmckendrick/records](https://github.com/russmckendrick/records/).\n\n## Features ✨\n\n### Data Collection\n- Fetches collection data from Discogs API\n- Enriches data with information from:\n  - Apple Music API\n  - Spotify API\n  - Wikipedia API\n- Downloads and processes album artwork and artist images\n- Caches data in SQLite database to avoid rate limiting\n\n### Web Interface\nThe Flask-based web interface provides:\n\n#### Core Features\n- Traditional multi-page layout with Bootstrap styling\n- Database backup on application launch (timestamped copies in `backups/` folder)\n- Comprehensive logging to dated files in `logs/` directory\n\n#### Release Management\n- Full CRUD operations for releases\n- Searchable and sortable release listing\n- Rich preview with album artwork, track listings, and metadata\n- Links to external services (Discogs, Apple Music, Spotify)\n- Default sorting by Date Added (newest first)\n\n#### Artist Management\n- Full CRUD operations for artists\n- Searchable artist listing (by ID, name, or slug)\n- Rich preview showing artist images, bio, and related information\n- Integration with Apple Music, Discogs, and Wikipedia data\n\n#### Editor Features\n- CodeMirror-based JSON editor with:\n  - Syntax highlighting\n  - Real-time validation\n  - Auto-formatting\n  - Error highlighting\n  - Line numbers and bracket matching\n- Preview-first layout with collapsible raw data view\n\n## Getting Started 🚀\n\n1. Clone the repository\n2. Create and activate a Python virtual environment:\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   ```\n3. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n4. Copy `secrets.json.example` to `secrets.json` and fill in your API credentials:\n   - Discogs access token\n   - Spotify client ID and secret\n   - Apple Music client ID and team ID\n   - Apple Music private key (place in `backups/apple_private_key.p8`)\n\n## Running the Application 🏃‍♂️\n\n### Web Interface\nStart the Flask web application:\n```bash\npython app.py\n```\n\nAdd `--debug-data` flag to enable detailed debugging output:\n```bash\npython app.py --debug-data\n```\n\n### Discogs Scraper\nThe scraper supports various modes:\n\n```bash\n# Process just 10 releases (default)\npython discogs_scraper.py\n\n# Process all releases\npython discogs_scraper.py --all\n\n# Process specific number of releases\npython discogs_scraper.py --num-items 100\n\n# Adjust request delay (default: 2 seconds)\npython discogs_scraper.py --delay 1\n\n# Regenerate artist pages only\npython discogs_scraper.py --artists-only\n\n# Regenerate specific artist\npython discogs_scraper.py --regenerate-artist \"Artist Name\"\n\n# Migrate artist data\npython discogs_scraper.py --migrate-artists\n```\n\n## Project Structure 📁\n\n- `app.py` - Flask web application\n- `discogs_scraper.py` - Main scraper script\n- `db_handler.py` - Database operations\n- `utils.py` - Shared utility functions\n- `templates/` - Flask HTML templates\n- `logs/` - Application logs\n- `backups/` - Database backups\n- `website/` - Generated Hugo content\n\n## Useful Links 🔗\n\n- [JSON Lint](https://jsonlint.com/)\n- [JSON Formatter](https://www.text-utils.com/json-formatter/)\n- [Apple Media Services Tools](https://tools.applemediaservices.com/?country=gb)\n\n## One More Thing... 🤖\n\nThis project was initially developed with assistance from ChatGPT 💬, with subsequent debugging 🐛 and feature additions. 🤓\n\n## Contributing 🤝\n\nFeel free to submit issues and pull requests. The project uses comprehensive logging and maintains a structured approach to data handling.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frussmckendrick%2Fdiscogs-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frussmckendrick%2Fdiscogs-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frussmckendrick%2Fdiscogs-scraper/lists"}