https://github.com/recommend-games/board-game-scraper
Board game data scraper
https://github.com/recommend-games/board-game-scraper
bgg bgg-rating board-game board-games boardgame boardgamegeek boardgamegeek-dataset boardgames data-set data-sets dataset datasets python scraped-data scraper scrapers scraping scrapy spider tabletop-games
Last synced: 3 days ago
JSON representation
Board game data scraper
- Host: GitHub
- URL: https://github.com/recommend-games/board-game-scraper
- Owner: recommend-games
- License: mit
- Created: 2019-01-23T07:27:58.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2026-03-27T19:14:40.000Z (12 days ago)
- Last Synced: 2026-03-28T02:39:36.412Z (12 days ago)
- Topics: bgg, bgg-rating, board-game, board-games, boardgame, boardgamegeek, boardgamegeek-dataset, boardgames, data-set, data-sets, dataset, datasets, python, scraped-data, scraper, scrapers, scraping, scrapy, spider, tabletop-games
- Language: Python
- Homepage: https://recommend.games/
- Size: 1.96 MB
- Stars: 29
- Watchers: 1
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# 🎲 Board Game Scraper 🕸
Scraping data about board games from the web. View the data live at
[Recommend.Games](https://recommend.games/)! Install via
```bash
pip install board-game-scraper
```
## Sources
* [BoardGameGeek](https://boardgamegeek.com/) (`bgg`)
* [DBpedia](https://wiki.dbpedia.org/) (`dbpedia`)
* [Luding.org](https://luding.org/) (`luding`)
* [Spielen.de](https://gesellschaftsspiele.spielen.de/) (`spielen`)
* [Wikidata](https://www.wikidata.org/) (`wikidata`)
## Run scrapers
[Requires Python 3](https://pythonclock.org/). Make sure
[Pipenv](https://docs.pipenv.org/) is installed and create the virtual
environment:
```bash
python3 -m pip install --upgrade pipenv
pipenv install --dev
pipenv shell
```
Run a spider like so:
```bash
JOBDIR="jobs/${SPIDER}/$(date --utc +'%Y-%m-%dT%H-%M-%S')"
scrapy crawl "${SPIDER}" \
--output 'feeds/%(name)s/%(time)s/%(class)s.csv' \
--set "JOBDIR=${JOBDIR}"
```
where `$SPIDER` is one of the IDs above.
Run all the spiders with the [`run_scrapers.sh`](run_scrapers.sh) script. Get a
list of the running scrapers' PIDs with the [`processes.sh`](processes.sh)
script. You can close all the running scrapers via
```bash
./processes.sh stop
```
and resume them later.
## Tests
You can run `scrapy check` to perform contract tests for all spiders, or
`scrapy check $SPIDER` to test one particular spider. If tests fails,
there most likely has been some change on the website and the spider needs
updating.
## Board game datasets
If you are interested in using any of the datasets produced by this scraper,
take a look at the
[BoardGameGeek guild](https://boardgamegeek.com/thread/2287371/boardgamegeek-games-and-ratings-datasets).
A subset of the data can also be found on [Kaggle](https://www.kaggle.com/mshepherd/board-games).
## Links
* [board-game-scraper](https://gitlab.com/recommend.games/board-game-scraper):
This repository
* [Recommend.Games](https://recommend.games/): board game recommender using the
scraped data
* [recommend-games-server](https://gitlab.com/recommend.games/recommend-games-server):
Server code for [Recommend.Games](https://recommend.games/)
* [board-game-recommender](https://gitlab.com/recommend.games/board-game-recommender):
Recommender code for [Recommend.Games](https://recommend.games/)