https://github.com/recommend-games/board-game-scraper

Board game data scraper
https://github.com/recommend-games/board-game-scraper

bgg bgg-rating board-game board-games boardgame boardgamegeek boardgamegeek-dataset boardgames data-set data-sets dataset datasets python scraped-data scraper scrapers scraping scrapy spider tabletop-games

Last synced: 3 months ago
JSON representation

Board game data scraper

Host: GitHub
URL: https://github.com/recommend-games/board-game-scraper
Owner: recommend-games
License: mit
Created: 2019-01-23T07:27:58.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2026-03-27T19:14:40.000Z (4 months ago)
Last Synced: 2026-03-28T02:39:36.412Z (4 months ago)
Topics: bgg, bgg-rating, board-game, board-games, boardgame, boardgamegeek, boardgamegeek-dataset, boardgames, data-set, data-sets, dataset, datasets, python, scraped-data, scraper, scrapers, scraping, scrapy, spider, tabletop-games
Language: Python
Homepage: https://recommend.games/
Size: 1.96 MB
Stars: 29
Watchers: 1
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          # 🎲 Board Game Scraper 🕸

Scraping data about board games from the web. View the data live at

[Recommend.Games](https://recommend.games/)! Install via

```bash

pip install board-game-scraper

```

## Sources

* [BoardGameGeek](https://boardgamegeek.com/) (`bgg`)

* [DBpedia](https://wiki.dbpedia.org/) (`dbpedia`)

* [Luding.org](https://luding.org/) (`luding`)

* [Spielen.de](https://gesellschaftsspiele.spielen.de/) (`spielen`)

* [Wikidata](https://www.wikidata.org/) (`wikidata`)

## Run scrapers

[Requires Python 3](https://pythonclock.org/). Make sure

[Pipenv](https://docs.pipenv.org/) is installed and create the virtual

environment:

```bash

python3 -m pip install --upgrade pipenv

pipenv install --dev

pipenv shell

```

Run a spider like so:

```bash

JOBDIR="jobs/${SPIDER}/$(date --utc +'%Y-%m-%dT%H-%M-%S')"

scrapy crawl "${SPIDER}" \

    --output 'feeds/%(name)s/%(time)s/%(class)s.csv' \

    --set "JOBDIR=${JOBDIR}"

```

where `$SPIDER` is one of the IDs above.

Run all the spiders with the [`run_scrapers.sh`](run_scrapers.sh) script. Get a

list of the running scrapers' PIDs with the [`processes.sh`](processes.sh)

script. You can close all the running scrapers via

```bash

./processes.sh stop

```

and resume them later.

## Tests

You can run `scrapy check` to perform contract tests for all spiders, or

`scrapy check $SPIDER` to test one particular spider. If tests fails,

there most likely has been some change on the website and the spider needs

updating.

## Board game datasets

If you are interested in using any of the datasets produced by this scraper,

take a look at the

[BoardGameGeek guild](https://boardgamegeek.com/thread/2287371/boardgamegeek-games-and-ratings-datasets).

A subset of the data can also be found on [Kaggle](https://www.kaggle.com/mshepherd/board-games).

## Links

* [board-game-scraper](https://gitlab.com/recommend.games/board-game-scraper):

 This repository

* [Recommend.Games](https://recommend.games/): board game recommender using the

 scraped data

* [recommend-games-server](https://gitlab.com/recommend.games/recommend-games-server):

 Server code for [Recommend.Games](https://recommend.games/)

* [board-game-recommender](https://gitlab.com/recommend.games/board-game-recommender):

 Recommender code for [Recommend.Games](https://recommend.games/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/recommend-games/board-game-scraper

Awesome Lists containing this project

README