Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sodascience/lichess_db
Lichess game header data as parquet files
https://github.com/sodascience/lichess_db
database lichess lichess-database
Last synced: 27 days ago
JSON representation
Lichess game header data as parquet files
- Host: GitHub
- URL: https://github.com/sodascience/lichess_db
- Owner: sodascience
- License: mit
- Created: 2023-11-06T08:12:45.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-10T18:44:20.000Z (5 months ago)
- Last Synced: 2024-07-10T22:31:16.974Z (5 months ago)
- Topics: database, lichess, lichess-database
- Language: Jupyter Notebook
- Homepage:
- Size: 1.3 MB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: readme.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Lichess database to parquet converter
ThIn this repository, you can find code to efficiently download millions of chess games from [database.lichess.org](https://database.lichess.org) to your computer using Apache `parquet` files, which can be loaded to tables efficiently using [`polars`](https://pola.rs):
```python
import polars as pl
chess_df = pl.scan_parquet("lichess_parquet/*.parquet")
chess_df.head().collect()
```| | ID | UTCDate | UTCTime | White | Black | Result | WhiteElo | BlackElo | WhiteRatingDiff | BlackRatingDiff | ECO | Opening | TimeControl | Termination |
|---:|:---------|:--------------------|:----------|:-----------------|:------------------|:---------|-----------:|-----------:|------------------:|------------------:|:------|:--------------------------------------------|:--------------|:--------------|
| 0 | j1dkb5dw | 2012-12-31 00:00:00 | 23:01:03 | BFG9k | mamalak | 1-0 | 1639 | 1403 | 5 | -8 | C00 | French Defense: Normal Variation | 600+8 | Normal |
| 1 | a9tcp02g | 2012-12-31 00:00:00 | 23:04:12 | Desmond_Wilson | savinka59 | 1-0 | 1654 | 1919 | 19 | -22 | D04 | Queen's Pawn Game: Colle System, Anti-Colle | 480+2 | Normal |
| 2 | szom2tog | 2012-12-31 00:00:00 | 23:03:15 | Kozakmamay007 | VanillaShamanilla | 1-0 | 1643 | 1747 | 13 | -94 | C50 | Four Knights Game: Italian Variation | 420+17 | Normal |
| 3 | rklpc7mk | 2012-12-31 00:00:00 | 23:04:57 | Naitero_Nagasaki | 800 | 0-1 | 1824 | 1973 | -6 | 8 | B12 | Caro-Kann Defense: Goldman Variation | 60+1 | Normal |
| 4 | 1xb3os63 | 2012-12-31 00:00:00 | 23:02:37 | nichiren1967 | Naitero_Nagasaki | 0-1 | 1765 | 1815 | -9 | 9 | C00 | French Defense: La Bourdonnais Variation | 60+1 | Normal |## Installation and usage
1. Clone or download this repository
2. Install requirements: `pip install -r requirements.txt`
3. `python ingest_lichess.py`The command accepts the following arguments:
+ `--start` start year for download (default: 2013)
+ `--end` end year (default: current year)
+ `--months` months to download; list months by number, seperated by spaces (example `--months 1 2 3` for first quarter) (optional; defaults to all months)
+ `--parquet-dir` path to write Parquet-files to (default: `./lichess_parquet`)
+ `--include-moves` whether to include each game's moves in the data (default: False). Take care, this increases the size of the data dramatically.
+ `--debug` display debug info while downloading (default: False)4. Wait a good while (this results in tens of gigabytes of data!!). To avoid memory problems while downloading, there is a limit of 1M games per Parquet file. Hence, there will be multiple files per year/month (`2023_05_001.parquet`, `2023_05_002.parquet`, etc).
5. Open [`eda.ipynb`](eda.ipynb) for examples on how to read, filter and visualize the data
## Some plots
The daily number of games played is increasing strongly over the years.
![](img/gamecount_plot.png)The most popular openings are as follows:
![](img/opening_plot.png)## Contributing
Contributions are what make the open source community an amazing place
to learn, inspire, and create. Any contributions you make are **greatly
appreciated**.Please refer to the
[CONTRIBUTING](https://github.com/sodascience/lichess_db/blob/main/CONTRIBUTING.md)
file for more information on issues and pull requests.## License and citation
The package `lichess_db` is published under an MIT license.
The [lichess data](https://database.lichess.org/) is licensed under the [Creative Commons CC0 license](https://creativecommons.org/publicdomain/zero/1.0/).## Contact
This project is developed and maintained by the [ODISSEI Social Data
Science (SoDa)](https://odissei-data.nl/nl/soda/) team.Do you have questions, suggestions, or remarks? File an issue in the issue
tracker or feel free to contact the team via
https://odissei-data.nl/en/using-soda/.