Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/eve-ning/osu-data

Docker Containerize osu!db as a MySQL Service
https://github.com/eve-ning/osu-data

docker-compose mysql

Last synced: 2 months ago
JSON representation

Docker Containerize osu!db as a MySQL Service

Awesome Lists containing this project

README

        

# osu! Data on Docker

`pip install osu-data; osu-data -m mania -v top_1000 -ymd YYYY_MM_DD`

**Docker must be installed and running on your machine.**

Retrieves database data from https://data.ppy.sh/ and hosts it on a local MySQL
server.
Optionally, store all ranked/loved `.osu` files in a service with the `-f` tag.

## Get Started

**IMPORTANT**: MySQL data persists across runs.
Recreate the MySQL Service if you changed the data used.

1) Install via pip `pip install osu-data`

2) Minimally, specify:
- `-m`, `--mode`:
The game mode to build the database with. `osu`, `taiko`, `catch`
or `mania`
- `-v`, `--version`:
The database version. `top_1000`, `top_10000` or `random_10000`
3) Optionally, specify:
- `-ymd`, `--year_month_day`:
The year, month, day of the database in the format `YYYY_MM_DD`
- `-p`, `--port`:
The port to expose MySQL on. Default is `3308`
- `-f`, `--files`:
Whether to download `.osu` files.
- `-np`, `--nginx-port`:
The port to expose the nginx service on. Default is `8080`.
Not used if `-f` is not specified.
- `--...`:
See below table, these are optional flags to include or exclude more
data. **By specifying the flag, will INVERT the default value.**

| Option | Default Value |
|-----------------------------------|---------------|
| `--beatmap-difficulty-attribs` | False |
| `--beatmap-difficulty` | False |
| `--scores` | True |
| `--beatmap-failtimes` | False |
| `--user-beatmap-playcount` | False |
| `--beatmaps` | True |
| `--beatmapsets` | True |
| `--user-stats` | True |
| `--sample-users` | True |
| `--counts` | True |
| `--difficulty-attribs` | True |
| `--beatmap-performance-blacklist` | True |

These options are chosen to be the most useful for analysis, and performance.

E.g.

```bash
osu-data \
-m osu -v top_1000 -ymd 2023_08_01 -p 3308 -f \
--beatmap-difficulty
```

- Download the top 1000 osu! standard beatmaps
- from 1st August 2023
- expose MySQL on port 3308
- download `.osu` files
- include beatmap difficulty data

4) Connect on:
- `localhost:`
- `localhost:` (if `-f` is specified)

## Common Issues

- **Docker daemon is not running**. Make sure that Docker is installed and
running. If you're using Docker Desktop, make sure it's actually started.
- **MySQL Data isn't incorrect**. A few reasons
- *Import was abruptly stopped*. This can cause some `.sql` files to be
missing / incomplete. Delete the whole compose project and try again.
- *Didn't specify the optional flags to include files*. By default, some
`.sql` files are not loaded. Take a look at `osu-data -h` and specify the
optional flags to include them.
- *Data is outdated*. By default, on every re-run of `osu-data`, the data
is
preserved. To update the data, you must delete the whole compose project
and try again.
- **wget: server returned error: HTTP/1.1 404 Not Found**. This happens when
you try to pull a `YYYY_MM_DD` that doesn't exist, and happens often when the
data isn't yet ready on the start of each month.
Check on https://data.ppy.sh/ to see which `YYYY_MM_DD` are available.
- **`rm: can't remove '../osu.mysql.init/*'`**: This is safe to ignore.
- **MySQL Credentials**. By default, the MySQL doesn't have a password, so just
use `root` as the username and leave the password blank.
- **No `files` service**. This is default, `files` service is optional and
must be activated with the `-f` tag. `osu-data -h` for more info.

## `mysql.cnf`

The database is tuned to be fast in importing speed, thus shouldn't be used for
production. Notably, we set `innodb_doublewrite = 0` which can compromise
data integrity in the event of a crash. If you want to use this for production,
we recommend to set this up from this Git repo, and tweak `mysql.cnf`.

## Important Matters

1) Do not distribute the built images as per peppy's request.
Instead, you can just share the code to build your image, which should yield
the same result.
2) This database is meant to be for analysis, it's not tuned for production.
Tweak `mysql.cnf` after importing
for more MySQL customizations.
3) Finally, be mindful on conclusions you make from the data.

## Changelog

- **0.1.5**:
- Allowed wider range of Python versions `3.9 ~ 4.0`.
- **0.2.0**:
- Added GitHub Actions to automatically create dataset on workflow dispatch.
- Year, Month specification is now Year, Month, Day because some data dumps
don't fall exactly on day 1.
- `-ym` -> `-ymd`, `--year-month` -> `--year-month-day`
- Default of `-ymd` is removed to encourage users to check source of data.