Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ppasupat/pimthaiyakshift

A simple Thai typing game.
https://github.com/ppasupat/pimthaiyakshift

Last synced: 8 days ago
JSON representation

A simple Thai typing game.

Awesome Lists containing this project

README

        

# พิมพ์ไทยยาก Shift

A simple Thai typing game.

The score (number of shift characters) is based on the
[Kedmanee keyboard layout](https://en.wikipedia.org/wiki/Thai_Kedmanee_keyboard_layout).
Feel free to fork to support other layouts!

## Current word lists

* `words.json`: Generated by

```bash
python3 filter.py pythainlp -o data/words.json -m 5 -M 15 -s 2 -r .3 -e .5
```

The command above collects all words with 5-15 characters from the
[PyThaiNLP](https://pythainlp.github.io/) word list with at least
2 shift characters.
Then it greedily removes words with low shift-to-non-shift ratios
until the overall ratio of shift characters to all characters exceeds 0.3.
Frequent shift characters count as 0.5 times other shift characters.

* `wikititles.json`: Thai Wikipedia titles

```bash
python3 filter.py wikititles -i -o data/wikititles.json -m 5 -M 15 -s 2 -r .3 -e .5
```

The titles were extracted by running [WikiExtractor](https://github.com/attardi/wikiextractor)
on the Thai Wikipedia dump (Oct 2021) and then running

```bash
grep -r 'title=".*"' -o -h text/ | sed 's/title="\(.*\)"/\1/' > wiki-titles.txt
```

* `wikititles-arabic.json`: Thai Wikipedia titles but with arabic numerals

```bash
python3 filter.py wikititles -i -o data/wikititles-arabic.json -m 5 -M 15 -s 2 -r .3 -e .5 -a
```

Note that arabic numerals don't count toward the score in the game.

* `names.json`: Thai names from PyThaiNLP

```bash
python3 filter.py thainames -o data/names.json -m 5 -M 15 -s 2 -r .3 -e .5
```

* `skoy.json`: Skoy language.
I wanted to use authentic Skoy, but the original Facebook page (sowhateiei: ษม่ค่ล์มนิ๋ญฒสก๊อย) has been nuked.
(Skoy language is pretty old, you know.)
So I manually collected parallel sentences from the remnants of the language usage.