An open API service indexing awesome lists of open source software.

https://github.com/t-systems-on-site-services-gmbh/fasttext-on-wikipedia

fastText trained on Wikipedia text corpus
https://github.com/t-systems-on-site-services-gmbh/fasttext-on-wikipedia

Last synced: 2 months ago
JSON representation

fastText trained on Wikipedia text corpus

Awesome Lists containing this project

README

        

# fastText on Wikipedia
In this repository we publish several fastText embeddings trained on Wikipedia data.
Used software and data:
- fastText: [v0.9.2](https://github.com/facebookresearch/fastText/releases/tag/v0.9.2)
- Wikipedia text corpus from: [GermanT5/wikipedia2corpus](https://github.com/GermanT5/wikipedia2corpus)

## commands
- `fasttext skipgram -input data/dewiki-20220201-clean.txt -output de-wikipedia-skipgram-64 -dim 64`
- `fasttext skipgram -input data/ft-train-de/train.txt -output de-wikipedia-skipgram-64 -dim 64 -autotune-validation data/ft-train-de/val.txt -autotune-duration 172800`
- `fasttext skipgram -input data/ft-train-en/train.txt -output en-wikipedia-skipgram-64 -dim 64 -autotune-validation data/ft-train-en/val.txt -autotune-duration 345600`