https://github.com/t-systems-on-site-services-gmbh/fasttext-on-wikipedia
fastText trained on Wikipedia text corpus
https://github.com/t-systems-on-site-services-gmbh/fasttext-on-wikipedia
Last synced: 2 months ago
JSON representation
fastText trained on Wikipedia text corpus
- Host: GitHub
- URL: https://github.com/t-systems-on-site-services-gmbh/fasttext-on-wikipedia
- Owner: t-systems-on-site-services-gmbh
- Created: 2022-02-23T12:34:12.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-03-04T08:24:00.000Z (about 3 years ago)
- Last Synced: 2025-01-21T13:11:18.285Z (4 months ago)
- Size: 4.88 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# fastText on Wikipedia
In this repository we publish several fastText embeddings trained on Wikipedia data.
Used software and data:
- fastText: [v0.9.2](https://github.com/facebookresearch/fastText/releases/tag/v0.9.2)
- Wikipedia text corpus from: [GermanT5/wikipedia2corpus](https://github.com/GermanT5/wikipedia2corpus)## commands
- `fasttext skipgram -input data/dewiki-20220201-clean.txt -output de-wikipedia-skipgram-64 -dim 64`
- `fasttext skipgram -input data/ft-train-de/train.txt -output de-wikipedia-skipgram-64 -dim 64 -autotune-validation data/ft-train-de/val.txt -autotune-duration 172800`
- `fasttext skipgram -input data/ft-train-en/train.txt -output en-wikipedia-skipgram-64 -dim 64 -autotune-validation data/ft-train-en/val.txt -autotune-duration 345600`