Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/GermanT5/wikipedia2corpus
Wikipedia text corpus for self-supervised NLP model training
https://github.com/GermanT5/wikipedia2corpus
corpus german-nlp machine-learning nlp somajo wikipedia wikipedia-corpus
Last synced: 13 days ago
JSON representation
Wikipedia text corpus for self-supervised NLP model training
- Host: GitHub
- URL: https://github.com/GermanT5/wikipedia2corpus
- Owner: GermanT5
- License: mit
- Created: 2022-02-20T19:05:24.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-07-17T07:25:32.000Z (over 2 years ago)
- Last Synced: 2024-08-01T19:35:12.470Z (3 months ago)
- Topics: corpus, german-nlp, machine-learning, nlp, somajo, wikipedia, wikipedia-corpus
- Language: Python
- Homepage:
- Size: 49.8 KB
- Stars: 36
- Watchers: 2
- Forks: 3
- Open Issues: 0