Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with wikipedia-corpus
A curated list of projects in awesome lists tagged with wikipedia-corpus .
https://github.com/GermanT5/wikipedia2corpus
Wikipedia text corpus for self-supervised NLP model training
corpus german-nlp machine-learning nlp somajo wikipedia wikipedia-corpus
Last synced: 31 Oct 2024
https://github.com/macbre/mediawiki-dump
Python package for working with MediaWiki XML content dumps
fandom mediawiki-dump python python3-library wikia wikipedia wikipedia-corpus wikipedia-dump xml-dump
Last synced: 02 Nov 2024
https://github.com/ayushidalmia/wikipedia-search-engine
Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in real time.
information-retrieval python search-engine wikipedia-corpus
Last synced: 09 Nov 2024
https://github.com/tomeraberbach/wikipedia-ngrams
📚 A Kotlin project which extracts ngram counts from Wikipedia data dumps.
cli extracts-ngram-counts kotlin ngram ngrams nlp wikiextractor wikipedia wikipedia-corpus wikipedia-data-dump wikipedia-dump wikipedia-ngrams
Last synced: 07 Nov 2024
https://github.com/macbre/faroese-corpus
Some Faroese language statistics taken from fo.wikipedia.org content dump
corpus-linguistics faroe faroese faroese-language linguistic-analysis linguistics python3-script wikipedia-corpus wikipedia-dump
Last synced: 09 Nov 2024