Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dbklim/russian_subtitles_dataset
Preprocessing of the dataset of 347 subtitles for the TV series (thanks to Taiga Corpus) to build a word2vec model, JamSpell model, neural network training, chat bot training or in any other NLP task.
https://github.com/dbklim/russian_subtitles_dataset
bot cnn corpus dataset lstm machine-learning ml natural-language-processing nlp nlu rnn russian subtitles text text-analysis text-processing word2vec
Last synced: 3 months ago
JSON representation
Preprocessing of the dataset of 347 subtitles for the TV series (thanks to Taiga Corpus) to build a word2vec model, JamSpell model, neural network training, chat bot training or in any other NLP task.
- Host: GitHub
- URL: https://github.com/dbklim/russian_subtitles_dataset
- Owner: dbklim
- License: apache-2.0
- Created: 2019-02-20T15:26:36.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-06-14T18:44:55.000Z (over 5 years ago)
- Last Synced: 2024-08-07T23:28:02.262Z (6 months ago)
- Topics: bot, cnn, corpus, dataset, lstm, machine-learning, ml, natural-language-processing, nlp, nlu, rnn, russian, subtitles, text, text-analysis, text-processing, word2vec
- Language: Python
- Homepage: https://tatianashavrina.github.io/taiga_site/downloads
- Size: 135 MB
- Stars: 22
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE