https://github.com/dbklim/russian_subtitles_dataset

Preprocessing of the dataset of 347 subtitles for the TV series (thanks to Taiga Corpus) to build a word2vec model, JamSpell model, neural network training, chat bot training or in any other NLP task.
https://github.com/dbklim/russian_subtitles_dataset

bot cnn corpus dataset lstm machine-learning ml natural-language-processing nlp nlu rnn russian subtitles text text-analysis text-processing word2vec

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/dbklim/russian_subtitles_dataset
Owner: dbklim
License: apache-2.0
Created: 2019-02-20T15:26:36.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-06-14T18:44:55.000Z (about 7 years ago)
Last Synced: 2025-04-05T17:51:13.481Z (over 1 year ago)
Topics: bot, cnn, corpus, dataset, lstm, machine-learning, ml, natural-language-processing, nlp, nlu, rnn, russian, subtitles, text, text-analysis, text-processing, word2vec
Language: Python
Homepage: https://tatianashavrina.github.io/taiga_site/downloads
Size: 135 MB
Stars: 23
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dbklim/russian_subtitles_dataset

Awesome Lists containing this project