https://github.com/rajspeaks/deep-learning-approach-to-english-corpus-text-visualization-using-word2vec-model
English Corpus Text-Visualization using Word2Vec Model from Gensim. A mini project under the mentorship of Prof. Sandipan Ganguly, HIT-K.
https://github.com/rajspeaks/deep-learning-approach-to-english-corpus-text-visualization-using-word2vec-model
gensim gensim-library gensim-word2vec machine-learning ml natural-language-processing natural-language-understanding nlp nlp-machine-learning rajdeep-das rajspeaks text-mining visualization word2vec word2vec-algorithm word2vec-model
Last synced: 6 months ago
JSON representation
English Corpus Text-Visualization using Word2Vec Model from Gensim. A mini project under the mentorship of Prof. Sandipan Ganguly, HIT-K.
- Host: GitHub
- URL: https://github.com/rajspeaks/deep-learning-approach-to-english-corpus-text-visualization-using-word2vec-model
- Owner: Rajspeaks
- License: gpl-3.0
- Created: 2022-03-23T18:24:45.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-05-02T06:43:54.000Z (over 3 years ago)
- Last Synced: 2025-03-21T23:34:09.135Z (7 months ago)
- Topics: gensim, gensim-library, gensim-word2vec, machine-learning, ml, natural-language-processing, natural-language-understanding, nlp, nlp-machine-learning, rajdeep-das, rajspeaks, text-mining, visualization, word2vec, word2vec-algorithm, word2vec-model
- Language: Jupyter Notebook
- Homepage:
- Size: 198 KB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# English Corpus Text Visualization using Word2Vec Model
Machine Learning approach to English Corpus Text-visualization using Word2Vec Model from Gensim Library in NLP.
This project was done to test the accuracy of the Word2Vec Model on English Corpus.## Library requirements:
1. Sklearn: Used for data preprocessing, model selection, classification, Regression, clustering.
2. Matplotlib: It's used for 2D or 3D plotting to show Histogram, Bar-Chart etc
3. Gensim: Open Source Library used in Text Analysis, Word2Vec, Doc2Vec.
4. Used Melon Honey font & sample texts are collected from the Internet.## Word2Vec
Word2Vec model is used in word embedding. I have used here Gensim library & Matplotlib-pyplot for 2d visualization of corpus.
## Methodology:
1. First I took an English Corpus applied punctuation remover.
2. Splitted the data & visualized the corpus using.
3. Repeated the Process taking larger corpus.## Tools:
1. Google Colab/Jupyter Notebook
2. Language: Python
3. Word2Vec from Gensim
4. Matplotlib | Plyplot### Mentor
Prof. Sandipan Ganguly, HIT-K.### Developer
Rajdeep Das### Thank you