https://github.com/runtime-error786/text-vectorization
This repository demonstrates various text vectorization techniques including Bag of Words (BoW), TF-IDF, N-grams, and Word2Vec (CBOW,SKIPGRAM) using nltk,Gensim and Scikit-Learn. The steps outlined here show how to convert textual data into numerical vectors, which are essential for machine learning models.
https://github.com/runtime-error786/text-vectorization
Last synced: 4 months ago
JSON representation
This repository demonstrates various text vectorization techniques including Bag of Words (BoW), TF-IDF, N-grams, and Word2Vec (CBOW,SKIPGRAM) using nltk,Gensim and Scikit-Learn. The steps outlined here show how to convert textual data into numerical vectors, which are essential for machine learning models.
- Host: GitHub
- URL: https://github.com/runtime-error786/text-vectorization
- Owner: runtime-error786
- Created: 2025-01-01T10:52:06.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-02T21:47:46.000Z (about 1 year ago)
- Last Synced: 2025-03-23T19:17:26.126Z (10 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 56.1 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Text Vectorization Techniques using nltk,Gensim and Scikit-Learn
This repository demonstrates various text vectorization techniques including Bag of Words (BoW), TF-IDF, N-grams, and Word2Vec (CBOW) using nltk,Gensim and Scikit-Learn. The steps outlined here show how to convert textual data into numerical vectors, which are essential for machine learning models.Word2Vec is a popular word embedding technique that uses either Continuous Bag of Words (CBOW) or Skip-gram model to learn vector representations of words based on their context.The CBOW model predicts the target word based on context words, while the Skip-gram model does the reverse by using a word to predict its surrounding context.