https://github.com/besrym/words-embedding-visualization
a simple python script to train and visualize a WordsEmbedding
https://github.com/besrym/words-embedding-visualization
embedding matplotlib nlp numpy python seaborn sklearn torch tsne-visualization
Last synced: about 2 months ago
JSON representation
a simple python script to train and visualize a WordsEmbedding
- Host: GitHub
- URL: https://github.com/besrym/words-embedding-visualization
- Owner: besrym
- Created: 2022-01-29T16:30:56.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-01-30T23:41:42.000Z (over 3 years ago)
- Last Synced: 2025-02-08T16:25:33.413Z (3 months ago)
- Topics: embedding, matplotlib, nlp, numpy, python, seaborn, sklearn, torch, tsne-visualization
- Language: Python
- Homepage:
- Size: 435 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Words Embedding & Visualization
# Installation
```
conda create -n we-venv -y python=3.7 && conda activate we-venv
pip install -r requirements.txt
```
# UsageInsert project path e.g. :
```
/Users/your_name/Desktop/WordsEmbedding
```
then you can choose between a tiny dataset (100 sentences) and a big dataset (15927 sentences).
Just uncomment the variable which you want to use in main.py. This script is computationally intensive
and was written for educational purpose. I would recomment to use the tiny dataset.then run the script:
```
python main.py
```# Training
i used:
- optimizer: Stochastic gradient descent
- epochs: 10
- learning rate: 0.01
- momentum: 0.9
for targetword in sentence:
for contextword arround targetword:
embedding = matmul(E, targetword)
tmp = matmul(W, embedding)
predicted_contextword = softmax(tmp)
minimize(predicted_contextword, contextword)# Visualization
For the visualization I used the dimension reduction method T-SNE.
# Contact
If you have any Input for me to make the training more efficient or better feel free to contact me.