https://github.com/assad2008/word2vec

Automatically exported from code.google.com/p/word2vec
https://github.com/assad2008/word2vec

Last synced: about 1 month ago
JSON representation

Automatically exported from code.google.com/p/word2vec

Host: GitHub
URL: https://github.com/assad2008/word2vec
Owner: assad2008
License: apache-2.0
Created: 2015-04-04T08:56:07.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2015-04-04T08:58:40.000Z (about 10 years ago)
Last Synced: 2025-01-27T06:12:15.322Z (3 months ago)
Language: C
Size: 258 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 27
Metadata Files:
- Readme: README.txt
- License: LICENSE

Awesome Lists containing this project

README

Tools for computing distributed representtion of words
------------------------------------------------------

We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.

Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:
- desired vector dimensionality
- the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
- training algorithm: hierarchical softmax and / or negative sampling
- threshold for downsampling the frequent words
- number of threads to use
- the format of the output word vector file (text or binary)

Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets.

The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
is finished, the user can interactively explore the similarity of the words.

More information about the scripts is provided at https://code.google.com/p/word2vec/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/assad2008/word2vec

Awesome Lists containing this project

README