Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yjmade/word2vec
Automatically exported from code.google.com/p/word2vec
https://github.com/yjmade/word2vec
Last synced: 24 days ago
JSON representation
Automatically exported from code.google.com/p/word2vec
- Host: GitHub
- URL: https://github.com/yjmade/word2vec
- Owner: yjmade
- License: apache-2.0
- Created: 2016-03-25T10:48:50.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-03-25T10:51:22.000Z (almost 9 years ago)
- Last Synced: 2024-11-05T13:17:55.346Z (2 months ago)
- Language: C
- Size: 122 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 35
-
Metadata Files:
- Readme: README.txt
- License: LICENSE
Awesome Lists containing this project
README
Tools for computing distributed representtion of words
------------------------------------------------------We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.
Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:
- desired vector dimensionality
- the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
- training algorithm: hierarchical softmax and / or negative sampling
- threshold for downsampling the frequent words
- number of threads to use
- the format of the output word vector file (text or binary)Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets.
The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
is finished, the user can interactively explore the similarity of the words.More information about the scripts is provided at https://code.google.com/p/word2vec/