https://github.com/nlpir-team/word2vec-cpp
https://github.com/nlpir-team/word2vec-cpp
word-embedding word-vectors word2vec
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nlpir-team/word2vec-cpp
- Owner: NLPIR-team
- Created: 2017-07-24T04:57:28.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-08-01T13:22:36.000Z (about 8 years ago)
- Last Synced: 2025-03-11T06:58:57.201Z (7 months ago)
- Topics: word-embedding, word-vectors, word2vec
- Language: C++
- Size: 6.3 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Word2vec/Word Embedding
The algorithm is optimized by the Word2vec that generate word vectors by given the words./特定词的词向量生成。
This method boosts the training speed and gets a better result./算法加速了训练过程,并且得到了更好的结果。# Word2vec API
1.初始化
int InitWord2vec(const char * data,const char * sLicenseCode);2.初始化训练参数
//size:词向量维度
//train:语料
//entity:词典
//model:CBOW:1和SKip-gram:0
//outfile:输出模型
//threads:线程数
int InitPara(int size,const char * train,const char * entity,int model,float alp,const char * outfile,int win,float sam,int h,int neg,int threads, int it,int mincount);3.训练
//训练
int Train();4.加载模型
//model模型位置
int LoadModel(const char * model);5.计算词向量
//word:输入词,N输出词个数
const char * CalculateWord(const char * word,int N);#Build Train File
1.train file
Using the NLPIR, the corpus contains words that like english that splited by the " " without POS info.
//使用中科院分词,将词以空格分开,存入文件,生成语料。2.make dict
We optimized the algorithm by add dict which will be projected to the vector space, and the word out of the dict will boost the training speed.
//将需要生成词向量的词整理到词典,以加速训练,也可将所有词作为词典3.set the out model
The outfile is the model the generated by the method.
//设置输出模型