https://github.com/nlpir-team/word2vec-cpp

word-embedding word-vectors word2vec

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/nlpir-team/word2vec-cpp
Owner: NLPIR-team
Created: 2017-07-24T04:57:28.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2017-08-01T13:22:36.000Z (about 8 years ago)
Last Synced: 2025-03-11T06:58:57.201Z (7 months ago)
Topics: word-embedding, word-vectors, word2vec
Language: C++
Size: 6.3 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Word2vec/Word Embedding

The algorithm is optimized by the Word2vec that generate word vectors by given the words./特定词的词向量生成。

This method boosts the training speed and gets a better result./算法加速了训练过程，并且得到了更好的结果。

# Word2vec API

1.初始化

int InitWord2vec(const char * data,const char * sLicenseCode);

2.初始化训练参数

//size：词向量维度

//train:语料

//entity:词典

//model：CBOW：1和SKip-gram：0

//outfile:输出模型

//threads：线程数

int InitPara(int size,const char * train,const char * entity,int model,float alp,const char * outfile,int win,float sam,int h,int neg,int threads, int it,int mincount);

3.训练

//训练

int Train();

4.加载模型

//model模型位置

int LoadModel(const char * model);

5.计算词向量

//word：输入词，N输出词个数

const char * CalculateWord(const char * word,int N);

#Build Train File

1.train file

Using the NLPIR, the corpus contains words that like english that splited by the " " without POS info.

//使用中科院分词，将词以空格分开，存入文件，生成语料。

2.make dict

We optimized the algorithm by add dict which will be projected to the vector space, and the word out of the dict will boost the training speed.

//将需要生成词向量的词整理到词典，以加速训练，也可将所有词作为词典

3.set the out model

The outfile is the model the generated by the method.

//设置输出模型

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nlpir-team/word2vec-cpp

Awesome Lists containing this project

README