https://github.com/multimeric/doc2vec_agg
Generates simple document vectors from word2vec embeddings
https://github.com/multimeric/doc2vec_agg
Last synced: 12 months ago
JSON representation
Generates simple document vectors from word2vec embeddings
- Host: GitHub
- URL: https://github.com/multimeric/doc2vec_agg
- Owner: multimeric
- License: gpl-3.0
- Created: 2019-08-31T08:34:28.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-08-31T14:15:11.000Z (over 6 years ago)
- Last Synced: 2025-03-21T05:23:30.755Z (about 1 year ago)
- Language: Python
- Size: 16.6 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# doc2vec_agg
## Installation
```bash
pip install git+https://github.com/TMiguelT/doc2vec_agg.git
```
## Usage
```python
from word2vec_agg.word2vec import docvector
# Generate the document vector
doc_vector = docvector(
word2vec='./GoogleNews-vectors-negative300.bin', # Path to pretrained doc2vec embeddings, in binary format
text=['passenger', 'terminal', 'building'], # Array of preprocessed tokens, representing the document
max=True, # True if you want the maximum of each dimension in the final output
mean=True, # True if you want the mean of each dimension in the final output
min=True # True if you want the minimum of each dimension in the final output
)
# Do operations with the vector
from scipy.spatial import distance
return distance.cosine(doc_vector_1, doc_vector_2)
```