Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lopuhin/python-adagram
AdaGram (adaptive skip-gram) for Python
https://github.com/lopuhin/python-adagram
Last synced: 17 days ago
JSON representation
AdaGram (adaptive skip-gram) for Python
- Host: GitHub
- URL: https://github.com/lopuhin/python-adagram
- Owner: lopuhin
- Created: 2016-07-27T22:57:10.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-05-09T08:33:42.000Z (over 7 years ago)
- Last Synced: 2024-10-14T12:51:28.616Z (about 1 month ago)
- Language: Python
- Size: 87.9 KB
- Stars: 74
- Watchers: 9
- Forks: 16
- Open Issues: 8
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
Python-AdaGram
==============Python-AdaGram is an implementation of AdaGram (adaptive skip-gram) for Python.
It borrows a lot of C code from the original AdaGram implementation in Julia
(https://github.com/sbos/AdaGram.jl). AdaGram was introduced in a paper by
Sergey Bartunov, Dmitry Kondrashkin, Anton Osokin and Dmitry Vetrov
at http://arxiv.org/abs/1502.07257.**Note**: this is a work in progress: it lacks tests,
and training is not working correctly yet.
But it can already load AdaGram.jl models, perform disambiguation,
search for nearest neighbours, etc.
If you have a more mature implementation or want to help,
please get in touch.Install
-------The package is not on PyPI yet, please install it from source in the meantime::
$ pip install Cython numpy
$ pip install git+https://github.com/lopuhin/python-adagram.gitUsage
-----Train a model from command line::
$ adagram-train tokenized.txt out.pkl
Input corpus must be already tokenized, with tokens (usually words)
separated by spaces.
There are many options available, see ``adagram-train --help``.Load model::
>>> import adagram
>>> vm = adagram.VectorModel.load('out.pkl')Get sense probabilities for some word::
>>> vm.word_sense_probs('apple')
[0.341832, 0.658164]Get sense neighbors::
>>> vm.sense_neighbors('apple', 0)
[('almond', 0, 0.70396507),
('cherry', 1, 0.69193166),
('plum', 0, 0.690269),
('apricot', 0, 0.6882005),
('orange', 3, 0.6739181),
('pecan', 0, 0.6662803),
('pomegranate', 0, 0.6580653)
('blueberry', 0, 0.6509351),
('pear', 0, 0.6484747),
('peach', 0, 0.6313036)]>>> vm.sense_neighbors('apple', 1)
[('macintosh', 0, 0.79053026),
('iifx', 0, 0.71349466),
('iigs', 0, 0.7030192),
('computers', 0, 0.6952761),
('kaypro', 0, 0.6938647),
('ipad', 0, 0.6914306),
('pc', 3, 0.6801078),
('ibm', 0, 0.66797054),
('powerpc-based', 0, 0.66319686),
('ibm-compatible', 0, 0.66120595)]Get sense vector::
>>> vm.sense_vector('apple', 1)
array([...], dtype=float32)Converting models built with AdaGram.jl
---------------------------------------First, install AdaGram.jl as described here https://github.com/sbos/AdaGram.jl.
Install JSON package::$ julia
julia> Pkg.add("JSON")Run the script that converts a julia model to JSON::
$ julia adagram/dump_julia.jl julia-model out-directory
This will save two JSON files to ``out-directory``.
Next, to convert model to python format, run::
$ ./adagram/load_julia.py out-directory model.joblib