Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lopuhin/sensefreq
Sense frequencies and WSD
https://github.com/lopuhin/sensefreq
Last synced: about 1 month ago
JSON representation
Sense frequencies and WSD
- Host: GitHub
- URL: https://github.com/lopuhin/sensefreq
- Owner: lopuhin
- Created: 2016-02-12T20:14:08.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2022-10-15T14:19:24.000Z (about 2 years ago)
- Last Synced: 2024-09-19T23:18:50.382Z (about 2 months ago)
- Language: Python
- Size: 1.2 MB
- Stars: 4
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
Sense frequencies and WSD
=========================This repository contains scripts and expriments related to the
`Sense frequencies project `_, and an ``rlwsd``
python package for WSD (word sense disambiguation) for Russian language.rlwsd package
-------------This package can perform WSD for Russian nouns described in the
of Active Dictionary of Russian (currently, only the first volume is published
with letters "А" - "Г").Installation
~~~~~~~~~~~~The package currently works only on CPython 3.4+. Install with pip::
pip3 install rlwsd
The package requires models that are not hosted on PyPI and most be
downloaded separately (about 2.3 Gb total)::python3 -m rlwsd.download
Models are re-downloaded even if they are already present.
In case of problems (download does not finish, etc.) you can download models
manually from ``rlwsd.download.MODELS_URL``
and extract them into the ``models`` folder inside ``rlwsd`` (package) folder.Usage
~~~~~Most functionality is provided by the model class. Model for each word
must be loaded separately::>>> import rlwsd
>>> model = rlwsd.SphericalModel.load('альбом')
>>> model.senses
{'1': {'meaning': 'Вещь в виде большой тетради ...',
'name': 'альбом 1'},
'2': {'meaning': 'Книга тематически связанных изобразительных материалов ...',
'name': 'альбом 2.1'},
'3': {'meaning': 'Собрание музыкальных произведений ...',
'name': 'альбом 2.2'}}
>>> model.disambiguate('она задумчиво листала', 'альбом', 'с фотографиями')
'2'You can also get a list of all words with models::
>>> import rlwsd
>>> rlwsd.list_words()
['абрикос',
'абсурд',
'авангард',
...
'гусь',
'гуща']A large word2vec model is used internally. By default it is loaded once,
one the first call to ``.disambiguate`` method, which takes noticeable time.
There is an option to load word2vec
model in a separate process by running ``w2v-server`` command, which starts
a server, and exporting ``W2VSRV`` environment variable with any non-empty value::# in the first terminal window
$ w2v-server
running...
# in the second terminal window
$ export W2VSRV=yes
$ pythonIn this way you can leave the ``w2v-server`` running and save time on word2vec
model reloads.License
~~~~~~~License is MIT