Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lopuhin/sensefreq

Sense frequencies and WSD
https://github.com/lopuhin/sensefreq

Last synced: about 1 month ago
JSON representation

Sense frequencies and WSD

Host: GitHub
URL: https://github.com/lopuhin/sensefreq
Owner: lopuhin
Created: 2016-02-12T20:14:08.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2022-10-15T14:19:24.000Z (about 2 years ago)
Last Synced: 2024-09-19T23:18:50.382Z (about 2 months ago)
Language: Python
Size: 1.2 MB
Stars: 4
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.rst

Awesome Lists containing this project

README

        Sense frequencies and WSD

=========================

This repository contains scripts and expriments related to the

`Sense frequencies project `_, and an ``rlwsd``

python package for WSD (word sense disambiguation) for Russian language.

rlwsd package

-------------

This package can perform WSD for Russian nouns described in the

of Active Dictionary of Russian (currently, only the first volume is published

with letters "А" - "Г").

Installation

~~~~~~~~~~~~

The package currently works only on CPython 3.4+. Install with pip::

    pip3 install rlwsd

The package requires models that are not hosted on PyPI and most be

downloaded separately (about 2.3 Gb total)::

    python3 -m rlwsd.download

Models are re-downloaded even if they are already present.

In case of problems (download does not finish, etc.) you can download models

manually from ``rlwsd.download.MODELS_URL``

and extract them into the ``models`` folder inside ``rlwsd`` (package) folder.

Usage

~~~~~

Most functionality is provided by the model class. Model for each word

must be loaded separately::

    >>> import rlwsd

    >>> model = rlwsd.SphericalModel.load('альбом')

    >>> model.senses

    {'1': {'meaning': 'Вещь в виде большой тетради ...',

           'name': 'альбом 1'},

     '2': {'meaning': 'Книга тематически связанных изобразительных материалов ...',

           'name': 'альбом 2.1'},

     '3': {'meaning': 'Собрание музыкальных произведений ...',

           'name': 'альбом 2.2'}}

    >>> model.disambiguate('она задумчиво листала', 'альбом', 'с фотографиями')

    '2'

You can also get a list of all words with models::

    >>> import rlwsd

    >>> rlwsd.list_words()

    ['абрикос',

     'абсурд',

     'авангард',

     ...

     'гусь',

     'гуща']

A large word2vec model is used internally. By default it is loaded once,

one the first call to ``.disambiguate`` method, which takes noticeable time.

There is an option to load word2vec

model in a separate process by running ``w2v-server`` command, which starts

a server, and exporting ``W2VSRV`` environment variable with any non-empty value::

    # in the first terminal window

    $ w2v-server

    running...

    # in the second terminal window

    $ export W2VSRV=yes

    $ python

In this way you can leave the ``w2v-server`` running and save time on word2vec

model reloads.

License

~~~~~~~

License is MIT