Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lopuhin/ruslang-wsd-labeled

Labeled contexts of Russian polysemous words
https://github.com/lopuhin/ruslang-wsd-labeled

Last synced: about 1 month ago
JSON representation

Labeled contexts of Russian polysemous words

Host: GitHub
URL: https://github.com/lopuhin/ruslang-wsd-labeled
Owner: lopuhin
Created: 2016-02-20T18:47:46.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-08-28T19:11:11.000Z (about 7 years ago)
Last Synced: 2024-08-16T12:47:05.365Z (3 months ago)
Language: Python
Size: 1.28 MB
Stars: 2
Watchers: 4
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.rst

Awesome Lists containing this project

README

        Labeled Russian Context for WSD

===============================

Contexts sampled from RuTenTen and RNC. Sense definitions from Active Dictionary.

Some words have two annotators. Number of contexts is 100 for most words

and 500 for 7 words.

Annotators (words):

- Анастасия Лопухина (47)

- Константин Лопухин (11)

- Александра Удальцова (2)

- Анастасия К. (2)

- Анна Кот (2)

- Анна Татаренко (2)

- Борис Иомдин (2)

- Иван Самойленко (1)

Contexts are stored in ``rl_wsd_labeled/``::

    rl_wsd_labeled

    ├── adjectives

    │   └── RuTenTen

    ├── nouns

    │   ├── RNC

    │   └── RuTenTen

    └── verbs

        └── RuTenTen

A python interface is provided. Intall the package first::

    pip install rl_wsd_labeled

and then in order to get labeled contexts::

    >>> import rl_wsd_labeled

    >>> f = rl_wsd_labeled.contexts_filename('nouns', 'RuTenTen', 'горшок')

    >>> rl_wsd_labeled.get_contexts(f)

    ({'1': 'Округлый глиняный сосуд для приготовления пищи (печной горшок)',

      '2': 'Расширяющийся кверху сосуд с отверстием в дне (цветочный горшок)',

      '3': 'Ночной горшок'},

     [(('телевизор, - ковер, , - музыкальный центр, - стол, - аквариум, - 3 шкафа, - цветы в',

        ' горшках',

        ', - мелкие аксессуары.'),

      '2'),

      ...

      (('ибо настанет срок и оно будет разрушено течением времени либо войною, будто старый',

        ' горшок',

        ' с вином в трюме торгового корабля, попавшего в бурю и разбившегося о скалы.'),

      '1')

     ])

Apart from senses, there are two special annotations: "0" means

"I don't know/the context is unclear/the contexts is invalid", and "max sense + 1"

mean "other sense, not listed among given senses". Contexts marked as "0" or "other"

are not returned, unless ``with_skipped=True`` is passed.

If there was more then one annotator, contexts where annotators did not agree are also

not included. There is a function ``rl_wsd_labeled.get_agreement`` that returns the

ratio of senses where where both annotators gave either the

same concrete sense, or both skipped the senses (so "0" and "other" are considered equal).