Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/natasha/yargy

Rule-based facts extraction for Russian language
https://github.com/natasha/yargy

earley-parser information-extraction morphology nlp python russian tomita tomita-parser

Last synced: 1 day ago
JSON representation

Rule-based facts extraction for Russian language

Host: GitHub
URL: https://github.com/natasha/yargy
Owner: natasha
License: mit
Created: 2016-08-05T15:49:24.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2023-07-24T10:07:03.000Z (over 1 year ago)
Last Synced: 2024-04-27T23:37:08.199Z (9 months ago)
Topics: earley-parser, information-extraction, morphology, nlp, python, russian, tomita, tomita-parser
Language: Python
Homepage:
Size: 655 KB
Stars: 307
Watchers: 19
Forks: 41
Open Issues: 16
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        

![CI](https://github.com/natasha/yargy/actions/workflows/test.yml/badge.svg)

Yargy uses rules and dictionaries to extract structured information from Russian texts. Yargy is similar to Tomita parser.

## Install

Yargy supports Python 3.7+, PyPy 3, depends only on Pymorphy2.

```bash

$ pip install yargy

```

## Usage

```python

from yargy import Parser, rule, and_, not_

from yargy.interpretation import fact

from yargy.predicates import gram

from yargy.relations import gnc_relation

from yargy.pipelines import morph_pipeline

Name = fact(

    'Name',

    ['first', 'last'],

)

Person = fact(

    'Person',

    ['position', 'name']

)

LAST = and_(

    gram('Surn'),

    not_(gram('Abbr')),

)

FIRST = and_(

    gram('Name'),

    not_(gram('Abbr')),

)

POSITION = morph_pipeline([

    'управляющий директор',

    'вице-мэр'

])

gnc = gnc_relation()

NAME = rule(

    FIRST.interpretation(

        Name.first

    ).match(gnc),

    LAST.interpretation(

        Name.last

    ).match(gnc)

).interpretation(

    Name

)

PERSON = rule(

    POSITION.interpretation(

        Person.position

    ).match(gnc),

    NAME.interpretation(

        Person.name

    )

).interpretation(

    Person

)

parser = Parser(PERSON)

match = parser.match('управляющий директор Иван Ульянов')

print(match)

Person(

    position='управляющий директор',

    name=Name(

        first='Иван',

        last='Ульянов'

    )

)

```

## Documentation

All materials are in Russian:

* Overview

* Video from workshop

* Getting started

* Reference

* Cookbook

* Examples

* Code snippets

## Support

- Chat — https://t.me/natural_language_processing

- Issues — https://github.com/natasha/yargy/issues

- Commercial support — https://lab.alexkuk.ru

## Development

Dev env

```bash

brew install graphviz

python -m venv ~/.venvs/natasha-yargy

source ~/.venvs/natasha-yargy/bin/activate

pip install -r requirements/dev.txt

pip install -e .

python -m ipykernel install --user --name natasha-yargy

```

Test + lint

```bash

make test

```

Update docs

```bash

make exec-docs

# Manually check git diff docs/, commit

```

Release

```bash

# Update setup.py version

git commit -am 'Up version'

git tag v0.16.0

git push

git push --tags

# Github Action builds dist and publishes to PyPi

```