Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/natasha/yargy
Rule-based facts extraction for Russian language
https://github.com/natasha/yargy
earley-parser information-extraction morphology nlp python russian tomita tomita-parser
Last synced: about 2 hours ago
JSON representation
Rule-based facts extraction for Russian language
- Host: GitHub
- URL: https://github.com/natasha/yargy
- Owner: natasha
- License: mit
- Created: 2016-08-05T15:49:24.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-07-24T10:07:03.000Z (over 1 year ago)
- Last Synced: 2024-04-27T23:37:08.199Z (7 months ago)
- Topics: earley-parser, information-extraction, morphology, nlp, python, russian, tomita, tomita-parser
- Language: Python
- Homepage:
- Size: 655 KB
- Stars: 307
- Watchers: 19
- Forks: 41
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![CI](https://github.com/natasha/yargy/actions/workflows/test.yml/badge.svg)
Yargy uses rules and dictionaries to extract structured information from Russian texts. Yargy is similar to Tomita parser.
## Install
Yargy supports Python 3.7+, PyPy 3, depends only on Pymorphy2.
```bash
$ pip install yargy
```## Usage
```python
from yargy import Parser, rule, and_, not_
from yargy.interpretation import fact
from yargy.predicates import gram
from yargy.relations import gnc_relation
from yargy.pipelines import morph_pipelineName = fact(
'Name',
['first', 'last'],
)
Person = fact(
'Person',
['position', 'name']
)LAST = and_(
gram('Surn'),
not_(gram('Abbr')),
)
FIRST = and_(
gram('Name'),
not_(gram('Abbr')),
)POSITION = morph_pipeline([
'управляющий директор',
'вице-мэр'
])gnc = gnc_relation()
NAME = rule(
FIRST.interpretation(
Name.first
).match(gnc),
LAST.interpretation(
Name.last
).match(gnc)
).interpretation(
Name
)PERSON = rule(
POSITION.interpretation(
Person.position
).match(gnc),
NAME.interpretation(
Person.name
)
).interpretation(
Person
)parser = Parser(PERSON)
match = parser.match('управляющий директор Иван Ульянов')
print(match)Person(
position='управляющий директор',
name=Name(
first='Иван',
last='Ульянов'
)
)```
## Documentation
All materials are in Russian:
* Overview
* Video from workshop
* Getting started
* Reference
* Cookbook
* Examples
* Code snippets## Support
- Chat — https://t.me/natural_language_processing
- Issues — https://github.com/natasha/yargy/issues
- Commercial support — https://lab.alexkuk.ru## Development
Dev env
```bash
brew install graphvizpython -m venv ~/.venvs/natasha-yargy
source ~/.venvs/natasha-yargy/bin/activatepip install -r requirements/dev.txt
pip install -e .python -m ipykernel install --user --name natasha-yargy
```Test + lint
```bash
make test
```Update docs
```bash
make exec-docs# Manually check git diff docs/, commit
```Release
```bash
# Update setup.py versiongit commit -am 'Up version'
git tag v0.16.0git push
git push --tags# Github Action builds dist and publishes to PyPi
```