https://github.com/ppke-nlpg/emmorphpy

A wrapper, a lemmatizer and REST API implemented in Python for emMorph (Humor) Hungarian morphological analyzer
https://github.com/ppke-nlpg/emmorphpy

Last synced: about 2 months ago
JSON representation

A wrapper, a lemmatizer and REST API implemented in Python for emMorph (Humor) Hungarian morphological analyzer

Host: GitHub
URL: https://github.com/ppke-nlpg/emmorphpy
Owner: ppke-nlpg
License: lgpl-3.0
Created: 2018-06-04T14:52:41.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2019-11-06T20:30:23.000Z (over 5 years ago)
Last Synced: 2024-11-17T01:33:16.168Z (8 months ago)
Language: Python
Size: 25.6 MB
Stars: 3
Watchers: 7
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-hungarian-nlp - emMorphPy

README

        # __Warning: This repository might not contain the newest version of the source code. The development is continued at https://github.com/dlt-rilmta/emmorphpy__ 

---

# emMorphPy

A wrapper and lemmatizer implemented in Python for ___emMorph__ (Humor) Hungarian morphological analyzer_ 

## Requirements

  - (Included in this repository) The compiled FST (hu.hfstol): go to https://github.com/dlt-rilmta/emMorph for compilation details

  - (Included in this repository) The lemmatizer config file: available at https://github.com/dlt-rilmta/hunlp-GATE/blob/master/Lang_Hungarian/resources/hfst/hfst-wrapper.props

  - _hfst-lookup 0.6 (hfst 3.13.0)_ or higher: On Ubuntu 18.04 LTS or higher just `sudo apt install hfst`

  - Python 3 (>=3.5, tested with 3.6)

  - Pip to install the additional requirements in requirements.txt

  - (Optional) a cloud service like [Heroku](https://heroku.com) for hosting the API

## Features

 - Stemming and returning the detailed morphological analyses with the proper transducer and config file

 - Handling extra and exceptional lexicons statically and dynamically (see [emmorphpy/emmorphpy.py](https://github.com/ppke-nlpg/emmorphpy/blob/master/emmorphpy/emmorphpy.py) for details)

 - Can be used through REST API (using [xtsv](https://github.com/dlt-rilmta/xtsv)), or from Python as a library (see usage examples below)

## Install on local machine

  - Clone the repository

  - Run: `sudo pip3 install -r requirements.txt`

  - Use from Python

## Install to Heroku

  - Register

  - Download Heroku CLI

  - Login to Heroku from the CLI

  - Create an app

  - Clone the repository

  - Add Heroku as remote origin

  - Add APT buildpack: `heroku buildpacks:add --index 1 https://github.com/heroku/heroku-buildpack-apt`

  - Add Python buildpack: `heroku buildpacks:add --index 2 heroku/python`

  - Push the repository to Heroku

  - Enjoy!

## Usage

  - From browser or anyhow through the REST API:

     - Lemmatization: https://emmorph.herokuapp.com/stem/működik

     - Detailed analysis: https://emmorph.herokuapp.com/analyze/működik

     - Lemmatisation with the corresponding detailed analysis: https://emmorph.herokuapp.com/dstem/működik

     - The library also support HTTP POST requests to handle multiple words at once. (See examples for details.)

	```python

	>>> import requests

	>>> import json

	>>> word = 'működik'

	>>> json.loads(requests.get('https://emmorph.herokuapp.com/stem/' + word).text)[word]

	[{'lemma': 'működik', 'tag': '[/V][Prs.Def.3Pl]'}, {'lemma': 'működik', 'tag': '[/V][Prs.NDef.3Sg]'}]

	>>> json.loads(requests.get('https://emmorph.herokuapp.com/analyze/' + word).text)[word]

	[{'morphana': 'működik[/V]=működ+ik[Prs.Def.3Pl]=ik'}, {'morphana': 'működik[/V]=működ+ik[Prs.NDef.3Sg]=ik'}]

	>>> json.loads(requests.get('https://emmorph.herokuapp.com/dstem/' + word).text)[word]

    [{'lemma': 'működik', 'tag': '[/V][Prs.Def.3Pl]', 'morphana': 'működik[/V]=működ+ik[Prs.Def.3Pl]=ik', 'readable': 'működik[/V]=működ + ik[Prs.Def.3Pl]', 'twolevel': 'm:m ű:ű k:k ö:ö d:d :i :k :[/V] i:i k:k :[Prs.Def.3Pl]'}, {'lemma': 'működik', 'tag': '[/V][Prs.NDef.3Sg]', 'morphana': 'működik[/V]=működ+ik[Prs.NDef.3Sg]=ik', 'readable': 'működik[/V]=működ + ik[Prs.NDef.3Sg]', 'twolevel': 'm:m ű:ű k:k ö:ö d:d :i :k :[/V] i:i k:k :[Prs.NDef.3Sg]'}]

	>>> words = '\n'.join(('form', word, 'word2', ''))  # One word per line (first line is header, trailing newline is needed!)

	>>> words_out = requests.post('https://emmorph.herokuapp.com/stem', files={'file': words}).text.split('\n')

	>>> print(words_out[1].split('\t'))

	['működik', '[{"lemma": "működik", "tag": "[/V][Prs.Def.3Pl]"}, {"lemma": "működik", "tag": "[/V][Prs.NDef.3Sg]"}]']

	>>> words_out = requests.post('https://emmorph.herokuapp.com/analyze', files={'file': words}).text.split('\n')

	>>> print(words_out[1].split('\t'))

	['működik', '[{"morphana": "működik[/V]=működ+ik[Prs.Def.3Pl]=ik"}, {"morphana": "működik[/V]=működ+ik[Prs.NDef.3Sg]=ik"}]']

    >>> words_out = requests.post('https://emmorph.herokuapp.com/dstem', files={'file': words}).text.split('\n')

	>>> print(words_out[1].split('\t'))

	['működik', '[{"lemma": "működik", "tag": "[/V][Prs.Def.3Pl]", "morphana": "működik[/V]=működ+ik[Prs.Def.3Pl]=ik", "readable": "működik[/V]=működ + ik[Prs.Def.3Pl]", "twolevel": "m:m ű:ű k:k ö:ö d:d :i :k :[/V] i:i k:k :[Prs.Def.3Pl]"}, {"lemma": "működik", "tag": "[/V][Prs.NDef.3Sg]", "morphana": "működik[/V]=működ+ik[Prs.NDef.3Sg]=ik", "readable": "működik[/V]=működ + ik[Prs.NDef.3Sg]", "twolevel": "m:m ű:ű k:k ö:ö d:d :i :k :[/V] i:i k:k :[Prs.NDef.3Sg]"}]']

	```

 

  - From Python:

	```python

	>>> import emmorphpy.emmorphpy as emmorph

	>>> m = emmorph.EmMorphPy()

	>>> m.stem('működik')     # Returns list of lemmatisations (stem and tag pairs)

	[('működik', '[/V][Prs.Def.3Pl]'), ('működik', '[/V][Prs.NDef.3Sg]')]

	>>> m.analyze('működik')  # Returns list of detailed analyzes (word by morphemes)

	['működik[/V]=működ+ik[Prs.Def.3Pl]=ik', 'működik[/V]=működ+ik[Prs.NDef.3Sg]=ik']

	>>> m.dstem('működik')    # Returns list of lemmatisations with the corresponding detailed analyzes (stem, tag and detailed analyzes triples)

	[('működik', '[/V][Prs.Def.3Pl]', 'működik[/V]=működ+ik[Prs.Def.3Pl]=ik', 'működik[/V]=működ + ik[Prs.Def.3Pl]', 'm:m ű:ű k:k ö:ö d:d :i :k :[/V] i:i k:k :[Prs.Def.3Pl]'), ('működik', '[/V][Prs.NDef.3Sg]', 'működik[/V]=működ+ik[Prs.NDef.3Sg]=ik', 'működik[/V]=működ + ik[Prs.NDef.3Sg]', 'm:m ű:ű k:k ö:ö d:d :i :k :[/V] i:i k:k :[Prs.NDef.3Sg]')]

	>>> # Add new analyses to the lexicon (Not a paradigm, but a single analysis!) Format: [('STEM', 'TAG', 'DETAILED_ANALYSIS', 'HFST-OUTPUT')]

	>>> m.lexicon['Obamával'] = [('Obama', '[/N][Nom]', '', ''), ('Obam', '[/N][Nom]', '', ''), ('Obamá', '[/N][Nom]', '', '')]

	>>> # Add new exceptions to the lexicon (Exact matches will be filtered out ASAP!) Format: ('HFST-OUTPUT')

	>>> m.exceptions['almával'] = {'a:a l:l :o m:m :[/N] á:a :[Poss.3Sg] v:v a:a l:l :[Ins]'}  

	```

## License

This Python wrapper, the lemmatizer implementation is licensed under the LGPL 3.0 license.

xtsv, HFST, the database and the lemmatizer configuration has their own license.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ppke-nlpg/emmorphpy

Awesome Lists containing this project

README