https://github.com/marty1885/eoparser
Word level Esperanto parser - helps computer understand the structure of Esperanto
https://github.com/marty1885/eoparser
conlang constructed-language esperanto language-parsing nlp
Last synced: 3 months ago
JSON representation
Word level Esperanto parser - helps computer understand the structure of Esperanto
- Host: GitHub
- URL: https://github.com/marty1885/eoparser
- Owner: marty1885
- License: bsd-3-clause
- Created: 2021-01-09T08:15:14.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-03-15T08:10:00.000Z (over 2 years ago)
- Last Synced: 2025-03-27T02:30:47.044Z (7 months ago)
- Topics: conlang, constructed-language, esperanto, language-parsing, nlp
- Language: Python
- Homepage:
- Size: 97.7 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# EOParser
|  |
|--------------------------------------------------------------|
| EOParser parsing an article from Vikipedio |## What is it?
`EOParser` is a python library that decomposes Esperanto words into it's particles. This is possible soly due to Esperanto's regular grammar! For example, the word "malsanulo" (sick person) will be decomposed into `mal` (opposite of), `san` (health), `ul` (suffix for a person) and `o` (marking it as a noun).
## Really, what is it?
Techincally `EOParser` is a handwritten GLL parser and disambiguator with a big list of poiible prefixes, suffixes, roots, etc... It uses the LL parser to parse words into it's particles.
## Limitations
EOParser is a word-level analyser. It does not care about the overall grammar and context of a sentence. Hence it may not disambiguate gwords correctly in some cases.
## How to use
```python
>>> import eoparser
>>> parser = eoparser.eoparser()
>>> parser.parse('geparto') # English: parent
[['ge', 'prefix'], ['part', 'root']]
```The `eoparser.parse` method return a list of lists. Storing the particles and their in type order. In the example, `geparto` is decomposed into `ge` (the prefix for both sexes), `part` (the word root for parent) and the part-of-speech marker is omited. Meaning both father and mother.
Use the `keep_ending_marker` if you wish to keep the `-o` part of speech marker.
```python
>>> parser.parse('geparto', keep_ending_marker=True)
[['ge', 'prefix'], ['part', 'root'], ['o', 'pos_marker']]
```This is the list of particle types
| Types |
|----------------------------|
| root |
| pos_marker |
| suffix |
| prefix |
| word (special single word) |### Using a different dictionary
EOParser uses severial dictionaries to track which particles are which. By default it loads from wherever the library lives.
```python
parser = eoparser.eoparser(numbers="my_list_of_numbers.txt")
# Or
my_list_of_numbers = load_numbers()
parser = eoparser.eoparser(numbers=list(my_list_of_numbers))
```| Parameter |
|------------|
| numbers |
| prefixes |
| suffixes |
| roots |
| full_words |
|correlatives|## Contribution
EOParser is by no means perfect! Feel free to submit an issue, open a PR or anything!