https://github.com/fidelisrafael/esperanto-analyzer
Morphological and syntactic analysis of Esperanto sentences
https://github.com/fidelisrafael/esperanto-analyzer
analyzer esperanto grammar grammar-checker grammar-learning grammar-parser grammars language language-learning languages linguistic-analysis linguistics natural-language natural-language-interface natural-language-processing natural-language-programming python tokenizer
Last synced: 4 months ago
JSON representation
Morphological and syntactic analysis of Esperanto sentences
- Host: GitHub
- URL: https://github.com/fidelisrafael/esperanto-analyzer
- Owner: fidelisrafael
- License: bsd-2-clause
- Created: 2018-08-29T20:22:29.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2021-05-06T19:15:21.000Z (about 5 years ago)
- Last Synced: 2025-08-20T19:08:04.073Z (10 months ago)
- Topics: analyzer, esperanto, grammar, grammar-checker, grammar-learning, grammar-parser, grammars, language, language-learning, languages, linguistic-analysis, linguistics, natural-language, natural-language-interface, natural-language-processing, natural-language-programming, python, tokenizer
- Language: Python
- Homepage: https://pypi.org/project/esperanto-analyzer/
- Size: 209 KB
- Stars: 33
- Watchers: 4
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Esperanto Analyzer
----

## Build Status:
### Development:
[](https://travis-ci.com/fidelisrafael/esperanto-analyzer)
[](https://codecov.io/gh/fidelisrafael/esperanto-analyzer)
### Master:
[](https://travis-ci.com/fidelisrafael/esperanto-analyzer)
[](https://codecov.io/gh/fidelisrafael/esperanto-analyzer)
---
### Atendu! Kio estas Esperanto? (_Wait! What is Esperanto?_)
That is a fair question! Esperanto is the most widely spoken constructed international auxiliary language
(_conlang_) in the world. It was created back in **1887** by a polish-jewish guy named "Ludwik Lejzer Zamenhof"_(often refered as L.L Zamenhof)_. Zamenhof's goal was to **create an easy and flexible language** that would serve as a universal second language to foster peace and international understanding of people from all around the world.
The phonology, grammar, vocabulary, and semantics are based on the **Indo-European**(_Italian_,_Spanish_,_French_, _Catalan_, _Russian_, _German_...) languages spoken in Europe. The sound inventory is essentially **Slavic**, as is much of the semantics, whereas the vocabulary derives primarily from the **Romance languages**, with a lesser contribution from **Germanic languages** and minor contributions from **Slavic languages** and **Greek**.
The language has more than **130 years of history** and culture now, and a very active community as well.
Esperanto is a SUPER regular language, this means that the language does not have **irregular verbs** or **gender distinction for articles**, beside this Esperanto has only **16 grammar rules**.
For example, one of the rules: ALL **Nouns** MUST end with the vowel `o`, eg:
- `domo`
- `homo`
- `komputilo`
- `komputilisto`
Or **Adjectives** MUST end with the letter `a`, eg:
- `bela`
- `granda`
- `varma`
- `malvarma`
If you want to know (or learn) more about Esperanto, you should read the following links:
- [Esperanto at Wikipedia](https://www.wikiwand.com/en/Esperanto)
- [Kio estas Esperanto? (in Esperanto)](https://lernu.net/eo/esperanto) or in [English](https://lernu.net/es/esperanto)
- Esperanto course at Duolingo for: [[English speakers]](https://www.duolingo.com/course/eo/en/Learn-Esperanto-Online), [[Portuguese speakers]](https://www.duolingo.com/course/eo/pt/Learn-Esperanto-Online), [[Spanish speakers]](https://www.duolingo.com/course/eo/es/Learn-Esperanto-Online)
- [Esperanto course at Lernu.net](http://lernu.net/kurso)
- [Youtube serie: Esperanto estas...](https://www.youtube.com/watch?v=RlftmTm8I18&list=PL83728C14BFC5822F)
---
## About this project
The aim of this project is to create one tool that can read and grammarly classify Esperanto sentences.
The first part of project consists in **Morphological Analyzes** of Esperanto words, the next step is to create a **Syntactical Analyzer** for the language as well.
---
## How to use it?
### Demo
You can check it out the demo application built with React: [Online Demo](https://fidelisrafael.github.io/esperanto-analyzer-react/) or [Github Repository](https://github.com/fidelisrafael/esperanto-analyzer-react/)
[](https://fidelisrafael.github.io/esperanto-analyzer-react/)
Or you can try the demo API hosted on Heroku:
[https://esperanto-analyzer-api.herokuapp.com/analyze?sentence=Esperanto%20estas%20tre%20facila%20lingvo%20al%20lerni](
https://esperanto-analyzer-api.herokuapp.com/analyze?sentence=Esperanto%20estas%20tre%20facila%20lingvo%20al%20lerni)
---
## Installation
First, install it:
```bash
$ pip install esperanto-analyzer
```
## CLI usage:
[TODO] (Skip it for now)
Now you will have the libraries source-code files in your system, and also the executable `binary` through CLI, test it:
```bash
$ eo-analyzer --version
> Version: 0.0.1
```
```sh
$ eo-analyzer "Jen la alfabeto de Esperanto. Ĉiu litero ĉiam sonas same kaj literumado estas perfekte regula. Klaku la ekzemplojn por aŭdi la elparolon!"
```

Pretty cool humn?
## Python library usage
Ok, so now you want to import this library in your project, right? That's super simple, just drop these lines in your project:
### Morphological analyzes of sentences
```py
from esperanto_analyzer import MorphologicalSentenceAnalyzer
# Creates one instance to morphologically analyzes one sentence
analyzer = MorphologicalSentenceAnalyzer("Esperanto estas tre facila lingvo al lerni.")
analyzer.analyze() # => Returns True/False
# This is the simplest human-readable response of the morphological analyzes' results
print(analyzer.simple_results())
# => [['Esperanto', 'Noun'], ['estas', 'Verb'], ['tre', 'Adverb'], ['facila', 'Adjective'], ['lingvo', 'Noun'], ['al', 'Preposition'], ['lerni', 'Verb']]
```
But you can always deal with a more complex results set if you (or better, your software) want/need to:
```py
# The `#results()` method returns a Array object wirh a more complex structure than `#simple_results()` method
results = analyzer.analyzes_results()
first_analyze = results[0]
# Returns and Array object with `AnalyzeResult` objects
print(results)
# => [, ,(...)]
print(first_analyze)
# =>
# Rich and detailed results using `AnalyzeResult`
print(first_analyze.result)
# =>
# Get any information that you might need using the response objects API
print((first_analyze.result.raw_word, first_analyze.result.matches, first_analyze.result.word_class() ))
# => ('Esperanto', , )
```
---
### Morphological analyze of a single WORD
You can also use the internal analyzers of **words** if you want so, ex:
```py
from esperanto_analyzer.analyzers.morphological import AdjectiveMorphologicalAnalyzer, NumeralMorphologicalAnalyzer
# There's the total of `10` morphological analyzers, such as `VerbMorphologicalAnalyzer`, `NumeralMorphologicalAnalyzer`
analyzer = AdjectiveMorphologicalAnalyzer('belajn')
# If it returns true, that means that the inputed word is a valid adjective. False otherwise
analyzer.analyze() # => returns True/False
print(analyzer.matches)
# =>
print(analyzer.raw_word) # => 'belajn'
# The `word` property is one class object that inherits from the `Word` class.
print(analyzer.word)
# =>
# Get the base class name for the detected 'Part of Speech' class
print(analyzer.word.__class__.__name__) # => 'Adjective'
numeral_analyzer = NumeralMorphologicalAnalyzer('naŭcent')
numeral_analyzer.analyze() # => True
print(numeral_analyzer.word)
# =>
print(numeral_analyzer.matches)
# =>
```
---
### Parts of Speech: Word, Article, Adverb, Adjective, Verb...
You can even use the **Parts of Speech**(such as `Article`, `Adverb`, `Pronoun`, `Conjunction`) of the language:
```py
# `esperanto_analyzer.speech` is home for all parts-of-speech classes
from esperanto_analyzer.speech import Article
# Raises an `InvalidArticleError` Exception, since 'lo' is not an Esperanto article
article = Article('lo')
# 'La' is the ONLY valid article in Esperanto
valid_article = Article('la')
# All `esperanto_analyzer.speech` objects inherits from `esperanto_analyzer.speech.word.Word` class
print(valid_article.__class__.__bases__) # => (esperanto_analyzer.speech.word.Word,)
# La is invariable article, it's the same for plural and singular sentences, ex:
# 'La domo' # The house
# 'La domoj' # The houses
print(valid_article.plural) # => False
# You can provide some `context` when creating the `Part of Speech` so it can determine if the word should be in plural or singular, eg:
print(Article('la', 'domoj').plural) # => True
```
---
## Development Setup
Clone this repository:
```bash
$ git clone https://github.com/fidelisrafael/esperanto-analyzer.git
$ cd esperanto-analyzer
```
Make sure you have `python` >= `3.7.0` and `virtualenv` >= `16.0.0` installed:
```bash
$ python --version
> Python 3.7.0
$ virtualenv --version
> 16.0.0
```
Otherwise, [install it](https://virtualenv.pypa.io/en/stable/installation/).
Then, create one new `virtualenv` and activate it:
```bash
$ virtualenv venv
$ source venv/bin/activate
```
Install the dependencies for development and test enviroments:
```bash
# If you just want to install the needed dependencies for production, just run: `make init`
$ make init_dev
> pip install -r development_requirements.txt
> pip install -r test_requirements.txt
> pip install -r requirements.txt
```
Run the tests:
```bash
$ make test
> pytest tests --cov-config .coveragerc --cov=esperanto_analyzer --cov-report=html
> =============================================================================== test session starts ================================================================================
> platform darwin -- Python 3.7.0, pytest-3.7.4, py-1.6.0, pluggy-0.7.1
> rootdir: /(...)/esperanto_analyzer, inifile:
> plugins: cov-2.5.1
> collected 492 items
> (...)
> ====================================================================== 492 passed, 2 warnings in 2.61 seconds ======================================================================
```
You can follow the code coverage stats opening: `coverage/index.html`
### OBS: This library has **100%** code coverage at the time of this writing!
---
### Built-in JSON Web API
**_Note: This web API will be published as a separated package in a near future._**
This library cames with a very simple HTTP Server built on top of Flask to provide an WEB API interface for integration with others systems. You can run the HTTP server running the following make task in the root folder of the project:
```bash
$ make web_api # or simply running: python web/runserver.py
> python esperanto_analyzer/web/runserver.py
> * Serving Flask app "esperanto_analyzer.web.api.server" (lazy loading)
> * Environment: production
> WARNING: Do not use the development server in a production environment.
> Use a production WSGI server instead.
> * Debug mode: on
> * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
```
Or you can just run it from inside any python project with:
```py
from esperanto_analyzer.web import run_app
run_app(debug=True, port=9090)
# * Serving Flask app "esperanto_analyzer.web.api.server" (lazy loading)
# * Environment: production
# WARNING: Do not use the development server in a production environment.
# Use a production WSGI server instead.
# * Debug mode: off
# * Running on http://127.0.0.1:9090/ (Press CTRL+C to quit)
```
This server has auto-reload(or hot-reload) enabled by default, so you don't need to restart the server when you change the source code.
To test it:
```bash
curl http://127.0.0.1:5000/analyze?sentence=Kio%20estas%20Esperanto%3F%20%C4%9Ci%20estas%20lingvo%20tre%20ta%C5%ADga%20por%20internacia%20komunikado.
```
### HTTP API Deploy
If you need an API(like [this one](https://esperanto-analyzer-api.herokuapp.com/analyze?sentence=Esperanto%20estas%20tre%20facila%20lingvo%20al%20lerni)) you can just easily deploy this project to `Heroku` since it comes with a `Procfile` file, this will take no more than 4 commands:
OBS: You will need [Heroku's CLI](https://devcenter.heroku.com/articles/heroku-cli) for this.
```bash
$ git clone https://github.com/fidelisrafael/esperanto-analyzer.git
$ cd esperanto-analyzer
$ heroku create my-esperanto-analyzer
> Creating ⬢ my-analyzer-test... done
$ git push heroku master:master
# Open https://my-esperanto-analyzer.herokuapp.com/analyze?sentence=Kiel%20%vi%fartas
$ heroku open '/analyze?sentence=Kiel%20vi%20fartas?'
```
---
## How it works?
This library can be used in a miriad of ways to analyze Esperanto sentences and words, for a complete reference of the API and all the possibilities you should check the 'Full API' section.
[TODO]
---
- :white_medium_small_square: Create syntactical analyzers
- :white_medium_small_square: Update this Roadmap with more plans
- :white_check_mark: Front-end application. (Done, [check it out](https://fidelisrafael.github.io/esperanto-analyzer-react/))
---
## :thumbsup: Contributing
Bug reports and pull requests are welcome on GitHub at http://github.com/fidelisrafael/esperanto-analyzer. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](contributor-covenant.org) code of conduct.
---
## :memo: License
The library is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).