https://github.com/mideind/yfirlestur
The yfirlestur.is web application.
https://github.com/mideind/yfirlestur
grammar greynir icelandic icelandic-language natural-language-processing nlp spelling web
Last synced: 6 months ago
JSON representation
The yfirlestur.is web application.
- Host: GitHub
- URL: https://github.com/mideind/yfirlestur
- Owner: mideind
- License: other
- Created: 2020-03-09T17:29:28.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-04-03T16:12:49.000Z (7 months ago)
- Last Synced: 2025-04-03T17:26:22.113Z (7 months ago)
- Topics: grammar, greynir, icelandic, icelandic-language, natural-language-processing, nlp, spelling, web
- Language: Python
- Homepage:
- Size: 1.37 MB
- Stars: 6
- Watchers: 8
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/release/python-380/)
[]()
# Yfirlestur
### Spelling and grammar correction for Icelandic
*Yfirlestur.is* is a web application where you can enter or submit
Icelandic text and have it checked for spelling and grammar errors.The tool also gives hints on words and structures that might not be appropriate,
depending on the intended audience for the text.Try Yfirlestur (in Icelandic) at [https://yfirlestur.is](https://yfirlestur.is)!
*Text with annotations, as displayed by Yfirlestur.is*
The core spelling and grammar checking functionality of Yfirlestur.is is provided by the
[GreynirCorrect](https://github.com/mideind/GreynirCorrect) engine, by the same authors.
User feedback is greatly appreciated, either through GitHub Issues
or by e-mail to [mideind@mideind.is](mailto:mideind@mideind.is).## HTTPS API
In addition to its graphical web front-end, Yfirlestur.is exposes a public
HTTPS/JSON application programming interface (API) to perform spelling and grammar
checking.### From the command line
This API can for example by accessed by `curl` from the Linux/MacOS command line
as follows (try it!):```bash
curl https://yfirlestur.is/correct.api -d "text=Manninum á verkstæðinu vantar hamar"
```...or, of course, via a HTTPS `POST` from your own code; see below.
All text is assumed to be coded in UTF-8.
The example returns the following JSON (shown indented, for ease of reading):
```json
{
"result": [
[
{
"annotations": [
{
"code":"P_WRONG_CASE_þgf_þf",
"detail":"Sögnin 'að vanta' er ópersónuleg. Frumlag hennar á að vera í þolfalli í stað þágufalls.",
"start":0,
"end":2,
"start_char":0,
"end_char":21,
"suggest":"Manninn á verkstæðinu",
"text":"Á líklega að vera 'Manninn á verkstæðinu'"
}
],
"corrected":"Manninum á verkstæðinu vantar hamar",
"tokens": [
{"k":6,"x":"Manninum"},
{"k":6,"x":"á"},
{"k":6,"x":"verkstæðinu"},
{"k":6,"x":"vantar"},
{"k":6,"x":"hamar"}
]
}
]
],
"stats":
{
"ambiguity":1.0,
"num_parsed":1,
"num_sentences":1,
"num_tokens":5
},
"text":"Manninum á verkstæðinu vantar hamar",
"valid":true
}
```The `result` field contains the result of the annotation, as a list of paragraphs,
each containing a list of sentences, each containing a list of annotations (under
the `annotations` field). Of course, if a sentence is correct and has no annotations,
its annotation list will be empty. An overview of error codes used in annotations is available [here](https://github.com/mideind/GreynirCorrect/blob/master/doc/errorcodes.rst).Each sentence entry has a field containing a `corrected` version of it, where
likely errors have been corrected. The `corrected` string includes corrections
of most spelling errors but only a subset of suspected grammar errors;
the system is intentionally less aggressive about automatically applying those
(as can be seen in the example above).Sentence entries also contain a list of `tokens`. The tokens
originate in the [Tokenizer package](https://github.com/mideind/Tokenizer) and contain the following fields:`i`: Character index of token start.
`k`: Number identifying the token type (WORD, DATEREL, AMOUNT, etc.). The mapping from numbers to token types can be found in the documentation for the [Tokenizer package](https://github.com/mideind/Tokenizer).
`o`: Original token text.
`x`: Corrected text of token.Other possible fields:
`s`: Lemma of word. It can contain '-' if the lemma does not appear in BÍN and the word has been identified as a compound word.
`c`: Part-of-speech (kk/kvk/hk, so, lo, ao, fs, st, etc.).
`b`: Inflectional form given in BÍN. Can be '-' if the word cannot be inflected.
`t`: Terminal that the token is connected to in the CFG.
`v`: Token value (if applicable). Number, amount, date or name of currency.
`f`: BÍN category (alm, ism, fyr, örn, etc.).Each annotation applies to a span of sentence tokens, starting
at the token whose index is
given in `start` and ending with the token whose index is
in `end`. Both indices are 0-based
and inclusive. Also, a starting character index is found
in `start_char` and an ending index in `end_char`. Again,
both are 0-based and inclusive. Note that these are character
indices within the original source string, not byte indices.An annotation has a `code` which uniquely determines the type
of error or warning. If the code ends with `/w`, it is a warning, otherwise
it is an error.An annotation has a short, human-readable `text` field which describes
the annotation succintly, as well as a `detail` field which has further detail
on the annotation, possibly containing grammatical explanations.Finally, some annotations contain a `suggest` field with text that could
replace the text within the token span, if the user agrees with
the suggestion being made.The result JSON further includes a `stats` field with information about
the annotation job, such as the number of tokens and sentences processed,
and how many of those sentences could be parsed. The `valid` field is
`true` if the request was correctly formatted and could be processed
without error, or `false` if there was a problem.#### Options
The `/correct.api` endpoint supports several options that can be included
with the request data, either as additional form fields (for `x-www-form-urlencoded`
requests) or JSON properties (for `application/json` requests).| Key | Type | Default | Explanation
| ----------------------------- | ---- | ------- | ------------------------------
| annotate\_unparsed\_sentences | bool | true | Annotate sentence even when parsing fails
| suppress_suggestions | bool | false | Don't return suggestions
| ignore_wordlist | list | [] | Words to accept without comment
| ignore_rules | list | [] | Rules to ignore when annotatingAs an example, to suppress suggestions:
```bash
curl https://yfirlestur.is/correct.api -d "text=Manninum á verkstæðinu vantar hamar&suppress_suggestions=true"
```### From Python
As an example of accessing the Yfirlestur API from Python, here is
a short demo program which submits two paragraphs of text to the
spelling and grammar checker:```python
# $ pip install requests
import requests
import json# The text to check, two paragraphs of two and one sentences, respectively
my_text = (
"Manninum á verkstæðinu vanntar hamar. Guðjón setti kókið í kælir.\n"
"Mér dreimdi stórann brauðhleyf."
)# Make the POST request, submitting the text
# Include additional keys in the dict if you want to specify options,
# such as dict(text=mytext, suppress_suggestions=True)
rq = requests.post("https://yfirlestur.is/correct.api", data=dict(text=my_text))# Retrieve the JSON response
resp = rq.json()# Enumerate through the returned paragraphs, sentences and annotations
for ix, pg in enumerate(resp["result"]):
print(f"\n{ix+1}. efnisgrein")
for sent in pg:
print(f" {sent['corrected']}")
for ann in sent["annotations"]:
print(
f" {ann['start']:03} {ann['end']:03} "
f"{ann['code']:20} {ann['text']}"
)
```This program prints the following output:
```bash
$ python test.py1. efnisgrein
Manninum á verkstæðinu vantar hamar.
000 002 P_WRONG_CASE_þgf_þf Á líklega að vera 'Manninn á verkstæðinu'
003 003 S004 Orðið 'vanntar' var leiðrétt í 'vantar'
Guðjón setti kókið í kælir.
004 004 P_NT_EndingIR Á sennilega að vera 'kæli'2. efnisgrein
Mér dreymdi stóran brauðhleif.
000 000 P_WRONG_CASE_þgf_þf Á líklega að vera 'Mig'
001 001 S004 Orðið 'dreimdi' var leiðrétt í 'dreymdi'
002 002 S001 Orðið 'stórann' var leiðrétt í 'stóran'
003 003 S004 Orðið 'brauðhleyf' var leiðrétt í 'brauðhleif'
```The open source *GreynirCorrect* engine that powers Yfirlestur.is
is further [documented here](https://yfirlestur.is/doc/).## Running for development
The service can be packaged and started in development mode using
[Docker](https://www.docker.com). Run the following commands to start the service
and expose it via HTTP on port 5002:```bash
# Set internal Gunicorn (WSGI web server) user and password
if [ ! -f "./gunicorn_user.txt" ]; then
echo 'root' > gunicorn_user.txt
echo 'root' >> gunicorn_user.txt
fidocker build -t yfirlestur:latest .
docker run -it -p 5002:5002 yfirlestur
```For production use, the Docker module should be packaged inside a robust server
such as [nginx](https://www.nginx.com), and the [Gunicorn](https://gunicorn.org)
user should be configured appropriately.## Acknowledgements
Parts of this software were developed under the auspices of the
Icelandic Government's 5-year Language Technology Programme for Icelandic,
managed by Almannarómur. The LT Programme is described
[here](https://www.stjornarradid.is/lisalib/getfile.aspx?itemid=56f6368e-54f0-11e7-941a-005056bc530c)
(English version [here](https://clarin.is/media/uploads/mlt-en.pdf)).## Copyright and licensing
Yfirlestur.is is Copyright © 2023 [Miðeind ehf.](https://mideind.is)
The original author of this software is *Vilhjálmur Þorsteinsson*.This software is licensed under the **MIT License**:
*Permission is hereby granted, free of charge, to any person*
*obtaining a copy of this software and associated documentation*
*files (the "Software"), to deal in the Software without restriction,*
*including without limitation the rights to use, copy, modify, merge,*
*publish, distribute, sublicense, and/or sell copies of the Software,*
*and to permit persons to whom the Software is furnished to do so,*
*subject to the following conditions:***The above copyright notice and this permission notice shall be**
**included in all copies or substantial portions of the Software.***THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,*
*EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF*
*MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.*
*IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY*
*CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,*
*TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE*
*SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.*If you would like to use this software in ways that are incompatible
with the standard MIT license, [contact Miðeind ehf.](mailto:mideind@mideind.is)
to negotiate custom arrangements.