{"id":20673716,"url":"https://github.com/cpg314/ltapiserv-rs","last_synced_at":"2025-04-19T19:55:45.639Z","repository":{"id":184635088,"uuid":"671980511","full_name":"cpg314/ltapiserv-rs","owner":"cpg314","description":"Server implementation of the LanguageTool API for offline grammar and spell checking, based on nlprule and symspell. And a small graphical command-line client.","archived":false,"fork":false,"pushed_at":"2024-08-18T19:45:15.000Z","size":509,"stargazers_count":17,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T12:35:05.907Z","etag":null,"topics":["api","grammar","languagetool","rust","spellchecking"],"latest_commit_sha":null,"homepage":"https://c.pgdm.ch/code","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cpg314.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-28T15:44:53.000Z","updated_at":"2025-03-15T14:17:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"552461e8-54ae-4fad-b041-986ddd5b76a5","html_url":"https://github.com/cpg314/ltapiserv-rs","commit_stats":null,"previous_names":["cpg314/ltapiserv-rs"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpg314%2Fltapiserv-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpg314%2Fltapiserv-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpg314%2Fltapiserv-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cpg314%2Fltapiserv-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cpg314","download_url":"https://codeload.github.com/cpg314/ltapiserv-rs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249787037,"owners_count":21325569,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","grammar","languagetool","rust","spellchecking"],"created_at":"2024-11-16T20:42:20.099Z","updated_at":"2025-04-19T19:55:45.592Z","avatar_url":"https://github.com/cpg314.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"![ltapiserv-rs](doc/logo.webp)\n\nThis provides a **lightweight alternative backend implementation** of the LanguageTool API for **offline grammar and spell checking**, based on:\n\n- [nlprule](https://github.com/bminixhofer/nlprule) for grammar and style checking, using the [LanguageTool rules](https://github.com/languagetool-org/languagetool).\n- [symspell](https://github.com/reneklacan/symspell) for spell-checking.\n\nA simple command-line client, `ltapi-client`, is also provided, displaying results graphically with [miette](https://docs.rs/miette/latest/miette/).\n\n## Quick start\n\n1. [Install the server](#installation), which is a single binary with no dependencies, which can be run as a systemd service. Debian/Ubuntu and Arch packages are provided. Alternatively, a Docker image is available.\n2. [Configure your client](#usage-clients) (browser, editor, `ltapi-client`, ...) to use the local server.\n\n## Screenshots\n\n![Illustration](doc/illustration.png) \\\n_Using the `ltapiserv-rs` server with the official LanguageTool browser extension._\n\n![Command line interface](doc/client.png)\n_Output from the client CLI_\n\n## Background\n\n## LanguageTool\n\n[LanguageTool](https://languagetool.org/) is an open-source alternative to [Grammarly](https://www.grammarly.com/) for natural language linting (spelling, grammar, style), with a [large set of rules](https://community.languagetool.org/). Multiple clients exist for its API, bringing functionalities to [Firefox](https://addons.mozilla.org/firefox/addon/languagetool/), [Chrome](https://chrome.google.com/webstore/detail/grammar-and-spell-checker/oldceeleldhonbafppcapldpdifcinji?utm_source=lt-homepage\u0026utm_medium=referral), [LibreOffice](https://languagetool.org/libre-office), [Thunderbird](https://languagetool.org/thunderbird), [emacs](https://github.com/emacs-languagetool/flycheck-languagetool), and many more.\n\n### Self-hosting LanguageTool\n\nWhile most users access LanguageTool through the official hosted server (with a free or paid plan), the Java API server can be [hosted locally](https://dev.languagetool.org/http-server), which can be particularly desirable for privacy reasons (e.g. when editing confidential documents).\n\nEven though the browser extensions are unfortunately [closed-source](https://forum.languagetool.org/t/license-and-source-code-for-firefox-add-on/3851), they still allow custom servers to be specified.\n\n### Lightweight LanguageTool API server\n\n[Benjamin Minixhofer](https://bmin.ai/) wrote a Rust crate, [`nlprule`](https://github.com/bminixhofer/nlprule), that is able to parse and then apply LanguageTool rules noticeably faster than the original Java implementation (see [this benchmark](https://github.com/bminixhofer/nlprule)). More complex rules written in Java are not supported and spellchecking is not implemented, but nevertheless roughly 85% of the LanguageTool grammar rules (as of 2021) are available.\n\nUsing `nlprule` and [`symspell`](https://crates.io/crates/symspell) (for spell-checking), we can implement a simple LanguageTool API server in Rust that can then be called from a variety of contexts using LanguageTool clients.\n\nThe code and binaries can be found on \u003ci class=\"bi bi-github\"\u003e\u003c/i\u003e \u003chttps://github.com/cpg314/ltapiserv-rs\u003e.  \nSee the `README` there for the configuration as a `systemd` service as well as the setup of the clients.\n\n### Comparison with the Java server\n\nRunning [H.G. Wells' War of the Worlds](https://en.wikipedia.org/wiki/The_War_of_the_Worlds) (~6k lines and 62k words) through the two servers, using [hyperfine](https://github.com/sharkdp/hyperfine) and [httpie](https://httpie.io/docs/cli), we get:\n\n```console\n$ docker pull erikvl87/languagetool\n$ docker run --rm -p 8010:8010 erikvl87/languagetool\n  http://localhost:{port}/v2/check language=en-us text=@wells.txt'\n$ for port in 8875 8010; do http --form  POST http://localhost:$port/v2/check \\\n                            language=en-us text=@wells.txt | jq \".matches|length\"; done\n1490\n1045\n$ hyperfine -L port 8875,8010 --runs 10 'http --ignore-stdin --meta --form  POST \\\n                   http://localhost:{port}/v2/check language=en-us text=@wells.txt'\n```\n\nThe additional false positives in `ltapiserv-rs` seem to come mostly from the spell-checking.\n\n| Command        |       Mean [s] | Min [s] | Max [s] |    Relative |\n| :------------- | -------------: | ------: | ------: | ----------: |\n| `ltapiserv-rs` | 16.002 ± 0.629 |  15.566 |  17.745 |        1.00 |\n| `java`         | 30.594 ± 2.372 |  29.569 |  37.296 | 1.91 ± 0.17 |\n\nWith only a paragraph (to simulate something close to the normal use of LanguageTool, say in emails):\n\n| Command        |   Mean [ms] | Min [ms] | Max [ms] | Relative |\n| :------------- | ----------: | -------: | -------: | -------: |\n| `ltapiserv-rs` | 379.7 ± 9.3 |    362.6 |    393.4 |     1.00 |\n\n## Installation\n\nAny of the following methods will make a server available at http://localhost:8875\n\nBy default, the custom dictionary is located in `~/.local/share/ltapiserv-rs/dictionary.txt`. A different path can be passed via the `--dictionary` option. The contents are automatically reloaded on file change.\n\n### Docker\n\n```console\n$ docker run -d --name ltapiserv-rs -p 8875:8875 -v ~/.local/share/ltapiserv-rs:/data ghcr.io/cpg314/ltapiserv-rs:0.2.2\n$ docker logs -f ltapiserv-rs\n```\n\n### Debian/Ubuntu and Arch packages\n\nPackages are available from the [releases page](https://github.com/cpg314/ltapiserv-rs/releases), containing the server and `ltapi-client`. They will install a `systemd` service definition in `/usr/lib/systemd/user/ltapiserv-rs.service`, which can be enabled with:\n\n```console\n$ systemctl --user enable --now ltapiserv-rs\n$ # Check status\n$ systemctl --user status ltapiserv-rs\n$ # Check logs\n$ journalctl --user -u ltapiserv-rs -f\n$ # After updating:\n$ systemctl --user restart ltapiserv-rs\n```\n\n### tar.gz archive\n\nFor other distributions, standalone binaries are also available from the [releases page](https://github.com/cpg314/ltapiserv-rs/releases).\n\n```console\n$ sudo cp ltapiserv-rs /usr/local/bin\n$ sudo chmod +x /usr/local/bin/ltapiserv-rs\n$ ln -s $(pwd)/ltapiserv-rs.service ~/.config/systemd/user/ltapiserv-rs.service\n$ systemctl --user daemon-reload \u0026\u0026 systemctl --user enable --now ltapiserv-rs\n$ systemctl --user status ltapiserv-rs\n$ # After updating:\n$ systemctl --user restart ltapiserv-rs\n```\n\nSee the above remark about the custom dictionary.\n\n## Usage / clients\n\nThe following clients have been tested. The server should be compatible with others, but there might be idiosyncrasies; don't hesitate to send a PR.\n\n### Browser extensions\n\nInstall the official LanguageTool browser extension (e.g. for [Chrome](https://languagetool.org/chrome) or [Firefox](https://languagetool.org/firefox)) and configure it to use your local server:\n\n![Chrome extension settings](doc/chrome_ext.png)\n\n### Command line client\n\nA command line client, `ltapi-client`, is also included in this project.\n\n```\nRun text through a LanguageTool server and display the results\n\nUsage: ltapi-client [OPTIONS] --server \u003cSERVER\u003e [FILENAME]\n\nArguments:\n  [FILENAME]  Filename; if not provided, will read from stdin\n\nOptions:\n  -l, --language \u003cLANGUAGE\u003e        [default: en-US]\n  -s, --server \u003cSERVER\u003e            Server base URL [env: LTAPI_SERVER=http://localhost:8875]\n      --json                       JSON output\n      --suggestions \u003cSUGGESTIONS\u003e  Number of suggestions to display [default: 3]\n      --pandoc                     Convert to plaintext with pandoc, removing code blocks. Line numbers are not preserved.\n  -h, --help                       Print help\n```\n\n- The return code will be `1` if any error is detected. The server address can be configured through the `LTAPI_SERVER` environment variable.\n- If `pandoc` is installed, the client can use it to convert input files into plain text.\n\nThe client uses [miette](https://docs.rs/miette/latest/miette/index.html) to get a nice graphical reporting of the errors:\n\n![Command line interface](doc/client.png)\n\n#### Example usage\n\n```console\n$ export LTAPI_SERVER=http://localhost:8875\n$ cat text.txt | ltapi-client\n$ ltapi-client test.txt\n$ ltapi-client --pandoc test.md\n```\n\n### flycheck-languagetool (emacs)\n\nSee \u003chttps://github.com/emacs-languagetool/flycheck-languagetool\u003e\n\n```emacs-lisp\n(use-package flycheck-languagetool\n  :ensure t\n  :hook ((text-mode gfm-mode markdown-mode) . flycheck-languagetool-setup)\n  :init\n  (setq flycheck-languagetool-url \"http://127.0.0.1:8875\")\n  :custom\n  (flycheck-languagetool-active-modes '(text-mode gfm-mode markdown-mode))\n  )\n```\n\n### ltex-ls (language server protocol for markup)\n\nSee \u003chttps://github.com/valentjn/ltex-ls\u003e.\n\nThis currently requires [this patch](https://github.com/valentjn/ltex-ls/pull/276) to send the proper content type in the requests (this also could be done in `ltapiserv-rs` with an axum middleware to edit the content type).\n\nUse the `ltex.languageToolHttpServerUri` variable to set the URL, e.g. with [lsp-ltex](https://github.com/emacs-languagetool/lsp-ltex) in emacs:\n\n```emacs-lisp\n(use-package lsp-ltex\n  :ensure t\n  :hook (text-mode . (lambda ()\n                       (require 'lsp-ltex)\n                       (lsp)))  ; or lsp-deferred\n  :init\n  (setq lsp-ltex-version \"16.0.0\"\n        lsp-ltex-languagetool-http-server-uri \"http://localhost:8875\"\n        )\n)\n```\n\n### Tools based on `languagetools-rust`\n\nUnfortunately, tools such as [cargo-languagetool](https://github.com/rnbguy/cargo-languagetool/) and [languagetool-code-comments](https://github.com/dustinblackman/languagetool-code-comments), based on the [languagetool-rust](https://github.com/jeertmans/languagetool-rust) client, are for now not compatible with this server. There are two reasons:\n\n- The queries are sent as URL parameters rather than as form data. Even though the former matches the [API specifications](https://languagetool.org/http-api/swagger-ui/#!/default/post_check), the latter is also supported by the official server.\n- The client expect all fields to be contained in the response, while we only send a subset.\n\nThe solution to the first issue is simple (support query parameters depending on the `Content-Type` header and/or request contents). For the second, one should either expand the messages defined here, or replace them by the `languagetools-rust` ones (which might cause serialization issues in the `form-data` path), or add conversions.\n\n## Implementation details\n\n### API endpoint\n\nThe LanguageTool API is documented [here](https://languagetool.org/http-api/swagger-ui/#!/default/post_check). It suffices to implement the HTTP POST `/v2/check` endpoint that processes\n\n```rust\npub struct Request {\n    text: Option\u003cString\u003e,\n    data: Option\u003cString\u003e,\n    language: String,\n}\n```\n\nand returns\n\n```rust\npub struct Response {\n    pub matches: Vec\u003cMatch\u003e,\n    pub language: LanguageResponse,\n}\npub struct Match {\n    pub message: String,\n    pub short_message: String,\n    pub offset: usize,\n    pub length: usize,\n    pub replacements: Vec\u003cReplacement\u003e,\n    pub sentence: String,\n    pub context_for_sure_match: usize,\n    pub ignore_for_incomplete_sentence: bool,\n    pub r#type: MatchType,\n    pub rule: Rule,\n}\n\n```\n\nThe most important fields in `Response` are `offset`, `length` (defining the span of the suggestion), `message`, `replacements`, and `Rule`.\n\nThere are a couple of small tricks required to get the closed-source browser extensions to behave as expected, e.g. in displaying grammar and spelling errors with the right colours and showing tooltips.\n\n![LanguageTool in the browser](doc/screenshot1.png)\n\n![LanguageTool in the browser](doc/screenshot2.png)\n\n### Grammar, spelling, and repetition checkers\n\nThe main functionality, returning suggestions based on an input text, can be reduced to the following method:\n\n```rust\npub fn suggest(\u0026self, text: \u0026str) -\u003e Vec\u003capi::Match\u003e {\n    let mut suggestions = Vec::new();\n    for sentence in self.tokenizer.pipe(text) {\n        debug!(\"Processing sentence {:#?}\", sentence);\n        // Grammar suggestions from nlprule\n        suggestions\n            .extend(self.rules.apply(\u0026sentence).into_iter().map(Match::from));\n        // Spelling and repetitions, processing the sentence token by token.\n        let tokens = sentence.tokens();\n        for (i, token) in tokens.iter().enumerate() {\n            // ...\n        }\n    }\n    suggestions\n}\n```\n\nThe `Match::from` method performs conversion between an [`nlprule::Suggestion`](https://docs.rs/nlprule/0.6.4/nlprule/types/struct.Suggestion.html) to a `Match`, essentially copying over the span and the message.\n\nThe `nlprule` crate does not yet support [spell checking](https://github.com/bminixhofer/nlprule/issues/2), but we can add a basic version using the [`symspell`](https://crates.io/crates/symspell) crate and leveraging the tokenization we already have from `nlprule`. Similarly, the tokenization allows us to implement a word repetition rule that did not seem present in `nlprule`.\n\n## Compile from source\n\nBinaries can also be built from source as follows:\n\n```console\n$ cargo make build\n```\n\n## Future work\n\n- LanguageTool (even the original implementation with all rules) seems to be failing to identify more subtle grammatical errors:\n\n  \u003e \"No one would have believed in the last years of the nineteenth century that this world were being watched keenly and closely by intelligences greater than man's\"\n\n  \u003e \"With infinite complacency men went to and fro over this globe about his little affairs, serene in their assurance of their empire over matter.\"\n\n  It would be interesting to understand what the state of the art is (under a fast processing constraint).\n\n- Support more languages. German is already supported in `nlprule`, but adding more languages is actually non-trivial because of language-specific assumptions, see [this issue](https://github.com/bminixhofer/nlprule/issues/46) and [this one](https://github.com/bminixhofer/nlprule/issues/14).\n- Support addition and deletion of words to the dictionary. This is pretty simple and corresponds to the `/words/add` and `/words/delete` API endpoints. However, the browser extension seems to store the dictionary locally, unless one logs in to LanguageTool Premium.\n- Reduce the number of false positives of the spellchecker.\n- Expand tests\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcpg314%2Fltapiserv-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcpg314%2Fltapiserv-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcpg314%2Fltapiserv-rs/lists"}