{"id":13469396,"url":"https://github.com/bminixhofer/nlprule","last_synced_at":"2025-05-15T16:05:58.803Z","repository":{"id":37846912,"uuid":"280904831","full_name":"bminixhofer/nlprule","owner":"bminixhofer","description":"A fast, low-resource Natural Language Processing and Text Correction library written in Rust.","archived":false,"fork":false,"pushed_at":"2023-05-23T00:44:59.000Z","size":920,"stargazers_count":624,"open_issues_count":27,"forks_count":39,"subscribers_count":14,"default_branch":"main","last_synced_at":"2025-05-08T20:57:54.393Z","etag":null,"topics":["grammar","grammatical-error-correction","machine-learning","natural-language-processing","nlp","proofreading","rust","spellcheck","style-checker"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bminixhofer.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-07-19T16:24:54.000Z","updated_at":"2025-05-03T19:28:56.000Z","dependencies_parsed_at":"2024-01-18T20:05:32.554Z","dependency_job_id":null,"html_url":"https://github.com/bminixhofer/nlprule","commit_stats":{"total_commits":355,"total_committers":8,"mean_commits":44.375,"dds":0.08169014084507042,"last_synced_commit":"ebb54617ea5f3378c0dc63486631724b7e97e13c"},"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bminixhofer%2Fnlprule","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bminixhofer%2Fnlprule/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bminixhofer%2Fnlprule/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bminixhofer%2Fnlprule/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bminixhofer","download_url":"https://codeload.github.com/bminixhofer/nlprule/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254374461,"owners_count":22060611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["grammar","grammatical-error-correction","machine-learning","natural-language-processing","nlp","proofreading","rust","spellcheck","style-checker"],"created_at":"2024-07-31T15:01:37.588Z","updated_at":"2025-05-15T16:05:58.786Z","avatar_url":"https://github.com/bminixhofer.png","language":"Rust","funding_links":[],"categories":["Rust","Text Processing"],"sub_categories":[],"readme":"\u003ch1 align='center'\u003e\n  nlprule\n\u003c/h1\u003e\n\n\u003cp align='center'\u003e\n    \u003ca href=\"https://pypi.org/project/nlprule\"\u003e\n        \u003cimg src=\"https://img.shields.io/pypi/v/nlprule\" alt=\"PyPI\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://crates.io/crates/nlprule\"\u003e\n        \u003cimg src=\"https://img.shields.io/crates/v/nlprule\" alt=\"Crates.io\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://docs.rs/nlprule\"\u003e\n        \u003cimg src=\"https://docs.rs/nlprule/badge.svg\" alt=\"Docs.rs\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pepy.tech/project/nlprule\"\u003e\n        \u003cimg src=\"https://pepy.tech/badge/nlprule/month\" alt=\"PyPI Downloads\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"\"\u003e\n        \u003cimg src=\"https://img.shields.io/crates/l/nlprule\" alt=\"License\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\nA fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based approach to NLP using resources from [LanguageTool](https://github.com/languagetool-org/languagetool).\n\n\u003cdetails\u003e\n  \u003csummary\u003ePython Usage\u003c/summary\u003e\n\nInstall: `pip install nlprule`\n\nUse:\n```python\nfrom nlprule import Tokenizer, Rules\n\ntokenizer = Tokenizer.load(\"en\")\nrules = Rules.load(\"en\", tokenizer)\n```\n```python\nrules.correct(\"He wants that you send him an email.\")\n# returns: 'He wants you to send him an email.'\n\nrules.correct(\"I can due his homework.\")\n# returns: 'I can do his homework.'\n\nfor s in rules.suggest(\"She was not been here since Monday.\"):\n    print(s.start, s.end, s.replacements, s.source, s.message)\n# prints:\n# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?\n```\n```python\nfor sentence in tokenizer.pipe(\"A brief example is shown.\"):\n    for token in sentence:\n        print(\n            repr(token.text).ljust(10),\n            repr(token.span).ljust(10),\n            repr(token.tags).ljust(24),\n            repr(token.lemmas).ljust(24),\n            repr(token.chunks).ljust(24),\n        )\n# prints:\n# 'A'        (0, 1)     ['DT']                   ['A', 'a']               ['B-NP-singular']       \n# 'brief'    (2, 7)     ['JJ']                   ['brief']                ['I-NP-singular']       \n# 'example'  (8, 15)    ['NN:UN']                ['example']              ['E-NP-singular']       \n# 'is'       (16, 18)   ['VBZ']                  ['be', 'is']             ['B-VP']                \n# 'shown'    (19, 24)   ['VBN']                  ['show', 'shown']        ['I-VP']                \n# '.'        (24, 25)   ['.', 'PCT', 'SENT_END'] ['.']                    ['O']\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eRust Usage\u003c/summary\u003e\n\nRecommended setup:\n\n`Cargo.toml`\n```toml\n[dependencies]\nnlprule = \"\u003cversion\u003e\"\n\n[build-dependencies]\nnlprule-build = \"\u003cversion\u003e\" # must be the same as the nlprule version!\n```\n\n`build.rs`\n```rust\nfn main() -\u003e Result\u003c(), nlprule_build::Error\u003e {\n    println!(\"cargo:rerun-if-changed=build.rs\");\n\n    nlprule_build::BinaryBuilder::new(\n        \u0026[\"en\"],\n        std::env::var(\"OUT_DIR\").expect(\"OUT_DIR is set when build.rs is running\"),\n    )\n    .build()?\n    .validate()\n}\n```\n\n`src/main.rs`\n```rust\nuse nlprule::{Rules, Tokenizer, tokenizer_filename, rules_filename};\n\nfn main() {\n    let mut tokenizer_bytes: \u0026'static [u8] = include_bytes!(concat!(\n        env!(\"OUT_DIR\"),\n        \"/\",\n        tokenizer_filename!(\"en\")\n    ));\n    let mut rules_bytes: \u0026'static [u8] = include_bytes!(concat!(\n        env!(\"OUT_DIR\"),\n        \"/\",\n        rules_filename!(\"en\")\n    ));\n\n    let tokenizer = Tokenizer::from_reader(\u0026mut tokenizer_bytes).expect(\"tokenizer binary is valid\");\n    let rules = Rules::from_reader(\u0026mut rules_bytes).expect(\"rules binary is valid\");\n\n    assert_eq!(\n        rules.correct(\"She was not been here since Monday.\", \u0026tokenizer),\n        String::from(\"She was not here since Monday.\")\n    );\n}\n```\n\n`nlprule` and `nlprule-build` versions are kept in sync.\n\n\u003c/details\u003e\n\n## Main features\n\n- Rule-based Grammatical Error Correction through multiple thousand rules.\n- A text processing pipeline doing sentence segmentation, part-of-speech tagging, lemmatization, chunking and disambiguation.\n- Support for English, German and Spanish.\n- Spellchecking. (*in progress*)\n\n## Goals\n\n- A single place to apply spellchecking and grammatical error correction for a downstream task.\n- Fast, low-resource NLP suited for running:\n    1. as a pre- / postprocessing step for more sophisticated (i. e. ML) approaches.\n    2. in the background of another application with low overhead.\n    3. client-side in the browser via WebAssembly.\n- 100% Rust code and dependencies.\n\n## Comparison to LanguageTool\n\n|         | \\|Disambiguation rules\\|                      | \\|Grammar rules\\| | LT version | nlprule time | LanguageTool time |\n| ------- | --------------------------------------------- | ----------------- | ---------- | ------------ | ----------------- |\n| English | 843 (100%)                                    | 3725 (~ 85%)      | 5.2        | 1            | 1.7 - 2.0         |\n| German  | 486 (100%)                                    | 2970 (~ 90%)      | 5.2        | 1            | 2.4 - 2.8         |\n| Spanish | *Experimental support. Not fully tested yet.* |\n\nSee the [benchmark issue](https://github.com/bminixhofer/nlprule/issues/6) for details.\n\n## Projects using nlprule\n\n- [prosemd](https://github.com/kitten/prosemd-lsp): a proofreading and linting language server for markdown files with VSCode integration.\n- [cargo-spellcheck](https://github.com/drahnr/cargo-spellcheck): a tool to check all your Rust documentation for spelling and grammar mistakes.\n\nPlease submit a PR to add your project!\n\n## Acknowledgements\n\nAll credit for the resources used in nlprule goes to [LanguageTool](https://github.com/languagetool-org/languagetool) who have made a Herculean effort to create high-quality resources for Grammatical Error Correction and broader NLP.\n\n## License\n\nnlprule is licensed under the MIT license or Apache-2.0 license, at your option.\n\nThe nlprule binaries (`*.bin`) are derived from LanguageTool v5.2 and licensed under the LGPLv2.1 license. nlprule statically and dynamically links to these binaries. Under LGPLv2.1 §6(a) this does not have any implications on the license of nlprule itself.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbminixhofer%2Fnlprule","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbminixhofer%2Fnlprule","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbminixhofer%2Fnlprule/lists"}