{"id":16715451,"url":"https://github.com/philippeitis/nlp_specifier","last_synced_at":"2026-04-17T06:34:10.322Z","repository":{"id":111799545,"uuid":"379139594","full_name":"philippeitis/nlp_specifier","owner":"philippeitis","description":"Formal verification for natural language software documentation","archived":false,"fork":false,"pushed_at":"2022-11-15T07:33:52.000Z","size":924,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-15T08:48:33.889Z","etag":null,"topics":["natural-language-processing","nlp","spacy"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philippeitis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-22T04:19:29.000Z","updated_at":"2022-11-18T21:41:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"bf7b8d32-17b6-4b5a-bfc6-4e0a5ddfe0a7","html_url":"https://github.com/philippeitis/nlp_specifier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/philippeitis/nlp_specifier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philippeitis%2Fnlp_specifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philippeitis%2Fnlp_specifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philippeitis%2Fnlp_specifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philippeitis%2Fnlp_specifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philippeitis","download_url":"https://codeload.github.com/philippeitis/nlp_specifier/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philippeitis%2Fnlp_specifier/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267361535,"owners_count":24074950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-27T02:00:11.917Z","response_time":82,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["natural-language-processing","nlp","spacy"],"created_at":"2024-10-12T21:09:26.218Z","updated_at":"2026-04-17T06:34:05.280Z","avatar_url":"https://github.com/philippeitis.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\nFormal verification allows users to verify properties of program execution - for instance, they can verify properties of a function's output, or that a program does not crash. This is useful for a variety of purposes - from formally verifying the behaviour of cyptography libraries to ensuring that applications used for sensitive medical application do not crash.\n\nNLPSpecifier is an application that allows you to produce formal specifications from software documentation. These formal specifications can then be used by Prusti to verify the behaviour of your source code. In particular, NLPSpecifier can parse documentation embedded within Rust documentation pages (produced through `cargo doc`, or downloaded from `rustup`) and Rust source code.\n\n# Installation\nPython 3.6+ and Rust should already be installed for the parsing steps. To perform verification of output specifcations, Prusti should also be installed.\nDependencies for individual components of the system are specified below, or are otherwise manually installed using provided `setup.sh` scripts.\nNote that all .sh files and commands provided are specific to Linux. \n\nAlso note that to avoid contaminating your Python installation, it is best to use `venv`.\nThe main functionality of this application is available through [src/doc_parser/](src/doc_parser/). To build this executable for your system, use \n```bash\ncd ./src/doc_parser/ \u0026\u0026 cargo build --release ; cd ..\n```\n# Usage\nOnce installation is complete, this project can be used through `doc_parser`. To run the program, use `./doc_parser`, or `cargo run --release` at the \nappropriate locations. Run the following command to see a list of all possible commands.\n```console\nfoo@bar:~$ ./doc_parser -h\ndoc_parser\n\nUSAGE:\n    doc_parser \u003cSUBCOMMAND\u003e\n\nFLAGS:\n    -h, --help       Print help information\n    -V, --version    Print version information\n\nSUBCOMMANDS:\n    end-to-end    Demonstrates entire pipeline from start to end on provided file, writing\n                  output to terminal\n    help          Print this message or the help of the given subcommand(s)\n    render        Visualization of various components in the system's pipeline\n    specify       Creates specifications for a variety of sources\n```\n\nTo see more specific help, do the following:\n```console\nfoo@bar:~$ ./doc_parser end-to-end --help\ndoc_parser-end-to-end \n\nDemonstrates entire pipeline from start to end on provided file, writing output to terminal\n\nUSAGE:\n    doc_parser end-to-end [OPTIONS]\n\nFLAGS:\n    -h, --help       Print help information\n    -V, --version    Print version information\n\nOPTIONS:\n    -p, --path \u003cPATH\u003e    Source file to specify [default: ../../data/test3.rs]\n```\n\n## NLP Processing\nThe NLP parsing code tokenizes, assigns parts of speech tags to tokens, and detects named entities using spaCy. To set up the dependencies, run [src/setup.sh](src/setup.sh):\n```bash\ncd ./src/ \u0026\u0026 sudo chmod +x ./setup.sh \u0026\u0026 ./setup.sh \u0026\u0026 cd .\n```\n\nIf you also need to use SRL and NER capabilities, run [jml_nlp/setup.sh](jml_nlp/setup.sh):\n```bash\ncd ./jmlnlp/ \u0026\u0026 sudo chmod +x ./setup.sh \u0026\u0026 ./setup.sh \u0026\u0026 cd .\n```\n\nThen, to launch the base services, which include a web interface and a REST API used in doc_parser for NLP:\n```bash\nsudo docker-compose --env-file nlp.env up -d --build\n```\n\nAnd using `--profile jml_nlp` to also launch the SRL and NER services:\n```bash\nsudo docker-compose --env-file --profile jml_nlp nlp.env up -d --build\n```\n\nAlternatively, it is possible to run the base services directly on your computer, without using docker.\n\n## Python Implementation\nThe python implementation is available at https://github.com/philippeitis/nlp_specifier/tree/62d4d51a30c173f65daaf631b7acca0ffbf572a3\n\n### WordNet POS tags reference\nThis link provides a useful reference for the POS tags generated by spaCy:\nhttp://erwinkomen.ruhosting.nl/eng/2014_Longdale-Labels.htm\n\n## Named-entity Recognition and Semantic Role Labelling\n### Requirements\nTo use NER and SRL analysis for documentation, Docker and Docker Compose must be installed. Additionally, downloading the relevant models requires installing Git and Git LFS. All other dependencies for this are set up using [jml_nlp/setup.sh](jml_nlp/setup.sh).\n```bash\ncd ./jml_nlp/ \u0026\u0026 sudo chmod +x ./setup.sh \u0026\u0026 ./setup.sh \u0026\u0026 cd .\n```\nAfter running this script, the SRL service will be available at 127.0.0.8:701, and the NER service will be available at 127.0.0.8:702.\n[src/nlp/ner.py](src/nlp/ner.py) provides functions for annotating text using these services. The Tokenizer class in [src/nlp/tokenizer.py](doc_parser/doc_parser.py) transforms these annotations to a format that can be rendered by spaCy's displaCy tool.\n\nThe NER and SRL models are sourced from `Combining formal and machine learning techniques for the generation of JML specifications`.\n\n## Major TODOs:\n- Build server interface for NLP\n- Tree formatting\n- Convert codegen to Rust build.rs (to ensure that we don't introduce inconsistencies when parsing grammar)\n- Build synonym sets for this problem domain\n- Train model which correctly tags VB cases (currently tagged as \"NN\")\n- Build pseudo-compiler to iteratively resolve specification\n\n- Detect fn purity\n- Build tool to clean up resulting specifications\n- Detect vacuously true specifications\n- Quality of specification?\n- Desugar for loops?\n\n## Plan\n1. Straight up search using current methods, and plug in symbols\n- No slotting at all (but methods / struct fields should be attached as .x, while lone functions are wrapping)\n\n## Examples of unaccepted (maybe should be accepted?)\n- Replaces first N matches of a pattern with another string\n- `::ffff:a.b.c.d` becomes `a.b.c.d`\n- The elements are passed in opposite order from their order in the slice , so if `same_bucket(a, b)` returns `true` , `a` is moved at the end of the slice\n- Returns the capacity this `OsString` can hold without reallocating\n- Note that the capacity of `self` does not change\n- Returns `true` if the associated `Once` was poisoned prior to the invocation of the closure passed to `Once::call_once_force()`\n\n## Examples of Unverifiable spec\n- `replace_with` does not need to be the same length as `range`\n- This method is equivalent to `CString::new` except that no runtime assertion is made that `v` contains no 0 bytes , and it requires an actual byte vector , not anything that can be converted to one with Into\n- See `send` for notes about guarantees of whether the receiver has received the data or not if this function is successful\n\n\n## Options\n- Find matches which do not include the sentence (this approach does not handle interior skipwords)\n- Preprocess sentence before tree - detect and remove common fillers (already done in part with SPACE)\n- Process fragments, have grammar for interior fragments (eg. \"otherwise x is true\")\n- Allow chaining (requires Context)\n- Add error messages when parsing (eg. \"You typed '`self.capacity is unchanged', missing 'must' or 'will' to specify that this is a pre/post condition\")\n\nAllow using \"but\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilippeitis%2Fnlp_specifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilippeitis%2Fnlp_specifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilippeitis%2Fnlp_specifier/lists"}