{"id":13740537,"url":"https://github.com/divvun/divvunspell","last_synced_at":"2025-07-16T04:07:00.010Z","repository":{"id":42124414,"uuid":"87927599","full_name":"divvun/divvunspell","owner":"divvun","description":"Spell checking library for ZHFST/BHFST spellers, with case handling and tokenization support. (Spell checking derived from hfst-ospell)","archived":false,"fork":false,"pushed_at":"2025-05-23T12:48:01.000Z","size":2550,"stargazers_count":14,"open_issues_count":11,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-23T14:19:07.425Z","etag":null,"topics":["box","fst","hfst","rust","spellchecking"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/divvun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"support/accuracy-viewer/.gitignore","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-04-11T11:43:22.000Z","updated_at":"2025-05-06T12:50:15.000Z","dependencies_parsed_at":"2024-05-30T16:08:44.538Z","dependency_job_id":"ea7b7909-fa0a-4b0f-8d81-0ab1c145cf18","html_url":"https://github.com/divvun/divvunspell","commit_stats":{"total_commits":308,"total_committers":14,"mean_commits":22.0,"dds":0.3928571428571429,"last_synced_commit":"557085cf432212421dd912ed12147adb41b26363"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/divvun/divvunspell","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divvun%2Fdivvunspell","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divvun%2Fdivvunspell/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divvun%2Fdivvunspell/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divvun%2Fdivvunspell/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/divvun","download_url":"https://codeload.github.com/divvun/divvunspell/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divvun%2Fdivvunspell/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265480757,"owners_count":23773781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["box","fst","hfst","rust","spellchecking"],"created_at":"2024-08-03T04:00:49.392Z","updated_at":"2025-07-16T04:06:59.966Z","avatar_url":"https://github.com/divvun.png","language":"Rust","readme":"# divvunspell\n\nAn implementation of [hfst-ospell](https://github.com/hfst/hfst-ospell) in Rust, with added features like tokenization, case handling, and parallelisation.\n\n[![CI](https://github.com/divvun/divvunspell/actions/workflows/ci.yml/badge.svg)](https://github.com/divvun/divvunspell/actions/workflows/ci.yml)\n\n## Building and installing commandline tools\n\n```sh\n# For the `divvunspell` binary:\ncargo install divvunspell-bin\n\n# For `thfst-tools` binary (most people can skip this one):\ncargo install thfst-tools\n\n# To build the development version from this source, cd into the relevant directory and:\ncargo install --path .\n```\n\n### Building with `gpt2` support on macOS aarch64\n\n(Skip this if you are not experimenting with gpt2 support. So skip. Now.)\n\nClone this repo then:\n\n```bash\nbrew install libtorch\nLIBTORCH=/opt/homebrew/opt/libtorch cargo build --features gpt2 --bin divvunspell\n```\n\n### No Rust?\n\n```sh\ncurl https://sh.rustup.rs -sSf | sh\nsource $HOME/.cargo/env\nrustup default stable\ncargo build --release\n```\n\n### divvunspell\nUsage:\n\n```sh\nUsage: divvunspell SUBCOMMAND [OPTIONS]\n\nOptional arguments:\n  -h, --help  print help message\n\nAvailable subcommands:\n  suggest   get suggestions for provided input\n  tokenize  print input in word-separated tokenized form\n  predict   predict next words using GPT2 model\n\n$ divvunspell suggest -h\nUsage: divvunspell suggest [OPTIONS]\n\nPositional arguments:\n  inputs                 words to be processed\n\nOptional arguments:\n  -h, --help             print help message\n  -a, --archive ARCHIVE  BHFST or ZHFST archive to be used\n  -S, --always-suggest   always show suggestions even if word is correct\n  -w, --weight WEIGHT    maximum weight limit for suggestions\n  -n, --nbest NBEST      maximum number of results\n  --no-reweighting       disables reweighting algorithm (makes results more like hfst-ospell)\n  --no-recase            disables recasing algorithm (makes results more like hfst-ospell)\n  --json                 output in JSON format\n```\n\nIf you want to debug divvunspell behaviour, simply enable rust's logging\nfeatures by setting `RUST_LOG=trace` on your commandline's environment\nvariables.\n\n### accuracy\n\nBuilding:\n```sh\ncd accuracy/\ncargo install --path .\n```\n\nThe resulting binary `accuracy` is placed in `$HOME/.cargo/bin/`, make sure it is on the path.\n\nUsage:\n\n```\ndivvunspell-accuracy 1.0.0-beta.1\nAccuracy testing for DivvunSpell.\n\nUSAGE:\n    accuracy [OPTIONS] [ARGS]\n\nFLAGS:\n    -h, --help       Prints help information\n    -V, --version    Prints version information\n\nOPTIONS:\n    -c \u003cconfig\u003e             Provide JSON config file to override test defaults\n    -o \u003cJSON-OUTPUT\u003e        The file path for the JSON report output\n    -w \u003cmax-words\u003e          Truncate typos list to max number of words specified\n    -t \u003cTSV-OUTPUT\u003e         The file path for the TSV line append\n\nARGS:\n    \u003cWORDS\u003e    The 'input -\u003e expected' list in tab-delimited value file (TSV)\n    \u003cZHFST\u003e    Use the given ZHFST file\n```\n\n### thfst-tools\n\nConvert hfst and zhfst files to thfst and bhfst formats.\n\n- **thfst**: byte-aligned hfst for fast and efficient loading and memory mapping, required to run `divvunspell` on ARM processors\n- **bhfst**: thfst files wrapped in a [box](https://github.com/bbqsrc/box) container; in the case of zhfst files converted to bhfst, the metadata file (`index.xml` in the zhfst archive) is converted to a json file for faster and leaner processing by the `divvunspell` library.\n\nUsage:\n\n```\nthfst-tools 1.0.0-alpha.5\nTromsø-Helsinki Finite State Transducer toolkit.\n\nUSAGE:\n    thfst-tools \u003cSUBCOMMAND\u003e\n\nFLAGS:\n    -h, --help       Prints help information\n    -V, --version    Prints version information\n\nSUBCOMMANDS:\n    bhfst-info         Print metadata for BHFST\n    help               Prints this message or the help of the given subcommand(s)\n    hfst-to-thfst      Convert an HFST file to THFST\n    thfsts-to-bhfst    Convert a THFST acceptor/errmodel pair to BHFST\n    zhfst-to-bhfst     Convert a ZHFST file to BHFST\n```\n\n## Speller testing\n\nThere's a prototype-level testing tool in `support/accuracy-viewer`. Use it like:\n\n```\naccuracy -o support/accuracy-viewer/public/report.json typos.txt sma.zhfst\ncd support/accuracy-viewer\nnpm i \u0026\u0026 npm run dev\n```\n\nView in `http://localhost:5000`.\n\n`typos.txt` is a TSV file with typos in the first column and expected correction in the second.\nMore info by `accuracy --help`.\n\n## License\n\nThe crate `divvunspell` is licensed under either of\n\n * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)\n * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)\n\nat your option.\n\nThe `divvunspell`, `thfst-tools` and `accuracy` binaries are licensed under the GPL version 3 license.\n\n## More docs?\n\nWe have [GitHub pages site](https://divvun.github.io/divvunspell/) for\ndivvunspell with some more tech docs and stuff (WIP).\n","funding_links":[],"categories":["Software"],"sub_categories":["Utilities"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivvun%2Fdivvunspell","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdivvun%2Fdivvunspell","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivvun%2Fdivvunspell/lists"}