{"id":15519071,"url":"https://github.com/danieldk/alpino-tokenizer","last_synced_at":"2025-04-23T04:18:58.236Z","repository":{"id":35161312,"uuid":"213572688","full_name":"danieldk/alpino-tokenizer","owner":"danieldk","description":"Rust wrapper for the Alpino tokenizer","archived":false,"fork":false,"pushed_at":"2023-11-06T09:51:47.000Z","size":14620,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-23T04:18:53.061Z","etag":null,"topics":["alpino","dutch","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieldk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-2.0.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-10-08T07:11:10.000Z","updated_at":"2022-08-09T07:08:05.000Z","dependencies_parsed_at":"2022-07-24T18:17:29.833Z","dependency_job_id":null,"html_url":"https://github.com/danieldk/alpino-tokenizer","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Falpino-tokenizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Falpino-tokenizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Falpino-tokenizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Falpino-tokenizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieldk","download_url":"https://codeload.github.com/danieldk/alpino-tokenizer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250366715,"owners_count":21418772,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alpino","dutch","tokenizer"],"created_at":"2024-10-02T10:19:57.750Z","updated_at":"2025-04-23T04:18:58.217Z","avatar_url":"https://github.com/danieldk.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"## alpino-tokenizer\n\nThis Rust crate provides a tokenizer based on finite state transducers.\nIt is primarily designed to use the\n[Alpino](https://www.let.rug.nl/vannoord/alp/Alpino/) tokenizer for\nDutch, but in principle, you could load a tokenizer for any language.\n\nThe transducer of the Alpino tokenizer can be\n[downloaded](https://github.com/danieldk/alpino-tokenizer/releases/download/0.3.0/alpino-tokenizer-20200315.proto.gz).\nWe will synchronize the transducer regularly as the tokenizer in\nAlpino is updated.\n\nYou can use the [alpino-tokenizer](https://crates.io/crates/alpino-tokenizer)\ncrate to integrate the tokenizer in your Rust programs.\n\nFor convenience, an\n[alpino-tokenize](https://crates.io/crates/alpino-tokenize)\ncommand-line utility is provided for tokenizing text on from the shell\nor in shell scripts.\n\n## Installing the `alpino-tokenize` command-line utility\n\n### cargo\n\nThe `alpino-tokenize` utility can be installed with\n[cargo](https://rustup.rs/):\n\n```shell\n$ cargo install alpino-tokenize\n```\n\n### Nix\n\nThis repository is also a Nix flake. If you use a Nix version that\nsupports flakes, you can start a shell with `alpino-tokenize` as\nfollows:\n\n```\n$ nix shell github:danieldk/alpino-tokenizer\n```\n\n## License\n\nCopyright 2019-2020 Daniël de Kok\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Falpino-tokenizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldk%2Falpino-tokenizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Falpino-tokenizer/lists"}