{"id":21961750,"url":"https://github.com/daac-tools/python-vibrato","last_synced_at":"2025-03-15T23:10:42.036Z","repository":{"id":64715283,"uuid":"575674913","full_name":"daac-tools/python-vibrato","owner":"daac-tools","description":"Viterbi-based accelerated tokenizer (Python wrapper)","archived":false,"fork":false,"pushed_at":"2024-09-04T00:55:21.000Z","size":40,"stargazers_count":41,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-28T20:36:41.753Z","etag":null,"topics":["morphological-analysis","nlp","python","segmentation","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daac-tools.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-08T03:30:10.000Z","updated_at":"2024-12-17T06:05:37.000Z","dependencies_parsed_at":"2025-02-07T12:10:46.451Z","dependency_job_id":"a1b99d70-c016-4f21-b3b1-a0601d5d3936","html_url":"https://github.com/daac-tools/python-vibrato","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-vibrato","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-vibrato/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-vibrato/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-vibrato/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daac-tools","download_url":"https://codeload.github.com/daac-tools/python-vibrato/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243801679,"owners_count":20350108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["morphological-analysis","nlp","python","segmentation","tokenizer"],"created_at":"2024-11-29T10:17:49.322Z","updated_at":"2025-03-15T23:10:42.017Z","avatar_url":"https://github.com/daac-tools.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🐍 python-vibrato 🎤\n\n[Vibrato](https://github.com/daac-tools/vibrato) is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm.\nThis is a Python wrapper for Vibrato.\n\n[![PyPI](https://img.shields.io/pypi/v/vibrato)](https://pypi.org/project/vibrato/)\n[![Build Status](https://github.com/daac-tools/python-vibrato/actions/workflows/CI.yml/badge.svg)](https://github.com/daac-tools/python-vibrato/actions)\n[![Documentation Status](https://readthedocs.org/projects/python-vibrato/badge/?version=latest)](https://python-vibrato.readthedocs.io/en/latest/?badge=latest)\n\n## Installation\n\n### Install pre-built package from PyPI\n\nRun the following command:\n\n```\n$ pip install vibrato\n```\n\n### Build from source\n\nYou need to install the Rust compiler following [the documentation](https://www.rust-lang.org/tools/install) beforehand.\nvibrato uses `pyproject.toml`, so you also need to upgrade pip to version 19 or later.\n\n```\n$ pip install --upgrade pip\n```\n\nAfter setting up the environment, you can install vibrato as follows:\n\n```\n$ pip install git+https://github.com/daac-tools/python-vibrato\n```\n\n## Example Usage\n\npython-vibrato does not contain model files.\nTo perform tokenization, follow [the document of Vibrato](https://github.com/daac-tools/vibrato) to download distribution models or train your own models beforehand.\n\nCheck the version number as shown below to use compatible models:\n\n```python\n\u003e\u003e\u003e import vibrato\n\u003e\u003e\u003e vibrato.VIBRATO_VERSION\n'0.5.1'\n\n```\n\nExamples:\n\n```python\n\u003e\u003e\u003e import vibrato\n\n\u003e\u003e\u003e with open('tests/data/system.dic', 'rb') as fp:\n...     tokenizer = vibrato.Vibrato(fp.read())\n\n\u003e\u003e\u003e tokens = tokenizer.tokenize('社長は火星猫だ')\n\n\u003e\u003e\u003e len(tokens)\n5\n\n\u003e\u003e\u003e tokens[0]\nToken { surface: \"社長\", feature: \"名詞,普通名詞,一般,*\" }\n\n\u003e\u003e\u003e tokens[0].surface()\n'社長'\n\n\u003e\u003e\u003e tokens[0].feature()\n'名詞,普通名詞,一般,*'\n\n\u003e\u003e\u003e tokens[0].start()\n0\n\n\u003e\u003e\u003e tokens[0].end()\n2\n\n```\n\n## Note for distributed models\n\nThe distributed models are compressed in zstd format. If you want to load these compressed models,\nyou must decompress them outside the API.\n\n```python\n\u003e\u003e\u003e import vibrato\n\u003e\u003e\u003e import zstandard  # zstandard package in PyPI\n\n\u003e\u003e\u003e dctx = zstandard.ZstdDecompressor()\n\u003e\u003e\u003e with open('tests/data/system.dic.zst', 'rb') as fp:\n...     with dctx.stream_reader(fp) as dict_reader:\n...         tokenizer = vibrato.Vibrato(dict_reader.read())\n\n```\n\n## License\n\nLicensed under either of\n\n * Apache License, Version 2.0\n   ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)\n * MIT license\n   ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)\n\nat your option.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaac-tools%2Fpython-vibrato","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaac-tools%2Fpython-vibrato","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaac-tools%2Fpython-vibrato/lists"}