{"id":19646151,"url":"https://github.com/yoctol/strpipe","last_synced_at":"2025-04-28T15:30:29.631Z","repository":{"id":57471908,"uuid":"145549389","full_name":"Yoctol/strpipe","owner":"Yoctol","description":"text preprocessing pipeline","archived":false,"fork":false,"pushed_at":"2018-11-30T11:23:02.000Z","size":711,"stargazers_count":5,"open_issues_count":9,"forks_count":0,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-25T04:50:03.178Z","etag":null,"topics":["cython","natural-language-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yoctol.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-08-21T10:37:42.000Z","updated_at":"2022-06-19T00:14:13.000Z","dependencies_parsed_at":"2022-08-30T13:52:09.018Z","dependency_job_id":null,"html_url":"https://github.com/Yoctol/strpipe","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yoctol%2Fstrpipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yoctol%2Fstrpipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yoctol%2Fstrpipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yoctol%2Fstrpipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yoctol","download_url":"https://codeload.github.com/Yoctol/strpipe/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251338493,"owners_count":21573566,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython","natural-language-processing"],"created_at":"2024-11-11T14:37:06.579Z","updated_at":"2025-04-28T15:30:28.654Z","avatar_url":"https://github.com/Yoctol.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# strpipe\n\n[![travis-ci][travis-image]][travis-url]    [![pypi-version][pypi-image]][pypi-url]    [![codecov][codecov-image]][codecov-url]\n\n[travis-image]: https://travis-ci.org/Yoctol/strpipe.svg?branch=master\n[travis-url]: https://travis-ci.org/Yoctol/strpipe\n\n[pypi-image]: https://badge.fury.io/py/strpipe.svg\n[pypi-url]: https://badge.fury.io/py/strpipe\n\n[codecov-image]: https://codecov.io/gh/Yoctol/strpipe/branch/master/graph/badge.svg\n[codecov-url]: https://codecov.io/gh/Yoctol/strpipe\n\n\nReversible string processing pipe. Featuring reproducibility, serializability and performance.\n\n## Installation\n\n```\npip install strpipe\n```\n\n## Usage\n\n```python\nimport strpipe as sp\n\np = sp.Pipe()\np.add_step_by_op_name('ZhCharTokenizer')\np.add_step_by_op_name('AddSosEos')\np.add_checkpoint()\np.add_step_by_op_name('Pad')\np.add_step_by_op_name('TokenToIndex')\n\ndata = [\n    '你好啊',\n    '早安',\n    '你早上好',\n]\n\np.fit(data)\nresult, tx_info, intermediates = p.transform(data)  # convention: tx =\u003e tranform\nback_data = p.inverse_transform(result, tx_info)\n```\n\n### Serialization\n```python\n# Save it\np.save_json('/path/of/pipe')\n\n# Load it\np = sp.Pipe.restore_from_json('/path/of/pipe')\nresult, meta = p.transform(['你好'])\n```\n\n## Test\n\n```\n$ make test\n```\n\n## Docs\n\n```\n$ make docs\n\nDocs will be built in the `docs/build/html` folder. (Note: this also reinstalls the package because we\nneed Cython code to be rebuilt.)\n```\n\n## Extend Ops\n\n1. Extend the new ops with `BaseOp`\n2. Define `input_type`, `output_type`\n3. Implement op creation\n4. Implement fit, transform, inverse_transform. If the op is stateless, the `fit` method should return None.\n\n\u003e Note: It is expected that an ops's functionality will often be able to be decomposed into several functions. These functions should be written into (or imported from) the toolkit package for easy reuse.\nOps in the ops package will, for the most part, be wrappers for functions in toolkit.\n\n5. Write tests\n6. Register to `op_factory`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoctol%2Fstrpipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyoctol%2Fstrpipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoctol%2Fstrpipe/lists"}