{"id":13838768,"url":"https://github.com/jacksonllee/pycantonese","last_synced_at":"2026-01-21T22:38:36.973Z","repository":{"id":24565805,"uuid":"27973172","full_name":"jacksonllee/pycantonese","owner":"jacksonllee","description":"Cantonese Linguistics and NLP","archived":false,"fork":false,"pushed_at":"2024-05-23T12:48:59.000Z","size":15833,"stargazers_count":391,"open_issues_count":12,"forks_count":43,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-09-25T16:27:24.267Z","etag":null,"topics":["cantonese","computational-linguistics","jyutping","linguistics","natural-language-processing","nlp","part-of-speech-tagging","pycantonese","python","stop-words","word-segmentation"],"latest_commit_sha":null,"homepage":"https://pycantonese.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacksonllee.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["jacksonllee"],"custom":["https://www.buymeacoffee.com/pycantonese"]}},"created_at":"2014-12-13T20:40:56.000Z","updated_at":"2025-09-23T15:08:00.000Z","dependencies_parsed_at":"2024-04-23T04:46:23.985Z","dependency_job_id":"a7cdd7a9-2d98-438f-ab8d-57f4f9e17fe1","html_url":"https://github.com/jacksonllee/pycantonese","commit_stats":{"total_commits":309,"total_committers":5,"mean_commits":61.8,"dds":"0.24595469255663427","last_synced_commit":"9585c919f9e4c9798cbcf17e743220db59a2eb3a"},"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/jacksonllee/pycantonese","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonllee%2Fpycantonese","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonllee%2Fpycantonese/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonllee%2Fpycantonese/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonllee%2Fpycantonese/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacksonllee","download_url":"https://codeload.github.com/jacksonllee/pycantonese/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonllee%2Fpycantonese/sbom","scorecard":{"id":500868,"data":{"date":"2025-08-11","repo":{"name":"github.com/jacksonllee/pycantonese","commit":"3485a6c78495d1415d5228b2c94dd1a37c58ed9a"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 1/29 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: MIT License: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 2 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-19T21:48:56.462Z","repository_id":24565805,"created_at":"2025-08-19T21:48:56.462Z","updated_at":"2025-08-19T21:48:56.462Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28645551,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T21:29:11.980Z","status":"ssl_error","status_checked_at":"2026-01-21T21:24:31.872Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cantonese","computational-linguistics","jyutping","linguistics","natural-language-processing","nlp","part-of-speech-tagging","pycantonese","python","stop-words","word-segmentation"],"created_at":"2024-08-04T16:00:32.876Z","updated_at":"2026-01-21T22:38:36.958Z","avatar_url":"https://github.com/jacksonllee.png","language":"Python","funding_links":["https://github.com/sponsors/jacksonllee","https://www.buymeacoffee.com/pycantonese"],"categories":["工具 Tools","Open Source"],"sub_categories":["Categories"],"readme":"PyCantonese: Cantonese Linguistics and NLP in Python\n====================================================\n\n.. image:: https://jacksonllee.com/logos/pycantonese-logo.png\n   :width: 250px\n\nFull Documentation: https://pycantonese.org\n\n|\n\n.. image:: https://badge.fury.io/py/pycantonese.svg\n   :target: https://pypi.python.org/pypi/pycantonese\n   :alt: PyPI version\n\n.. image:: https://img.shields.io/pypi/pyversions/pycantonese.svg\n   :target: https://pypi.python.org/pypi/pycantonese\n   :alt: Supported Python versions\n\n.. image:: https://circleci.com/gh/jacksonllee/pycantonese.svg?style=shield\n   :target: https://circleci.com/gh/jacksonllee/pycantonese\n   :alt: CircleCI Builds\n\n|\n\n.. start-sphinx-website-index-page\n\nPyCantonese is a Python library for Cantonese linguistics and natural language\nprocessing (NLP). Currently implemented features (more to come!):\n\n- Accessing and searching corpus data\n- Parsing and conversion tools for Jyutping romanization\n- Parsing Cantonese text\n- Stop words\n- Word segmentation\n- Part-of-speech tagging\n\n.. _download_install:\n\nDownload and Install\n--------------------\n\nTo download and install the stable, most recent version::\n\n    $ pip install --upgrade pycantonese\n\nReady for more?\nCheck out the `Quickstart \u003chttps://pycantonese.org/quickstart.html\u003e`_ page.\n\nConsulting\n----------\n\nIf your team would like professional assistance in using PyCantonese,\nfreelance consulting and training services are available for both academic and commercial groups.\nPlease email `Jackson L. Lee \u003chttps://jacksonllee.com\u003e`_.\n\nSupport\n-------\n\nIf you have found PyCantonese useful and would like to offer support,\n`buying me a coffee \u003chttps://www.buymeacoffee.com/pycantonese\u003e`_ would go a long way!\n\nLinks\n-----\n\n* Source code: https://github.com/jacksonllee/pycantonese\n* Bug tracker: https://github.com/jacksonllee/pycantonese/issues\n* Social media:\n  `Facebook \u003chttps://www.facebook.com/pycantonese\u003e`_\n  and `Twitter \u003chttps://twitter.com/pycantonese\u003e`_\n\nHow to Cite\n-----------\n\nPyCantonese is authored and maintained by `Jackson L. Lee \u003chttps://jacksonllee.com\u003e`_.\n\nLee, Jackson L., Litong Chen, Charles Lam, Chaak Ming Lau, and Tsz-Him Tsui. 2022.\n`PyCantonese: Cantonese Linguistics and NLP in Python \u003chttps://jacksonllee.com/papers/pycantonese_lrec_2022-05-06.pdf\u003e`_.\n*Proceedings of the 13th Language Resources and Evaluation Conference*.\n\n.. code-block:: latex\n\n      @inproceedings{lee-etal-2022-pycantonese,\n         title = \"PyCantonese: Cantonese Linguistics and NLP in Python\",\n         author = \"Lee, Jackson L.  and\n            Chen, Litong  and\n            Lam, Charles  and\n            Lau, Chaak Ming  and\n            Tsui, Tsz-Him\",\n         booktitle = \"Proceedings of The 13th Language Resources and Evaluation Conference\",\n         month = june,\n         year = \"2022\",\n         publisher = \"European Language Resources Association\",\n         language = \"English\",\n      }\n\nLicense\n-------\n\nMIT License. Please see ``LICENSE.txt`` in the GitHub source code for details.\n\nThe HKCanCor dataset included in PyCantonese is substantially modified from\nits source in terms of format. The original dataset has a CC BY license.\nPlease see ``pycantonese/data/hkcancor/README.md``\nin the GitHub source code for details.\n\nThe rime-cantonese data (release 2021.05.16) is\nincorporated into PyCantonese for word segmentation and\ncharacters-to-Jyutping conversion.\nThis data has a CC BY 4.0 license.\nPlease see ``pycantonese/data/rime_cantonese/README.md``\nin the GitHub source code for details.\n\nLogo\n----\n\nThe PyCantonese logo is the Chinese character 粵 meaning Cantonese,\nwith artistic design by albino.snowman (Instagram handle).\n\nAcknowledgments\n---------------\n\nWonderful resources with a permissive license that have been incorporated into PyCantonese:\n\n- HKCanCor\n- rime-cantonese\n\nIndividuals who have contributed pull requests, bug reports, and other feedback\n(in alphabetical order of last names):\n\n- @cathug\n- Francis Bond\n- Jenny Chim\n- Eric Dong\n- @g-traveller\n- @graphemecluster\n- Rachel Han\n- Ryan Lai\n- @laubonghaudoi\n- Katrina Li\n- Kevin Li\n- @ZhanruiLiang\n- Hill Ma\n- @richielo\n- @rylanchiu\n- Stephan Stiller\n- Robin Yuen\n\n.. end-sphinx-website-index-page\n\nChangelog\n---------\n\nPlease see ``CHANGELOG.md``.\n\nSetting up a Development Environment\n------------------------------------\n\nThe latest code under development is available on GitHub at\n`jacksonllee/pycantonese \u003chttps://github.com/jacksonllee/pycantonese\u003e`_.\nTo obtain this version for experimental features or for development:\n\n.. code-block:: bash\n\n   $ git clone https://github.com/jacksonllee/pycantonese.git\n   $ cd pycantonese\n   $ pip install -e \".[dev]\"\n\nTo run tests and styling checks:\n\n.. code-block:: bash\n\n   $ pytest\n   $ flake8 src tests\n   $ black --check src tests\n\nTo build the documentation website files:\n\n.. code-block:: bash\n\n    $ python docs/source/build_docs.py\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonllee%2Fpycantonese","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacksonllee%2Fpycantonese","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonllee%2Fpycantonese/lists"}