{"id":18704883,"url":"https://github.com/xiaohk/pinyin_data","last_synced_at":"2025-04-12T10:06:13.557Z","repository":{"id":134719897,"uuid":"82696863","full_name":"xiaohk/pinyin_data","owner":"xiaohk","description":"🐼 Easy to use and portable pronunciation data for Hanzi characters.","archived":false,"fork":false,"pushed_at":"2017-02-27T16:58:51.000Z","size":3671,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-12T10:02:02.606Z","etag":null,"topics":["chinese","cjk","pinyin","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xiaohk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-21T15:40:09.000Z","updated_at":"2024-08-10T14:48:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"8b922eed-3f29-49fa-b3e3-07417daed5bb","html_url":"https://github.com/xiaohk/pinyin_data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaohk%2Fpinyin_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaohk%2Fpinyin_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaohk%2Fpinyin_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaohk%2Fpinyin_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xiaohk","download_url":"https://codeload.github.com/xiaohk/pinyin_data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248550634,"owners_count":21122933,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese","cjk","pinyin","python"],"created_at":"2024-11-07T12:08:57.022Z","updated_at":"2025-04-12T10:06:13.552Z","avatar_url":"https://github.com/xiaohk.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pinyin Data\nEasy to use and portable pronunciation data for Hanzi characters.\n\n## Pinyin\n`./pinyin/pinyin.json` and `./pinyin/pinyin.yaml` contain the same Pinyin records\nfor 41216 Hanzi characters (both traditional and simplified).\n\nEach file is a dictionary mapping Hanzi character to a list of Pinyin's.\n```python\n{'长' : ['zhǎng', 'cháng'],\n '長' : ['zhǎng', 'cháng', 'zhàng']}\n```\n\n- First element of the Pinyin list is the most frequently used pronunciation.\n- All Pinyin records are from `kMandarin`, `kXHC1983`(\"现代汉语词典\"), \n`kHanyuPinlu`(\"现代汉语频率词典\"), `kHanyuPinyin`(\"汉语大字典\") feilds of \nUnihan reading database.\n- Unihan reading database version: `2016-06-01 07:01:48 GMT`\n\n## Polyphone\nSome Hanzi characters have multiple pronunciation, \n`./polyphone/polyphone.json` and `./polyphone/polyphone.yaml` are used to map\nthe particular pronunciation to corresponding word context.\n\nEach file is a dictionary mapping Hanzi character to an inner dictionary. The\ninner dictionary map Pinyin to a list containing three lists of words. Three \nlists contain the words where the Hanzi character is at the beginning, in the \nmiddle or at the end.\n\n```python\n{'会': {huì:[['会合'], [], ['都会']],\n        kuài:[['会计'], [], ['财会']]}}\n```\n\nIn this version, all polyphone data are parsed from this [website](http://www.fuhaoku.com/duoyinzi/). The overall coverage is still limited, so you are more than\nwelcome to add more example words and entries into the polyphone collection.\n\n1. You can parse data from other websites and add non-duplicate words into the \nPolyphone dictionary using the same structure. Just a heads up, there might be\nlots of errors on the websites.\n2. You can simply add new words into the correct list in \n`./polyphone/polyphone.yaml`, then run `./parse/update_json.py` to sync it to\n`./polyphone/polyphone.json`. \n\n## Use\nClone the git, then copy the interested data to your project.\n\nUse of the Pinyin information should follow [Unicode® Terms of Use](http://www.unicode.org/copyright.html). Other codes use MIT licence.\n\n## TODO List\n- Add Jyuping records\n\n\n## How to Contribute:\n1. Create an issue.\n2. Add words into Polyphone collection, fix bugs, add features, then pull request.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxiaohk%2Fpinyin_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxiaohk%2Fpinyin_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxiaohk%2Fpinyin_data/lists"}