https://github.com/dahlia/unihan-json
JSON data files parsed from the Unicode Han Database (Unihan)
https://github.com/dahlia/unihan-json
chinese-characters cjk-characters hanja hanzi kanji unicode-data unihan unihan-database
Last synced: 7 days ago
JSON representation
JSON data files parsed from the Unicode Han Database (Unihan)
- Host: GitHub
- URL: https://github.com/dahlia/unihan-json
- Owner: dahlia
- Created: 2018-04-04T20:46:22.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-05-26T20:50:11.000Z (over 6 years ago)
- Last Synced: 2025-05-12T19:21:28.096Z (5 months ago)
- Topics: chinese-characters, cjk-characters, hanja, hanzi, kanji, unicode-data, unihan, unihan-database
- Language: Python
- Size: 7.96 MB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Unihan JSON
===========[![Build Status][ci-badge]][ci]
This project generates JSON data files parsed from the [Unicode Han Database]
(Unihan) in a structured way. Although it's automated through the *process.py*
script, the goal of this project is not the script, but JSON data files.To download JSON data files, see the [latest release]. To load them from a web
page through XHR or `window.fetch()` function, request the following URI
(replace `` with a property name, e.g., `kSimplifiedVariant`):https://dahlia.github.io/unihan-json/12.1.0/.json
Each JSON file corresponds to a property, and is an object which represents
a table from Unicode characters to values for the property. For example,
*kCantonese.json* is like:~~~~~~~~ json
{
"香": ["hoeng1"],
"港": ["gong2"],
...
}
~~~~~~~~The following some properties are parsed further into structured values:
kAccountingNumeric
kCantonese
kFrequency
kGB0
kGB1
kGB3
kGB5
kGB7
kGB8
kGradeLevel
kHangul
kHanyuPinlu
kHanyuPinyin
kJapaneseKun
kJapaneseOn
kLau
kNelson
kOtherNumeric
kPrimaryNumeric
kSimplifiedVariant
kTaiwanTelegraph
kTang
kTotalStrokes
kTraditionalVariant
kVietnameseThe rest properties are merely parsed into string values. Contributing more
parsers are welcome; see also the `PROP_PARSERS` map in the *process.py* script.[ci-badge]: https://travis-ci.org/dahlia/unihan-json.svg?branch=master
[ci]: https://travis-ci.org/dahlia/unihan-json
[Unicode Han Database]: https://www.unicode.org/reports/tr38/
[latest release]: https://github.com/dahlia/unihan-json/releases/latest