https://github.com/pusnow/ks-x-1026-python
Python implementation for KS X 1026-1 and KS X 1026-2
https://github.com/pusnow/ks-x-1026-python
hangul korean korean-standard oldhangul python unicode
Last synced: 3 months ago
JSON representation
Python implementation for KS X 1026-1 and KS X 1026-2
- Host: GitHub
- URL: https://github.com/pusnow/ks-x-1026-python
- Owner: Pusnow
- License: mit
- Created: 2016-08-22T12:32:22.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2022-11-24T01:59:09.000Z (over 2 years ago)
- Last Synced: 2025-03-04T08:02:55.273Z (3 months ago)
- Topics: hangul, korean, korean-standard, oldhangul, python, unicode
- Language: Python
- Size: 43.9 KB
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# KS X 1026 Python
[](https://travis-ci.org/Pusnow/KS-X-1026-Python)Python implementation for KS X 1026-1.
## KS X 1026-1
KS X 1026-1 is a Korean standard for Hangul processing guide for information interchange. More informations are available [here](http://www.unicode.org/L2/L2008/08225-n3422.pdf).## Installation
KS X 1026 Python is available via PyPipip install ksx1026
or setup.py
python setup.py install
## Normalizations
### Hangul Decomposition
Returns a Johab Modern Hangul Syllable Block for the given Wanseong Modern Hangul Syllable Blockchar S: Single character Hangul Syllable. If not, return input.
>>> from ksx1026.normalization import decomposeHangul
>>> c = "\uAC01"
>>> d = decomposeHangul(c)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11a8'### Hangul Decomposition String
Returns a Johab Modern Hangul Syllable string for the given Wanseong Modern Hangul Syllable stringstring source: unicode string.
>>> from ksx1026.normalization import decomposeHangulStr
>>> source = "\uAC01\uAC01"
>>> d = decomposeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11a8\\u1100\\u1161\\u11a8'### Hangul Composition
Returns a Wanseong Modern Hangul Syllable Block for the given Johab Modern Hangul Syllable Block. Even when a portion of an Old Hangul Syllable Block is a Modern Hangul Syllable Block,unlike UAX #15, that portion is not transformed to a Wanseong Modern Hangul Syllable Block.
string source: unicode string.
>>> from ksx1026.normalization import composeHangul
>>> source = "\u1100\u1161\u11a8"
>>> d = composeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\uac01'
>>> source = "\u1100\u1161\u11c3"
>>> d = composeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11c3'### Hangul Recomposition
If one uses a UAX #15 algorithm instead of the above composeHangul function for normalization, an Old Hangul Syllable Block can be decomposed into a Wanseong Modern Hangul Syllable Block and Johab Hangul Letter(s). In such cases, after applying, one can use the following recomposition algorithm to restore a character string in Normalization Form NFC or NFKC to an L V T format.
string source: unicode string>>> from ksx1026.normalization import recomposeHangul
>>> source = "\uac00\u11c3"
>>> d = recomposeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11c3'### Normalization of Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols
Normalizing Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols (NormalizeJamoKDKC)
string source: unicode string
>>> from ksx1026.normalization import normalizeJamoKDKC
>>> source = "\u3200"
>>> d = normalizeJamoKDKC(source)
>>> print(d.encode('raw_unicode_escape'))
>>> b'(\\u1100\\u1160)