Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sillsdev/khmer-normalizer
Normalize Khmer strings according to https://www.unicode.org/L2/L2022/22290-khmer-encoding.pdf
https://github.com/sillsdev/khmer-normalizer
Last synced: 3 months ago
JSON representation
Normalize Khmer strings according to https://www.unicode.org/L2/L2022/22290-khmer-encoding.pdf
- Host: GitHub
- URL: https://github.com/sillsdev/khmer-normalizer
- Owner: sillsdev
- License: other
- Created: 2024-08-10T09:49:33.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-08-10T10:16:54.000Z (3 months ago)
- Last Synced: 2024-08-10T11:30:31.584Z (3 months ago)
- Language: TypeScript
- Size: 53.7 KB
- Stars: 1
- Watchers: 9
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-khmer-language - sillsdev/khmer-normalizer - khmer-encoding.pdf (Awesome Khmer Language / 2. Toolkit)
README
# khmer-normalizer
This module normalizes Khmer text according to the proposed normal encoding structure at https://www.unicode.org/L2/L2022/22290-khmer-encoding.pdf.
It does not attempt to identify faulty text, merely to ensure that two strings that would have rendered the same are output as the same string.
A live online version is available at https://convert.ភាសាខ្មែរ.com/ (see https://ភាសាខ្មែរ.com for related tools).
## Example
ខែ្មរ is corrected to ខ្មែរ:
* Input: ខ `U+1781` ែ `U+17C2` ្ `U+17D2` ម `U+1798` រ `U+179A`
* Output: ខ `U+1781` ្ `U+17D2` ម `U+1798` ែ `U+17C2` រ `U+179A`## Installation
```
npm install khmer-normalizer
```## API Usage
```ts
import { khnormal } from 'khmer-normalizer';
// Normal use -- Modern Khmer
const cleanKhmer = khnormal(inputKhmer);// Specifying the Modern Khmer language tag is optional
const cleanKhmer = khnormal(inputKhmer, 'km');// For Middle Khmer, use the language tag 'xhm'
const cleanMiddleKhmerText = khnormal(inputMiddleKhmerText, 'xhm');```
## Command line usage
```
khnormal [options] [inputFile...]If no input files are specified, reads from stdin, utf-8
# Options
--outfile, -o Write concatenated output to file; if not specified, writes to stdout, utf-8
--fail, -f Highlight places where khnormal was unable to regularize text
--lang, -l Specify processing language, km (Modern Khmer, default) or xhm (Middle Khmer)
--help, -h Print this help
```