Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hkhc/ccc
Chinese Commercial Code mapping
https://github.com/hkhc/ccc
Last synced: about 7 hours ago
JSON representation
Chinese Commercial Code mapping
- Host: GitHub
- URL: https://github.com/hkhc/ccc
- Owner: hkhc
- Created: 2021-04-23T19:53:06.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-06-04T02:45:08.000Z (over 3 years ago)
- Last Synced: 2023-07-04T14:04:38.960Z (over 1 year ago)
- Language: Java
- Size: 341 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Chinese Commercial Code mapping library
## Setup
```gradle
dependencies {
implementation 'io.hkhc.ccc:ccc:1.0'
}
```### Introduction
Before the computer-era, people send textual message all over the world with
telegraph. It is straight forward to send messages in English alphabets. Not so
for Chinese characters. People invented a coding system to encode most frequently
used Chinese characters in 4-digit code. It is known as Chinese Commercial Code. It is not friendly for data input, but very
space efficient, which is far more valuable in the old days.Nowaday we have better coding system for Chinese characters, as part of Unicode
system. However the Chinese Commercial Code system is still in use today, to specify
the legal name of people in Hong Kong Identity Card (HKID) and passport. The corresponding Chinese
Commercial Code of the Chinese name is printed on the documents. It helps non-Chinese
law enforcement agencies to recognize the Chinese name of people on the documents. It also
help HKID data collections with camera by recognize digits rather than Chinese characters.However, mapping between Chinese characters and Chinese Commercial Codes is not a commmon facility of modern
programming platform. This library fills the gap for applications that have this needs.Note that mapping between Chinese characters and Chinese Commercial Code is not one-to-one. One code
may represents different Chinese characters, and it is not entirely rare. Most of the time the traditional Chinese
version and the Simplified Chinese version of the same character are mapped to the same code. So it is not practical to
map the code back to Chinese characters mechanically. But it is mostly good enough for person name identificaation purpose.
It is also possible that a Chinese character (available in Unicode) and no mapping in Chinese Commercial Code.
There are much more characters for 4-digit coding system.## Examples
```java
CCCDB db = new CCCDB();
db.load();
Assert.assertEquals("7115 1129 2429", db.getCCCs("陳大文"));
```If the Chinese characters has no corresponding Chinese Commercial Code, the result would be
"0000'.## Source data
Source data for the mapping files are placed under data folder. The encoding mapping file
is generated by `CCCGenerator` class.## Building the library
It is a trivial piece of code to build with Gradle. The deployment of the library to MavenCentral is done
by my another project [Jarbird](https://github.com/hkhc/jarbird). But it is undergoing major update, stay tuned.