https://github.com/rusq/gomojimoji
A Go port of a fast converter between Japanese hankaku and zenkaku characters.
https://github.com/rusq/gomojimoji
converter go golang hankaku japanese japanese-language zenkaku
Last synced: 3 months ago
JSON representation
A Go port of a fast converter between Japanese hankaku and zenkaku characters.
- Host: GitHub
- URL: https://github.com/rusq/gomojimoji
- Owner: rusq
- License: apache-2.0
- Created: 2022-09-25T22:02:30.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-09-25T22:05:36.000Z (over 2 years ago)
- Last Synced: 2025-03-28T10:18:10.684Z (3 months ago)
- Topics: converter, go, golang, hankaku, japanese, japanese-language, zenkaku
- Language: Go
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# (go) mojimoji
[](https://pkg.go.dev/github.com/rusq/gomojimoji)
This is a port of the excellent [mojimoji][1] library written in Python
to Golang.It provides two functions:
- **HanToZen** - half-width to full-width character conversion.
- **ZenToHan** - half-width to full-width character conversion.Each of the functions allow the following options:
- **ASCII** - enable or disable ASCII translation.
- **Digits** - enable or disable Digits translation.
- **Kana** - enable or disable Kana translation.All options are enabled by default, see examples on their usage.
Logic is implemented as of commit [aca2661][2].
## Examples
### HanToZen
```go
fmt.Println(HanToZen("ニュージーランド"))
fmt.Println(HanToZen("ニュージーランド Auckland 6012", ASCII(true), Digits(false), Kana(false)))// Output:
// ニュージーランド
// ニュージーランド Auckland 6012
```### ZenToHan
```go
fmt.Println(ZenToHan("ニュージーランド"))
fmt.Println(ZenToHan("ニュージーランド Auckland 0123", Kana(false), Digits(true)))// Output:
// ニュージーランド
// ニュージーランド Auckland 0123
```## Benchmark
### Original library etc.
Original mojimoji, zenhan and unicodedata on my system, for comparison:
```python
In [4]: s = u'ABCDEFG012345' * 10In [5]: %time for n in range(1000000): mojimoji.zen_to_han(s)
CPU times: user 3.24 s, sys: 1.28 ms, total: 3.24 s
Wall time: 3.24 sIn [6]: %time for n in range(1000000): zenhan.z2h(s)
CPU times: user 26.2 s, sys: 16.3 ms, total: 26.2 s
Wall time: 26.2 sIn [7]: %time for n in range(1000000): unicodedata.normalize('NFKC', s)
CPU times: user 3.12 s, sys: 15.4 ms, total: 3.13 s
Wall time: 3.14 s
```### This library
ZenToHan and HanToZen use different approaches:- ZenToHan uses string.Builder, which is simpler to implement.
- HanToZen uses direct slice operations to allow for seeking when needed.ZenToHan:
```
mojimoji (master)> go test -bench=BenchmarkZenToHanConv
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkZenToHanConv-16 1 2880823810 ns/op
--- BENCH: BenchmarkZenToHanConv-16
mojimoji_test.go:98: 2.88079814s
PASS
ok github.com/rusq/gomojimoji 2.977s
```HanToZen:
```
mojimoji (master)> go test -bench=BenchmarkHanToZen
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkHanToZenConv-16 1 2712209539 ns/op
--- BENCH: BenchmarkHanToZenConv-16
mojimoji_test.go:107: 2.712166151s
PASS
ok github.com/rusq/gomojimoji 2.804s
```[1]: https://github.com/studio-ousia/mojimoji
[2]: https://github.com/studio-ousia/mojimoji/tree/aca26614f4a7a90a845f3a3c384c27d0a925efce