https://github.com/egegungordu/jaime
A fast, lightweight Japanese IME engine for Zig projects that converts romaji to hiragana, kanji, and full-width characters. Supports Google 日本語入力-style input patterns with an easy-to-use API.
https://github.com/egegungordu/jaime
hiragana ime input-method-engine kanji romaji text-conversion transliteration zig
Last synced: 3 months ago
JSON representation
A fast, lightweight Japanese IME engine for Zig projects that converts romaji to hiragana, kanji, and full-width characters. Supports Google 日本語入力-style input patterns with an easy-to-use API.
- Host: GitHub
- URL: https://github.com/egegungordu/jaime
- Owner: egegungordu
- License: mit
- Created: 2025-01-04T00:06:30.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-02-05T20:35:09.000Z (4 months ago)
- Last Synced: 2025-02-24T01:44:17.677Z (3 months ago)
- Topics: hiragana, ime, input-method-engine, kanji, romaji, text-conversion, transliteration, zig
- Language: Zig
- Homepage: https://jaime-wasm.pages.dev/
- Size: 17.4 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Jaime
A headless Japanese IME (Input Method Editor) engine for Zig projects that provides:
- Romaji to hiragana/katakana conversion
eiennni → えいえんに
- Full-width character conversion
abc123 → abc123
- Dictionary-based word conversion
かんじ → 漢字
- Built-in cursor and buffer management
Based on Google 日本語入力 behavior.
On the **terminal** with libvaxis
[View repository](https://github.com/egegungordu/ja-ime-terminal-demo)
On the **web** with webassembly
[Online demo](https://jaime-wasm.pages.dev/)
## Zig Version
The minimum Zig version required is 0.13.0.
## Licensing Information
This project includes the **IPADIC dictionary**, which is provided under the license terms stated in the accompanying `COPYING` file. The IPADIC license imposes additional restrictions and requirements on its usage and redistribution. If your application cannot comply with the terms of the IPADIC license, consider using the `ime_core` module with a custom dictionary implementation instead.
## Integrating jaime into your Zig Project
You can add jaime as a dependency in your `build.zig.zon` file in two ways:
### Development Version
```bash
# Get the latest development version from main branch
zig fetch --save git+https://github.com/egegungordu/jaime
```### Release Version
```bash
# Get a specific release version (replace x.y.z with desired version)
zig fetch --save https://github.com/egegungordu/jaime/archive/refs/tags/vx.y.z.tar.gz
```Then instantiate the dependency in your `build.zig`:
```zig
const jaime = b.dependency("jaime", .{});
exe.root_module.addImport("kana", jaime.module("kana")); // For simple kana conversion
exe.root_module.addImport("ime_core", jaime.module("ime_core")); // For IME without dictionary
exe.root_module.addImport("ime_ipadic", jaime.module("ime_ipadic")); // For IME with IPADIC dictionary
```## Usage
The library provides three modules for different use cases:
### 1. Kana Module - Simple Conversions
For simple romaji to hiragana conversions without IME functionality:
```zig
const kana = @import("kana");// Using a provided buffer (no allocations)
var buf: [100]u8 = undefined;
const result = try kana.convertBuf(&buf, "konnnichiha");
try std.testing.expectEqualStrings("こんにちは", result);// Using an allocator (returns owned slice)
const result2 = try kana.convert(allocator, "konnnichiha");
defer allocator.free(result2);
try std.testing.expectEqualStrings("こんにちは", result2);
```### 2. IME IPADIC Module - Full Featured IME
For applications that want to use the full-featured IME with the IPADIC dictionary:
```zig
const ime_ipadic = @import("ime_ipadic");// Using owned buffer (with allocator)
var ime = ime_ipadic.Ime(.owned).init(allocator);
defer ime.deinit();// Using borrowed buffer (fixed size, no allocations)
var buf: [100]u8 = undefined;
var ime = ime_ipadic.Ime(.borrowed).init(&buf);// Common IME operations
const result = try ime.insert("k");
const result2 = try ime.insert("o");
const result3 = try ime.insert("n");
try std.testing.expectEqualStrings("こん", ime.input.buf.items());// Dictionary Matches
if (ime.getMatches()) |matches| {
// Get suggested conversions from the dictionary
// Returns []WordEntry containing possible word matches
}
try ime.applyMatch(); // Apply the best dictionary match to the current input// Cursor Movement and Editing
ime.moveCursorBack(1); // Move cursor left n positions
ime.moveCursorForward(1);// Move cursor right n positions
try ime.insert("y"); // Insert at cursor position
ime.clear(); // Clear the input buffer
try ime.deleteBack(); // Delete one character before cursor
try ime.deleteForward(); // Delete one character after cursor
```> [!WARNING]
> The IPADIC dictionary is subject to its own license terms. If you need to use a different dictionary or want to avoid IPADIC's license requirements, use the `ime_core` module with your own dictionary implementation.### 3. IME Core Module - Custom Dictionary
For applications that want to use IME functionality with their own dictionary implementation:
```zig
const ime_core = @import("ime_core");// Create your own dictionary loader that implements the required interface
const MyDictLoader = struct {
pub fn loadDictionary(allocator: std.mem.Allocator) !Dictionary {
// Your dictionary loading logic here
}pub fn freeDictionary(dict: *Dictionary) void {
// Your dictionary cleanup logic here
}
};// Use the IME with your custom dictionary
var ime = ime_core.Ime(MyDictLoader).init(allocator);
defer ime.deinit();
```## WebAssembly Bindings
For web applications, you can build the WebAssembly bindings:
```bash
# Build the WebAssembly library
zig build
```The WebAssembly library uses the IPADIC dictionary by default. For a complete example of how to use the WebAssembly bindings in a web application, check out the [web example](examples/web/index.js).
The WebAssembly library provides the following functions:
```javascript
// Initialize the IME
init();// Get pointer to input buffer for writing input text
getInputBufferPointer();// Insert text at current position
// length: number of bytes to read from input buffer
insert(length);// Get information about the last insertion
getDeletedCodepoints(); // Number of codepoints deleted
getInsertedTextLength(); // Length of inserted text in bytes
getInsertedTextPointer(); // Pointer to inserted text// Cursor movement and editing
deleteBack(); // Delete character before cursor
deleteForward(); // Delete character after cursor
moveCursorBack(n); // Move cursor back n positions
moveCursorForward(n); // Move cursor forward n positions
```Example usage in JavaScript:
```javascript
// Initialize
init();// Get input buffer
const inputPtr = getInputBufferPointer();
const inputBuffer = new Uint8Array(memory.buffer, inputPtr, 64);// Write and insert characters one by one
const text = "ka";
for (const char of text) {
// Write single character to buffer
const bytes = new TextEncoder().encode(char);
inputBuffer.set(bytes);// Insert and get result
insert(bytes.length);// Get the inserted text
const insertedLength = getInsertedTextLength();
const insertedPtr = getInsertedTextPointer();
const insertedText = new TextDecoder().decode(
new Uint8Array(memory.buffer, insertedPtr, insertedLength)
);// Check if any characters were deleted
const deletedCount = getDeletedCodepoints();console.log({
inserted: insertedText,
deleted: deletedCount,
});
}
// Final result is "か"
```## Testing
To run the test suite:
```bash
zig build test --summary all
```## Features
- Romaji to hiragana/full-width character conversion based on Google 日本語入力 mapping
- Basic hiragana (あ、い、う、え、お、か、き、く...)
- a -> あ
- ka -> か
- Small hiragana (ゃ、ゅ、ょ...)
- xya -> や
- li -> ぃ
- Sokuon (っ)
- tte -> って
- Full-width characters
- k -> k
- 1 -> 1
- Punctuation
- . -> 。
- ? -> ?
- [ -> 「## Contributing
Contributions are welcome! Please feel free to open an issue or submit a Pull Request.
## Acknowledgments
- Based on [Google 日本語入力](https://www.google.co.jp/ime/) transliteration mappings
- [mozc](https://github.com/google/mozc) - Google's open source Japanese Input Method Editor, which provided valuable insights for IME implementation
- The following projects were used as a reference for the codebase structure:
- [chipz - 8-bit emulator in zig](https://github.com/floooh/chipz)
- [zg - Unicode text processing for zig](https://codeberg.org/atman/zg)## Further Reading & References
For those interested in the data structures and algorithms used in this project, or looking to implement similar functionality, the following academic papers provide excellent background:
- [Efficient dictionary and language model compression for input method editors](https://aclanthology.org/W11-3503/) - Describes techniques for compressing IME dictionaries while maintaining fast lookup times
- [Space-efficient static trees and graphs](https://doi.org/10.1109/SFCS.1989.63533) - Introduces fundamental succinct data structure techniques that enable near-optimal space usage while supporting fast operations
- [最小コスト法に基づく形態素解析における CPU キャッシュの効率化](https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/C2-4.pdf) - Discusses CPU cache optimization techniques for morphological analysis using minimum-cost methods
- [Vibrato](https://github.com/daac-tools/vibrato) - Viterbi-based accelerated tokenizer
- [Vaporetto: 点予測法に基づく高速な日本語トークナイザ](https://www.anlp.jp/proceedings/annual_meeting/2022/pdf_dir/D2-5.pdf) - Presents a fast tokenization approach using linear classification and point prediction methods and three novel score preprocessing techniques
- [Vaporetto](https://github.com/daac-tools/vaporetto?tab=readme-ov-file) - Implementation of the pointwise prediction tokenizer described in the paper above