Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/azu/kuromojin
Provide a high-level wrapper for kuromoji.js. Cache/Promise API
https://github.com/azu/kuromojin
japanese javascript kuromoji promise
Last synced: 3 days ago
JSON representation
Provide a high-level wrapper for kuromoji.js. Cache/Promise API
- Host: GitHub
- URL: https://github.com/azu/kuromojin
- Owner: azu
- License: mit
- Created: 2015-11-13T02:52:23.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2024-01-06T04:07:34.000Z (about 1 year ago)
- Last Synced: 2025-01-13T15:12:59.265Z (11 days ago)
- Topics: japanese, javascript, kuromoji, promise
- Language: CSS
- Homepage: https://kuromojin.netlify.app/
- Size: 14.6 MB
- Stars: 89
- Watchers: 5
- Forks: 10
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# kuromojin [![Actions Status: test](https://github.com/azu/kuromojin/workflows/test/badge.svg)](https://github.com/azu/kuromojin/actions?query=workflow%3A"test")
Provide a high level wrapper for [kuromoji.js](https://github.com/takuyaa/kuromoji.js "kuromoji.js").
## Features
- Promise based API
- Cache Layer
- Fetch the dictionary at once
- Return same tokens for same text## Installation
npm install kuromojin
## Online Playground
📝 Require [DecompressionStream](https://developer.mozilla.org/ja/docs/Web/API/DecompressionStream) supported browser
-
## Usage
Export two API.
- `getTokenizer()` return `Promise` that is resolved with kuromoji.js's `tokenizer` instance.
- `tokenize()` return `Promise` that is resolved with analyzed tokens.```js
import {tokenize, getTokenizer} from "kuromojin";getTokenizer().then(tokenizer => {
// kuromoji.js's `tokenizer` instance
});tokenize(text).then(tokens => {
console.log(tokens)
/*
[ {
word_id: 509800, // 辞書内での単語ID
word_type: 'KNOWN', // 単語タイプ(辞書に登録されている単語ならKNOWN, 未知語ならUNKNOWN)
word_position: 1, // 単語の開始位置
surface_form: '黒文字', // 表層形
pos: '名詞', // 品詞
pos_detail_1: '一般', // 品詞細分類1
pos_detail_2: '*', // 品詞細分類2
pos_detail_3: '*', // 品詞細分類3
conjugated_type: '*', // 活用型
conjugated_form: '*', // 活用形
basic_form: '黒文字', // 基本形
reading: 'クロモジ', // 読み
pronunciation: 'クロモジ' // 発音
} ]
*/
});
```### For browser/global options
If `window.kuromojin.dicPath` is defined, kuromojin use it as default dict path.
```js
import {getTokenizer} from "kuromojin";
// Affect all module that are used kuromojin.
window.kuromojin = {
dicPath: "https://cdn.jsdelivr.net/npm/[email protected]/dict"
};
// this `getTokenizer` function use "https://kuromojin.netlify.com/dict"
getTokenizer();
// ===
getTokenizer({dicPath: "https://cdn.jsdelivr.net/npm/[email protected]/dict"})
```:memo: Test dictionary URL
- "https://cdn.jsdelivr.net/npm/[email protected]/dict"
- cdn dict for kuromoji.js
- https://kuromojin.netlify.com/dict/*.dat.gz
- example: https://kuromojin.netlify.com/dict/base.dat.gz### Note: backward compatibility for <= 1.1.0
kuromojin v1.1.0 export `tokenize` as default function.
kuromojin v2.0.0 remove the default function.
```js
import kuromojin from "kuromojin";
// kuromojin === tokenize
```Recommended: use `import {tokenize} from "kuromojin"` instead of it
```js
import {tokenize} from "kuromojin";
```### Note: kuromoji version is pinned
kuromojin pin kuromoji's version.
It aim to dedupe kuromoji's dictionary.
The dictionary is large and avoid to duplicated dictionary.## Related
- [azu/morpheme-match: match function that match token(形態素解析) with sentence.](https://github.com/azu/morpheme-match/tree/master)
## Tests
npm test
## Contributing
1. Fork it!
2. Create your feature branch: `git checkout -b my-new-feature`
3. Commit your changes: `git commit -am 'Add some feature'`
4. Push to the branch: `git push origin my-new-feature`
5. Submit a pull request :D## License
MIT