Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/takuyaa/kuromoji.js
JavaScript implementation of Japanese morphological analyzer
https://github.com/takuyaa/kuromoji.js
Last synced: about 2 months ago
JSON representation
JavaScript implementation of Japanese morphological analyzer
- Host: GitHub
- URL: https://github.com/takuyaa/kuromoji.js
- Owner: takuyaa
- Created: 2014-12-04T08:29:12.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2023-11-12T07:55:12.000Z (11 months ago)
- Last Synced: 2024-07-18T13:21:27.243Z (2 months ago)
- Language: JavaScript
- Size: 43.8 MB
- Stars: 831
- Watchers: 21
- Forks: 117
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE-2.0.txt
Awesome Lists containing this project
README
kuromoji.js
===========[![Build Status](https://travis-ci.org/takuyaa/kuromoji.js.svg?branch=master)](https://travis-ci.org/takuyaa/kuromoji.js)
[![Coverage Status](https://coveralls.io/repos/github/takuyaa/kuromoji.js/badge.svg?branch=master)](https://coveralls.io/github/takuyaa/kuromoji.js?branch=master)
[![npm version](https://badge.fury.io/js/kuromoji.svg)](https://badge.fury.io/js/kuromoji)
[![dependencies](https://david-dm.org/takuyaa/kuromoji.js.svg)](https://david-dm.org/takuyaa/kuromoji.js)
[![Code Climate](https://codeclimate.com/github/takuyaa/kuromoji.js/badges/gpa.svg)](https://codeclimate.com/github/takuyaa/kuromoji.js)
[![Downloads](https://img.shields.io/npm/dm/kuromoji.svg)](https://www.npmjs.com/package/kuromoji)JavaScript implementation of Japanese morphological analyzer.
This is a pure JavaScript porting of [Kuromoji](https://www.atilika.com/ja/kuromoji/).You can see how kuromoji.js works in [demo site](https://takuyaa.github.io/kuromoji.js/demo/tokenize.html).
Directory
---------Directory tree is as follows:
build/
kuromoji.js -- JavaScript file for browser (Browserified)
demo/ -- Demo
dict/ -- Dictionaries for tokenizer (gzipped)
example/ -- Examples to use in Node.js
src/ -- JavaScript source
test/ -- Unit testUsage
-----You can tokenize sentences with only 5 lines of code.
If you need working examples, you can see the files under the demo or example directory.### Node.js
Install with npm package manager:
npm install kuromoji
Load this library as follows:
var kuromoji = require("kuromoji");
You can prepare tokenizer like this:
kuromoji.builder({ dicPath: "path/to/dictionary/dir/" }).build(function (err, tokenizer) {
// tokenizer is ready
var path = tokenizer.tokenize("すもももももももものうち");
console.log(path);
});### Browser
You only need the build/kuromoji.js and dict/*.dat.gz files
Install with Bower package manager:
bower install kuromoji
Or you can use the kuromoji.js file and dictionary files from the GitHub repository.
In your HTML:
In your JavaScript:
kuromoji.builder({ dicPath: "/url/to/dictionary/dir/" }).build(function (err, tokenizer) {
// tokenizer is ready
var path = tokenizer.tokenize("すもももももももものうち");
console.log(path);
});API
---The function tokenize() returns an JSON array like this:
[ {
word_id: 509800, // 辞書内での単語ID
word_type: 'KNOWN', // 単語タイプ(辞書に登録されている単語ならKNOWN, 未知語ならUNKNOWN)
word_position: 1, // 単語の開始位置
surface_form: '黒文字', // 表層形
pos: '名詞', // 品詞
pos_detail_1: '一般', // 品詞細分類1
pos_detail_2: '*', // 品詞細分類2
pos_detail_3: '*', // 品詞細分類3
conjugated_type: '*', // 活用型
conjugated_form: '*', // 活用形
basic_form: '黒文字', // 基本形
reading: 'クロモジ', // 読み
pronunciation: 'クロモジ' // 発音
} ](This is defined in src/util/IpadicFormatter.js)
See also [JSDoc page](https://takuyaa.github.io/kuromoji.js/jsdoc/) in details.