Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/messense/jieba-rs
The Jieba Chinese Word Segmentation Implemented in Rust
https://github.com/messense/jieba-rs
chinese-word-segmentation jieba jieba-chinese nlp wasm
Last synced: 28 days ago
JSON representation
The Jieba Chinese Word Segmentation Implemented in Rust
- Host: GitHub
- URL: https://github.com/messense/jieba-rs
- Owner: messense
- License: mit
- Created: 2018-05-06T09:41:22.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-04-22T06:38:43.000Z (7 months ago)
- Last Synced: 2024-05-02T00:55:18.598Z (7 months ago)
- Topics: chinese-word-segmentation, jieba, jieba-chinese, nlp, wasm
- Language: Rust
- Size: 2.86 MB
- Stars: 686
- Watchers: 13
- Forks: 40
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# jieba-rs
[![GitHub Actions](https://github.com/messense/jieba-rs/workflows/CI/badge.svg)](https://github.com/messense/jieba-rs/actions?query=workflow%3ACI)
[![codecov](https://codecov.io/gh/messense/jieba-rs/branch/master/graph/badge.svg)](https://codecov.io/gh/messense/jieba-rs)
[![Crates.io](https://img.shields.io/crates/v/jieba-rs.svg)](https://crates.io/crates/jieba-rs)
[![docs.rs](https://docs.rs/jieba-rs/badge.svg)](https://docs.rs/jieba-rs/)> 🚀 Help me to become a full-time open-source developer by [sponsoring me on GitHub](https://github.com/sponsors/messense)
The Jieba Chinese Word Segmentation Implemented in Rust
## Installation
Add it to your ``Cargo.toml``:
```toml
[dependencies]
jieba-rs = "0.7"
```then you are good to go. If you are using Rust 2015 you have to ``extern crate jieba_rs`` to your crate root as well.
## Example
```rust
use jieba_rs::Jieba;fn main() {
let jieba = Jieba::new();
let words = jieba.cut("我们中出了一个叛徒", false);
assert_eq!(words, vec!["我们", "中", "出", "了", "一个", "叛徒"]);
}
```## Enabling Additional Features
* `default-dict` feature enables embedded dictionary, this features is enabled by default
* `tfidf` feature enables TF-IDF keywords extractor
* `textrank` feature enables TextRank keywords extractor```toml
[dependencies]
jieba-rs = { version = "0.7", features = ["tfidf", "textrank"] }
```## Run benchmark
```bash
cargo bench --all-features
```## Benchmark: Compare with cppjieba
* [Optimizing jieba-rs to be 33% faster than cppjieba](https://blog.paulme.ng/posts/2019-06-30-optimizing-jieba-rs-to-be-33percents-faster-than-cppjieba.html)
* [优化 jieba-rs 中文分词性能评测](https://blog.paulme.ng/posts/2019-07-01-%E4%BC%98%E5%8C%96-jieba-rs-%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D-%E6%80%A7%E8%83%BD%E8%AF%84%E6%B5%8B%EF%BC%88%E5%BF%AB%E4%BA%8E-cppjieba-33percent%29.html)
* [最佳化 jieba-rs 中文斷詞性能測試](https://blog.paulme.ng/posts/2019-07-01-%E6%9C%80%E4%BD%B3%E5%8C%96jieba-rs%E4%B8%AD%E6%96%87%E6%96%B7%E8%A9%9E%E6%80%A7%E8%83%BD%E6%B8%AC%E8%A9%A6%28%E5%BF%AB%E4%BA%8Ecppjieba-33%25%29.html)## `jieba-rs` bindings
* [`@node-rs/jieba` NodeJS binding](https://github.com/napi-rs/node-rs/tree/main/packages/jieba)
* [`jieba-php` PHP binding](https://github.com/binaryoung/jieba-php)
* [`rjieba-py` Python binding](https://github.com/messense/rjieba-py)
* [`cang-jie` Chinese tokenizer for tantivy](https://github.com/DCjanus/cang-jie)
* [`tantivy-jieba` An adapter that bridges between tantivy and jieba-rs](https://github.com/jiegec/tantivy-jieba)
* [`jieba-wasm` the WebAssembly binding](https://github.com/fengkx/jieba-wasm)## License
This work is released under the MIT license. A copy of the license is provided in the [LICENSE](./LICENSE) file.