Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andelf/rust-darts
Double Array Trie in Rust
https://github.com/andelf/rust-darts
data-structures
Last synced: 3 months ago
JSON representation
Double Array Trie in Rust
- Host: GitHub
- URL: https://github.com/andelf/rust-darts
- Owner: andelf
- License: mit
- Created: 2016-09-10T04:25:44.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2019-07-07T15:06:42.000Z (over 5 years ago)
- Last Synced: 2024-10-27T01:51:23.007Z (3 months ago)
- Topics: data-structures
- Language: Rust
- Homepage:
- Size: 18.2 MB
- Stars: 51
- Watchers: 6
- Forks: 7
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# rust-darts: Double-Array Trie Rust implementation.
This library is in alpha state, PRs are welcomed. An optional Forward Maximum Matching Searcher is provided when enabled by features.
[![Build Status](https://travis-ci.org/andelf/rust-darts.svg?branch=master)](https://travis-ci.org/andelf/rust-darts)
[![codecov](https://codecov.io/gh/andelf/rust-darts/branch/master/graph/badge.svg)](https://codecov.io/gh/andelf/rust-darts)
[![Crates.io](https://img.shields.io/crates/v/darts.svg)](https://crates.io/crates/darts)
[![docs.rs](https://docs.rs/darts/badge.svg)](https://docs.rs/darts/)## Installation
Add it to your `Cargo.toml`:
```toml
[dependencies]
darts = "0.1"
```then you are good to go. If you are using Rust 2015 you have to `extern crate darts` to your crate root as well.
## Example
```rust
use std::fs::File;
use darts::DoubleArrayTrie;fn main() {
let mut f = File::open("./priv/dict.big.bincode").unwrap();
let da = DoubleArrayTrie::load(&mut f).unwrap();
let string = "中华人民共和国";
let prefixes = da.common_prefix_search(string).map(|matches| {
matches
.iter()
.map(|(end_idx, v)| {
&string[..end_idx]
})
.collect();
}).unwrap_or(vec![]);
assert_eq!(vec!["中", "中华", "中华人民", "中华人民共和国"], prefixes);
}
``````rust
use std::fs::File;
use darts::DoubleArrayTrie;fn main() {
let mut f = File::open("./priv/dict.big.bincode").unwrap();
let da = DoubleArrayTrie::load(&mut f).unwrap();
assert!(da.exact_match_search("东湖高新技术开发区").is_some());
}
```## Enabling Additional Features
* `searcher` feature enables searcher for maximum forward matcher
* `serialization` feature enables saving and loading serialized `DoubleArrayTrie` data```toml
[dependencies]
darts = { version = "0.1", features = ["searcher", "serialization"] }
```## To Rebuild Dictionary
```bash
# It would take minutes, be patient.
time cargo test --all-features -- --nocapture --ignored test_dat_basic
```## To run benchmark tests
```bash
cargo bench --all-features
```## License
This work is released under the MIT license. A copy of the license is provided in the LICENSE file.
## Reference
- [hankcs/HanLP](https://github.com/hankcs/HanLP)
- [Aho Corasick自动机结合DoubleArrayTrie极速多模式匹配](http://www.hankcs.com/program/algorithm/aho-corasick-double-array-trie.html)
- [DoubleArrayTrie和AhoCorasickDoubleArrayTrie的实用性对比](http://www.hankcs.com/program/algorithm/double-array-trie-vs-aho-corasick-double-array-trie.html)
- [Darts: Double-Array Trie System](http://chasen.org/~taku/software/darts/)
- [An Implementation of Double-Array Trie](https://linux.thai.net/~thep/datrie/datrie.html)