https://github.com/cldellow/segmenter
Segment short strings into words.
https://github.com/cldellow/segmenter
nlp
Last synced: 9 months ago
JSON representation
Segment short strings into words.
- Host: GitHub
- URL: https://github.com/cldellow/segmenter
- Owner: cldellow
- License: apache-2.0
- Created: 2019-01-12T21:28:49.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-10T04:50:19.000Z (over 7 years ago)
- Last Synced: 2025-03-29T12:13:21.044Z (over 1 year ago)
- Topics: nlp
- Language: Java
- Size: 14.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# segmenter
[](https://travis-ci.org/cldellow/segmenter)
[](https://codecov.io/gh/cldellow/segmenter)
[](https://mvnrepository.com/artifact/com.cldellow/segmenter)
Segment short strings into words.
## Usage
The easiest way to get started is to create a map of word
probabilities:
```
HashMap probabilities = new HashMap();
probabilities.put("eats", 0.2);
probabilities.put("at", 0.2);
probabilities.put("eat", 0.1);
probabilities.put("sat", 0.1);
Segmenter segmenter = new Segmenter(probabilities);
Result result = segmenter.segment("eatsat", 2, 2, 0);
result.getPhrase(0); // "eats at"
result.getPhrase(1); // "eat sat"
```
Under the covers, the `Segmenter` converts the map into a trie. The
construction step is slow, so you can also pass a constructed trie
(perhaps deserialized from a previous construction) to speed up
that step.
The `Segmenter` class is thread-safe.