https://github.com/rakutentech/pisah
Sentence Splitter Library (C++ port of pySBD)
https://github.com/rakutentech/pisah
nlp nmt sentence-splitter
Last synced: 9 months ago
JSON representation
Sentence Splitter Library (C++ port of pySBD)
- Host: GitHub
- URL: https://github.com/rakutentech/pisah
- Owner: rakutentech
- License: mit
- Created: 2023-05-24T07:11:29.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-07T06:56:57.000Z (almost 3 years ago)
- Last Synced: 2025-09-01T10:14:09.024Z (10 months ago)
- Topics: nlp, nmt, sentence-splitter
- Language: C++
- Homepage:
- Size: 132 KB
- Stars: 5
- Watchers: 11
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# PISAH
Pisah (translates to "separate" in Malay) is a tool for splitting a natural language document into sentences.
Currently Pisah supports only English.
The library is mostly based on:
* Pragmatic Sentence Segmenter (Ruby) (https://github.com/diasks2/pragmatic_segmenter)
* PySBD (Python): (https://github.com/nipunsadvilkar/pySBD/)
#### Installation of PCRE library
* UBUNTU: Run `sudo apt-get install libpcre3 libpcre3-dev`
* MacOSX: Run `brew install pcre`
#### BUILD
```
mkdir build && cd build
cmake ..
make -j
```
#### TEST
```
echo "How are you, Mr. John? It has been so long since we talked." | ./pisah
```