https://github.com/louismullie/scalpel
A fast and accurate rule-based sentence segmentation tool for Ruby.
https://github.com/louismullie/scalpel
Last synced: 7 months ago
JSON representation
A fast and accurate rule-based sentence segmentation tool for Ruby.
- Host: GitHub
- URL: https://github.com/louismullie/scalpel
- Owner: louismullie
- License: other
- Created: 2012-08-15T05:14:20.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2015-12-22T04:30:05.000Z (almost 10 years ago)
- Last Synced: 2025-03-26T09:51:15.840Z (8 months ago)
- Language: Ruby
- Size: 6.84 KB
- Stars: 51
- Watchers: 7
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- nlp-with-ruby - scapel - (NLP Pipeline Subtasks / Segmentation)
README
[](http://travis-ci.org/#!/louismullie/scalpel)
**About**
Scalpel is the result of my inability to find a simple and elegant solution to sentence segmentation in Ruby. Machine learning approaches - both unsupervised ([punkt-segmenter](https://github.com/lfcipriani/punkt-segmenter)) and supervised ( [tactful_tokenizer](https://github.com/SlyShy/Tactful_Tokenizer)) - depend on proper domain-specific training to work well. Stanford's tokenize-first group-later method ([stanford-core-nlp](https://github.com/louismullie/stanford-core-nlp)) does not work so well in the face of ill-formatted content. Finally, extensive rule-based methods ([srx-english](https://github.com/apohllo/srx-english)) are very accurate but suffer from poor performance.
Scalpel is based on a very simple principle that reduces the complexity of performing sentence segmentation. The idea is that it is simpler and more efficient to find occurrences of periods that do __not__ indicate the end of a sentence, rather than those who do. These occurrences are temporarily replaced by "placeholder" characters, and sentence splitting is subsequently performed. The placeholder characters are then replaced by the original characters.
**Usage**
gem install scalpel
```ruby
require 'scalpel'
Scalpel.cut("some text")
```
**Contributing**
Feel free to fork the project and send me a pull request!