Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zencephalon/tactful_tokenizer
Accurate Bayesian sentence tokenizer in Ruby.
https://github.com/zencephalon/tactful_tokenizer
nlp ruby rubynlp
Last synced: 1 day ago
JSON representation
Accurate Bayesian sentence tokenizer in Ruby.
- Host: GitHub
- URL: https://github.com/zencephalon/tactful_tokenizer
- Owner: zencephalon
- Created: 2010-03-10T03:17:37.000Z (over 14 years ago)
- Default Branch: release
- Last Pushed: 2014-04-30T13:40:53.000Z (over 10 years ago)
- Last Synced: 2024-10-31T14:46:41.097Z (17 days ago)
- Topics: nlp, ruby, rubynlp
- Language: Ruby
- Homepage:
- Size: 25.4 MB
- Stars: 80
- Watchers: 5
- Forks: 13
- Open Issues: 0
-
Metadata Files:
- Readme: README.rdoc
Awesome Lists containing this project
README
= TactfulTokenizer
{}[http://badge.fury.io/rb/tactful_tokenizer]
{}[https://travis-ci.org/zencephalon/Tactful_Tokenizer]
{}[https://codeclimate.com/github/zencephalon/Tactful_Tokenizer]
{}[https://coveralls.io/r/zencephalon/Tactful_Tokenizer?branch=release]TactfulTokenizer is a Ruby library for high quality sentence
tokenization. It uses a Naive Bayesian statistical model, and
is based on Splitta[http://code.google.com/p/splitta/], but
has support for '?' and '!' as well as primitive handling of
XHTML markup. Better support for XHTML parsing is coming shortly.Additionally supports unicode text tokenization.
== Usage
require "tactful_tokenizer"
m = TactfulTokenizer::Model.new
m.tokenize_text("Here in the U.S. Senate we prefer to eat our friends. Is it easier that way? Yes. Maybe!")
#=> ["Here in the U.S. Senate we prefer to eat our friends.", "Is it easier that way?", "Yes.", "Maybe!"]The input text is expected to consist of paragraphs delimited
by line breaks.== Installation
gem install tactful_tokenizer== Author
Copyright (c) 2010 Matthew Bunday. All rights reserved.
Released under the {GNU GPL v3}[http://www.gnu.org/licenses/gpl.html].