{"id":20574906,"url":"https://github.com/zencephalon/tactful_tokenizer","last_synced_at":"2025-08-20T02:32:35.819Z","repository":{"id":835428,"uuid":"555329","full_name":"zencephalon/Tactful_Tokenizer","owner":"zencephalon","description":"Accurate Bayesian sentence tokenizer in Ruby.","archived":false,"fork":false,"pushed_at":"2014-04-30T13:40:53.000Z","size":26642,"stargazers_count":80,"open_issues_count":0,"forks_count":13,"subscribers_count":4,"default_branch":"release","last_synced_at":"2025-08-09T21:30:29.575Z","etag":null,"topics":["nlp","ruby","rubynlp"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"typus/typus","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zencephalon.png","metadata":{"files":{"readme":"README.rdoc","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2010-03-10T03:17:37.000Z","updated_at":"2024-05-08T10:27:41.000Z","dependencies_parsed_at":"2022-08-16T11:05:18.932Z","dependency_job_id":null,"html_url":"https://github.com/zencephalon/Tactful_Tokenizer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zencephalon/Tactful_Tokenizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zencephalon%2FTactful_Tokenizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zencephalon%2FTactful_Tokenizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zencephalon%2FTactful_Tokenizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zencephalon%2FTactful_Tokenizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zencephalon","download_url":"https://codeload.github.com/zencephalon/Tactful_Tokenizer/tar.gz/refs/heads/release","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zencephalon%2FTactful_Tokenizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271254649,"owners_count":24727382,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","ruby","rubynlp"],"created_at":"2024-11-16T05:37:31.650Z","updated_at":"2025-08-20T02:32:35.147Z","avatar_url":"https://github.com/zencephalon.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"= TactfulTokenizer\n\n{\u003cimg src=\"https://badge.fury.io/rb/tactful_tokenizer.png\" alt=\"Gem Version\" /\u003e}[http://badge.fury.io/rb/tactful_tokenizer]\n{\u003cimg src=\"https://travis-ci.org/zencephalon/Tactful_Tokenizer.png?branch=release\" alt=\"Build Status\" /\u003e}[https://travis-ci.org/zencephalon/Tactful_Tokenizer]\n{\u003cimg src=\"https://codeclimate.com/github/zencephalon/Tactful_Tokenizer.png\" /\u003e}[https://codeclimate.com/github/zencephalon/Tactful_Tokenizer]\n{\u003cimg src=\"https://coveralls.io/repos/zencephalon/Tactful_Tokenizer/badge.png?branch=release\" alt=\"Coverage Status\" /\u003e}[https://coveralls.io/r/zencephalon/Tactful_Tokenizer?branch=release]\n\nTactfulTokenizer is a Ruby library for high quality sentence\ntokenization. It uses a Naive Bayesian statistical model, and\nis based on Splitta[http://code.google.com/p/splitta/], but \nhas support for '?' and '!' as well as primitive handling of \nXHTML markup. Better support for XHTML parsing is coming shortly.\n\nAdditionally supports unicode text tokenization.\n\n== Usage\n\n require \"tactful_tokenizer\"\n m = TactfulTokenizer::Model.new\n m.tokenize_text(\"Here in the U.S. Senate we prefer to eat our friends. Is it easier that way? \u003cem\u003eYes.\u003c/em\u003e \u003cem\u003eMaybe\u003c/em\u003e!\")\n #=\u003e [\"Here in the U.S. Senate we prefer to eat our friends.\", \"Is it easier that way?\", \"\u003cem\u003eYes.\u003c/em\u003e\", \"\u003cem\u003eMaybe\u003c/em\u003e!\"]\n\nThe input text is expected to consist of paragraphs delimited\nby line breaks.\n\n== Installation\n  gem install tactful_tokenizer\n\n== Author\n\nCopyright (c) 2010 Matthew Bunday. All rights reserved.\nReleased under the {GNU GPL v3}[http://www.gnu.org/licenses/gpl.html].\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzencephalon%2Ftactful_tokenizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzencephalon%2Ftactful_tokenizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzencephalon%2Ftactful_tokenizer/lists"}