{"id":17044853,"url":"https://github.com/louismullie/scalpel","last_synced_at":"2025-04-12T15:22:01.977Z","repository":{"id":4291961,"uuid":"5422326","full_name":"louismullie/scalpel","owner":"louismullie","description":"A fast and accurate rule-based sentence segmentation tool for Ruby. ","archived":false,"fork":false,"pushed_at":"2015-12-22T04:30:05.000Z","size":7,"stargazers_count":51,"open_issues_count":2,"forks_count":5,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-26T09:51:15.840Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/louismullie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-08-15T05:14:20.000Z","updated_at":"2024-06-30T00:07:15.000Z","dependencies_parsed_at":"2022-07-21T12:18:06.479Z","dependency_job_id":null,"html_url":"https://github.com/louismullie/scalpel","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fscalpel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fscalpel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fscalpel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fscalpel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/louismullie","download_url":"https://codeload.github.com/louismullie/scalpel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247947859,"owners_count":21023066,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T09:35:42.498Z","updated_at":"2025-04-12T15:22:01.944Z","avatar_url":"https://github.com/louismullie.png","language":"Ruby","funding_links":[],"categories":["NLP Pipeline Subtasks"],"sub_categories":["Segmentation"],"readme":"[![Build Status](https://secure.travis-ci.org/louismullie/scalpel.png)](http://travis-ci.org/#!/louismullie/scalpel)\n\n**About**\n\nScalpel is the result of my inability to find a simple and elegant solution to sentence segmentation in Ruby. Machine learning approaches - both unsupervised ([punkt-segmenter](https://github.com/lfcipriani/punkt-segmenter)) and supervised ( [tactful_tokenizer](https://github.com/SlyShy/Tactful_Tokenizer)) - depend on proper domain-specific training to work well. Stanford's tokenize-first group-later method ([stanford-core-nlp](https://github.com/louismullie/stanford-core-nlp)) does not work so well in the face of ill-formatted content. Finally, extensive rule-based methods ([srx-english](https://github.com/apohllo/srx-english)) are very accurate but suffer from poor performance. \n\nScalpel is based on a very simple principle that reduces the complexity of performing sentence segmentation. The idea is that it is simpler and more efficient to find occurrences of periods that do __not__ indicate the end of a sentence, rather than those who do. These occurrences are temporarily replaced by \"placeholder\" characters, and sentence splitting is subsequently performed. The placeholder characters are then replaced by the original characters.\n\n**Usage**\n\n    gem install scalpel\n\n```ruby\nrequire 'scalpel'\nScalpel.cut(\"some text\")\n```\n\n**Contributing**\n\nFeel free to fork the project and send me a pull request!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flouismullie%2Fscalpel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flouismullie%2Fscalpel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flouismullie%2Fscalpel/lists"}