{"id":13806224,"url":"https://github.com/louismullie/open-nlp","last_synced_at":"2025-05-08T21:23:48.585Z","repository":{"id":6010704,"uuid":"7233981","full_name":"louismullie/open-nlp","owner":"louismullie","description":"Ruby bindings to the OpenNLP Java toolkit.","archived":false,"fork":false,"pushed_at":"2014-10-01T02:33:34.000Z","size":1180,"stargazers_count":91,"open_issues_count":2,"forks_count":11,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-04-11T00:04:04.037Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/louismullie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-12-19T02:44:58.000Z","updated_at":"2023-06-21T06:03:41.000Z","dependencies_parsed_at":"2022-09-12T09:51:48.590Z","dependency_job_id":null,"html_url":"https://github.com/louismullie/open-nlp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fopen-nlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fopen-nlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fopen-nlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louismullie%2Fopen-nlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/louismullie","download_url":"https://codeload.github.com/louismullie/open-nlp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238044094,"owners_count":19407128,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T01:01:09.081Z","updated_at":"2025-02-10T02:09:23.634Z","avatar_url":"https://github.com/louismullie.png","language":"Ruby","funding_links":[],"categories":["NLP Pipeline Subtasks","Ruby"],"sub_categories":["Multipurpose Engines"],"readme":"[![Build Status](https://secure.travis-ci.org/louismullie/open-nlp.png)](http://travis-ci.org/louismullie/open-nlp)\n\n###About\n\nThis library provides high-level Ruby bindings to the Open NLP package, a Java machine learning toolkit for natural language processing (NLP). This gem is compatible with Ruby 1.9.2 and 1.9.3 as well as JRuby 1.7.1. It is tested on both Java 6 and Java 7.\n\n###Installing\n\nFirst, install the gem: `gem install open-nlp`. Then, download [the JARs and English language models](http://louismullie.com/treat/open-nlp-english.zip) in one package (80 MB).\n\nPlace the contents of the extracted archive inside the /bin/ folder of the `open-nlp` gem (e.g. [...]/gems/open-nlp-0.x.x/bin/).\n\nAlternatively, from a terminal window, `cd` to the gem's folder and run:\n\n```\nwget http://www.louismullie.com/treat/open-nlp-english.zip\nunzip -o open-nlp-english.zip -d bin/\n```\n\nAfterwards, you may individually download the appropriate models for other languages from the [open-nlp website](http://opennlp.sourceforge.net/models-1.5/).\n\n###Configuring\n\nAfter installing and requiring the gem (`require 'open-nlp'`), you may want to set some of the following configuration options.\n\n```ruby\n# Set an alternative path to look for the JAR files.\n# Default is gem's bin folder.\nOpenNLP.jar_path = '/path_to_jars/'\n\n# Set an alternative path to look for the model files.\n# Default is gem's bin folder.\nOpenNLP.model_path = '/path_to_models/'\n\n# Pass some alternative arguments to the Java VM.\n# Default is ['-Xms512M', '-Xmx1024M'].\nOpenNLP.jvm_args = ['-option1', '-option2']\n\n# Redirect VM output to log.txt\nOpenNLP.log_file = 'log.txt'\n\n# Set default models for a language.\nOpenNLP.use :language\n```\n\n###Examples\n\n\n**Simple tokenizer**\n\n```ruby\nOpenNLP.load\n\nsent = \"The death of the poet was kept from his poems.\"\ntokenizer = OpenNLP::SimpleTokenizer.new\n\ntokens = tokenizer.tokenize(sent).to_a\n# =\u003e %w[The death of the poet was kept from his poems .]\n```\n\n**Maximum entropy tokenizer, chunker and POS tagger**\n\n```ruby\n\nOpenNLP.load\n\nchunker   = OpenNLP::ChunkerME.new\ntokenizer = OpenNLP::TokenizerME.new\ntagger    = OpenNLP::POSTaggerME.new\n\nsent   = \"The death of the poet was kept from his poems.\"\n\ntokens = tokenizer.tokenize(sent).to_a\n# =\u003e %w[The death of the poet was kept from his poems .]\n\ntags   = tagger.tag(tokens).to_a\n# =\u003e %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]\n\nchunks = chunker.chunk(tokens, tags).to_a\n# =\u003e %w[B-NP I-NP B-PP B-NP I-NP B-VP I-VP B-PP B-NP I-NP O]\n```\n\n**Abstract Bottom-Up Parser**\n\n```ruby\nOpenNLP.load\n\nsent      = \"The death of the poet was kept from his poems.\"\nparser = OpenNLP::Parser.new\nparse = parser.parse(sent)\n\nparse.get_text.should eql sent\n\nparse.get_span.get_start.should eql 0\nparse.get_span.get_end.should eql 46\nparse.get_child_count.should eql 1\n\nchild = parse.get_children[0]\n\nchild.text # =\u003e \"The death of the poet was kept from his poems.\"\nchild.get_child_count # =\u003e 3\nchild.get_head_index #=\u003e 5\nchild.get_type # =\u003e \"S\"\n```\n\n**Maximum Entropy Name Finder***\n\n```ruby\nOpenNLP.load\n\ntext = File.read('./spec/sample.txt').gsub!(\"\\n\", \"\")\n\ntokenizer   = OpenNLP::TokenizerME.new\nsegmenter   = OpenNLP::SentenceDetectorME.new\nner_models  = ['person', 'time', 'money']\n\nner_finders = ner_models.map do |model|\n  OpenNLP::NameFinderME.new(\"en-ner-#{model}.bin\")\nend\n\nsentences = segmenter.sent_detect(text)\nnamed_entities = []\n\nsentences.each do |sentence|\n\n  tokens = tokenizer.tokenize(sentence)\n  \n  ner_models.each_with_index do |model,i|\n    finder = ner_finders[i]\n    name_spans = finder.find(tokens)\n    name_probs = finder.probs()\n    name_spans.each_with_index do |name_span,j|\n      start = name_span.get_start\n      stop  = name_span.get_end-1\n      slice = tokens[start..stop].to_a\n      prob  = name_probs[j]\n      named_entities \u003c\u003c [slice, model, prob]\n    end\n  end\n\nend\n```\n\n**Loading specific models**\n\nJust pass the name of the model file to the constructor. The gem will search for the file in the `OpenNLP.model_path` folder.\n\n```ruby\nOpenNLP.load\n\ntokenizer = OpenNLP::TokenizerME.new('en-token.bin')\ntagger = OpenNLP::POSTaggerME.new('en-pos-perceptron.bin')\nname_finder = OpenNLP::NameFinderME.new('en-ner-person.bin')\n# etc.\n```\n\n**Loading specific classes**\n\nYou may want to load specific classes from the OpenNLP library that are not loaded by default. The gem provides an API to do this:\n\n```ruby\n# Default base class is opennlp.tools.\nOpenNLP.load_class('SomeClassName')  \n# =\u003e OpenNLP::SomeClassName\n\n# Here, we specify another base class.\nOpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')\n# =\u003e OpenNLP::SomeOtherClass\n```\n\n**Contributing**\n\nFork the project and send me a pull request! Config updates for other languages are welcome.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flouismullie%2Fopen-nlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flouismullie%2Fopen-nlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flouismullie%2Fopen-nlp/lists"}