{"id":21815542,"url":"https://github.com/arbox/treetagger-ruby","last_synced_at":"2025-07-29T23:14:48.260Z","repository":{"id":1966670,"uuid":"2897561","full_name":"arbox/treetagger-ruby","owner":"arbox","description":"The Ruby based wrapper for the TreeTagger by Helmut Schmid.","archived":false,"fork":false,"pushed_at":"2015-11-12T11:33:39.000Z","size":184,"stargazers_count":16,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-07-02T07:02:30.623Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arbox.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rdoc","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-12-02T09:50:08.000Z","updated_at":"2021-02-12T13:42:13.000Z","dependencies_parsed_at":"2022-09-09T10:41:18.443Z","dependency_job_id":null,"html_url":"https://github.com/arbox/treetagger-ruby","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arbox/treetagger-ruby","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Ftreetagger-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Ftreetagger-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Ftreetagger-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Ftreetagger-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arbox","download_url":"https://codeload.github.com/arbox/treetagger-ruby/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Ftreetagger-ruby/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267779064,"owners_count":24143176,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-27T15:19:40.456Z","updated_at":"2025-07-29T23:14:48.199Z","avatar_url":"https://github.com/arbox.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TreeTagger for Ruby\n\n[RubyGems](http://rubygems.org/gems/treetagger-ruby) | [RTT Project Page](http://bu.chsta.be/projects/treetagger-ruby/) |\n[Source Code](https://github.com/arbox/treetagger-ruby) | [Bug Tracker](https://github.com/arbox/treetagger-ruby/issues)\n\n[\u003cimg src=\"https://badge.fury.io/rb/treetagger-ruby.png\" alt=\"Gem Version\" /\u003e](http://badge.fury.io/rb/treetagger-ruby)\n[\u003cimg src=\"https://travis-ci.org/arbox/treetagger-ruby.png\" alt=\"Build Status\" /\u003e](https://travis-ci.org/arbox/treetagger-ruby)\n[\u003cimg src=\"https://codeclimate.com/github/arbox/treetagger-ruby.png\" alt=\"Code Climate\" /\u003e](https://codeclimate.com/github/arbox/treetagger-ruby)\n\n## DESCRIPTION\nA Ruby based wrapper for the TreeTagger by Helmut Schmid.\n\nCheck it out if you are interested in Natural Language Processing (NLP) and/or Human Language Technology (HLT).\n\nThis library provides comprehensive bindings for the\n[TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/),\na statistical language independed POS tagging and chunking software.\n\nTreeTagger is language agnostic, it will never guess what language you're going to use.\n\nThe tagger is described in the following two papers:\n\n* Helmut Schmid (1995): [Improvements in Part-of-Speech Tagging with an Application to German.](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger2.pdf) Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.\n\n* Helmut Schmid (1994): [Probabilistic Part-of-Speech Tagging Using Decision Trees.](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger1.pdf) Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.\n\n### INSTALLATION\nBefore you install the \u003ctt\u003etreetagger-ruby\u003c/tt\u003e package please ensure\nyou have downloaded and installed the\n[TreeTagger](http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/)\nitself.\n\nThe [TreeTagger](http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/)\nis a copyrighted software by Helmut Schmid and\n[IMS](http://www.ims.uni-stuttgart.de/), please read the license\nagreament before you download the TreeTagger package and language models.\n\nAfter the installation of the \u003ctt\u003eTreeTagger\u003c/tt\u003e set the environment variable\n\u003ctt\u003eTREETAGGER_BINARY\u003c/tt\u003e to the location where the binary \u003ctt\u003etree-tagger\u003c/tt\u003e\nresides. Usually this binary is located under the \u003ctt\u003ebin\u003c/tt\u003e directory in the\nmain installation directory of the \u003ctt\u003eTreeTagger\u003c/tt\u003e.\n\nAlso you have to set the variable \u003ctt\u003eTREETAGGER_MODEL\u003c/tt\u003e to the location of\nthe appropriate language model you have acquired in the training step.\n\nFor instance you may add the following lines to your \u003ctt\u003e.profile\u003c/tt\u003e file:\n\n    export TREETAGGER_BINARY='/path/to/your/TreeTagger/bin/tree-tagger'\n    export TREETAGGER_MODEL='/path/to/your/TreeTagger/lib/german.par'\n\nIt is convinient to work with a default language model, but you can change\nit every time during the instantiation of a new tagger instance.\n\n`treetagger-ruby` is provided as a `.gem` package. Simply install it via\n[RubyGems](http://rubygems.org/gems/treetagger-ruby).\nTo install \u003ctt\u003etreetagger-ruby\u003c/tt\u003e issue the following command:\n  $ gem install treetagger-ruby\n\nIf you want to do a system wide installation, do this as root\n(possibly using `sudo`).\n\nAlternatively use your Gemfile for dependency management.\n\n\n## SYNOPSIS\n### Basic Usage\n\nBasic usage is very simple:\n\n    $ require 'treetagger'\n    $ # Instantiate a tagger instance with default values.\n    $ tagger = TreeTagger::Tagger.new\n    $ # Process an array of tokens.\n    $ tagger.process(%w{Ich gehe in die Schule})\n    $ # Flush the pipeline.\n    $ tagger.flush\n    $ # Get the processed data.\n    $ tagger.get_output\n\n### Input Format\n\nBasically you have to provide a tokenized sequence with possibly some additional\ninformation on lexical classes of tokens and on their probabilities. Every token\nhas to be on a separate line. Due to technical limitations SGML tags\n(i.e. sequences with heading \u003c and trailing \u003e) cannot be valid tokens since\nthey are used internally for delimiting meaningful content from flush tokens.\nIt implies the use of the \u003ctt\u003e-sgml\u003c/tt\u003e option which cannot be changes by user.\nIt is a limitation of \u003cem\u003ethis\u003c/em\u003e library. If you do need to process tags,\nfall back and use the TreeTagger as a standalone programm possibly employing\ntemp files to store your input and output. This behaviour will be also\nimplemented in futher versions of \u003ctt\u003etreetagger-ruby\u003c/tt\u003e.\n\nEvery token may occure alone on the line or be followed by additional\ninformation:\n* token;\n* token (\\\\tab tag)+;\n* token (\\\\tab tag \\\\space lemma)+;\n* token (\\\\tab tag \\\\space probability)+;\n* token (\\\\tab tag \\\\space probability \\\\space lemma)+.\n\nYou input may look like the following sentence:\n  Die     ART 0.99\n  neuen   ADJA neu\n  Hunde  NN NP\n  stehen  VVFIN 0.99 stehen\n  an\n  den\n  Mauern  NN Mauer\n  .\n\n\nThis wrapper accepts the input as `String` or `Array`.\n\nIf you want to use strings, you are responsible for the proper delimiters inside\nthe string: \u003ctt\u003e\"Die\\\\tART 0.99\\\\nneuen\\\\tADJA neu\\\\nHunde\\\\tNN NP\\\\nstehen\\\\t\nVVFIN 0.99 stehen\\\\nan\\\\nden\\\\nMauern\\\\tNN Mauer\\\\n.\\\\n\"\u003c/tt\u003e\nNow \u003ctt\u003etreetagger-ruby\u003c/tt\u003e does not check your markup for correctness and will\npossibly report a \u003ctt\u003eTreeTagger::ExternalError\u003c/tt\u003e if the TreeTagger process\ndie due to input errors.\n\nUsing arrays is more convinient since they can be built programmatically.\n\nArrays should have the following structure:\n* ['token', 'token', 'token'];\n* ['token', ['token', ['POS', 'lemma'], ['POS', 'lemma']], 'token'];\n* ['token', ['token', ['POS', prob], ['POS', 'prob']], 'token'];\n* ['token', ['token', ['POS', prob, 'lemma'], ['POS', 'prob', 'lemma']]].\n\nIt is internally converted in the sequence \u003ctt\u003etoken\\\\ntoken\\\\tPOS lemma\\\\t\nPOS lemma\\\\ntoken\\\\n\u003c/tt\u003e, i.e. in the enriched version alternatives are\ntab separated and entries a blank separated.\n\nNote that probabilities may be strings or integers.\n\nThe lexicon lookup is *not* implemented for now, that's the latter three forms\nof input arrays are not supported yet.\n\n### Output Format\nFor now you'll get an array with strings elements. However the precise string\nstructure depends on the command line arguments you've provided during the tagger\ninstantiation.\n\nFor instanse for the input \u003ctt\u003e[\"Veruntreute\", \"die\", \"AWO\", \"Spendengeld\", \"?\"]\n\u003c/tt\u003e you'll get the following output with default cmd argumetns:\n\n\u003ctt\u003e[\"Veruntreute\\tNN\\tVeruntreute\", \"die\\tART\\td\", \"AWO\\tNN\\t\u003cunknown\u003e\",\n\"Spendengeld\\tNN\\tSpendengeld\", \"?\\t$.\\t?\"]\u003c/tt\u003e\n\nSee documentation in the TreeTagger::Tagger class for details\non particular methods.\n\n## Exception Hierarchy\n\nWhile using TreeTagger you can face following errors:\n* `TreeTagger::UserError`;\n* `TreeTagger::RuntimeError`;\n* `TreeTagger::ExternalError`.\n\nThis three kinds of errors all subclass \u003ctt\u003eTreeTagger::Error\u003c/tt\u003e, which\nin turn is a subclass of \u003ctt\u003eStandardError\u003c/tt\u003e. For an end user this means that\nit is possible to intercept all errors from `treetagger-ruby` with\na simple `rescue` clause.\n\n### Implemented Features\n\nPlease have a look at the [CHANGELOG](CHANGELOG.rdoc) file for details on implemented\nand planned features.\n\n\n## SUPPORT\nIf you have question, bug reports or any suggestions, please drop me an email :)\n\n## HOW TO CONTRIBUTE\nPlease contact me and suggest your ideas, report bugs, talk to me, if you want\nto implement some features in the future releases of this library.\n\nPlease don't feel offended if I cannot accept all your pull requests, I have\nto review them and find the appropriate time and place in the code base to\nincorporate your valuable changes.\n\nAny help is deeply appreciated!\n\n## LICENSE\n\nRTT is a copyrighted software by Andrei Beliankou, 2011-\n\nYou may use, redistribute and change it under the terms\nprovided in the [LICENSE](LICENSE.rdoc) file.\n\n\n# TODO:\n\n* How to use TreeTagger in the wild;\n* Input and output format, tokenization;\n* The actual german parameter file has been estimated on one byte encoded data.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farbox%2Ftreetagger-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farbox%2Ftreetagger-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farbox%2Ftreetagger-ruby/lists"}