{"id":18862123,"url":"https://github.com/yohasebe/engtagger","last_synced_at":"2025-04-13T13:14:14.747Z","repository":{"id":3502509,"uuid":"4559464","full_name":"yohasebe/engtagger","owner":"yohasebe","description":"English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger","archived":false,"fork":false,"pushed_at":"2025-01-16T11:11:48.000Z","size":1492,"stargazers_count":271,"open_issues_count":2,"forks_count":49,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-13T13:14:04.645Z","etag":null,"topics":["english","nlp","pos-tagging","ruby","rubynlp"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yohasebe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-06-05T12:11:19.000Z","updated_at":"2025-04-06T01:47:27.000Z","dependencies_parsed_at":"2024-06-18T13:55:49.841Z","dependency_job_id":"39aeb9f5-f971-4eef-817f-2c8f095140fa","html_url":"https://github.com/yohasebe/engtagger","commit_stats":{"total_commits":38,"total_committers":11,"mean_commits":"3.4545454545454546","dds":0.5,"last_synced_commit":"77fef40a9c2a355286a1d428bc955be958b3a151"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yohasebe%2Fengtagger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yohasebe%2Fengtagger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yohasebe%2Fengtagger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yohasebe%2Fengtagger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yohasebe","download_url":"https://codeload.github.com/yohasebe/engtagger/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717237,"owners_count":21150389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["english","nlp","pos-tagging","ruby","rubynlp"],"created_at":"2024-11-08T04:33:21.707Z","updated_at":"2025-04-13T13:14:14.725Z","avatar_url":"https://github.com/yohasebe.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EngTagger\n\nEnglish Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger\n\n## Description\n\nA Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained\ntagger that assigns POS tags to English text based on a lookup dictionary and\na set of probability values. The tagger assigns appropriate tags based on\nconditional probabilities--it examines the preceding tag to determine the\nappropriate tag for the current word. Unknown words are classified according to\nword morphology or can be set to be treated as nouns or other parts of speech.\nThe tagger also extracts as many nouns and noun phrases as it can, using a set\nof regular expressions.\n\n## Features\n\n* Assigns POS tags to English text\n* Extract noun phrases from tagged text\n* etc.\n\n## Synopsis\n\n```ruby\nrequire 'engtagger'\n\n# Create a parser object\ntgr = EngTagger.new\n\n# Sample text\ntext = \"Alice chased the big fat cat.\"\n\n# Add part-of-speech tags to text\ntagged = tgr.add_tags(text)\n\n#=\u003e \"\u003cnnp\u003eAlice\u003c/nnp\u003e \u003cvbd\u003echased\u003c/vbd\u003e \u003cdet\u003ethe\u003c/det\u003e \u003cjj\u003ebig\u003c/jj\u003e \u003cjj\u003efat\u003c/jj\u003e\u003cnn\u003ecat\u003c/nn\u003e \u003cpp\u003e.\u003c/pp\u003e\"\n\n# Get a list of all nouns and noun phrases with occurrence counts\nword_list = tgr.get_words(text)\n\n#=\u003e {\"Alice\"=\u003e1, \"cat\"=\u003e1, \"fat cat\"=\u003e1, \"big fat cat\"=\u003e1}\n\n# Get a readable version of the tagged text\nreadable = tgr.get_readable(text)\n\n#=\u003e \"Alice/NNP chased/VBD the/DET big/JJ fat/JJ cat/NN ./PP\"\n\n# Get all nouns from a tagged output\nnouns = tgr.get_nouns(tagged)\n\n#=\u003e {\"cat\"=\u003e1, \"Alice\"=\u003e1}\n\n# Get all proper nouns\nproper = tgr.get_proper_nouns(tagged)\n\n#=\u003e {\"Alice\"=\u003e1}\n\n# Get all past tense verbs\npt_verbs = tgr.get_past_tense_verbs(tagged)\n\n#=\u003e {\"chased\"=\u003e1}\n\n# Get all the adjectives\nadj = tgr.get_adjectives(tagged)\n\n#=\u003e {\"big\"=\u003e1, \"fat\"=\u003e1}\n\n# Get all noun phrases of any syntactic level\n# (same as word_list but take a tagged input)\nnps = tgr.get_noun_phrases(tagged)\n\n#=\u003e {\"Alice\"=\u003e1, \"cat\"=\u003e1, \"fat cat\"=\u003e1, \"big fat cat\"=\u003e1}\n```\n\n## Tag Set\n\nThe set of POS tags used here is a modified version of the Penn Treebank tagset. Tags with non-letter characters have been redefined to work better in our data structures. Also, the \"Determiner\" tag (DET) has been changed from 'DT', in order to avoid confusion with the HTML tag, `\u003cDT\u003e`.\n\n    CC      Conjunction, coordinating               and, or\n    CD      Adjective, cardinal number              3, fifteen\n    DET     Determiner                              this, each, some\n    EX      Pronoun, existential there              there\n    FW      Foreign words\n    IN      Preposition / Conjunction               for, of, although, that\n    JJ      Adjective                               happy, bad\n    JJR     Adjective, comparative                  happier, worse\n    JJS     Adjective, superlative                  happiest, worst\n    LS      Symbol, list item                       A, A.\n    MD      Verb, modal                             can, could, 'll\n    NN      Noun                                    aircraft, data\n    NNP     Noun, proper                            London, Michael\n    NNPS    Noun, proper, plural                    Australians, Methodists\n    NNS     Noun, plural                            women, books\n    PDT     Determiner, prequalifier                quite, all, half\n    POS     Possessive                              's, '\n    PRP     Determiner, possessive second           mine, yours\n    PRPS    Determiner, possessive                  their, your\n    RB      Adverb                                  often, not, very, here\n    RBR     Adverb, comparative                     faster\n    RBS     Adverb, superlative                     fastest\n    RP      Adverb, particle                        up, off, out\n    SYM     Symbol                                  *\n    TO      Preposition                             to\n    UH      Interjection                            oh, yes, mmm\n    VB      Verb, infinitive                        take, live\n    VBD     Verb, past tense                        took, lived\n    VBG     Verb, gerund                            taking, living\n    VBN     Verb, past/passive participle           taken, lived\n    VBP     Verb, base present form                 take, live\n    VBZ     Verb, present 3SG -s form               takes, lives\n    WDT     Determiner, question                    which, whatever\n    WP      Pronoun, question                       who, whoever\n    WPS     Determiner, possessive \u0026 question       whose\n    WRB     Adverb, question                        when, how, however\n\n    PP      Punctuation, sentence ender             ., !, ?\n    PPC     Punctuation, comma                      ,\n    PPD     Punctuation, dollar sign                $\n    PPL     Punctuation, quotation mark left        ``\n    PPR     Punctuation, quotation mark right       ''\n    PPS     Punctuation, colon, semicolon, elipsis  :, ..., -\n    LRB     Punctuation, left bracket               (, {, [\n    RRB     Punctuation, right bracket              ), }, ]\n\n## Installation\n\n**Recommended Approach (without sudo):**\n\nIt is recommended to install the `engtagger` gem within your user environment without root privileges. This ensures proper file permissions and avoids potential issues. You can achieve this by using Ruby version managers like `rbenv` or `rvm` to manage your Ruby versions and gemsets.\n\nTo install without `sudo`, simply run:\n\n```bash\ngem install engtagger\n```\n\n**Alternative Approach (with sudo):**\n\nIf you must use `sudo` for installation, you'll need to adjust file permissions afterward to ensure accessibility.\n\n1. Install the gem with `sudo`:\n\n```bash\nsudo gem install engtagger\n```\n\n2. Grant necessary permissions to your user:\n\n```bash\nsudo chown -R $(whoami) /Library/Ruby/Gems/2.6.0/gems/engtagger-0.4.2\n```\n\n**Note:** The path above assumes you are using Ruby version 2.6.0.  If you are using a different version, you will need to modify the path accordingly.  You can find your Ruby version by running `ruby -v`. \n\n## Troubleshooting\n\n**Permission Issues:**\n\nIf you encounter \"cannot load such file\" errors after installation, it might be due to incorrect file permissions. Ensure you've followed the instructions for adjusting permissions if you used `sudo` during installation.\n\n## Author\n\nYoichiro Hasebe (yohasebe [at] gmail.com)\n\n## Contributors\n\nMany thanks to the collaborators listed in the right column of this GitHub page.\n\n## Acknowledgement\n\nThis Ruby library is a direct port of Lingua::EN::Tagger available at CPAN.\nThe credit for the crucial part of its algorithm/design therefore goes to\nAaron Coburn, the author of the original Perl version.\n\n## License\n\nThis library is distributed under the GPL.  Please see the LICENSE file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyohasebe%2Fengtagger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyohasebe%2Fengtagger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyohasebe%2Fengtagger/lists"}