{"id":22607873,"url":"https://github.com/patch/lingua-stem-any-pm5","last_synced_at":"2025-03-28T22:20:02.844Z","repository":{"id":7693533,"uuid":"9057595","full_name":"patch/lingua-stem-any-pm5","owner":"patch","description":"Lingua::Stem::Any (Perl 5): Unified interface to any stemmer on CPAN (Perl 5)","archived":false,"fork":false,"pushed_at":"2014-08-29T06:24:52.000Z","size":448,"stargazers_count":4,"open_issues_count":3,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-03T08:48:05.562Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://metacpan.org/pod/Lingua::Stem::Any","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/patch.png","metadata":{"files":{"readme":"README.md","changelog":"Changes","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-03-27T15:52:57.000Z","updated_at":"2022-03-26T17:35:05.000Z","dependencies_parsed_at":"2022-07-09T19:31:44.787Z","dependency_job_id":null,"html_url":"https://github.com/patch/lingua-stem-any-pm5","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patch%2Flingua-stem-any-pm5","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patch%2Flingua-stem-any-pm5/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patch%2Flingua-stem-any-pm5/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patch%2Flingua-stem-any-pm5/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/patch","download_url":"https://codeload.github.com/patch/lingua-stem-any-pm5/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246106790,"owners_count":20724412,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-08T14:22:37.412Z","updated_at":"2025-03-28T22:20:02.826Z","avatar_url":"https://github.com/patch.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build status](https://travis-ci.org/patch/lingua-stem-any-pm5.png)](https://travis-ci.org/patch/lingua-stem-any-pm5)\n[![Coverage status](https://coveralls.io/repos/patch/lingua-stem-any-pm5/badge.png)](https://coveralls.io/r/patch/lingua-stem-any-pm5)\n[![CPAN version](https://badge.fury.io/pl/Lingua-Stem-Any.png)](http://badge.fury.io/pl/Lingua-Stem-Any)\n\n# NAME\n\nLingua::Stem::Any - Unified interface to any stemmer on CPAN\n\n# VERSION\n\nThis document describes Lingua::Stem::Any v0.05.\n\n# SYNOPSIS\n\n```perl\nuse Lingua::Stem::Any;\n\n# create German stemmer using the default source module\n$stemmer = Lingua::Stem::Any-\u003enew(language =\u003e 'de');\n\n# create German stemmer explicitly using Lingua::Stem::Snowball\n$stemmer = Lingua::Stem::Any-\u003enew(\n    language =\u003e 'de',\n    source   =\u003e 'Lingua::Stem::Snowball',\n);\n\n# get stem for word\n$stem = $stemmer-\u003estem($word);\n\n# get list of stems for list of words\n@stems = $stemmer-\u003estem(@words);\n```\n\n# DESCRIPTION\n\nThis module aims to provide a simple unified interface to any stemmer on CPAN.\nIt will provide a default available source module when a language is requested\nbut no source is requested.\n\n## Attributes\n\n- language\n\n    The following language codes are currently supported.\n\n        ┌────────────┬────┐\n        │ Bulgarian  │ bg │\n        │ Czech      │ cs │\n        │ Danish     │ da │\n        │ Dutch      │ nl │\n        │ English    │ en │\n        │ Esperanto  │ eo │\n        │ Finnish    │ fi │\n        │ French     │ fr │\n        │ Galician   │ gl │\n        │ German     │ de │\n        │ Hungarian  │ hu │\n        │ Ido        │ io │\n        │ Italian    │ it │\n        │ Latin      │ la │\n        │ Norwegian  │ no │\n        │ Persian    │ fa │\n        │ Polish     │ pl │\n        │ Portuguese │ pt │\n        │ Romanian   │ ro │\n        │ Russian    │ ru │\n        │ Spanish    │ es │\n        │ Swedish    │ sv │\n        │ Turkish    │ tr │\n        └────────────┴────┘\n\n    They are in the two-letter ISO 639-1 format and are case-insensitive but are\n    always returned in lowercase when requested.\n\n    ```perl\n    # instantiate a stemmer object\n    $stemmer = Lingua::Stem::Any-\u003enew(language =\u003e $language);\n\n    # get current language\n    $language = $stemmer-\u003elanguage;\n\n    # change language\n    $stemmer-\u003elanguage($language);\n    ```\n\n    The default language is `en` (English). The values `nb` (Norwegian Bokmål)\n    and `nn` (Norwegian Nynorsk) are aliases for `no` (Norwegian). Country codes\n    such as `CZ` for the Czech Republic are not supported, as opposed to `cs` for\n    the Czech language, nor are full IETF language tags or Unicode locale\n    identifiers such as `pt-PT` or `pt_BR`.\n\n- source\n\n    The following source modules are currently supported.\n\n        ┌────────────────────────┬──────────────────────────────────────────────┐\n        │ Module                 │ Languages                                    │\n        ├────────────────────────┼──────────────────────────────────────────────┤\n        │ Lingua::Stem::Snowball │ da de en es fi fr hu it nl no pt ro ru sv tr │\n        │ Lingua::Stem::UniNE    │ bg cs de fa                                  │\n        │ Lingua::Stem           │ da de en fr gl it no pt ru sv                │\n        │ Lingua::Stem::Patch    │ eo io pl                                     │\n        └────────────────────────┴──────────────────────────────────────────────┘\n\n    A module name is used to specify the source. If no source is specified, the\n    first available source in the above list with support for the current language\n    is used.\n\n    ```perl\n    # get current source\n    $source = $stemmer-\u003esource;\n\n    # change source\n    $stemmer-\u003esource('Lingua::Stem::UniNE');\n    ```\n\n- cache\n\n    Boolean value specifying whether to cache the stem for each word. This will\n    increase performance when stemming the same word multiple times at the expense\n    of increased memory consumption. When enabled, the stems are cached for the life\n    of the object or until the [\"clear\\_cache\"](#clear_cache) method is called. The same cache is\n    not shared among different languages, sources, or different instances of the\n    stemmer object.\n\n- exceptions\n\n    Exceptions may be desired to bypass stemming for specific words and use\n    predefined stems. For example, the plural English word `mice` will not stem to\n    the singular word `mouse` unless it is specified in the exception dictionary.\n    Another example is that by default the word `pants` will stem to `pant` even\n    though stemming is normally not desired in this example. The exception\n    dictionary can be provided as a hashref where the keys are language codes and\n    the values are hashrefs of exceptions.\n\n    ```perl\n    # instantiate stemmer object with exceptions\n    $stemmer = Lingua::Stem::Any-\u003enew(\n        language   =\u003e 'en',\n        exceptions =\u003e {\n            en =\u003e {\n                mice  =\u003e 'mouse',\n                pants =\u003e 'pants',\n            }\n        }\n    );\n\n    # add/change exceptions\n    $stemmer-\u003eexceptions(\n        en =\u003e {\n            mice  =\u003e 'mouse',\n            pants =\u003e 'pants',\n        }\n    );\n\n    # alternately...\n    $stemmer-\u003eexceptions-\u003e{en} = {\n        mice  =\u003e 'mouse',\n        pants =\u003e 'pants',\n    };\n    ```\n\n- casefold\n\n    Boolean value specifying whether to apply Unicode casefolding to words before\n    stemming them. This is enabled by default and is performed before normalization\n    when also enabled.\n\n- normalize\n\n    Boolean value specifying whether to apply Unicode NFC normalization to words\n    before stemming them. This is enabled by default and is performed after\n    casefolding when also enabled.\n\n## Methods\n\n- stem\n\n    Accepts a list of strings, stems each string, and returns a list of stems. The\n    list returned will always have the same number of elements in the same order as\n    the list provided. When no stemming rules apply to a word, the original word is\n    returned.\n\n    ```perl\n    @stems = $stemmer-\u003estem(@words);\n\n    # get the stem for a single word\n    $stem = $stemmer-\u003estem($word);\n    ```\n\n    The words should be provided as character strings and the stems are returned as\n    character strings. Byte strings in arbitrary character encodings are not\n    supported.\n\n- stem\\_in\\_place\n\n    Accepts an array reference, stems each element, and replaces them with the\n    resulting stems.\n\n    ```perl\n    $stemmer-\u003estem_in_place(\\@words);\n    ```\n\n    This method is provided for potential optimization when a large array of words\n    is to be stemmed. The return value is not defined.\n\n- languages\n\n    Returns a list of supported two-letter language codes using lowercase letters.\n\n    ```perl\n    # all languages\n    @languages = $stemmer-\u003elanguages;\n\n    # languages supported by Lingua::Stem::Snowball\n    @languages = $stemmer-\u003elanguages('Lingua::Stem::Snowball');\n    ```\n\n- sources\n\n    Returns a list of supported source module names.\n\n    ```perl\n    # all sources\n    @sources = $stemmer-\u003esources;\n\n    # sources that support English\n    @sources = $stemmer-\u003esources('en');\n    ```\n\n- clear\\_cache\n\n    Clears the stem cache for all languages and sources of this object instance when\n    the [\"cache\"](#cache) attribute is enabled. Does not affect whether caching is enabled.\n\n# SEE ALSO\n\n[Lingua::Stem::Snowball](https://metacpan.org/pod/Lingua::Stem::Snowball), [Lingua::Stem::UniNE](https://metacpan.org/pod/Lingua::Stem::UniNE), [Lingua::Stem](https://metacpan.org/pod/Lingua::Stem), [Lingua::Stem::Patch](https://metacpan.org/pod/Lingua::Stem::Patch)\n\n# AUTHOR\n\nNick Patch \u003cpatch@cpan.org\u003e\n\nThis project is brought to you by [Shutterstock](http://www.shutterstock.com/).\nAdditional open source projects from Shutterstock can be found at\n[code.shutterstock.com](http://code.shutterstock.com/).\n\n# COPYRIGHT AND LICENSE\n\n© 2013–2014 Shutterstock, Inc.\n\nThis library is free software; you can redistribute it and/or modify it under\nthe same terms as Perl itself.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatch%2Flingua-stem-any-pm5","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpatch%2Flingua-stem-any-pm5","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatch%2Flingua-stem-any-pm5/lists"}