{"id":21241482,"url":"https://github.com/pommedeterresautee/unine","last_synced_at":"2025-08-04T05:38:47.868Z","repository":{"id":56936991,"uuid":"174957564","full_name":"pommedeterresautee/unine","owner":"pommedeterresautee","description":"Unine light stemmer for French, German, Italian, Spanish, Portuguese, Finnish, Swedish","archived":false,"fork":false,"pushed_at":"2019-04-14T07:17:37.000Z","size":288,"stargazers_count":4,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-11T01:02:46.958Z","etag":null,"topics":["cran","finish","french","german","information-retrieval","ir","italian","nlp","portuguese","rstats","spanish","stemmer","swedish"],"latest_commit_sha":null,"homepage":"https://pommedeterresautee.github.io/unine/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pommedeterresautee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-11T08:32:52.000Z","updated_at":"2022-11-24T05:42:33.000Z","dependencies_parsed_at":"2022-08-21T06:21:01.044Z","dependency_job_id":null,"html_url":"https://github.com/pommedeterresautee/unine","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/pommedeterresautee/unine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pommedeterresautee%2Funine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pommedeterresautee%2Funine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pommedeterresautee%2Funine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pommedeterresautee%2Funine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pommedeterresautee","download_url":"https://codeload.github.com/pommedeterresautee/unine/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pommedeterresautee%2Funine/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268655128,"owners_count":24285128,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cran","finish","french","german","information-retrieval","ir","italian","nlp","portuguese","rstats","spanish","stemmer","swedish"],"created_at":"2024-11-21T00:55:52.146Z","updated_at":"2025-08-04T05:38:47.841Z","avatar_url":"https://github.com/pommedeterresautee.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"![UNINE](https://github.com/pommedeterresautee/unine/raw/master/tools/logo_unine.png) \n=========\n\n[![Travis build status](https://travis-ci.org/pommedeterresautee/unine.svg?branch=master)](https://travis-ci.org/pommedeterresautee/unine)\n[![Build status](https://ci.appveyor.com/api/projects/status/gole8beawqyw3tvy?svg=true)](https://ci.appveyor.com/project/pommedeterresautee/unine)\n[![Coverage status](https://codecov.io/gh/pommedeterresautee/unine/branch/master/graph/badge.svg)](https://codecov.io/github/pommedeterresautee/unine?branch=master)\n[![CRAN status](https://www.r-pkg.org/badges/version/unine)](https://cran.r-project.org/package=unine)\n[![CRAN_Download](http://cranlogs.r-pkg.org/badges/unine)](http://cran.rstudio.com/web/packages/unine/index.html) \n[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Follow](https://img.shields.io/twitter/follow/pommedeterre33.svg?style=social)](https://twitter.com/intent/follow?screen_name=pommedeterre33)\n\nImplementation of \"light\" stemmers for **French, German, Italian, Spanish, Portuguese, Finnish, Swedish**.  \nThey are based on the same work as the \"light\" stemmers found in [SolR](https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java) or [ElasticSearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html).  \nA \"light\" stemmer consists in removing inflections only for noun and adjectives.  \nIndexing verbs for these languages is not of primary importance compared to nouns and adjectives. \n\nThe procedures used in this stemmer are described below:  \n\n* the stemming procedure for French is described in (Savoy, 1999).  \n* in Italian, the main inflectional rule is to modify the final character (e.g., «-o», «-a» or «-e») into another (e.g., «-i», «-e»). As a second rule, Italian morphology may also alter the final two letters (e.g., «-io» in «-o», «-co» in «-chi», «-ga» in «-ghe»).  \n* in German, a few rules may be applied to obtain the plural form of words (e.g., \"Frau\" into \"Frauen\" (woman), \"Bild\" into \"Bilder\" (picture), \"Sohn\" into \"Söhne\" (son), \"Apfel\" into \"Äpfel\" (apple)), but the suggested algorithms do not account for person and tense variations, or for the morphological variations used by verbs.  \n\nOnline tests are available [on this website](http://yomguithereal.github.io/talisman/stemmers/french).\n\n### Installation\n\nYou can install the released version of unine from [CRAN](https://CRAN.R-project.org) with:\n\n``` r\ninstall.packages(\"unine\")\n```\n\n... or the last version from [Github](https://github.com/pommedeterresautee/unine)\n\n``` r\ndevtools::install_github(\"pommedeterresautee/unine\")\n```\n\n\n### Example\n\nBelow some examples for French and a comparaison with Porter French stemmer.\n\n``` r\nfrench_stemmer(words = c(\"complète\", \"caissière\"))\n# [1] \"complet\"  \"caisier\"\n# Not that below double letters are deduplicated: caissière -\u003e caisier\n\nfrench_stemmer(words = c(\"tester\", \"testament\", \"chevaux\", \"aromatique\", \"personnel\", \"folle\"))\n# [1] \"test\"      \"testament\" \"cheval\"    \"aromat\"    \"personel\" \"fou\" \n# Not that below double letters are deduplicated: personnel -\u003e personel\n\n# look at how \"testament\" and \"tester\" have been stemmed above. \n# Now with Porter stemmer :\nSnowballC::wordStem(c(\"testament\", \"tester\"), language = \"french\")\n# [1] \"test\" \"test\"\n\n```\n\n### References\n\nPlease cite [1] if using this R package.\n\n[1] J. Savoy, [*A stemming procedure and stopword list for general French corpora*](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.7093\u0026rep=rep1\u0026type=pdf)\n\n```\n@article{savoy1999stemming,\n  title={A stemming procedure and stopword list for general French corpora},\n  author={Savoy, Jacques},\n  journal={Journal of the American Society for Information Science 50(10), 944-952.},\n  year={2009}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpommedeterresautee%2Funine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpommedeterresautee%2Funine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpommedeterresautee%2Funine/lists"}