{"id":20375935,"url":"https://github.com/stcarrez/ada-stemmer","last_synced_at":"2025-07-10T01:43:43.926Z","repository":{"id":88796696,"uuid":"262580198","full_name":"stcarrez/ada-stemmer","owner":"stcarrez","description":"Multi natural language stemmer with Snowball generator","archived":false,"fork":false,"pushed_at":"2024-09-29T07:11:00.000Z","size":35302,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-01-15T07:09:31.118Z","etag":null,"topics":["ada","stemmer"],"latest_commit_sha":null,"homepage":"","language":"Ada","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stcarrez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-09T13:45:21.000Z","updated_at":"2024-09-29T07:11:04.000Z","dependencies_parsed_at":"2025-01-15T06:16:44.196Z","dependency_job_id":"13dada65-1f0a-4e5b-87cb-aa60a468de8b","html_url":"https://github.com/stcarrez/ada-stemmer","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stcarrez%2Fada-stemmer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stcarrez%2Fada-stemmer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stcarrez%2Fada-stemmer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stcarrez%2Fada-stemmer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stcarrez","download_url":"https://codeload.github.com/stcarrez/ada-stemmer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241924434,"owners_count":20043216,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ada","stemmer"],"created_at":"2024-11-15T01:34:14.103Z","updated_at":"2025-03-04T21:28:56.035Z","avatar_url":"https://github.com/stcarrez.png","language":"Ada","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ada Stemmer Library\n\n[![Build Status](https://img.shields.io/endpoint?url=https://porion.vacs.fr/porion/api/v1/projects/ada-stemmer/badges/build.json)](https://porion.vacs.fr/porion/projects/view/ada-stemmer/summary)\n[![Test Status](https://img.shields.io/endpoint?url=https://porion.vacs.fr/porion/api/v1/projects/ada-stemmer/badges/tests.json)](https://porion.vacs.fr/porion/projects/view/ada-stemmer/xunits)\n[![Coverage](https://img.shields.io/endpoint?url=https://porion.vacs.fr/porion/api/v1/projects/ada-stemmer/badges/coverage.json)](https://porion.vacs.fr/porion/projects/view/ada-stemmer/summary)\n[![License](https://img.shields.io/badge/license-APACHE2-blue.svg)](LICENSE)\n\nThe Ada Stemmer Library is a [stemming](https://en.wikipedia.org/wiki/Stemming) processor for several\nnatural languages.  It is based on the [Snowball compiler and stemming algorithms](https://snowballstem.org/)\nwhich has been adapted to generate Ada code ([Snowball Ada](https://github.com/stcarrez/snowball/tree/ada-support)).\nA stemming algorithm is used in natural language analysis to find\nthe base or root form of a word.  Such algorithm is specific to each natural language.\nThe [Porter Stemmer](https://tartarus.org/martin/PorterStemmer/) algorithm is specific to the English language\nand will not work for French, Greek or Russian.\n\nThe Ada Stemmer Library integrates stemming algorithms for:\nArabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek,\nHindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Serbian, Nepali, Norwegian,\nPortuguese, Romanian, Russian, Serbian, Spanish, Swedish, Tamil, Turkish.\n\nExample of stemming:\n\n|Language | Word          | Stem       |\n|---------|---------------|------------|\n|French   | chienne       | chien      |\n|French   | affectionnait | affection  |\n|English  | zealously     | zealous    |\n|English  | transitional  | transit    |\n|Greek    | ποσοτητα      | ποσοτητ    |\n|Greek    | μνημειωδεσ    | μνημειωδ   |\n|Russian  | ячменный      | ячмен      |\n|Russian  | адом          | ад         |\n\n\n## Version 1.2.1 - Under development\n\n* Update to build with Alire\n\n## Version 1.2.0 - May 2022\n\n* Update to use Snowball 2.2 (the Ada code generator has been integrated in Snowball 2.2!)\n* Improvement to help in running the tests\n\n## Version 1.1.0 - Oct 2020\n\n* Add support Arabic, Basque, Catalan, Finnish, Hindi, Hungarian, Indonesian,\n  Irish, Lithuanian, Nepali, Norwegian, Porter, Portuguese, Romanian,\n  Tamil, Turkish\n\n## Version 1.0.0 - May 2020\n\n* First implementation of the Ada Stemmer Library\n\n# Build\n\nBuild with the following commands:\n\n```sh\nmake\n```\n\n## Unit test\n\nTo build the unit test, you will need the [Ada Utility Library](https://github.com/stcarrez/ada-util).\nThe `make test` target will clone the git repository locally and it will configure the GNAT project\naccordingly to use and build the unit tests.\n\n```sh\nmake build test HAVE_ADA_UTIL=yes ADA_PROJECT_PATH=./ada-util/.alire:./ada-util:./ada-util/.alire/unit\n```\n\nAnd unit tests are executed with:\n\n```sh\nmake test\n```\n\nThe unit tests contains several reference files in `regtests/files` that come from the\n[Lucene](https://lucene.apache.org) search engine unit tests.\n\n# Examples\n\nThe samples can be built using:\n\n```sh\ngnatmake -Psamples\n```\n\nYou will get two programs:\n\n* `bin/stemargs` will give the stem of words given as program argument,\n* `bin/stemwords` will read a file and stem the words to print the result.\n\nThe first argument is the language.  For example:\n\n```sh\nbin/stemargs french chienne\n```\n\nor:\n\n```sh\nbin/stemwords english LICENSE.txt\n```\n\n\n# Simple example\n\nThe Ada Stemmer library does not split words.  You have to give them one word at a time\nto stem and it returns either the word itself or its stem.  The `Stemmer.Factory` is\nthe multi-language entry point.  The stemmer algorithm is created for each call.\n\n```ada\nwith Stemmer.Factory;\n\n  Ada.Text_IO.Put_Line (Stem (L_FRENCH, \"chienne\"));\n```\n\nIt is possible to instantiate a specific stemmer algorithm and then use it to stem\nwords.\n\n```ada\nwith Stemmer.English;\n\n  Ctx : Stemmer.English.Context_Type;\n  Result : Boolean;\n\n  Ctx.Stem_Word (\"zealously\", Result);\n  if Result then\n     Ada.Text_IO.Put_Line (Ctx.Get_Result);\n  end if;\n```\n\n# References\n\n* [The Porter Stemming Algorithm](https://tartarus.org/martin/PorterStemmer/)\n* [Snowball Manual](https://snowballstem.org/compiler/snowman.html)\n* [Lucene text analysis](https://lucene.apache.org/core/8_5_1/core/org/apache/lucene/analysis/package-summary.html#package.description)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstcarrez%2Fada-stemmer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstcarrez%2Fada-stemmer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstcarrez%2Fada-stemmer/lists"}