{"id":23029692,"url":"https://github.com/antononcube/raku-lingua-numericwordforms","last_synced_at":"2025-08-14T12:34:46.985Z","repository":{"id":71383720,"uuid":"356714024","full_name":"antononcube/Raku-Lingua-NumericWordForms","owner":"antononcube","description":"Raku functions that generate, parse, and interpret numeric word forms in different languages.","archived":false,"fork":false,"pushed_at":"2024-06-06T13:04:38.000Z","size":421,"stargazers_count":2,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-06-07T03:50:35.892Z","etag":null,"topics":["numeric-words-conversion"],"latest_commit_sha":null,"homepage":"","language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antononcube.png","metadata":{"files":{"readme":"README-work.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-10T22:46:46.000Z","updated_at":"2024-06-06T13:04:41.000Z","dependencies_parsed_at":"2024-06-05T03:54:59.394Z","dependency_job_id":null,"html_url":"https://github.com/antononcube/Raku-Lingua-NumericWordForms","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Lingua-NumericWordForms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Lingua-NumericWordForms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Lingua-NumericWordForms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Lingua-NumericWordForms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antononcube","download_url":"https://codeload.github.com/antononcube/Raku-Lingua-NumericWordForms/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229827651,"owners_count":18130395,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["numeric-words-conversion"],"created_at":"2024-12-15T14:16:45.946Z","updated_at":"2025-08-14T12:34:46.966Z","avatar_url":"https://github.com/antononcube.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Raku Lingua::NumericWordForms\n\n[![MacOS](https://github.com/antononcube/Raku-Lingua-NumericWordForms/actions/workflows/macos.yml/badge.svg)](https://github.com/antononcube/Raku-Lingua-NumericWordForms/actions/workflows/macos.yml)\n[![Linux](https://github.com/antononcube/Raku-Lingua-NumericWordForms/actions/workflows/linux.yml/badge.svg)](https://github.com/antononcube/Raku-Lingua-NumericWordForms/actions/workflows/linux.yml)\n[![Win64](https://github.com/antononcube/Raku-Lingua-NumericWordForms/actions/workflows/windows.yml/badge.svg)](https://github.com/antononcube/Raku-Lingua-NumericWordForms/actions/workflows/windows.yml)\n[![https://raku.land/zef:antononcube/Lingua::NumericWordForms](https://raku.land/zef:antononcube/Lingua::NumericWordForms/badges/version)](https://raku.land/zef:antononcube/Lingua::NumericWordForms)\n[![https://raku.land/zef:antononcube/Lingua::NumericWordForms](https://raku.land/zef:antononcube/Lingua::NumericWordForms/badges/downloads)](https://raku.land/zef:antononcube/Lingua::NumericWordForms)\n[![License: Artistic-2.0](https://img.shields.io/badge/License-Artistic%202.0-0298c3.svg)](https://opensource.org/licenses/Artistic-2.0)\n\n🇦🇲 🇦🇿 :bulgaria: 🇨🇿 🇬🇧 🇫🇷 🇩🇪 :greece: :iran: 🇯🇵 🇰🇿 :poland: 🇵🇹 🇷🇴 🇷🇺 🇪🇸 :ukraine:\n\n## Introduction\n\nThis repository provides a Raku package with functions for the \ngeneration, parsing, and interpretation of numeric word forms in different languages.\n\nThe initial versions of the code in this repository can be found in the GitHub repository \\[AAr1\\].\n\nThe Raku package \n[`Lingua::Number`](https://github.com/labster/p6-Lingua-Number), [BL1],\nprovides word forms (cardinal, ordinal, etc.) generation in many languages. \n(But at least for one language the produced forms are incorrect.)\n\nThe Raku package \n[`Lingua::EN::Numbers`](https://github.com/labster/p6-Lingua-Number), [SS1],\nalso provides word forms (cardinal, ordinal, etc.) generation in English. \n\nThe parsers and interpreters of this package can be seen as complementary\nto the functions in [BL1, SS1].\n\n**Remark:** Maybe a more complete version of this package should be merged with \n[`Lingua::Number`](https://github.com/labster/p6-Lingua-Number), [BL1].\n\n**Remark:** I can judge the quality of the results only of the languages:\nBulgarian, English, and Russian. The numeric word form interpreters for the rest of the languages\npass testing, but they might have potentially many deficiencies. \n(That are easily detected by people who have mastered those languages.)\n\n**Remark:** The package also \"understands\" (i.e. parses and translates to)\n[Koremutake](https://shorl.com/koremutake.php).\n\n------\n\n## Installation\n\nPackage installations from both sources use [zef installer](https://github.com/ugexe/zef)\n(which should be bundled with the \"standard\" [Rakudo](https://rakudo.org) installation file.)\n\nTo install the package via Zef's ecosystem use the shell command:\n\n```\nzef install Lingua::NumericWordForms\n```\n\nTo install the package from the GitHub repository use the shell command:\n\n```\nzef install https://github.com/antononcube/Raku-Lingua-NumericWordForms.git\n```\n\n------\n\n## Examples\n\n### Generation \n\nThe generation of numeric word forms is a *secondary* goal of this package.\nCurrently only generation of Bulgarian, English, Japanese, [Koremutake](https://shorl.com/koremutake.php), and Russian \nnumeric word forms are implemented. \nHere are examples:\n\n```perl6\nuse Lingua::NumericWordForms;\nsay to-numeric-word-form(8093);\nsay to-numeric-word-form(8093, 'Bulgarian');\nsay to-numeric-word-form(8093, 'Koremutake');\nsay to-numeric-word-form(8093, 'Russian');\nsay to-numeric-word-form(8093, 'Japanese');\n```\n\nThe first argument of `to-numeric-word-form` can be:\n\n- An integer\n- A string that can be parsed into an integer\n- A string of numbers separated by \";\"\n- A list of numbers or strings\n\nHere are examples of the latter two:\n\n```perl6\nto-numeric-word-form('123; 232; 898_934').join('; ');\n```\n\n```perl6\nto-numeric-word-form([321, '992', 100_904]).join('; ');\n```\n\n### Interpretation\n\nInterpretation of numeric word forms is the *primary* goal of this package.\nMultiple language are supported. Here are examples:\n\n```perl6\nuse Lingua::NumericWordForms;\nsay from-numeric-word-form('one thousand and twenty three');\nsay from-numeric-word-form('хиляда двадесет и три', 'Bulgarian');\nsay from-numeric-word-form('tysiąc dwadzieścia trzy', 'Polish');\nsay from-numeric-word-form('одна тысяча двадцать три', lang =\u003e 'Russian');\nsay from-numeric-word-form('mil veintitrés', lang =\u003e 'Spanish');\n```\n\nThe function `from-numeric-word-form` can take as a first argument:\n\n- A string that is a numeric word form\n  \n- A string comprised of numeric word forms separated by \";\"\n  \n- A list or an array of strings \n\nThe language can be specified as a second positional argument or with the named argument \"lang\".\nIn addition to the names of the supported languages the value of the language argument can be also `Whatever` or \"Automatic\".\n\nHere are corresponding examples:\n\n```perl6\nfrom-numeric-word-form('twenty six');\n```\n\n```perl6\nfrom-numeric-word-form(['mil veintitrés', 'dos mil setenta y dos']);\n```\n\n```perl6\nfrom-numeric-word-form('two hundred and five; триста четиридесет и две; 二十万六十五'):p;\n```\n\nFor more examples see the file \n[NumericWordForms-examples.raku](examples/NumericWordForms-parsing-examples.raku).\n\nHere we retrieve a list of all supported languages:\n\n```perl6\nfrom-numeric-word-form('languages').sort\n```\n\n**Remark:** In the list above some languages appear twice, with both their English and native names.\n\n#### Type of the result\n\nThe returned result can be an `Int` object or a `Str` object -- that is controlled with\nthe adverb `number` (which by default is `True`.) Here is an example:\n\n```perl6\nmy $res = from-numeric-word-form('one thousand and twenty three'); \nsay $res, ' ', $res.WHAT;\n```\n\n```perl6\n$res = from-numeric-word-form('one thousand and twenty three', :!number); \nsay $res, ' ', $res.WHAT;\n```\n\n#### Automatic language detection\n\nAutomatic language detection is invoked if the second argument is `Whatever` or \"Automatic\":\n\n```perl6\nsay from-numeric-word-form('tysiąc dwadzieścia trzy', Whatever):p;\nsay from-numeric-word-form('триста двадесет и три', lang =\u003e 'Automatic'):p;\n```\n\n```perl6\nsay from-numeric-word-form(['tysiąc dwadzieścia trzy', 'twenty three']):p;\n```\n\nThe adverb `:pairs` (`:p`) specifies whether the result should be a `Pair` object or a `List` of `Pair` objects\nwith the detected languages as keys.\n\n### Translation\n\nTranslation from one language to another:\n\n```perl6\ntranslate-numeric-word-form('хиляда двадесет и три', 'Bulgarian' =\u003e 'English');\n```\n\n```perl6\ntranslate-numeric-word-form('two hundred thousand and five', 'English' =\u003e 'Bulgarian');\n```\n\n**Remark:** Currently that function translates to Bulgarian, English, \n[Koremutake](https://shorl.com/koremutake.php), and Russian.\nonly (from any of the package languages.)\n\nHere is a Spanish to Koremutake example:\n\n```perl6\nmy $numForm = \"tres mil ochocientos noventa\";\nmy $trRes = translate-numeric-word-form($numForm, 'Automatic' =\u003e 'Koremutake');\nsay \"Given           : $numForm\";\nsay \"To Koremutake   : $trRes\";\nsay \"From Koremutake : {from-numeric-word-form($trRes)}\";\n```\n\nThe named arguments \"from\" and \"to\" can be also used:\n\n```perl6\ntranslate-numeric-word-form($numForm, from =\u003e Whatever, to =\u003e 'English');\n```\n\n------\n\n## Roles\n\nThis package provides (exports) roles that can be used in grammars or roles in other packages, applications, etc.\n\nFor example, see the roles:\n\n```\nLingua::NumericWordForms::Roles::Bulgarian::WordedNumberSpec\nLingua::NumericWordForms::Roles::English::WordedNumberSpec\n```\n\nA grammar or role that does the roles above should use the rule:\n\n```\n\u003cnumeric-word-form\u003e\n```\n\nFor code examples see the file \n[Parsing-examples.raku](./examples/Parsing-examples.raku).\n\n**Remark:** The role `Lingua::NumericWordForms::Roles::WordedNumberSpec` and the corresponding\nactions class `Lingua::NumericWordForms::Actions::WordedNumberSpec` are \"abstract\".\nThey were introduced in order to have simpler roles and actions code \n(and non-duplicated implementations.) Hence, that role and class *should not* be used in\ngrammars and roles outside of this package.\n\n------\n\n## CLI\n\nThe package provides two Command Line Interface (CLI) functions:\n`from-numeric-word-form` and `to-numeric-word-form`.\n\nCorresponding usage messages and examples are given below.\n\n### `from-numeric-word-form`\n\n#### Usage message\n\n```shell\nfrom-numeric-word-form --help\n```\n\n#### Example\n\n```shell\nfrom-numeric-word-form two hundred and five\n```\n\n### `to-numeric-word-form`\n\n#### Usage message\n\n```shell\nto-numeric-word-form --help\n```\n\n#### Example\n\n```shell\nto-numeric-word-form 33 124 99832 --lang Bulgarian\n```\n\n------\n\n## TODO\n\nThe following TODO items are ordered by priority, the most important are on top. \n \n1. [ ] TODO Expand parsing beyond trillions\n\n2. [X] DONE Automatic determination of the language\n\n3. [X] DONE Word form generation:\n   - [X] DONE Bulgarian\n   - [X] DONE English\n   - [X] DONE Japanese\n   - [X] DONE Koremutake\n   - [X] DONE Russian\n   - [X] CANCELED General algorithm\n       - Canceled because it is a hard problem and Large Language Models (LLMs) can do it.\n\n4. [ ] TODO Documentation of the general programming approach.\n\n   - [ ] TODO What are the main challenges?\n   - [ ] TODO How the chosen software architecture decisions address them?\n   - [ ] TODO Concrete implementations walk-through.\n   - [ ] TODO How to implement / include a new language?\n   - [ ] TODO How the random numbers test files were made?\n   - [ ] TODO Profiling, limitations, alternatives.\n   - [ ] TODO Comparison with LLM-based conversions.\n   \n5. [ ] TODO Full, consistent Persian numbers parsing. \n   - Currently, Persian number parsing works only for numbers less than 101.  \n   \n6. [X] DONE General strategy for parsing and interpretation of \n   numeric word forms of East Asia languages  \n   - Those languages use groupings based on 10^4 instead of 10^3. \n   - [X] DONE Implementation for Japanese.\n   \n7. [ ] TODO Implement parsing of ordinal numeric word forms \n\n   - [X] DONE English, French, Greek, and Spanish\n   \n   - [X] DONE Bulgarian\n    \n   - [X] DONE Czech, Russian, Ukrainian, Polish\n   \n   - [X] DONE Japanese\n\n   - [X] DONE Koremutake\n   \n   - [X] DONE Portuguese\n   \n   - [X] DONE Azerbaijani\n   \n   - [X] DONE Kazakh\n        - Very similar to Azerbaijani.\n          - The Kazakh action class should inherit the Azerbaijani one.\n          \n   - [X] DONE German\n        - As expected, required some refactoring to handle the agglutinative word forms. \n     \n   - [X] DONE Romanian\n   \n   - [X] DONE Armenian\n   \n   - [ ] TODO Korean\n   \n     - Implemented to a point.\n     \n   - [ ] TODO Persian\n    \n     - Implemented to a point.\n   \n   - [ ] TODO Sanskrit\n   \n       \n8. [ ] TODO Implement parsing of year \"shortcut\" word forms, like \"twenty o three\" \n\n9. [ ] TODO Implement parsing of numeric word forms for rationals, like \"five twelfths\" \n\n10. [X] DONE Translation function (from one language to another)\n\n------\n\n# Collaboration notes\n\n- The **main rule** is that the main branch should always be installable and pass all of its tests.\n\n- From the main rule it follows that new features are developed in separate branches or forks.\n    \n- The easiest way to collaborate is to create and commit new test files or corrections \n  to existing test files.\n  \n  - Then I would change the corresponding grammars rules and actions \n    in order the package to pass the tests.\n    \n- Please use [*Conventional Commits* (CC)](https://www.conventionalcommits.org/en/v1.0.0/). \n  \n  - Here is the CC short form stencil (in Raku):\n    `\u003ctype\u003e ['(' \u003cscope\u003e ')']? ':' \u003cdescription\u003e`.\n      \n  - See the recent commits in this repository for examples.\n    \n  - Here are additional examples of CC messages (each line is a separate message):\n \n```text  \nfeat:Implemented the parsing of Danish numeric word forms.\ndocs:Added documentation of right-to-left word forms parsing.\nfix(Persian):Corrected tests for numbers larger that 1000.\ntest:Added new corner cases tests.\ntest(Ukrainian):Added new tests.\n```   \n\n------\n\n## Acknowledgements\n\n- Thanks to [spyrettas](https://github.com/spyrettas) for:\n  - Riding \"shotgun\" during the initial implementation of the Greek role, actions, and tests\n  - Proofreading and correcting Greek tests and role\n- Thanks to [Denis](https://github.com/DenisVCode) for:\n  - Proofreading the Czech language unit tests and suggesting corrections.\n- Thanks to Aikerim Belispayeva, [aikerimbelis](https://github.com/aikerimbelis), for:\n  - Proofreading the Kazah language unit tests and suggesting corrections.\n- Thanks to Herbert Breunung, [lichtkind](https://github.com/lichtkind), for:\n  - Proofreading the German language unit tests\n  - Suggesting corrections and extensions\n  - Verifying the German numeric word forms parsing with the [DSL Translations](https://antononcube.shinyapps.io/DSL-evaluations/) interface\n- Thanks to Nora Popescu for:\n  - Bug reporting and suggestions for the Romanian language parser\n  - Verifying the Romanian numeric word forms parsing with the [DSL Translations](https://antononcube.shinyapps.io/DSL-evaluations/) interface\n  \n------\n\n## References\n\n[AAr1] Anton Antonov, \n[Raku::DSL::Shared](https://github.com/antononcube/Raku-DSL-Shared). \n\n[BL1] Brent \"Labster\" Laabs, \n[`Lingua::Number`](https://github.com/labster/p6-Lingua-Number).\n\n[SS1] Larry Wall, Steve Schulze, \n[Lingua::EN::Numbers](https://github.com/thundergnat/Lingua-EN-Numbers).\n\n------\n\nAnton Antonov   \nFlorida, USA   \nApril-May, 2021   \nOctober, 2022 (updated, separate executable doc)   \nMarch, 2023 (updated, Azerbaijani parsing)   \nJune, 2024 (updated, Bulgarian generation)   \nMarch-April, 2025 (updated; Kazakh, German, and Romanian parsing; Russian generation)     \nJune, 2025 (updated; Armenian parsing; Armenian and Japanese generation)   ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-lingua-numericwordforms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantononcube%2Fraku-lingua-numericwordforms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-lingua-numericwordforms/lists"}