https://github.com/rhasspy/unicode-rbnf
A pure Python implementation of ICU's rule-based number format engine
https://github.com/rhasspy/unicode-rbnf
Last synced: 10 days ago
JSON representation
A pure Python implementation of ICU's rule-based number format engine
- Host: GitHub
- URL: https://github.com/rhasspy/unicode-rbnf
- Owner: rhasspy
- License: mit
- Created: 2023-10-31T20:02:42.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-03-03T19:15:35.000Z (about 2 months ago)
- Last Synced: 2025-03-28T08:11:46.240Z (27 days ago)
- Language: Python
- Size: 183 KB
- Stars: 2
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Unicode RBNF
A pure Python implementation of [rule based number formatting](https://icu-project.org/docs/papers/a_rule_based_approach_to_number_spellout/) (RBNF) using the [Unicode Common Locale Data Repository](https://cldr.unicode.org) (CLDR).
This lets you spell out numbers for a large number of locales:
``` python
from unicode_rbnf import RbnfEngineengine = RbnfEngine.for_language("en")
assert engine.format_number(1234).text == "one thousand two hundred thirty-four"
```Different formatting purposes are supported as well, depending on the locale:
``` python
from unicode_rbnf import RbnfEngine, FormatPurposeengine = RbnfEngine.for_language("en")
assert engine.format_number(1999, FormatPurpose.CARDINAL).text == "one thousand nine hundred ninety-nine"
assert engine.format_number(1999, FormatPurpose.YEAR).text == "nineteen ninety-nine"
assert engine.format_number(11, FormatPurpose.ORDINAL).text == "eleventh"
```For locales with multiple genders, cases, etc., the different texts are accessible in the result of `format_number`:
``` python
from unicode_rbnf import RbnfEngineengine = RbnfEngine.for_language("de")
print(engine.format_number(1))
```Result:
```
FormatResult(
text='eins',
text_by_ruleset={
'spellout-numbering': 'eins',
'spellout-cardinal-neuter': 'ein',
'spellout-cardinal-masculine': 'ein',
'spellout-cardinal-feminine': 'eine',
'spellout-cardinal-n': 'einen',
'spellout-cardinal-r': 'einer',
'spellout-cardinal-s': 'eines',
'spellout-cardinal-m': 'einem'
}
)
```The `text` property of the result holds the text of the ruleset with the shortest name (least specific).
## Supported locales
See: https://github.com/unicode-org/cldr/tree/release-44/common/rbnf
## Engine implementation
Not [all features](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classRuleBasedNumberFormat.html) of the RBNF engine are implemented. The following features are available:
* Literal text (`hundred`)
* Quotient substitution (`<<` or `←←`)
* Reminder substitution (`>>` or `→→`)
* Optional substitution (`[...]`)
* Rule substituton (`←%ruleset_name←`)
* Rule replacement (`=%ruleset_name=`)
* Special rules:
* Negative numbers (`-x`)
* Improper fractions (`x.x`)
* Not a number (`NaN`)
* Infinity (`Inf`)
Some features that will need to be added eventually:* Proper fraction rules (`0.x`)
* Preceding reminder substitution (`>>>` or `→→→`)
* Number format strings (`==`)
* Decimal format patterns (`#,##0.00`)
* Plural replacements (`$(ordinal,one{st}...)`)