{"id":18487883,"url":"https://github.com/allo-media/text2num","last_synced_at":"2025-05-15T17:01:30.261Z","repository":{"id":37431282,"uuid":"150243703","full_name":"allo-media/text2num","owner":"allo-media","description":"Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.","archived":false,"fork":false,"pushed_at":"2025-02-12T13:36:49.000Z","size":243,"stargazers_count":105,"open_issues_count":15,"forks_count":48,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-31T22:23:31.125Z","etag":null,"topics":["english-nlp","french-nlp","python","words-to-numbers"],"latest_commit_sha":null,"homepage":"https://text2num.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/allo-media.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"docs/contributing.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-25T09:55:52.000Z","updated_at":"2025-02-12T13:36:54.000Z","dependencies_parsed_at":"2023-11-23T15:26:33.662Z","dependency_job_id":"1246b758-b516-42d0-97e0-1be5e45eaa57","html_url":"https://github.com/allo-media/text2num","commit_stats":{"total_commits":189,"total_committers":14,"mean_commits":13.5,"dds":0.4021164021164021,"last_synced_commit":"636301c729879b2104d2532b6ffda05b5b732108"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allo-media%2Ftext2num","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allo-media%2Ftext2num/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allo-media%2Ftext2num/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allo-media%2Ftext2num/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/allo-media","download_url":"https://codeload.github.com/allo-media/text2num/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247744294,"owners_count":20988781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["english-nlp","french-nlp","python","words-to-numbers"],"created_at":"2024-11-06T12:50:55.893Z","updated_at":"2025-04-07T23:04:20.137Z","avatar_url":"https://github.com/allo-media.png","language":"Python","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"text2num\n========\n\n|docs|\n\n\n``text2num`` is a python package that provides functions and parser classes for:\n\n- Parsing of numbers expressed as words in French, English, Spanish, Portuguese, German and Catalan and convert them to integer values.\n- Detection of ordinal, cardinal and decimal numbers in a stream of French, English, Spanish and Portuguese words and get their decimal digit representations. NOTE: Spanish does not support ordinal numbers yet.\n- Detection of ordinal, cardinal and decimal numbers in a German text (BETA). NOTE: No support for 'relaxed=False' yet (behaves like 'True' by default).\n\nCompatibility\n-------------\n\nTested on python 3.7. Requires Python \u003e= 3.6.\n\nLicense\n-------\n\nThis sofware is distributed under the MIT license of which you should have received a copy (see LICENSE file in this repository).\n\nInstallation\n------------\n\n``text2num`` does not depend on any other third party package.\n\nTo install text2num in your (virtual) environment::\n\n    pip install text2num\n\nThat's all folks!\n\nUsage examples\n--------------\n\nParse and convert\n~~~~~~~~~~~~~~~~~\n\n\nFrench examples:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import text2num\n    \u003e\u003e\u003e text2num('quatre-vingt-quinze', \"fr\")\n    95\n\n    \u003e\u003e\u003e text2num('nonante-cinq', \"fr\")\n    95\n\n    \u003e\u003e\u003e text2num('mille neuf cent quatre-vingt dix-neuf', \"fr\")\n    1999\n\n    \u003e\u003e\u003e text2num('dix-neuf cent quatre-vingt dix-neuf', \"fr\")\n    1999\n\n    \u003e\u003e\u003e text2num(\"cinquante et un million cinq cent soixante dix-huit mille trois cent deux\", \"fr\")\n    51578302\n\n    \u003e\u003e\u003e text2num('mille mille deux cents', \"fr\")\n    ValueError: invalid literal for text2num: 'mille mille deux cent'\n\n\nEnglish examples:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import text2num\n\n    \u003e\u003e\u003e text2num(\"fifty-one million five hundred seventy-eight thousand three hundred two\", \"en\")\n    51578302\n\n    \u003e\u003e\u003e text2num(\"eighty-one\", \"en\")\n    81\n\n\nRussian examples:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import text2num\n\n    \u003e\u003e\u003e text2num(\"пятьдесят один миллион пятьсот семьдесят восемь тысяч триста два\", \"ru\")\n    51578302\n\n    \u003e\u003e\u003e text2num(\"миллиард миллион тысяча один\", \"ru\")\n    1001001001\n\n    \u003e\u003e\u003e text2num(\"один миллиард один миллион одна тысяча один\", \"ru\")\n    1001001001\n\n    \u003e\u003e\u003e text2num(\"восемьдесят один\", \"ru\")\n    81\n\n\nSpanish examples:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import text2num\n    \u003e\u003e\u003e text2num(\"ochenta y uno\", \"es\")\n    81\n\n    \u003e\u003e\u003e text2num(\"nueve mil novecientos noventa y nueve\", \"es\")\n    9999\n\n    \u003e\u003e\u003e text2num(\"cincuenta y tres millones doscientos cuarenta y tres mil setecientos veinticuatro\", \"es\")\n    53243724\n\n\nPortuguese examples:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import text2num\n    \u003e\u003e\u003e text2num(\"trinta e dois\", \"pt\")\n    32\n\n    \u003e\u003e\u003e text2num(\"mil novecentos e seis\", \"pt\")\n    1906\n\n    \u003e\u003e\u003e text2num(\"vinte e quatro milhões duzentos mil quarenta e sete\", \"pt\")\n    24200047\n\n\nGerman examples:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import text2num\n\n    \u003e\u003e\u003e text2num(\"einundfünfzigmillionenfünfhundertachtundsiebzigtausenddreihundertzwei\", \"de\")\n    51578302\n\n    \u003e\u003e\u003e text2num(\"ein und achtzig\", \"de\")\n    81\n\n\nCatalan examples:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import text2num\n    \u003e\u003e\u003e text2num('noranta-cinc', \"ca\")\n    95\n\n    \u003e\u003e\u003e text2num('huitanta-u', \"ca\")\n    81\n\n    \u003e\u003e\u003e text2num('mil nou-cents noranta-nou', \"ca\")\n    1999\n\n    \u003e\u003e\u003e text2num(\"cinquanta-un milions cinc-cents setanta-vuit mil tres-cents dos\", \"ca\")\n    51578302\n\n    \u003e\u003e\u003e text2num('mil mil dos-cents', \"ca\")\n    ValueError: invalid literal for text2num: 'mil mil dos-cents'\n\n\nFind and transcribe\n~~~~~~~~~~~~~~~~~~~\n\nAny numbers, even ordinals.\n\nFrench:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import alpha2digit\n    \u003e\u003e\u003e sentence = (\n    ...     \"Huit cent quarante-deux pommes, vingt-cinq chiens, mille trois chevaux, \"\n    ...     \"douze mille six cent quatre-vingt-dix-huit clous.\\n\"\n    ...     \"Quatre-vingt-quinze vaut nonante-cinq. On tolère l'absence de tirets avant les unités : \"\n    ...     \"soixante seize vaut septante six.\\n\"\n    ...     \"Nombres en série : douze quinze zéro zéro quatre vingt cinquante-deux cent trois cinquante deux \"\n    ...     \"trente et un.\\n\"\n    ...     \"Ordinaux: cinquième troisième vingt et unième centième mille deux cent trentième.\\n\"\n    ...     \"Décimaux: douze virgule quatre-vingt dix-neuf, cent vingt virgule zéro cinq ; \"\n    ...     \"mais soixante zéro deux.\"\n    ... )\n    \u003e\u003e\u003e print(alpha2digit(sentence, \"fr\", ordinal_threshold=0))\n    842 pommes, 25 chiens, 1003 chevaux, 12698 clous.\n    95 vaut 95. On tolère l'absence de tirets avant les unités : 76 vaut 76.\n    Nombres en série : 12 15 004 20 52 103 52 31.\n    Ordinaux: 5ème 3ème 21ème 100ème 1230ème.\n    Décimaux: 12,99, 120,05 ; mais 60 02.\n\n    \u003e\u003e\u003e sentence = \"Cinquième premier second troisième vingt et unième centième mille deux cent trentième.\"\n    \u003e\u003e\u003e print(alpha2digit(sentence, \"fr\", ordinal_threshold=3))\n    5ème premier second troisième 21ème 100ème 1230ème.\n\n\nEnglish:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import alpha2digit\n\n    \u003e\u003e\u003e text = \"On May twenty-third, I bought twenty-five cows, twelve chickens and one hundred twenty five point forty kg of potatoes.\"\n    \u003e\u003e\u003e alpha2digit(text, \"en\")\n    'On May 23rd, I bought 25 cows, 12 chickens and 125.40 kg of potatoes.'\n\n\nRussian:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import alpha2digit\n\n    \u003e\u003e\u003e # дробная часть не обрабатывает уточнения вроде \"пять десятых\", \"двенадцать сотых\", \"сколько-то стотысячных\" и т.п., поэтому их лучше опускать\n    \u003e\u003e\u003e text = \"Двадцать пять коров, двенадцать сотен цыплят и сто двадцать пять точка сорок кг картофеля.\"\n    \u003e\u003e\u003e alpha2digit(text, \"ru\")\n    '25 коров, 1200 цыплят и 125.40 кг картофеля.'\n\n    \u003e\u003e\u003e text = \"каждый пятый на первый второй расчитайсь!\"\n    \u003e\u003e\u003e alpha2digit(text, 'ru', ordinal_threshold=0)\n    'каждый 5ый на 1ый 2ой расчитайсь!'\n\n\nSpanish (ordinals not supported yet):\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import alpha2digit\n\n    \u003e\u003e\u003e text = \"Compramos veinticinco vacas, doce gallinas y ciento veinticinco coma cuarenta kg de patatas.\"\n    \u003e\u003e\u003e alpha2digit(text, \"es\")\n    'Compramos 25 vacas, 12 gallinas y 125.40 kg de patatas.'\n\n    \u003e\u003e\u003e text = \"Tenemos mas veinte grados dentro y menos quince fuera.\"\n    \u003e\u003e\u003e alpha2digit(text, \"es\")\n    'Tenemos +20 grados dentro y -15 fuera.'\n\n\nPortuguese:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import alpha2digit\n\n    \u003e\u003e\u003e text = \"Comprámos vinte e cinco vacas, doze galinhas e cento vinte e cinco vírgula quarenta kg de batatas.\"\n    \u003e\u003e\u003e alpha2digit(text, \"pt\")\n    'Comprámos 25 vacas, 12 galinhas e 125,40 kg de batatas.'\n\n    \u003e\u003e\u003e text = \"Temos mais vinte graus dentro e menos quinze fora.\"\n    \u003e\u003e\u003e alpha2digit(text, \"pt\")\n    'Temos +20 graus dentro e -15 fora.'\n\n    \u003e\u003e\u003e text = \"Ordinais: quinto, terceiro, vigésimo, vigésimo primeiro, centésimo quarto\"\n    \u003e\u003e\u003e alpha2digit(text, \"pt\")\n    'Ordinais: 5º, terceiro, 20ª, 21º, 104º'\n\n\nGerman (BETA, Note: 'relaxed' parameter is not supported yet and 'True' by default):\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import alpha2digit\n\n    \u003e\u003e\u003e text = \"Ich habe fünfundzwanzig Kühe, zwölf Hühner und einhundertfünfundzwanzig kg Kartoffeln gekauft.\"\n    \u003e\u003e\u003e alpha2digit(text, \"de\")\n    'Ich habe 25 Kühe, 12 Hühner und 125 kg Kartoffeln gekauft.'\n\n    \u003e\u003e\u003e text = \"Die Temperatur beträgt minus fünfzehn Grad.\"\n    \u003e\u003e\u003e alpha2digit(text, \"de\")\n    'Die Temperatur beträgt -15 Grad.'\n\n    \u003e\u003e\u003e text = \"Die Telefonnummer lautet plus dreiunddreißig neun sechzig null sechs zwölf einundzwanzig.\"\n    \u003e\u003e\u003e alpha2digit(text, \"de\")\n    'Die Telefonnummer lautet +33 9 60 0 6 12 21.'\n\n    \u003e\u003e\u003e text = \"Der zweiundzwanzigste Januar zweitausendzweiundzwanzig.\"\n    \u003e\u003e\u003e alpha2digit(text, \"de\")\n    '22. Januar 2022'\n\n    \u003e\u003e\u003e text = \"Es ist ein Buch mit dreitausend Seiten aber nicht das erste.\"\n    \u003e\u003e\u003e alpha2digit(text, \"de\", ordinal_threshold=0)\n    'Es ist ein Buch mit 3000 Seiten aber nicht das 1..'\n\n    \u003e\u003e\u003e text = \"Pi ist drei Komma eins vier und so weiter, aber nicht drei Komma vierzehn :-p\"\n    \u003e\u003e\u003e alpha2digit(text, \"de\", ordinal_threshold=0)\n    'Pi ist 3,14 und so weiter, aber nicht 3 Komma 14 :-p'\n\n\nCatalan:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from text_to_num import alpha2digit\n    \u003e\u003e\u003e text = (\"Huit-centes quaranta-dos pomes, vint-i-cinc gossos, mil tres cavalls, dotze mil sis-cents noranta-huit claus.\\n Vuitanta-u és igual a huitanta-u.\\n Nombres en sèrie: dotze quinze zero zero quatre vint cinquanta-dos cent tres cinquanta-dos trenta-u.\\n Ordinals: cinquè tercera vint-i-uena centè mil dos-cents trentena.\\n Decimals: dotze coma noranta-nou, cent vint coma zero cinc; però seixanta zero dos.\")\n    \u003e\u003e\u003e print(alpha2digit(text, \"ca\", ordinal_threshold=0))\n    842 pomes, 25 gossos, 1003 cavalls, 12698 claus.\n    81 és igual a 81.\n    Nombres en sèrie: 12 15 004 20 52 103 52 31.\n    Ordinals: 5è 3a 21a 100è 1230a.\n    Decimals: 12,99, 120,05; però 60 02.\n\n    \u003e\u003e\u003e text = \"Cinqué primera segona tercer vint-i-ué centena mil dos-cents trenté.\"\n    \u003e\u003e\u003e print(alpha2digit(text, \"ca\", ordinal_threshold=3))\n    5é primera segona tercer 21é 100a 1230é.\n\n    \u003e\u003e\u003e text = \"Compràrem vint-i-cinc vaques, dotze gallines i cent vint-i-cinc coma quaranta kg de creïlles.\"\n    \u003e\u003e\u003e alpha2digit(text, \"ca\")\n    'Compràrem 25 vaques, 12 gallines i 125,40 kg de creïlles.'\n\n    \u003e\u003e\u003e text = \"Fa més vint graus dins i menys quinze fora.\"\n    \u003e\u003e\u003e alpha2digit(text, \"ca\")\n    'Fa +20 graus dins i -15 fora.'\n\n\nRead the complete documentation on `ReadTheDocs \u003chttp://text2num.readthedocs.io/\u003e`_.\n\nContribute\n----------\n\nJoin us on https://github.com/allo-media/text2num\n\n\n.. |docs| image:: https://readthedocs.org/projects/text2num/badge/?version=latest\n    :target: https://text2num.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallo-media%2Ftext2num","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fallo-media%2Ftext2num","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallo-media%2Ftext2num/lists"}