Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jaidevd/numerizer
A Python module to convert natural language numerics into ints and floats.
https://github.com/jaidevd/numerizer
information-extraction nlp regular-expressions spacy spacy-extension
Last synced: 7 days ago
JSON representation
A Python module to convert natural language numerics into ints and floats.
- Host: GitHub
- URL: https://github.com/jaidevd/numerizer
- Owner: jaidevd
- License: mit
- Created: 2019-12-02T07:00:34.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-09-26T08:33:18.000Z (4 months ago)
- Last Synced: 2025-01-11T03:06:54.375Z (14 days ago)
- Topics: information-extraction, nlp, regular-expressions, spacy, spacy-extension
- Language: Python
- Homepage:
- Size: 50.8 KB
- Stars: 226
- Watchers: 8
- Forks: 24
- Open Issues: 6
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
numerizer
=========A Python module to convert natural language numerics into ints and floats.
This is a port of the Ruby gem `numerizer
`_Numerizer has been tested on Python 3.9, 3.10 and 3.11.
Installation
------------The numerizer library can be installed from PyPI as follows:
.. code:: bash
$ pip install numerizer
Usage
-----.. code:: python
>>> from numerizer import numerize
>>> numerize('forty two')
'42'
>>> numerize('forty-two')
'42'
>>> numerize('four hundred and sixty two')
'462'
>>> numerize('one fifty')
'150'
>>> numerize('twelve hundred')
'1200'
>>> numerize('twenty one thousand four hundred and seventy three')
'21473'
>>> numerize('one million two hundred and fifty thousand and seven')
'1250007'
>>> numerize('one billion and one')
'1000000001'
>>> numerize('nine and three quarters')
'9.75'
>>> numerize('platform nine and three quarters')
'platform 9.75'Using the SpaCy extension
^^^^^^^^^^^^^^^^^^^^^^^^^Since version 0.2, numerizer is available as a `SpaCy extension `_.
Any named entities of a quantitative nature within a SpaCy document can be numerized as follows:
.. code:: python
>>> from spacy import load
>>> nlp = load('en_core_web_sm') # or load any other spaCy model
>>> doc = nlp('The projected revenue for the next quarter is over two million dollars.')
>>> doc._.numerize()
{the next quarter: 'the next 1/4', over two million dollars: 'over 2000000 dollars'}Users can specify which entity types are to be numerized, by using the `labels` argument in the extension function, as follows:
.. code:: python
>>> doc._.numerize(labels=['MONEY']) # only numerize entities of type 'MONEY'
{over two million dollars: 'over 2000000 dollars'}The extension is available for tokens and spans as well.
.. code:: python
>>> two_million = doc[-4:-2] # span corresponding to "two million"
>>> two_million._.numerize()
'2000000'
>>> quarter = doc[6] # token corresponding to "quarter"
>>> quarter._.numerized
'1/4'Extras
------For R users, a wrapper library has been developed by `@amrrs `_. Try it out `here `_.