Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/SethMMorton/fastnumbers

Super-fast and clean conversions to numbers for Python.
https://github.com/SethMMorton/fastnumbers

c-extension conversion numbers optimization python utility utility-library

Last synced: about 2 months ago
JSON representation

Super-fast and clean conversions to numbers for Python.

Awesome Lists containing this project

README

        

fastnumbers
===========

.. image:: https://img.shields.io/pypi/v/fastnumbers.svg
:target: https://pypi.org/project/fastnumbers/

.. image:: https://img.shields.io/pypi/pyversions/fastnumbers.svg
:target: https://pypi.org/project/fastnumbers/

.. image:: https://img.shields.io/pypi/l/fastnumbers.svg
:target: https://github.com/SethMMorton/fastnumbers/blob/main/LICENSE

.. image:: https://github.com/SethMMorton/fastnumbers/workflows/Tests/badge.svg
:target: https://github.com/SethMMorton/fastnumbers/actions

.. image:: https://codecov.io/gh/SethMMorton/fastnumbers/branch/main/graph/badge.svg
:target: https://codecov.io/gh/SethMMorton/fastnumbers

.. image:: https://img.shields.io/pypi/dw/fastnumbers.svg
:target: https://pypi.org/project/fastnumbers/

Super-fast and clean conversions to numbers.

- Source Code: https://github.com/SethMMorton/fastnumbers
- Downloads: https://pypi.org/project/fastnumbers/
- Documentation: https://fastnumbers.readthedocs.io/
- `Quick Start`_
- `Timing`_
- `High-level Algorithm`_
- `How To Run Tests`_
- `History`_

``fastnumbers`` is a module with the following three objectives (in order
of decreasing importance as to why the module was created):

#. Provide a set of convenience functions that wrap calls to
``int`` and ``float`` and provides easy, concise, powerful, fast
and flexible error handling.
#. Provide a set of functions that can be used to rapidly identify if
an input *could* be converted to *int* or *float*.
#. Provide drop-in replacements for the Python built-in ``int`` and
``float`` that are on par or faster with the Python equivalents
(see the `Timing`_ section for details). These functions
should behave *identically* to the Python built-ins except for a few
specific corner-cases as mentioned in the
`API documentation for those functions `_.

- **PLEASE** read the quick start for these functions to fully
understand the caveats before using them.

**What kind of speedups can you expect?** Here are some highlights, but please
see the `Timing`_ section for the raw data if you want details.

- Up to 2x faster conversion of strings to integers than the built-in
``int()`` function
- Up to 5x faster conversion of strings to floats than the built-in
``float()`` function (possibly greater for very long strings)
- Up to 10x faster handling of errors during conversion than using
user-side error handling
- On top of the above, operations to convert a list of strings
(with the ``map`` option or ``try_array`` function) is 2x faster
than the equivalent list comprehension.

**NOTICE**: As of ``fastnumbers`` version 4.0.0, only Python >= 3.7 is
supported.

**NOTICE**: As of ``fastnumbers`` version 4.0.0, the functions ``fast_real``,
``fast_float``, ``fast_int``, ``fast_forceint``, ``isreal``, ``isfloat``,
``isint``, and ``isintlike`` have been deprecated and are replaced with
``try_real``, ``try_float``, ``try_int``, ``try_forceint``, ``check_real``,
``check_float``, ``check_int``, and ``check_intlike``, respectively. These
new functions have more flexible APIs and have names that better reflect
the intent of the functions. The old functions can still be used (they will
*never* be removed from ``fastnumbers``), but the new ones should be
preferred for new development.

**NOTICE**: As of ``fastnumbers`` version 4.0.0, ``query_type`` now sets
``allow_underscores`` to ``False`` by default instead of ``True``.

Quick Start
-----------

- `Error-handling Functions`_
- `Checking Functions`_
- `Drop-in Replacement Functions`_

There are three broad categories of functions exposed by ``fastnumbers``.
The below quick start will demonstrate each of these categories. The
quick start is "by example", and will show a sample interactive session
using the ``fastnumbers`` API.

Error-Handling Functions
++++++++++++++++++++++++

- `Error-handling function API `_
- `Fast operations on lists and other iterables`_
- `About the on_fail option`_
- `About the denoise option`_

``try_float`` will be used to demonstrate the functionality of the
``try_*`` functions.

.. code-block:: python

>>> from fastnumbers import RAISE, try_float
>>> # Convert string to a float
>>> try_float('56.07')
56.07
>>> # Integers are converted to floats
>>> try_float(54)
54.0
>>>
>>> # Unconvertable string returned as-is by default
>>> try_float('bad input')
'bad input'
>>> # Unconvertable strings can trigger a default value
>>> try_float('bad input', on_fail=0)
0
>>>
>>> # One can ask inf or nan to be substituted with another value
>>> try_float('nan')
nan
>>> try_float('nan', nan=0.0)
0.0
>>> try_float(float('nan'), nan=0.0)
0.0
>>> try_float('56.07', nan=0.0)
56.07
>>>
>>> # The default built-in float behavior can be triggered with
>>> # RAISE given to "on_fail".
>>> try_float('bad input', on_fail=RAISE) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError: invalid literal for float(): bad input
>>>
>>> # A function can be used to return an alternate value for invalid input
>>> try_float('bad input', on_fail=len)
9
>>> try_float(54, on_fail=len)
54.0
>>>
>>> # Single unicode characters can be converted.
>>> try_float('\u2164') # Roman numeral 5 (V)
5.0
>>> try_float('\u2466') # 7 enclosed in a circle
7.0

``try_int`` behaves the same as ``try_float``, but for integers.

.. code-block:: python

>>> from fastnumbers import try_int
>>> try_int('1234')
1234
>>> try_int('\u2466')
7

``try_real`` is like ``try_float`` or ``try_int`` depending
on if there is any fractional component of thi return value.

.. code-block:: python

>>> from fastnumbers import try_real
>>> try_real('56')
56
>>> try_real('56.0')
56
>>> try_real('56.0', coerce=False)
56.0
>>> try_real('56.07')
56.07
>>> try_real(56.07)
56.07
>>> try_real(56.0)
56
>>> try_real(56.0, coerce=False)
56.0

``try_forceint`` always returns an integer.

.. code-block:: python

>>> from fastnumbers import try_forceint
>>> try_forceint('56')
56
>>> try_forceint('56.0')
56
>>> try_forceint('56.07')
56
>>> try_forceint(56.07)
56

Fast operations on lists and other iterables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Each of the ``try_*`` functions have a ``map`` option causes the function
to accept an iterable of items to convert and returns a list. Using
``try_float`` as an example, the following are all functionally equivalent.

.. code-block:: python

>>> from fastnumbers import try_float
>>> iterable = ["5", "4.5", "34567.6", "32"]
>>> try_float(iterable, map=list) == list(map(try_float, iterable))
True
>>> try_float(iterable, map=list) == [try_float(x) for x in iterable]
True
>>> try_float(iterable, map=list) == list(try_float(iterable, map=True))
True

The difference is that the ``map`` option is 2x the speed of the list
comprehension method, and 1.5x the speed of the ``map`` method. The reason
is that it avoids Python function call overhead on each iteration. Note that
*True* causes the function to return an iterator, and *list* causes it to
return a ``list``. In practice the performance of these are similar
(see `Timing`_ for raw data).

If you need to store your output in a ``numpy`` array, you can use
``try_array`` to do this conversion directly. This function has some
additional handling for overflow that is not present in the other
``fastnumbers`` functions that may come in handy when dealing with
``numpy`` arrays.

.. code-block:: python

>>> from fastnumbers import try_array
>>> import numpy as np
>>> iterable = ["5", "4.5", "34567.6", "32"]
>>> np.array_equal(np.array(try_float(iterable, map=list), dtype=np.float64), try_array(iterable))
True

You will see about a 2x speedup of doing this in one step over converting
to a list then converting that list to an array.

About the ``on_fail`` option
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``on_fail`` option is a way for you to do *anything* in the event that
the given input cannot be converted to a number. It can

* return given object as-is if set to ``fastnumbers.INPUT`` (this is the default)
* raise a ``ValueError`` if set to ``fastnumbers.RAISE``
* return a default value if given any non-callable object
* call a function with the given object if given a single-argument callable

Below are a couple of ideas to get you thinking.

**NOTE**:: There is also an ``on_type_error`` option that behaves the same as
``on_fail`` except that a) it is triggered when the given object is of an
invalid type and b) the default value is ``fastnumbers.RAISE``, not
``fastnumbers.INPUT``.

.. code-block:: python

>>> from fastnumbers import INPUT, RAISE, try_float
>>> # You want to convert strings that can be converted to numbers, but
>>> # leave the rest as strings. Use fastnumbers.INPUT (the default)
>>> try_float('45.6')
45.6
>>> try_float('invalid input')
'invalid input'
>>> try_float('invalid input', on_fail=INPUT)
'invalid input'
>>>
>>>
>>>
>>> # You want to convert any invalid string to NaN
>>> try_float('45.6', on_fail=float('nan'))
45.6
>>> try_float('invalid input', on_fail=float('nan'))
nan
>>>
>>>
>>>
>>> # Simple callable case, send the input through some function to generate a number.
>>> try_float('invalid input', on_fail=lambda x: float(x.count('i'))) # count the 'i's
3.0
>>>
>>>
>>>
>>> # Suppose we know that our input could either be a number, or if not
>>> # then we know we just have to strip off parens to get to the number
>>> # e.g. the input could be '45' or '(45)'. Also, suppose that if it
>>> # still cannot be converted to a number we want to raise an exception.
>>> def strip_parens_and_try_again(x):
... return try_float(x.strip('()'), on_fail=RAISE)
...
>>> try_float('45', on_fail=strip_parens_and_try_again)
45.0
>>> try_float('(45)', on_fail=strip_parens_and_try_again)
45.0
>>> try_float('invalid input', on_fail=strip_parens_and_try_again) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError: invalid literal for float(): invalid input
>>>
>>>
>>>
>>> # Suppose that whenever an invalid input is given, it needs to be
>>> # logged and then a default value is returned.
>>> def log_and_default(x, log_method=print, default=0.0):
... log_method("The input {!r} is not valid!".format(x))
... return default
...
>>> try_float('45', on_fail=log_and_default)
45.0
>>> try_float('invalid input', on_fail=log_and_default)
The input 'invalid input' is not valid!
0.0
>>> try_float('invalid input', on_fail=lambda x: log_and_default(x, default=float('nan')))
The input 'invalid input' is not valid!
nan

About the ``denoise`` option
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``denoise`` option is available on the ``try_real`` and ``try_forceint`` options.
To best understand its usage, consider the following native Python behavior:

.. code-block:: python

>>> int(3.453e21)
3452999999999999737856
>>> int(float("3.453e21"))
3452999999999999737856
>>> # Most users would likely expect this result from decimal.Decimal
>>> import decimal
>>> int(decimal.Decimal("3.453e21"))
3453000000000000000000
>>> # But watch out, even decimal.Decimal doesn't help for float input
>>> import decimal
>>> int(decimal.Decimal(3.453e21))
3452999999999999737856

Because the conversion of a float to an int goes through the C ``double`` data type which
has inherent limitations on accuracy (See
`this Stack Overflow question for examples `_)
the resulting ``int`` result has "noise" digits that are not part of the original float
representation.

For functions where this makes sense, ``fastnumbers`` provides the ``denoise`` option to
give you the results that ``decimal.Decimal`` would give for strings containing floats.

.. code-block:: python

>>> from fastnumbers import try_real
>>> try_real(3.453e21)
3452999999999999737856
>>> try_real("3.453e21")
3452999999999999737856
>>> try_real(3.453e21, denoise=True)
3453000000000000000000
>>> try_real("3.453e21", denoise=True)
3453000000000000000000

Two things to keep in mind:

1. The ``denoise`` option adds additional overhead to the conversion calculation, so please consider
the trade-offs between speed and accuracy when determining whether or not to use it. It is
*significantly* faster than using ``decimal.Decimal``, but much slower than not using it at all.
2. For string input, ``denoise`` will return results identical to ``decimal.Decimal``. For float
input, ``denoise`` will return results that are accurate to about 15 digits (C ``double`` can
only store 16 decimal digits, so this means that only the last possible digit may not be accurate).

Checking Functions
++++++++++++++++++

- `Checking function API `_

``check_float`` will be used to demonstrate the functionality of the
``check_*`` functions. There is also the ``query_type`` function.

.. code-block:: python

>>> from fastnumbers import check_float
>>> from fastnumbers import ALLOWED, DISALLOWED, NUMBER_ONLY, STRING_ONLY
>>> # Check that a string can be converted to a float
>>> check_float('56')
True
>>> check_float('56', strict=True)
False
>>> check_float('56.07')
True
>>> check_float('56.07 lb')
False
>>>
>>> # Check if a given number is a float
>>> check_float(56.07)
True
>>> check_float(56)
False
>>>
>>> # Specify if only strings or only numbers are allowed
>>> check_float(56.07, consider=STRING_ONLY)
False
>>> check_float('56.07', consider=NUMBER_ONLY)
False
>>>
>>> # Customize handling for nan or inf (see API for more details)
>>> check_float('nan')
False
>>> check_float('nan', nan=ALLOWED)
True
>>> check_float(float('nan'))
True
>>> check_float(float('nan'), nan=DISALLOWED)
False

``check_int`` works the same as ``check_float``, but for integers.

.. code-block:: python

>>> from fastnumbers import check_int
>>> check_int('56')
True
>>> check_int(56)
True
>>> check_int('56.0')
False
>>> check_int(56.0)
False

``check_real`` is very permissive - any float or integer is accepted.

.. code-block:: python

>>> from fastnumbers import check_real
>>> check_real('56.0')
True
>>> check_real('56')
True
>>> check_real(56.0)
True
>>> check_real(56)
True

``check_intlike`` checks if a number is "int-like", if it has no
fractional component.

.. code-block:: python

>>> from fastnumbers import check_intlike
>>> check_intlike('56.0')
True
>>> check_intlike('56.7')
False
>>> check_intlike(56.0)
True
>>> check_intlike(56.7)
False

The ``query_type`` function can be used if you need to determine if
a value is one of many types, rather than whether or not it is one specific
type.

.. code-block:: python

>>> from fastnumbers import query_type
>>> query_type('56.0')

>>> query_type('56')

>>> query_type(56.0)

>>> query_type(56)

>>> query_type(56.0, coerce=True)

>>> query_type('56.0', allowed_types=(float, int))

>>> query_type('hey')

>>> query_type('hey', allowed_types=(float, int)) # returns None

Drop-in Replacement Functions
+++++++++++++++++++++++++++++

- `Drop-in replacement function API `_

**PLEASE** do not take it for granted that these functions will provide you
with a speedup - they may not. Every platform, compiler, and data-set is
different, and you should perform a timing test on your system with your data
to evaluate if you will see a benefit. As you can see from the data linked in
the `Timing`_ section, the amount of speedup you will get is particularly
data-dependent. *In general* you will see a performance boost for floats (and
this boost increases as the size of the float increases), but for integers it
is largely dependent on the length of the integer. You will likely *not* see
a performance boost if the input are already numbers instead of strings.

**NOTE**: in the below examples, we use ``from fastnumbers import int`` instead
of ``import fastnumbers``. This is because calling ``fastnumbers.int()`` is a
bit slower than just ``int()`` because Python has to first find ``fastnumbers``
in your namespace, then find ``int`` in the ``fastnumbers`` namespace, instead
of just finding ``int`` in your namespace - this will slow down the function
call and defeat the purpose of using ``fastnumbers``. If you do not want to
actually shadow the built-in ``int`` function, you can do
``from fastnumbers import int as fn_int`` or something like that.

.. code-block:: python

>>> # Use is identical to the built-in functions
>>> from fastnumbers import float, int
>>> float('10')
10.0
>>> int('10')
10
>>> float('bad input') #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError: invalid literal for float(): bad input

``real`` is provided to give a float or int depending
on the fractional component of the input.

.. code-block:: python

>>> from fastnumbers import real
>>> real('56.0')
56
>>> real('56.7')
56.7
>>> real('56.0', coerce=False)
56.0

Timing
------

Just how much faster is ``fastnumbers`` than a pure python implementation?
Please look https://github.com/SethMMorton/fastnumbers/tree/main/profiling.

High-Level Algorithm
--------------------

For integers, CPython goes to great lengths to ensure that your string input
is converted to a number *correctly* and *losslessly* (you can prove this to
yourself by examining the source code for
`integer conversions `_).
This extra effort is only needed for integers that cannot fit into a 64-bit
integer data type - for those that can, a naive algorithm of < 10 lines
of C code is sufficient and significantly faster. ``fastnumbers`` uses a
heuristic to determine if the input can be safely converted with the much
faster naive algorithm, and if so it does so, falling back on
the CPython implementation for longer input strings.
Most real-world numbers pass the heuristic and so you should generally see
improved performance with ``fastnumbers`` for integers.

For floats, ``fastnumbers`` utilizes the ultra-fast
`fast_float::from_chars `_ function
to convert strings representing floats into a C ``double`` both quickly *and
safely* - the conversion provides the same accuracy as the CPython
`float conversion function `_
but instead of scaling linearly with length of the input string it seems
to have roughly constant performance. By completely bypassing the CPython
converter we get significant performance gains with no penalty, so you
should always see improved performance with ``fastnumbers`` for floats.

Installation
------------

Use ``pip``!

.. code-block::

$ pip install fastnumbers

How to Run Tests
----------------

Please note that ``fastnumbers`` is NOT set-up to support
``python setup.py test``.

The recommended way to run tests is with
`tox `_.
Suppose you want to run tests for Python 3.8 - you can run tests by simply
executing the following:

.. code-block:: sh

$ tox run -e py38

``tox`` will create virtual a virtual environment for your tests and install
all the needed testing requirements for you.

If you want to run testing on all supported Python versions you can simply execute

.. code-block:: sh

$ tox run

You can change the how much "random" input your tests will try with

.. code-block:: sh

# Run fewer tests with "random" input - much faster
$ tox run -- --hypothesis-profile fast

# Run more tests with "random" input - takes much longer but is more thorough
$ tox run -- --hypothesis-profile thorough

If you want to run the performce analysis yourself, you can execute

.. code-block:: sh

# This assumes Python 3.9 - adjust for the version you want to profile
$ tox run -e py39-prof

If you do not wish to use ``tox``, you can install the testing dependencies with the
``dev-requirements.txt`` file and then run the tests manually using
`pytest `_.

.. code-block:: sh

$ pip install -r dev/requirements.txt
$ pytest

Author
------

Seth M. Morton

History
-------

Please visit the changelog `on GitHub `_.