https://github.com/SethMMorton/fastnumbers

Super-fast and clean conversions to numbers for Python.
https://github.com/SethMMorton/fastnumbers
c-extension conversion numbers optimization python utility utility-library
Last synced: 8 months ago
JSON representation
Super-fast and clean conversions to numbers for Python.
Host: GitHub
URL: https://github.com/SethMMorton/fastnumbers
Owner: SethMMorton
License: mit
Created: 2014-08-05T00:44:53.000Z (over 11 years ago)
Default Branch: main
Last Pushed: 2024-10-24T02:21:38.000Z (about 1 year ago)
Last Synced: 2024-10-24T21:53:45.941Z (about 1 year ago)
Topics: c-extension, conversion, numbers, optimization, python, utility, utility-library
Language: C++
Homepage: https://pypi.org/project/fastnumbers/
Size: 1.5 MB
Stars: 105
Watchers: 6
Forks: 13
Open Issues: 2
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project

README

          fastnumbers

===========

.. image:: https://img.shields.io/pypi/v/fastnumbers.svg

    :target: https://pypi.org/project/fastnumbers/

.. image:: https://img.shields.io/pypi/pyversions/fastnumbers.svg

    :target: https://pypi.org/project/fastnumbers/

.. image:: https://img.shields.io/pypi/l/fastnumbers.svg

    :target: https://github.com/SethMMorton/fastnumbers/blob/main/LICENSE

.. image:: https://github.com/SethMMorton/fastnumbers/workflows/Tests/badge.svg

    :target: https://github.com/SethMMorton/fastnumbers/actions

.. image:: https://codecov.io/gh/SethMMorton/fastnumbers/branch/main/graph/badge.svg

    :target: https://codecov.io/gh/SethMMorton/fastnumbers

.. image:: https://img.shields.io/pypi/dw/fastnumbers.svg

    :target: https://pypi.org/project/fastnumbers/

Super-fast and clean conversions to numbers.

    - Source Code: https://github.com/SethMMorton/fastnumbers

    - Downloads: https://pypi.org/project/fastnumbers/

    - Documentation: https://fastnumbers.readthedocs.io/

    - `Quick Start`_

    - `Timing`_

    - `High-level Algorithm`_

    - `How To Run Tests`_

    - `History`_

``fastnumbers`` is a module with the following three objectives (in order

of decreasing importance as to why the module was created):

    #. Provide a set of convenience functions that wrap calls to

       ``int`` and ``float`` and provides easy, concise, powerful, fast

       and flexible error handling.

    #. Provide a set of functions that can be used to rapidly identify if

       an input *could* be converted to *int* or *float*.

    #. Provide drop-in replacements for the Python built-in ``int`` and

       ``float`` that are on par or faster with the Python equivalents

       (see the `Timing`_ section for details). These functions

       should behave *identically* to the Python built-ins except for a few

       specific corner-cases as mentioned in the

       `API documentation for those functions `_.

       - **PLEASE** read the quick start for these functions to fully

         understand the caveats before using them.

**What kind of speedups can you expect?** Here are some highlights, but please

see the `Timing`_ section for the raw data if you want details.

    - Up to 2x faster conversion of strings to integers than the built-in

      ``int()`` function

    - Up to 5x faster conversion of strings to floats than the built-in

      ``float()`` function (possibly greater for very long strings)

    - Up to 10x faster handling of errors during conversion than using

      user-side error handling

    - On top of the above, operations to convert a list of strings

      (with the ``map`` option or ``try_array`` function) is 2x faster

      than the equivalent list comprehension.

**NOTICE**: As of ``fastnumbers`` version 4.0.0, only Python >= 3.7 is

supported.

**NOTICE**: As of ``fastnumbers`` version 4.0.0, the functions ``fast_real``,

``fast_float``, ``fast_int``, ``fast_forceint``, ``isreal``, ``isfloat``,

``isint``, and ``isintlike`` have been deprecated and are replaced with

``try_real``, ``try_float``, ``try_int``, ``try_forceint``, ``check_real``,

``check_float``, ``check_int``, and ``check_intlike``, respectively. These

new functions have more flexible APIs and have names that better reflect

the intent of the functions. The old functions can still be used (they will

*never* be removed from ``fastnumbers``), but the new ones should be

preferred for new development.

**NOTICE**: As of ``fastnumbers`` version 4.0.0, ``query_type`` now sets

``allow_underscores`` to ``False`` by default instead of ``True``.

Quick Start

-----------

- `Error-handling Functions`_

- `Checking Functions`_

- `Drop-in Replacement Functions`_

There are three broad categories of functions exposed by ``fastnumbers``.

The below quick start will demonstrate each of these categories. The

quick start is "by example", and will show a sample interactive session

using the ``fastnumbers`` API.

Error-Handling Functions

++++++++++++++++++++++++

- `Error-handling function API `_

- `Fast operations on lists and other iterables`_

- `About the on_fail option`_

- `About the denoise option`_

``try_float`` will be used to demonstrate the functionality of the

``try_*`` functions.

.. code-block:: python

    >>> from fastnumbers import RAISE, try_float

    >>> # Convert string to a float

    >>> try_float('56.07')

    56.07

    >>> # Integers are converted to floats

    >>> try_float(54)

    54.0

    >>>

    >>> # Unconvertable string returned as-is by default

    >>> try_float('bad input')

    'bad input'

    >>> # Unconvertable strings can trigger a default value

    >>> try_float('bad input', on_fail=0)

    0

    >>>

    >>> # One can ask inf or nan to be substituted with another value

    >>> try_float('nan')

    nan

    >>> try_float('nan', nan=0.0)

    0.0

    >>> try_float(float('nan'), nan=0.0)

    0.0

    >>> try_float('56.07', nan=0.0)

    56.07

    >>>

    >>> # The default built-in float behavior can be triggered with

    >>> # RAISE given to "on_fail".

    >>> try_float('bad input', on_fail=RAISE) #doctest: +IGNORE_EXCEPTION_DETAIL

    Traceback (most recent call last):

      ...

    ValueError: invalid literal for float(): bad input

    >>>

    >>> # A function can be used to return an alternate value for invalid input

    >>> try_float('bad input', on_fail=len)

    9

    >>> try_float(54, on_fail=len)

    54.0

    >>>

    >>> # Single unicode characters can be converted.

    >>> try_float('\u2164')  # Roman numeral 5 (V)

    5.0

    >>> try_float('\u2466')  # 7 enclosed in a circle

    7.0

``try_int`` behaves the same as ``try_float``, but for integers.

.. code-block:: python

    >>> from fastnumbers import try_int

    >>> try_int('1234')

    1234

    >>> try_int('\u2466')

    7

``try_real`` is like ``try_float`` or ``try_int`` depending

on if there is any fractional component of thi return value.

.. code-block:: python

    >>> from fastnumbers import try_real

    >>> try_real('56')

    56

    >>> try_real('56.0')

    56

    >>> try_real('56.0', coerce=False)

    56.0

    >>> try_real('56.07')

    56.07

    >>> try_real(56.07)

    56.07

    >>> try_real(56.0)

    56

    >>> try_real(56.0, coerce=False)

    56.0

``try_forceint`` always returns an integer.

.. code-block:: python

    >>> from fastnumbers import try_forceint

    >>> try_forceint('56')

    56

    >>> try_forceint('56.0')

    56

    >>> try_forceint('56.07')

    56

    >>> try_forceint(56.07)

    56

Fast operations on lists and other iterables

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Each of the ``try_*`` functions have a ``map`` option causes the function

to accept an iterable of items to convert and returns a list. Using

``try_float`` as an example, the following are all functionally equivalent.

.. code-block:: python

    >>> from fastnumbers import try_float

    >>> iterable = ["5", "4.5", "34567.6", "32"]

    >>> try_float(iterable, map=list) == list(map(try_float, iterable))

    True

    >>> try_float(iterable, map=list) == [try_float(x) for x in iterable]

    True

    >>> try_float(iterable, map=list) == list(try_float(iterable, map=True))

    True

The difference is that the ``map`` option is 2x the speed of the list

comprehension method, and 1.5x the speed of the ``map`` method. The reason

is that it avoids Python function call overhead on each iteration. Note that

*True* causes the function to return an iterator, and *list* causes it to

return a ``list``. In practice the performance of these are similar

(see `Timing`_ for raw data).

If you need to store your output in a ``numpy`` array, you can use

``try_array`` to do this conversion directly. This function has some

additional handling for overflow that is not present in the other

``fastnumbers`` functions that may come in handy when dealing with

``numpy`` arrays.

.. code-block:: python

    >>> from fastnumbers import try_array

    >>> import numpy as np

    >>> iterable = ["5", "4.5", "34567.6", "32"]

    >>> np.array_equal(np.array(try_float(iterable, map=list), dtype=np.float64), try_array(iterable))

    True

You will see about a 2x speedup of doing this in one step over converting

to a list then converting that list to an array.

About the ``on_fail`` option

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``on_fail`` option is a way for you to do *anything* in the event that

the given input cannot be converted to a number. It can

* return given object as-is if set to ``fastnumbers.INPUT`` (this is the default)

* raise a ``ValueError`` if set to ``fastnumbers.RAISE``

* return a default value if given any non-callable object

* call a function with the given object if given a single-argument callable

Below are a couple of ideas to get you thinking.

**NOTE**:: There is also an ``on_type_error`` option that behaves the same as

``on_fail`` except that a) it is triggered when the given object is of an

invalid type and b) the default value is ``fastnumbers.RAISE``, not

``fastnumbers.INPUT``.

.. code-block:: python

    >>> from fastnumbers import INPUT, RAISE, try_float

    >>> # You want to convert strings that can be converted to numbers, but

    >>> # leave the rest as strings. Use fastnumbers.INPUT (the default)

    >>> try_float('45.6')

    45.6

    >>> try_float('invalid input')

    'invalid input'

    >>> try_float('invalid input', on_fail=INPUT)

    'invalid input'

    >>>

    >>>

    >>>

    >>> # You want to convert any invalid string to NaN

    >>> try_float('45.6', on_fail=float('nan'))

    45.6

    >>> try_float('invalid input', on_fail=float('nan'))

    nan

    >>>

    >>>

    >>>

    >>> # Simple callable case, send the input through some function to generate a number.

    >>> try_float('invalid input', on_fail=lambda x: float(x.count('i')))  # count the 'i's

    3.0

    >>>

    >>>

    >>>

    >>> # Suppose we know that our input could either be a number, or if not

    >>> # then we know we just have to strip off parens to get to the number

    >>> # e.g. the input could be '45' or '(45)'. Also, suppose that if it

    >>> # still cannot be converted to a number we want to raise an exception.

    >>> def strip_parens_and_try_again(x):

    ...     return try_float(x.strip('()'), on_fail=RAISE)

    ...

    >>> try_float('45', on_fail=strip_parens_and_try_again)

    45.0

    >>> try_float('(45)', on_fail=strip_parens_and_try_again)

    45.0

    >>> try_float('invalid input', on_fail=strip_parens_and_try_again) #doctest: +IGNORE_EXCEPTION_DETAIL

    Traceback (most recent call last):

      ...

    ValueError: invalid literal for float(): invalid input

    >>>

    >>>

    >>>

    >>> # Suppose that whenever an invalid input is given, it needs to be

    >>> # logged and then a default value is returned.

    >>> def log_and_default(x, log_method=print, default=0.0):

    ...     log_method("The input {!r} is not valid!".format(x))

    ...     return default

    ...

    >>> try_float('45', on_fail=log_and_default)

    45.0

    >>> try_float('invalid input', on_fail=log_and_default)

    The input 'invalid input' is not valid!

    0.0

    >>> try_float('invalid input', on_fail=lambda x: log_and_default(x, default=float('nan')))

    The input 'invalid input' is not valid!

    nan

About the ``denoise`` option

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``denoise`` option is available on the ``try_real`` and ``try_forceint`` options.

To best understand its usage, consider the following native Python behavior:

.. code-block:: python

    >>> int(3.453e21)

    3452999999999999737856

    >>> int(float("3.453e21"))

    3452999999999999737856

    >>> # Most users would likely expect this result from decimal.Decimal

    >>> import decimal

    >>> int(decimal.Decimal("3.453e21"))

    3453000000000000000000

    >>> # But watch out, even decimal.Decimal doesn't help for float input

    >>> import decimal

    >>> int(decimal.Decimal(3.453e21))

    3452999999999999737856

Because the conversion of a float to an int goes through the C ``double`` data type which

has inherent limitations on accuracy (See

`this Stack Overflow question for examples `_)

the resulting ``int`` result has "noise" digits that are not part of the original float

representation.

For functions where this makes sense, ``fastnumbers`` provides the ``denoise`` option to

give you the results that ``decimal.Decimal`` would give for strings containing floats.

.. code-block:: python

    >>> from fastnumbers import try_real

    >>> try_real(3.453e21)

    3452999999999999737856

    >>> try_real("3.453e21")

    3452999999999999737856

    >>> try_real(3.453e21, denoise=True)

    3453000000000000000000

    >>> try_real("3.453e21", denoise=True)

    3453000000000000000000

Two things to keep in mind:

1. The ``denoise`` option adds additional overhead to the conversion calculation, so please consider

   the trade-offs between speed and accuracy when determining whether or not to use it. It is

   *significantly* faster than using ``decimal.Decimal``, but much slower than not using it at all.

2. For string input, ``denoise`` will return results identical to ``decimal.Decimal``. For float

   input, ``denoise`` will return results that are accurate to about 15 digits (C ``double`` can

   only store 16 decimal digits, so this means that only the last possible digit may not be accurate).

Checking Functions

++++++++++++++++++

- `Checking function API `_

``check_float`` will be used to demonstrate the functionality of the

``check_*`` functions. There is also the ``query_type`` function.

.. code-block:: python

    >>> from fastnumbers import check_float

    >>> from fastnumbers import ALLOWED, DISALLOWED, NUMBER_ONLY, STRING_ONLY

    >>> # Check that a string can be converted to a float

    >>> check_float('56')

    True

    >>> check_float('56', strict=True)

    False

    >>> check_float('56.07')

    True

    >>> check_float('56.07 lb')

    False

    >>>

    >>> # Check if a given number is a float

    >>> check_float(56.07)

    True

    >>> check_float(56)

    False

    >>>

    >>> # Specify if only strings or only numbers are allowed

    >>> check_float(56.07, consider=STRING_ONLY)

    False

    >>> check_float('56.07', consider=NUMBER_ONLY)

    False

    >>>

    >>> # Customize handling for nan or inf (see API for more details)

    >>> check_float('nan')

    False

    >>> check_float('nan', nan=ALLOWED)

    True

    >>> check_float(float('nan'))

    True

    >>> check_float(float('nan'), nan=DISALLOWED)

    False

``check_int`` works the same as ``check_float``, but for integers.

.. code-block:: python

    >>> from fastnumbers import check_int

    >>> check_int('56')

    True

    >>> check_int(56)

    True

    >>> check_int('56.0')

    False

    >>> check_int(56.0)

    False

``check_real`` is very permissive - any float or integer is accepted.

.. code-block:: python

    >>> from fastnumbers import check_real

    >>> check_real('56.0')

    True

    >>> check_real('56')

    True

    >>> check_real(56.0)

    True

    >>> check_real(56)

    True

``check_intlike`` checks if a number is "int-like", if it has no

fractional component.

.. code-block:: python

    >>> from fastnumbers import check_intlike

    >>> check_intlike('56.0')

    True

    >>> check_intlike('56.7')

    False

    >>> check_intlike(56.0)

    True

    >>> check_intlike(56.7)

    False

The ``query_type`` function can be used if you need to determine if

a value is one of many types, rather than whether or not it is one specific

type.

.. code-block:: python

    >>> from fastnumbers import query_type

    >>> query_type('56.0')

    

    >>> query_type('56')

    

    >>> query_type(56.0)

    

    >>> query_type(56)

    

    >>> query_type(56.0, coerce=True)

    

    >>> query_type('56.0', allowed_types=(float, int))

    

    >>> query_type('hey')

    

    >>> query_type('hey', allowed_types=(float, int))  # returns None

Drop-in Replacement Functions

+++++++++++++++++++++++++++++

- `Drop-in replacement function API `_

**PLEASE** do not take it for granted that these functions will provide you

with a speedup - they may not. Every platform, compiler, and data-set is

different, and you should perform a timing test on your system with your data

to evaluate if you will see a benefit. As you can see from the data linked in

the `Timing`_ section, the amount of speedup you will get is particularly

data-dependent. *In general* you will see a performance boost for floats (and

this boost increases as the size of the float increases), but for integers it

is largely dependent on the length of the integer. You will likely *not* see

a performance boost if the input are already numbers instead of strings.

**NOTE**: in the below examples, we use ``from fastnumbers import int`` instead

of ``import fastnumbers``. This is because calling ``fastnumbers.int()`` is a

bit slower than just ``int()`` because Python has to first find ``fastnumbers``

in your namespace, then find ``int`` in the ``fastnumbers`` namespace, instead

of just finding ``int`` in your namespace - this will slow down the function

call and defeat the purpose of using ``fastnumbers``. If you do not want to

actually shadow the built-in ``int`` function, you can do

``from fastnumbers import int as fn_int`` or something like that.

.. code-block:: python

    >>> # Use is identical to the built-in functions

    >>> from fastnumbers import float, int

    >>> float('10')

    10.0

    >>> int('10')

    10

    >>> float('bad input') #doctest: +IGNORE_EXCEPTION_DETAIL

    Traceback (most recent call last):

      ...

    ValueError: invalid literal for float(): bad input

``real`` is provided to give a float or int depending

on the fractional component of the input.

.. code-block:: python

    >>> from fastnumbers import real

    >>> real('56.0')

    56

    >>> real('56.7')

    56.7

    >>> real('56.0', coerce=False)

    56.0

Timing

------

Just how much faster is ``fastnumbers`` than a pure python implementation?

Please look https://github.com/SethMMorton/fastnumbers/tree/main/profiling.

High-Level Algorithm

--------------------

For integers, CPython goes to great lengths to ensure that your string input

is converted to a number *correctly* and *losslessly* (you can prove this to

yourself by examining the source code for

`integer conversions `_).

This extra effort is only needed for integers that cannot fit into a 64-bit

integer data type - for those that can, a naive algorithm of < 10 lines

of C code is sufficient and significantly faster. ``fastnumbers`` uses a

heuristic to determine if the input can be safely converted with the much

faster naive algorithm, and if so it does so, falling back on

the CPython implementation for longer input strings.

Most real-world numbers pass the heuristic and so you should generally see

improved performance with ``fastnumbers`` for integers.

For floats, ``fastnumbers`` utilizes the ultra-fast

`fast_float::from_chars `_ function

to convert strings representing floats into a C ``double`` both quickly *and

safely* - the conversion provides the same accuracy as the CPython

`float conversion function `_

but instead of scaling linearly with length of the input string it seems

to have roughly constant performance. By completely bypassing the CPython

converter we get significant performance gains with no penalty, so you

should always see improved performance with ``fastnumbers`` for floats.

Installation

------------

Use ``pip``!

.. code-block::

    $ pip install fastnumbers

How to Run Tests

----------------

Please note that ``fastnumbers`` is NOT set-up to support

``python setup.py test``.

The recommended way to run tests is with

`tox `_.

Suppose you want to run tests for Python 3.8 - you can run tests by simply

executing the following:

.. code-block:: sh

    $ tox run -e py38

``tox`` will create virtual a virtual environment for your tests and install

all the needed testing requirements for you.

If you want to run testing on all supported Python versions you can simply execute

.. code-block:: sh

    $ tox run

You can change the how much "random" input your tests will try with

.. code-block:: sh

    # Run fewer tests with "random" input - much faster

    $ tox run -- --hypothesis-profile fast

    # Run more tests with "random" input - takes much longer but is more thorough

    $ tox run -- --hypothesis-profile thorough

If you want to run the performce analysis yourself, you can execute

.. code-block:: sh

    # This assumes Python 3.9 - adjust for the version you want to profile

    $ tox run -e py39-prof

If you do not wish to use ``tox``, you can install the testing dependencies with the

``dev-requirements.txt`` file and then run the tests manually using

`pytest `_.

.. code-block:: sh

    $ pip install -r dev/requirements.txt

    $ pytest

Author

------

Seth M. Morton

History

-------

Please visit the changelog `on GitHub `_.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SethMMorton/fastnumbers

Awesome Lists containing this project

README