An open API service indexing awesome lists of open source software.

https://github.com/thombashi/dataproperty

A Python library for extract property from data.
https://github.com/thombashi/dataproperty

data property python python-library

Last synced: 6 months ago
JSON representation

A Python library for extract property from data.

Awesome Lists containing this project

README

          

.. contents:: **DataProperty**
:backlinks: top
:local:

Summary
=======
A Python library for extract property from data.

.. image:: https://badge.fury.io/py/DataProperty.svg
:target: https://badge.fury.io/py/DataProperty
:alt: PyPI package version

.. image:: https://anaconda.org/conda-forge/DataProperty/badges/version.svg
:target: https://anaconda.org/conda-forge/DataProperty
:alt: conda-forge package version

.. image:: https://img.shields.io/pypi/pyversions/DataProperty.svg
:target: https://pypi.org/project/DataProperty
:alt: Supported Python versions

.. image:: https://img.shields.io/pypi/implementation/DataProperty.svg
:target: https://pypi.org/project/DataProperty
:alt: Supported Python implementations

.. image:: https://github.com/thombashi/DataProperty/actions/workflows/ci.yml/badge.svg
:target: https://github.com/thombashi/DataProperty/actions/workflows/ci.yml
:alt: CI status of Linux/macOS/Windows

.. image:: https://coveralls.io/repos/github/thombashi/DataProperty/badge.svg?branch=master
:target: https://coveralls.io/github/thombashi/DataProperty?branch=master
:alt: Test coverage

.. image:: https://github.com/thombashi/DataProperty/actions/workflows/github-code-scanning/codeql/badge.svg
:target: https://github.com/thombashi/DataProperty/actions/workflows/github-code-scanning/codeql
:alt: CodeQL

Installation
============

Installation: pip
------------------------------
::

pip install DataProperty

Installation: conda
------------------------------
::

conda install -c conda-forge dataproperty

Installation: apt
------------------------------
::

sudo add-apt-repository ppa:thombashi/ppa
sudo apt update
sudo apt install python3-dataproperty

Usage
=====

Extract property of data
------------------------

e.g. Extract a ``float`` value property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

>>> from dataproperty import DataProperty
>>> DataProperty(-1.1)
data=-1.1, type=REAL_NUMBER, align=right, ascii_width=4, int_digits=1, decimal_places=1, extra_len=1

e.g. Extract a ``int`` value property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

>>> from dataproperty import DataProperty
>>> DataProperty(123456789)
data=123456789, type=INTEGER, align=right, ascii_width=9, int_digits=9, decimal_places=0, extra_len=0

e.g. Extract a ``str`` (ascii) value property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

>>> from dataproperty import DataProperty
>>> DataProperty("sample string")
data=sample string, type=STRING, align=left, length=13, ascii_width=13, extra_len=0

e.g. Extract a ``str`` (multi-byte) value property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

>>> from dataproperty import DataProperty
>>> str(DataProperty("吾輩は猫である"))
data=吾輩は猫である, type=STRING, align=left, length=7, ascii_width=14, extra_len=0

e.g. Extract a time (``datetime``) value property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

>>> import datetime
>>> from dataproperty import DataProperty
>>> DataProperty(datetime.datetime(2017, 1, 1, 0, 0, 0))
data=2017-01-01 00:00:00, type=DATETIME, align=left, ascii_width=19, extra_len=0

e.g. Extract a ``bool`` value property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

>>> from dataproperty import DataProperty
>>> DataProperty(True)
data=True, type=BOOL, align=left, ascii_width=4, extra_len=0

Extract data property for each element from a matrix
----------------------------------------------------
``DataPropertyExtractor.to_dp_matrix`` method returns a matrix of ``DataProperty`` instances from a data matrix.
An example data set and the result are as follows:

:Sample Code:
.. code:: python

import datetime
from dataproperty import DataPropertyExtractor

dp_extractor = DataPropertyExtractor()
dt = datetime.datetime(2017, 1, 1, 0, 0, 0)
inf = float("inf")
nan = float("nan")

dp_matrix = dp_extractor.to_dp_matrix([
[1, 1.1, "aa", 1, 1, True, inf, nan, dt],
[2, 2.2, "bbb", 2.2, 2.2, False, "inf", "nan", dt],
[3, 3.33, "cccc", -3, "ccc", "true", inf, "NAN", "2017-01-01T01:23:45+0900"],
])

for row, dp_list in enumerate(dp_matrix):
for col, dp in enumerate(dp_list):
print("row={:d}, col={:d}, {}".format(row, col, str(dp)))

:Output:
::

row=0, col=0, data=1, type=INTEGER, align=right, ascii_width=1, int_digits=1, decimal_places=0, extra_len=0
row=0, col=1, data=1.1, type=REAL_NUMBER, align=right, ascii_width=3, int_digits=1, decimal_places=1, extra_len=0
row=0, col=2, data=aa, type=STRING, align=left, ascii_width=2, length=2, extra_len=0
row=0, col=3, data=1, type=INTEGER, align=right, ascii_width=1, int_digits=1, decimal_places=0, extra_len=0
row=0, col=4, data=1, type=INTEGER, align=right, ascii_width=1, int_digits=1, decimal_places=0, extra_len=0
row=0, col=5, data=True, type=BOOL, align=left, ascii_width=4, extra_len=0
row=0, col=6, data=Infinity, type=INFINITY, align=left, ascii_width=8, extra_len=0
row=0, col=7, data=NaN, type=NAN, align=left, ascii_width=3, extra_len=0
row=0, col=8, data=2017-01-01 00:00:00, type=DATETIME, align=left, ascii_width=19, extra_len=0
row=1, col=0, data=2, type=INTEGER, align=right, ascii_width=1, int_digits=1, decimal_places=0, extra_len=0
row=1, col=1, data=2.2, type=REAL_NUMBER, align=right, ascii_width=3, int_digits=1, decimal_places=1, extra_len=0
row=1, col=2, data=bbb, type=STRING, align=left, ascii_width=3, length=3, extra_len=0
row=1, col=3, data=2.2, type=REAL_NUMBER, align=right, ascii_width=3, int_digits=1, decimal_places=1, extra_len=0
row=1, col=4, data=2.2, type=REAL_NUMBER, align=right, ascii_width=3, int_digits=1, decimal_places=1, extra_len=0
row=1, col=5, data=False, type=BOOL, align=left, ascii_width=5, extra_len=0
row=1, col=6, data=Infinity, type=INFINITY, align=left, ascii_width=8, extra_len=0
row=1, col=7, data=NaN, type=NAN, align=left, ascii_width=3, extra_len=0
row=1, col=8, data=2017-01-01 00:00:00, type=DATETIME, align=left, ascii_width=19, extra_len=0
row=2, col=0, data=3, type=INTEGER, align=right, ascii_width=1, int_digits=1, decimal_places=0, extra_len=0
row=2, col=1, data=3.33, type=REAL_NUMBER, align=right, ascii_width=4, int_digits=1, decimal_places=2, extra_len=0
row=2, col=2, data=cccc, type=STRING, align=left, ascii_width=4, length=4, extra_len=0
row=2, col=3, data=-3, type=INTEGER, align=right, ascii_width=2, int_digits=1, decimal_places=0, extra_len=1
row=2, col=4, data=ccc, type=STRING, align=left, ascii_width=3, length=3, extra_len=0
row=2, col=5, data=True, type=BOOL, align=left, ascii_width=4, extra_len=0
row=2, col=6, data=Infinity, type=INFINITY, align=left, ascii_width=8, extra_len=0
row=2, col=7, data=NaN, type=NAN, align=left, ascii_width=3, extra_len=0
row=2, col=8, data=2017-01-01T01:23:45+0900, type=STRING, align=left, ascii_width=24, length=24, extra_len=0

Full example source code can be found at *examples/py/to_dp_matrix.py*

Extract properties for each column from a matrix
------------------------------------------------------
``DataPropertyExtractor.to_column_dp_list`` method returns a list of ``DataProperty`` instances from a data matrix. The list represents the properties for each column.
An example data set and the result are as follows:

Example data set and result are as follows:

:Sample Code:
.. code:: python

import datetime
from dataproperty import DataPropertyExtractor

dp_extractor = DataPropertyExtractor()
dt = datetime.datetime(2017, 1, 1, 0, 0, 0)
inf = float("inf")
nan = float("nan")

data_matrix = [
[1, 1.1, "aa", 1, 1, True, inf, nan, dt],
[2, 2.2, "bbb", 2.2, 2.2, False, "inf", "nan", dt],
[3, 3.33, "cccc", -3, "ccc", "true", inf, "NAN", "2017-01-01T01:23:45+0900"],
]

dp_extractor.headers = ["int", "float", "str", "num", "mix", "bool", "inf", "nan", "time"]
col_dp_list = dp_extractor.to_column_dp_list(dp_extractor.to_dp_matrix(dp_matrix))

for col_idx, col_dp in enumerate(col_dp_list):
print(str(col_dp))

:Output:
::

column=0, type=INTEGER, align=right, ascii_width=3, bit_len=2, int_digits=1, decimal_places=0
column=1, type=REAL_NUMBER, align=right, ascii_width=5, int_digits=1, decimal_places=(min=1, max=2)
column=2, type=STRING, align=left, ascii_width=4
column=3, type=REAL_NUMBER, align=right, ascii_width=4, int_digits=1, decimal_places=(min=0, max=1), extra_len=(min=0, max=1)
column=4, type=STRING, align=left, ascii_width=3, int_digits=1, decimal_places=(min=0, max=1)
column=5, type=BOOL, align=left, ascii_width=5
column=6, type=INFINITY, align=left, ascii_width=8
column=7, type=NAN, align=left, ascii_width=3
column=8, type=STRING, align=left, ascii_width=24

Full example source code can be found at *examples/py/to_column_dp_list.py*

Dependencies
============
- Python 3.9+
- `Python package dependencies (automatically installed) `__

Optional dependencies
---------------------
- `loguru `__
- Used for logging if the package installed