Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/louisdebruijn/waterfall-logging
a Python package to log (distinct) column counts in a DataFrame, export it as a Markdown table and plot a Waterfall statistics figure.
https://github.com/louisdebruijn/waterfall-logging
data-quality-checks logging markdown mkdocs pandas pyspark waterfall
Last synced: about 1 month ago
JSON representation
a Python package to log (distinct) column counts in a DataFrame, export it as a Markdown table and plot a Waterfall statistics figure.
- Host: GitHub
- URL: https://github.com/louisdebruijn/waterfall-logging
- Owner: LouisdeBruijn
- License: agpl-3.0
- Created: 2023-02-15T09:11:25.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-03-23T09:22:14.000Z (over 1 year ago)
- Last Synced: 2024-10-10T21:41:29.224Z (about 1 month ago)
- Topics: data-quality-checks, logging, markdown, mkdocs, pandas, pyspark, waterfall
- Language: Python
- Homepage: https://louisdebruijn.github.io/waterfall-logging/
- Size: 577 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Version](https://img.shields.io/pypi/v/waterfall-logging)](https://pypi.org/project/waterfall-logging/)
[![](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![Downloads](https://pepy.tech/badge/waterfall-logging)](https://pepy.tech/project/waterfall-logging)
[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=blue&label=docs&message=waterfall-statistics)][#docs-package][#docs-package]: https://LouisdeBruijn.github.io/waterfall-logging/
# Waterfall-logging
Waterfall-logging is a Python package to log (distinct) column counts in a DataFrame, export it as a Markdown table and plot a Waterfall statistics figure.
It provides an implementation in Pandas `PandasWaterfall` and PySpark `SparkWaterfall`.
Documentation with examples can be found [here](https://LouisdeBruijn.github.io/waterfall-logging).
Developed by Louis de Bruijn, https://louisdebruijn.com.
## Installation
### Install to use
Install Waterfall-logging using PyPi:```commandline
pip install waterfall-logging
```### Install to contribute
```commandline
git clone https://github.com/LouisdeBruijn/waterfall-logging
python -m pip install -e .pre-commit install --hook-type pre-commit --hook-type pre-push
```## Documentation
Documentation can be created via
```commandline
mkdocs serve
```## Usage
Instructions are provided in the [documentation](https://LouisdeBruijn.github.io/waterfall-logging/).
```python
import pandas as pd
from waterfall_logging.log import PandasWaterfallbicycle_rides = pd.DataFrame(data=[
['Shimano', 'race', 28, '2023-02-13', 1],
['Gazelle', 'comfort', 31, '2023-02-15', 1],
['Shimano', 'race', 31, '2023-02-16', 2],
['Batavia', 'comfort', 30, '2023-02-17', 3],
], columns=['brand', 'ride_type', 'wheel_size', 'date', 'bike_id']
)bicycle_rides_log = PandasWaterfall(table_name='rides', columns=['brand', 'ride_type', 'wheel_size'],
distinct_columns=['bike_id'])
bicycle_rides_log.log(table=bicycle_rides, reason='Logging initial column values', configuration_flag='')bicycle_rides = bicycle_rides.loc[lambda row: row['wheel_size'] > 30]
bicycle_rides_log.log(table=bicycle_rides, reason='Remove small wheels',
configuration_flag='small_wheel=False')print(bicycle_rides_log.to_markdown())
'''
| Table | brand | Δ brand | ride_type | Δ ride_type | wheel_size | Δ wheel_size | bike_id | Δ bike_id | Rows | Δ Rows | Reason | Configurations flag |
|:--------|--------:|----------:|------------:|--------------:|-------------:|---------------:|----------:|------------:|-------:|---------:|:------------------------------|:----------------------|
| rides | 4 | 0 | 4 | 0 | 4 | 0 | 3 | 0 | 4 | 0 | Logging initial column values | |
| rides | 2 | -2 | 2 | -2 | 2 | -2 | 2 | -1 | 2 | -2 | Remove small wheels | small_wheel=False |
'''
```