https://github.com/lmuffato/jiboia
Jiboia is a Python package for automatically normalizing and optimizing DataFrames efficiently.
https://github.com/lmuffato/jiboia
data-analysis data-science dataframe normalization pandas python
Last synced: about 1 month ago
JSON representation
Jiboia is a Python package for automatically normalizing and optimizing DataFrames efficiently.
- Host: GitHub
- URL: https://github.com/lmuffato/jiboia
- Owner: lmuffato
- License: mit
- Created: 2025-09-04T18:17:48.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-09-05T15:57:46.000Z (10 months ago)
- Last Synced: 2025-09-06T11:54:13.299Z (10 months ago)
- Topics: data-analysis, data-science, dataframe, normalization, pandas, python
- Language: Python
- Homepage: https://pypi.org/project/jiboia-gpu/
- Size: 11.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Jiboia
**Jiboia** is a Python package for **automatically normalizing and optimizing DataFrames** efficiently.
Key features:
- **String normalization**:
- Removes extra spaces.
- Strips leading and trailing spaces.
- Detects data pollution (e.g., columns that should be numeric but contain strings).
- **Type conversion**:
- Numeric strings and floats ending in `.0` → integers (`int8`, `int16`, `int32`, …).
- Converts floats and integers to the most memory-efficient type.
- Converts strings in various date formats to `datetime` (`yyyy?mm?dd`, `dd?mm?yyyy`, `yyyymmd`, `dd?mm?yy`).
- Converts time strings (`hhmm UTC`, `hh:mm:ss`, `hh:mm:ss.s`) to `timedelta`.
- **Null standardization** → converts different null representations to `pd.NA`.
- **Automatic CSV detection**:
- Detects delimiter.
- Detects encoding.
- **Memory optimization**:
- Provides memory usage information for DataFrames.
- Converts columns to the most compact types possible.
---
## Example Usage
```python
import jiboia as jb
df = pd.read_csv("data.csv")
df = jb.normalize_df(df)
```