Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eurobios-mews-labs/dataframe-memory
This tools aims to provide simple solution to save memory when using pandas' data frame.
https://github.com/eurobios-mews-labs/dataframe-memory
data data-science memory-usage pandas-dataframe python3
Last synced: 6 days ago
JSON representation
This tools aims to provide simple solution to save memory when using pandas' data frame.
- Host: GitHub
- URL: https://github.com/eurobios-mews-labs/dataframe-memory
- Owner: eurobios-mews-labs
- License: apache-2.0
- Created: 2023-11-03T17:35:08.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-21T09:33:33.000Z (12 months ago)
- Last Synced: 2024-09-22T21:21:39.851Z (about 2 months ago)
- Topics: data, data-science, memory-usage, pandas-dataframe, python3
- Language: Python
- Homepage:
- Size: 27.3 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Dataframe Memory Project
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/eurobios-mews-labs/dataframe-memory/graphs/commit-activity)
[![PyPI version](https://badge.fury.io/py/dataframe-memory.svg)](https://badge.fury.io/py/dataframe-memory)This tools aims to provide simple solution to save memory when using pandas' data frame.
It is highly inspired by this [kaggle post](https://www.kaggle.com/gemartin/load-data-reduce-memory-usage).> [!IMPORTANT]
> The very basic principle : for each column, this tool reduces int and float precision as much as possible so that
>
> 1. **Approximate method** `method='approx'` : no duplicated values appear and the minimum and maximum can be re-encoded
> 2. **Exact method** `method='exact'` : preserve absolute information by testing every value.
>
> For object data type, the function is trying to create category.### Installation
```shell
pip install dataframe-memory
```### Usage
````python
from data_memory import reduce_memory
import numpy as np
import pandas as pddf = pd.DataFrame(
np.array(
[[1, 2, "aaa"],
[4, 5, "bbb"],
[7, 8, "ccc"]] * 10000),
columns=['a', 'b', 'c'])reduce_memory(df, method="exact", verbose=True)
````
Yields the following decrease of memory
````text
Memory usage input: 5.04 MB
Memory usage output: 0.09 MB
Decreased by: 98.28 %
````
````python
df.info()
````````text
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30000 non-null category
1 b 30000 non-null category
2 c 30000 non-null category
````> [!WARNING]
> In `method='approx'`,
> 1. This tool **destroys** information and **should not be applied automatically** to any dataframe but big ones
> 2. It preserves relative but not absolute information