https://github.com/mews-labs/dataframe-memory
This tools aims to provide simple solution to save memory when using pandas' data frame.
https://github.com/mews-labs/dataframe-memory
data data-science memory-usage pandas-dataframe python3
Last synced: about 2 months ago
JSON representation
This tools aims to provide simple solution to save memory when using pandas' data frame.
- Host: GitHub
- URL: https://github.com/mews-labs/dataframe-memory
- Owner: mews-labs
- License: apache-2.0
- Created: 2023-11-03T17:35:08.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-21T09:33:33.000Z (about 2 years ago)
- Last Synced: 2025-09-27T15:14:02.159Z (5 months ago)
- Topics: data, data-science, memory-usage, pandas-dataframe, python3
- Language: Python
- Homepage:
- Size: 27.3 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Dataframe Memory Project
[](https://GitHub.com/eurobios-mews-labs/dataframe-memory/graphs/commit-activity)
[](https://badge.fury.io/py/dataframe-memory)
This tools aims to provide simple solution to save memory when using pandas' data frame.
It is highly inspired by this [kaggle post](https://www.kaggle.com/gemartin/load-data-reduce-memory-usage).
> [!IMPORTANT]
> The very basic principle : for each column, this tool reduces int and float precision as much as possible so that
>
> 1. **Approximate method** `method='approx'` : no duplicated values appear and the minimum and maximum can be re-encoded
> 2. **Exact method** `method='exact'` : preserve absolute information by testing every value.
>
> For object data type, the function is trying to create category.
### Installation
```shell
pip install dataframe-memory
```
### Usage
````python
from data_memory import reduce_memory
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.array(
[[1, 2, "aaa"],
[4, 5, "bbb"],
[7, 8, "ccc"]] * 10000),
columns=['a', 'b', 'c'])
reduce_memory(df, method="exact", verbose=True)
````
Yields the following decrease of memory
````text
Memory usage input: 5.04 MB
Memory usage output: 0.09 MB
Decreased by: 98.28 %
````
````python
df.info()
````
````text
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30000 non-null category
1 b 30000 non-null category
2 c 30000 non-null category
````
> [!WARNING]
> In `method='approx'`,
> 1. This tool **destroys** information and **should not be applied automatically** to any dataframe but big ones
> 2. It preserves relative but not absolute information