Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/eurobios-mews-labs/dataframe-memory

This tools aims to provide simple solution to save memory when using pandas' data frame.
https://github.com/eurobios-mews-labs/dataframe-memory

data data-science memory-usage pandas-dataframe python3

Last synced: 8 days ago
JSON representation

This tools aims to provide simple solution to save memory when using pandas' data frame.

Host: GitHub
URL: https://github.com/eurobios-mews-labs/dataframe-memory
Owner: eurobios-mews-labs
License: apache-2.0
Created: 2023-11-03T17:35:08.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-11-21T09:33:33.000Z (about 1 year ago)
Last Synced: 2024-12-08T15:03:57.721Z (about 1 month ago)
Topics: data, data-science, memory-usage, pandas-dataframe, python3
Language: Python
Homepage:
Size: 27.3 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Dataframe Memory Project

[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/eurobios-mews-labs/dataframe-memory/graphs/commit-activity)

[![PyPI version](https://badge.fury.io/py/dataframe-memory.svg)](https://badge.fury.io/py/dataframe-memory)

This tools aims to provide simple solution to save memory when using pandas' data frame.

It is highly inspired by this [kaggle post](https://www.kaggle.com/gemartin/load-data-reduce-memory-usage).

> [!IMPORTANT]

> The very basic principle : for each column, this tool reduces int and float precision as much as possible so that

>

> 1. **Approximate method** `method='approx'` : no duplicated values appear and  the minimum and maximum can be re-encoded

> 2. **Exact method**   `method='exact'` : preserve absolute information by testing every value.

> 

> For object data type, the function is trying to create category. 

### Installation

```shell

pip install dataframe-memory

```

### Usage

````python

from data_memory import reduce_memory

import numpy as np

import pandas as pd

df = pd.DataFrame(

    np.array(

        [[1, 2, "aaa"],

         [4, 5, "bbb"],

         [7, 8, "ccc"]] * 10000),

    columns=['a', 'b', 'c'])

reduce_memory(df, method="exact", verbose=True)

````

Yields the following decrease of memory

````text

Memory usage input: 5.04 MB

Memory usage output: 0.09 MB

Decreased by: 98.28 % 

````

````python

df.info()

````

````text

 #   Column  Non-Null Count  Dtype   

---  ------  --------------  -----   

 0   a       30000 non-null  category

 1   b       30000 non-null  category

 2   c       30000 non-null  category

````

> [!WARNING] 

>  In `method='approx'`, 

> 1. This tool **destroys** information and **should not be applied automatically** to any dataframe but big ones

> 2. It preserves relative but not absolute information