https://github.com/justinhchae/pd-helper
A helpful package to streamline Pandas DataFrame optimization.
https://github.com/justinhchae/pd-helper
bigdata dataframes developer-tools optimization-tools pandas python3
Last synced: about 1 year ago
JSON representation
A helpful package to streamline Pandas DataFrame optimization.
- Host: GitHub
- URL: https://github.com/justinhchae/pd-helper
- Owner: justinhchae
- License: mit
- Created: 2021-04-07T21:36:35.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2022-01-19T18:39:23.000Z (over 4 years ago)
- Last Synced: 2025-03-18T16:14:45.382Z (about 1 year ago)
- Topics: bigdata, dataframes, developer-tools, optimization-tools, pandas, python3
- Language: Python
- Homepage:
- Size: 95.7 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# pd-helper
A helpful package to streamline Pandas DataFrame optimization.
Save 50-75% on DataFrame memory usage by running the optimizer.
Autoconfigure dtypes for appropriate data types in each column with **helper**.
Generate a random DataFrame of controlled random variables for testing with **maker**.
## Install
```bash
pip install pd-helper
```
## Basic Usage to Iterate over DataFrame
```python
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()
if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
df = optimize(df)
```
## Better Usage With Multiprocessing
```python
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()
if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
df = optimize(df, enable_mp=True)
```
## Specify Special Mappings
```python
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()
if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
special_mappings = {'string': ['object_id'],
'category': ['item_name']}
# special mappings will be applied instead of by optimize ruleset, they will be returned.
df = optimize(df
, enable_mp=True,
special_mappings=special_mappings
)
```
## Sample Results with Helper
```bash
Starting with 175.63 MB memory.
After optmization.
Ending with 65.33 MB memory.
```
## Generating a Randomly Imperfect DataFrame with Maker
Maker provides a class, MakeData(), to generate a table of made-up records.
Each row is an event where an item was retrieved.
Options to make the table imperfectly random in various ways.
Sample table below:
| | Retrieved Date | Item Name | Retrieved | Condition | Sector |
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
| Example | 2019-01-01, 2019-03-4 | Toaster, Lighter | True, False | Junk, Excellent | 1, 2 |
| Data Type | String | String | String | String | Integer |
## References
* Pandas Categorical:
* Pandas Pickle:
* Pandas CSV:
* Pandas Datetime:
### TODO
* Improve efficiency of iterating on DataFrame.
* Allow user to toggle logging.
* Provide tools for imputing missing data.