https://github.com/jlehrer1/instanteda

Instantly generate common exploratory data plots without worrying about cleaning your DataFrame.
https://github.com/jlehrer1/instanteda

eda pandas python visualization

Last synced: 5 months ago
JSON representation

Instantly generate common exploratory data plots without worrying about cleaning your DataFrame.

Host: GitHub
URL: https://github.com/jlehrer1/instanteda
Owner: jlehrer1
License: mit
Created: 2020-05-17T04:33:21.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2022-01-07T20:15:47.000Z (over 3 years ago)
Last Synced: 2025-02-02T15:49:04.926Z (5 months ago)
Topics: eda, pandas, python, visualization
Language: Python
Homepage:
Size: 147 MB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Instant EDA
Instantly generate common exploratory data plots without having to worry about cleaning your DataFrame.

The code is hosted on PyPi, the Python Package Index
[here](https://pypi.org/project/quickplotter/1.0/)

It can be installed by running
```shell
pip install quickplotter==1.0
```

To setup the proper development environment, run
```
conda env create -f environment.yml
conda update pip
```

To run the test suite, run `pytest`.

## 1. Usage:
```python3
plotter = quickplotter.QuickPlotter(df: pd.DataFrame) #creates a QuickPlotter object with the given DataFrame

plotter.common(subset=['correlation', 'percent_nan']) #plots correlation between features, and percent nan in each column

plotter.distribution(column_subset=df.columns[0:4]) #plots distributions for the first four columns in the DataFrame

plotter.common(column_subset=['body_mass_index', 'blood_type']) #plots common plots for the given columns
```

**Remember, this is meant to be a quick and dirty tool for exploration, and not for being delicate with each data entry.** Therefore, if the number of `NaN` values in the DataFrame is `<= 5%` of the total values, the NaN rows will be dropped and the plots will be generated without them.

## 2. subset & diff lists
The quickplot module works mainly with two specifications, `subset` and `diff`.

For any `subset`-like list, the items in the list will be used. For any `diff`-like list, all items *except* those in the list will be used.

The options are as follow:
- `subset`: Use only the plots specified in the list
- `diff`: Use all plots *except* those specified in the list
- `subset_columns`: Use all columns specified in the list. Can either be `df.columns` slicing or by name
- `diff_columns`: Use all columns *except* those specified in the list. Can either be `df.columns` slicing or by name.

## 3. Contributing

If you have read this far I hope you've found this tool useful. I am always looking to learn more and develop as a programmer, so if you have any ideas or contributions, feel free to write a feature or pull request.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jlehrer1/instanteda

Awesome Lists containing this project

README