Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jlehrer1/instanteda
Instantly generate common exploratory data plots without worrying about cleaning your DataFrame.
https://github.com/jlehrer1/instanteda
eda pandas python visualization
Last synced: 3 days ago
JSON representation
Instantly generate common exploratory data plots without worrying about cleaning your DataFrame.
- Host: GitHub
- URL: https://github.com/jlehrer1/instanteda
- Owner: jlehrer1
- License: mit
- Created: 2020-05-17T04:33:21.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-01-07T20:15:47.000Z (almost 3 years ago)
- Last Synced: 2024-12-09T02:49:11.236Z (27 days ago)
- Topics: eda, pandas, python, visualization
- Language: Python
- Homepage:
- Size: 147 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Instant EDA
Instantly generate common exploratory data plots without having to worry about cleaning your DataFrame.The code is hosted on PyPi, the Python Package Index
[here](https://pypi.org/project/quickplotter/1.0/)It can be installed by running
```shell
pip install quickplotter==1.0
```To setup the proper development environment, run
```
conda env create -f environment.yml
conda update pip
```To run the test suite, run `pytest`.
## 1. Usage:
```python3
plotter = quickplotter.QuickPlotter(df: pd.DataFrame) #creates a QuickPlotter object with the given DataFrameplotter.common(subset=['correlation', 'percent_nan']) #plots correlation between features, and percent nan in each column
plotter.distribution(column_subset=df.columns[0:4]) #plots distributions for the first four columns in the DataFrame
plotter.common(column_subset=['body_mass_index', 'blood_type']) #plots common plots for the given columns
```**Remember, this is meant to be a quick and dirty tool for exploration, and not for being delicate with each data entry.** Therefore, if the number of `NaN` values in the DataFrame is `<= 5%` of the total values, the NaN rows will be dropped and the plots will be generated without them.
## 2. subset & diff lists
The quickplot module works mainly with two specifications, `subset` and `diff`.For any `subset`-like list, the items in the list will be used. For any `diff`-like list, all items *except* those in the list will be used.
The options are as follow:
- `subset`: Use only the plots specified in the list
- `diff`: Use all plots *except* those specified in the list
- `subset_columns`: Use all columns specified in the list. Can either be `df.columns` slicing or by name
- `diff_columns`: Use all columns *except* those specified in the list. Can either be `df.columns` slicing or by name.## 3. Contributing
If you have read this far I hope you've found this tool useful. I am always looking to learn more and develop as a programmer, so if you have any ideas or contributions, feel free to write a feature or pull request.