Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/faizantkhan/automated-eda

This repository showcases tools for automatic Exploratory Data Analysis (EDA) in Python. These tools help you quickly understand your datasets and generate insightful reports.
https://github.com/faizantkhan/automated-eda

automatic automation autoviz data-analysis data-analysis-python data-science data-visualization dtale dtale-library eda exploratory-data-analysis ml pandas pandas-profiling python python-library sweetviz

Last synced: about 10 hours ago
JSON representation

This repository showcases tools for automatic Exploratory Data Analysis (EDA) in Python. These tools help you quickly understand your datasets and generate insightful reports.

Awesome Lists containing this project

README

        

---

# Automatic EDA Tools

Welcome to the **Automatic EDA Tools** repository! This repository contains examples and documentation for various tools that automate Exploratory Data Analysis (EDA) in Python. These tools help data scientists and analysts quickly understand their datasets and generate insightful reports with minimal effort.

## Table of Contents

- [Introduction](#introduction)
- [Tools](#tools)
- [Pandas Profiling](#pandas-profiling)
- [AutoViz](#autoviz)
- [Sweetviz](#sweetviz)
- [D-Tale](#d-tale)
- [Installation](#installation)
- [Usage](#usage)
- [Contributing](#contributing)
- [License](#license)

## Introduction

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process. It involves summarizing the main characteristics of a dataset, often using visual methods. Automating EDA can save time and provide a comprehensive overview of the data. This repository showcases four popular Python tools for automatic EDA: Pandas Profiling, AutoViz, Sweetviz, and D-Tale.

## Tools

### Pandas Profiling

**Pandas Profiling** generates profile reports from a pandas DataFrame. The report includes a variety of statistics and visualizations, such as missing values, correlations, and distributions.

- **Features**:
- Descriptive statistics for each column
- Visualizations for distributions and correlations
- Interactive HTML reports

- **Example**:
```python
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.read_csv('your_dataset.csv')
profile = ProfileReport(df, title="Pandas Profiling Report")
profile.to_file("output.html")
```

### AutoViz

**AutoViz** automatically visualizes any dataset with a single line of code. It can handle both small and large datasets and provides a variety of plots to understand the data better.

- **Features**:
- Automatic visualization of data
- Handles large datasets efficiently
- Variety of plots including scatter, bar, and box plots

- **Example**:
```python
from autoviz.AutoViz_Class import AutoViz_Class

AV = AutoViz_Class()
df = AV.AutoViz('your_dataset.csv')
```

### Sweetviz

**Sweetviz** creates beautiful, high-density visualizations to help you understand your data quickly. It provides a detailed analysis of each feature and compares datasets.

- **Features**:
- High-density visualizations
- Detailed feature analysis
- Comparison of datasets

- **Example**:
```python
import pandas as pd
import sweetviz as sv

df = pd.read_csv('your_dataset.csv')
report = sv.analyze(df)
report.show_html('sweetviz_report.html')
```

### D-Tale

**D-Tale** is a powerful tool that combines the capabilities of a pandas DataFrame with an interactive web-based interface. It allows you to explore and analyze your data in a user-friendly environment.

- **Features**:
- Interactive web-based interface
- Real-time data manipulation and visualization
- Integration with pandas DataFrames

- **Example**:
```python
import dtale
import pandas as pd

df = pd.read_csv('your_dataset.csv')
d = dtale.show(df)
d.open_browser()
```

## Installation

To install these tools, you can use pip:

```bash
pip install pandas-profiling autoviz sweetviz dtale
```

## Usage

Refer to the examples provided above for each tool to get started with automatic EDA. You can also check the official documentation for more detailed usage instructions.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue if you have any suggestions or improvements.

## License

This repository is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

---

Source:
(1) 4 Ways to Automate Exploratory Data Analysis (EDA) in Python. https://builtin.com/data-science/EDA-python.
(2) Streamlining Data Exploration: A Comparison of Profiling Tools for .... https://dataroots.io/blog/streamlining-data-exploration-a-comparison-of-pandas-profiler-sweet-viz-and-pandas-gui-for-effective-eda.
(3) Tools to Automate EDA - GeeksforGeeks. https://www.geeksforgeeks.org/tools-to-automate-eda/.
(4) Sweetviz: Automate Exploratory Data Analysis (EDA) - CoderzColumn. https://coderzcolumn.com/tutorials/data-science/sweetviz-automate-exploratory-data-analysis-eda.
(5) Automated EDA with Python - Open Source Automation. https://theautomatic.net/2021/07/02/automated-eda-with-python/.