Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/amitbisht99/ydata-profiling

This repository showcases my learning process of automating EDA using 'ydata-profiling'
https://github.com/amitbisht99/ydata-profiling

data-analytics data-profiling eda pandas python3 ydata-profiling

Last synced: about 1 month ago
JSON representation

This repository showcases my learning process of automating EDA using 'ydata-profiling'

Awesome Lists containing this project

README

        

# Automating EDA with `ydata-profiling`

This repository demonstrates how to automate **Exploratory Data Analysis (EDA)** using the `ydata-profiling` library (formerly known as pandas-profiling). It simplifies the process of generating a comprehensive EDA report, saving time and ensuring a thorough analysis.

## 🚀 Features of `ydata-profiling`
The tool provides the following capabilities:
- **Type Inference**: Automatically detects data types (Categorical, Numerical, Date, etc.).
- **Warnings**: Identifies data challenges like missing values, inaccuracies, skewness, and more.
- **Univariate Analysis**: Generates descriptive statistics (mean, median, mode, etc.) and visualizations like histograms.
- **Multivariate Analysis**: Includes correlation analysis, missing data summaries, duplicate rows detection, and pairwise variable interactions.
- **Time-Series Analysis**: Provides insights such as auto-correlation, seasonality, and ACF/PACF plots.
- **Text Analysis**: Detects most common categories, scripts, and blocks (e.g., Latin, ASCII).
- **File & Image Analysis**: Reviews file sizes, creation dates, dimensions, and EXIF metadata.
- **Dataset Comparison**: Quickly compares datasets in one line of code.
- **Flexible Output Formats**: Reports can be exported as:
- **HTML**: Easily shareable interactive reports
- **JSON**: Suitable for automation systems
- **Jupyter Notebook Widgets**

## 📂 Project Structure
- **`data/`:** Contains sample datasets used for demonstration.
- **`notebooks/`:** Jupyter Notebooks showcasing how to use `ydata-profiling`.
- **`output/`:** Stores generated EDA reports.

## 🛠️ Getting Started

### **For Pre-requisites & Running Code, Refer:** https://github.com/ydataai/ydata-profiling

📊 Sample Output
The output/ folder contains example reports generated with ydata-profiling.

Reports include:
Data summary (missing values, duplicates, etc.)
Visualizations (correlations, distributions, etc.)
Detailed variable analysis

**🎥 Credits:** Big thanks to **https://www.youtube.com/@CodeWithHarry** for his excellent tutorial **https://www.youtube.com/watch?v=sGQfiyXOvF0&t=1136s** on pandas profiling, which inspired this project.

**🤝 Contributing:** Contributions are welcome! If you have suggestions, feel free to open an issue or submit a pull request.

**📜 License:** This project is licensed under the MIT License.

**💬 Feedback:** If you find this project helpful or have any questions, feel free to reach out!