Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amitbisht99/ydata-profiling
This repository showcases my learning process of automating EDA using 'ydata-profiling'
https://github.com/amitbisht99/ydata-profiling
data-analytics data-profiling eda pandas python3 ydata-profiling
Last synced: about 1 month ago
JSON representation
This repository showcases my learning process of automating EDA using 'ydata-profiling'
- Host: GitHub
- URL: https://github.com/amitbisht99/ydata-profiling
- Owner: amitbisht99
- License: mit
- Created: 2024-11-16T04:41:16.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-16T04:44:33.000Z (2 months ago)
- Last Synced: 2024-11-18T11:35:04.761Z (2 months ago)
- Topics: data-analytics, data-profiling, eda, pandas, python3, ydata-profiling
- Language: HTML
- Homepage:
- Size: 812 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Automating EDA with `ydata-profiling`
This repository demonstrates how to automate **Exploratory Data Analysis (EDA)** using the `ydata-profiling` library (formerly known as pandas-profiling). It simplifies the process of generating a comprehensive EDA report, saving time and ensuring a thorough analysis.
## 🚀 Features of `ydata-profiling`
The tool provides the following capabilities:
- **Type Inference**: Automatically detects data types (Categorical, Numerical, Date, etc.).
- **Warnings**: Identifies data challenges like missing values, inaccuracies, skewness, and more.
- **Univariate Analysis**: Generates descriptive statistics (mean, median, mode, etc.) and visualizations like histograms.
- **Multivariate Analysis**: Includes correlation analysis, missing data summaries, duplicate rows detection, and pairwise variable interactions.
- **Time-Series Analysis**: Provides insights such as auto-correlation, seasonality, and ACF/PACF plots.
- **Text Analysis**: Detects most common categories, scripts, and blocks (e.g., Latin, ASCII).
- **File & Image Analysis**: Reviews file sizes, creation dates, dimensions, and EXIF metadata.
- **Dataset Comparison**: Quickly compares datasets in one line of code.
- **Flexible Output Formats**: Reports can be exported as:
- **HTML**: Easily shareable interactive reports
- **JSON**: Suitable for automation systems
- **Jupyter Notebook Widgets**## 📂 Project Structure
- **`data/`:** Contains sample datasets used for demonstration.
- **`notebooks/`:** Jupyter Notebooks showcasing how to use `ydata-profiling`.
- **`output/`:** Stores generated EDA reports.## 🛠️ Getting Started
### **For Pre-requisites & Running Code, Refer:** https://github.com/ydataai/ydata-profiling
📊 Sample Output
The output/ folder contains example reports generated with ydata-profiling.Reports include:
Data summary (missing values, duplicates, etc.)
Visualizations (correlations, distributions, etc.)
Detailed variable analysis**🎥 Credits:** Big thanks to **https://www.youtube.com/@CodeWithHarry** for his excellent tutorial **https://www.youtube.com/watch?v=sGQfiyXOvF0&t=1136s** on pandas profiling, which inspired this project.
**🤝 Contributing:** Contributions are welcome! If you have suggestions, feel free to open an issue or submit a pull request.
**📜 License:** This project is licensed under the MIT License.
**💬 Feedback:** If you find this project helpful or have any questions, feel free to reach out!