Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/takk8is/datasetanalysiseda

A robust Python tool for comprehensive dataset analysis and machine learning model evaluation. This project automates the process of data preprocessing, exploratory data analysis (EDA), and predictive modeling, with a focus on handling common data inconsistencies.
https://github.com/takk8is/datasetanalysiseda

analytics analyzer chart csv-files data-science data-visualization datascience dataset datasets davidccavalcante eda fjallstoppur graphics machine-learning python python3 takk-ag takk-design takk8is xlsx-files

Last synced: 21 days ago
JSON representation

A robust Python tool for comprehensive dataset analysis and machine learning model evaluation. This project automates the process of data preprocessing, exploratory data analysis (EDA), and predictive modeling, with a focus on handling common data inconsistencies.

Awesome Lists containing this project

README

        

# Dataset Analysis EDA πŸ“Š

[![Version](https://img.shields.io/badge/version-1.0.0-blue.svg)](https://github.com/Takk8IS/DatasetAnalysisEDA)
[![Licence](https://img.shields.io/badge/licence-CC--BY--4.0-green.svg)](https://creativecommons.org/licenses/by/4.0/)
[![GitHub issues](https://img.shields.io/github/issues/Takk8IS/DatasetAnalysisEDA.svg)](https://github.com/Takk8IS/DatasetAnalysisEDA/issues)
[![GitHub stars](https://img.shields.io/github/stars/Takk8IS/DatasetAnalysisEDA.svg)](https://github.com/Takk8IS/DatasetAnalysisEDA/stargazers)

Dataset Analysis EDA is a Python-based tool designed for comprehensive exploratory data analysis (EDA) and machine learning model evaluation. This intelligent system processes various dataset formats, performs data preprocessing, conducts statistical analysis, and generates insightful visualizations.

![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-01.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-02.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-03.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-04.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-05.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-06.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-07.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-08.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-09.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-10.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-11.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-12.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-13.png?raw=true)
![Dataset Analysis EDA](https://github.com/Takk8IS/DatasetAnalysisEDA/blob/main/images/screenshot-14.png?raw=true)

## 🌟 Key Features

- πŸ“„ **Multi-format Data Processing**: Handle various file formats including CSV and Excel.
- 🧹 **Automated Data Preprocessing**: Includes grammar correction, handling of missing values, and feature encoding.
- πŸ“Š **Comprehensive EDA**: Generates statistical summaries, correlation analyses, and various visualizations.
- πŸ€– **Machine Learning Model Evaluation**: Implements Random Forest classification with cross-validation.
- πŸ“ˆ **Feature Importance Analysis**: Provides insights into the most influential features in the dataset.
- πŸ“‰ **Advanced Visualizations**: Includes histograms, heatmaps, confusion matrices, and feature importance plots.
- πŸ› οΈ **Robust Error Handling**: Comprehensive error management to ensure smooth operation with various datasets.

## πŸ“¦ Project Structure

```plaintext
β”œβ”€β”€ AUTHORS.md
β”œβ”€β”€ DatasetAnalysis.py
β”œβ”€β”€ FUNDING.yml
β”œβ”€β”€ INFO.md
β”œβ”€β”€ LICENSE.md
β”œβ”€β”€ PRIVACY.md
β”œβ”€β”€ PlanilhaModelagem.csv
β”œβ”€β”€ PlanilhaModelagem.xlsx
β”œβ”€β”€ README.md
β”œβ”€β”€ images
β”‚ β”œβ”€β”€ screenshot-01.png
β”‚ β”œβ”€β”€ screenshot-02.png
β”‚ β”œβ”€β”€ screenshot-03.png
β”‚ β”œβ”€β”€ screenshot-04.png
β”‚ β”œβ”€β”€ screenshot-05.png
β”‚ β”œβ”€β”€ screenshot-06.png
β”‚ β”œβ”€β”€ screenshot-07.png
β”‚ β”œβ”€β”€ screenshot-08.png
β”‚ β”œβ”€β”€ screenshot-09.png
β”‚ β”œβ”€β”€ screenshot-10.png
β”‚ β”œβ”€β”€ screenshot-11.png
β”‚ β”œβ”€β”€ screenshot-12.png
β”‚ β”œβ”€β”€ screenshot-13.png
β”‚ └── screenshot-14.png
└── requirements.txt
```

## πŸƒβ€β™‚οΈ How to Use

1. **Clone the Repository**:

```sh
git clone https://github.com/Takk8IS/DatasetAnalysisEDA.git
cd DatasetAnalysisEDA
```

2. **Install Dependencies**:

```sh
pip install -r requirements.txt
```

3. **Run the Analysis**:

```sh
python DatasetAnalysis.py PlanilhaModelagem.xlsx
```

4. **Review the Results**:
- The script will generate various plots and print analysis results in the console.
- Review the generated visualizations for insights about your dataset.

## Contributing

We welcome contributions from the community! If you'd like to contribute, please:

1. Fork the repository.
2. Create your feature branch (`git checkout -b feature/AmazingFeature`).
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`).
4. Push to the branch (`git push origin feature/AmazingFeature`).
5. Open a Pull Request.

## Donations

If this project has been helpful, consider making a donation:

**USDT (TRC-20)**: `TGpiWetnYK2VQpxNGPR27D9vfM6Mei5vNA`

Your support helps us continue to develop innovative data analysis tools.

## License

This project is licensed under the CC-BY-4.0 License. See the [LICENSE](LICENSE.md) file for more details.

## About Takkβ„’ Innovate Studio

Leading the Digital Revolution as the Pioneering 100% Artificial Intelligence Team.

- Author: [David C Cavalcante](mailto:[email protected])
- LinkedIn: [linkedin.com/in/hellodav](https://www.linkedin.com/in/hellodav/)
- X: [@Takk8IS](https://twitter.com/takk8is/)
- Medium: [takk8is.medium.com](https://takk8is.medium.com/)
- Website: [takk.ag](https://takk.ag/)