Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dadvaiahpavan/ai-data-scientist-
AI-powered tool for dataset analysis, featuring data preprocessing, classification, regression, anomaly detection, and text analysis. Built with scikit-learn, pandas, and Plotly for visualization. Includes an interactive Streamlit web interface for real-time data analysis.
https://github.com/dadvaiahpavan/ai-data-scientist-
ai anomaly-detection classification data-analysis data-science machine-learning panda plotu regression scikit-learn sentiment-analysis streamlit
Last synced: about 1 month ago
JSON representation
AI-powered tool for dataset analysis, featuring data preprocessing, classification, regression, anomaly detection, and text analysis. Built with scikit-learn, pandas, and Plotly for visualization. Includes an interactive Streamlit web interface for real-time data analysis.
- Host: GitHub
- URL: https://github.com/dadvaiahpavan/ai-data-scientist-
- Owner: DadvaiahPavan
- Created: 2024-11-27T09:28:18.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-11-27T11:38:58.000Z (about 1 month ago)
- Last Synced: 2024-12-05T00:06:58.447Z (about 1 month ago)
- Topics: ai, anomaly-detection, classification, data-analysis, data-science, machine-learning, panda, plotu, regression, scikit-learn, sentiment-analysis, streamlit
- Language: Python
- Homepage: https://ai-data-assistant.streamlit.app/
- Size: 72.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI Data Analysis Tool
A powerful data analysis tool that leverages AI and machine learning to provide comprehensive insights into your datasets. This application combines the power of scikit-learn, pandas, and Streamlit to offer an intuitive interface for data analysis, visualization, and machine learning tasks.
![Web UI](https://i.ibb.co/dKtg0W8/Screenshot-2024-11-27-163725.png)
## Getting Started
### Clone the Repository
```bash
git https://github.com/DadvaiahPavan/AI-Data-Scientist.git
cd AI_Data_Tool
```## Features
### 1. Data Processing
- Support for multiple file formats (CSV, Excel, etc.)
- Automatic data type detection and preprocessing
- Missing value handling and visualization
- Memory-efficient processing for large datasets### 2. AI Analysis Capabilities
- **Classification Analysis**: Using Random Forest Classifier for categorical predictions
- **Regression Analysis**: Random Forest Regressor for numerical predictions
- **Anomaly Detection**: Isolation Forest implementation for outlier detection
- **Text Analysis**: Sentiment analysis and text processing using TextBlob
- **Feature Importance**: Visualization of most influential features in your dataset### 3. Visualization
- Interactive plots using Plotly
- Missing value analysis
- Feature correlation heatmaps
- Distribution analysis
- Custom visualization options### 4. User Interface
- Modern, responsive Streamlit interface
- Intuitive navigation
- Real-time analysis updates
- Interactive parameter tuning
- Progress tracking for long operations## Project Structure
```
AI_Data_Tool/
├── ai_analyzer.py # Core AI analysis implementation
├── streamlit_app.py # Main application entry point
├── requirements.txt # Project dependencies
├── setup_dirs.py # Directory setup utility
├── src/
│ └── ui/
│ └── streamlit_app.py # UI implementation
├── data/ # Data directory
└── logs/ # Application logs
```## Setup Instructions
1. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
```2. Install dependencies:
```bash
pip install -r requirements.txt
```3. Run the application:
```bash
streamlit run streamlit_app.py
```## Requirements
- Python 3.8+
- Dependencies listed in requirements.txt:
- streamlit==1.24.0
- pandas==2.0.3
- numpy==1.25.0
- plotly==5.15.0
- scikit-learn==1.3.0
- textblob==0.17.1
- xmltodict==0.13.0
- openpyxl==3.1.2
- lxml==4.9.3
- torch>=2.0.0
- transformers>=4.30.0## Usage
1. Launch the application using `streamlit run streamlit_app.py`
2. Upload your dataset through the web interface
3. Select the type of analysis you want to perform:
- Data Overview
- Classification Analysis
- Regression Analysis
- Anomaly Detection
- Text Analysis
4. Configure analysis parameters if needed
5. View and interact with the results and visualizations## Features in Detail
### Data Overview
- Basic statistics
- Data type information
- Missing value analysis
- Correlation analysis
- Distribution plots### Classification Analysis
- Automatic feature preprocessing
- Model training and evaluation
- Performance metrics
- Confusion matrix visualization
- Feature importance analysis### Regression Analysis
- Numerical feature preprocessing
- Model training and validation
- Error metrics (MSE, RMSE, etc.)
- Prediction visualization
- Feature importance ranking### Anomaly Detection
- Outlier identification
- Anomaly scoring
- Visualization of anomalous data points
- Configurable contamination factor### Text Analysis
- Sentiment analysis
- Text preprocessing
- Word frequency analysis
- Sentiment distribution visualization## Performance Considerations
- The tool is optimized for memory efficiency
- Large datasets are processed in chunks
- Parallel processing is utilized where possible
- Model parameters are tuned for balance of accuracy and performance## Contributing
Feel free to submit issues and enhancement requests!