{"id":48335919,"url":"https://github.com/eugen-goebel/predictive-analytics-agent","last_synced_at":"2026-04-05T02:03:10.271Z","repository":{"id":347714857,"uuid":"1193095461","full_name":"eugen-goebel/predictive-analytics-agent","owner":"eugen-goebel","description":"Automated ML pipeline — data profiling, preprocessing, model training, and evaluation report generation","archived":false,"fork":false,"pushed_at":"2026-03-29T04:36:57.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-29T08:31:51.150Z","etag":null,"topics":["automation","data-science","docker","machine-learning","predictive-analytics","python","scikit-learn","streamlit"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eugen-goebel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-26T21:45:25.000Z","updated_at":"2026-03-29T04:37:36.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eugen-goebel/predictive-analytics-agent","commit_stats":null,"previous_names":["eugen-goebel/predictive-analytics-agent"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/eugen-goebel/predictive-analytics-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugen-goebel%2Fpredictive-analytics-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugen-goebel%2Fpredictive-analytics-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugen-goebel%2Fpredictive-analytics-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugen-goebel%2Fpredictive-analytics-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eugen-goebel","download_url":"https://codeload.github.com/eugen-goebel/predictive-analytics-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugen-goebel%2Fpredictive-analytics-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31421870,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T00:25:07.052Z","status":"online","status_checked_at":"2026-04-05T02:00:05.211Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","data-science","docker","machine-learning","predictive-analytics","python","scikit-learn","streamlit"],"created_at":"2026-04-05T02:03:09.710Z","updated_at":"2026-04-05T02:03:10.252Z","avatar_url":"https://github.com/eugen-goebel.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Predictive Analytics Agent\n\nAn automated machine learning pipeline that profiles datasets, preprocesses data, selects features, trains and compares multiple models, and generates a professional evaluation report — all without requiring an API key.\n\n![CI](https://github.com/eugen-goebel/predictive-analytics-agent/actions/workflows/tests.yml/badge.svg)\n![Python](https://img.shields.io/badge/Python-3.10+-blue)\n![Tests](https://img.shields.io/badge/Tests-35_passed-brightgreen)\n![scikit--learn](https://img.shields.io/badge/scikit--learn-1.5+-f7931e)\n![Streamlit](https://img.shields.io/badge/Streamlit-1.40+-red)\n![License](https://img.shields.io/badge/License-MIT-green)\n\n## Features\n\n- **Auto-Detection**: Automatically identifies the target column and task type (classification or regression)\n- **Data Profiling**: Analyzes data quality, distributions, missing values, and column statistics\n- **Smart Preprocessing**: Handles missing values, encodes categoricals, and scales features\n- **Feature Selection**: Applies variance thresholding and statistical feature selection (SelectKBest)\n- **Model Comparison**: Trains 4 models with 5-fold cross-validation and selects the best\n- **Evaluation**: Generates confusion matrices, feature importance charts, model comparison plots, and overfitting detection\n- **Report Generation**: Creates a professional DOCX report with all results and visualizations\n- **Web Interface**: Interactive Streamlit app for uploading data and exploring results\n- **No API Key Required**: Runs entirely locally using scikit-learn\n\n## Architecture\n\nThe project uses a multi-agent architecture where each agent handles one pipeline phase:\n\n```\nMLPipelineOrchestrator\n├── DataProfiler          → Dataset analysis \u0026 target detection\n├── PreprocessorAgent     → Cleaning, encoding, scaling\n├── FeatureEngineerAgent  → Feature selection \u0026 ranking\n├── ModelTrainerAgent     → Training \u0026 cross-validation (4 models)\n├── EvaluatorAgent        → Metrics, charts, overfitting detection\n└── ReportGenerator       → Professional DOCX report\n```\n\n### Models Used\n\n**Classification:**\n- Logistic Regression\n- Random Forest Classifier\n- Gradient Boosting Classifier\n- K-Nearest Neighbors\n\n**Regression:**\n- Linear Regression\n- Random Forest Regressor\n- Gradient Boosting Regressor\n- K-Nearest Neighbors Regressor\n\n## Quick Start\n\n### Installation\n\n```bash\npython -m venv venv\nsource venv/bin/activate   # Windows: venv\\Scripts\\activate\npip install -r requirements.txt\n```\n\n### CLI Usage\n\n```bash\n# Run with sample dataset\npython main.py\n\n# Run with your own data\npython main.py path/to/your_data.csv\n\n# Specify output directory\npython main.py data.csv --output reports/\n```\n\n### Web Interface\n\n```bash\nstreamlit run app.py\n```\n\nUpload a CSV/Excel file or use the built-in sample dataset. The app displays:\n- Data profile with quality metrics\n- Preprocessing steps applied\n- Model comparison table with scores\n- Evaluation charts (confusion matrix, feature importance, etc.)\n- Download button for the full DOCX report\n\n## Sample Dataset\n\nIncludes a customer churn dataset (`data/sample_customers.csv`) with 80 rows and 11 features:\n\n| Feature | Description |\n|---------|-------------|\n| age | Customer age |\n| income | Annual income |\n| credit_score | Credit score |\n| years_customer | Years as customer |\n| num_products | Number of products |\n| has_mortgage | Has mortgage (0/1) |\n| has_online_banking | Uses online banking (0/1) |\n| monthly_charges | Monthly charges |\n| total_charges | Total charges |\n| support_calls | Number of support calls |\n| **churn** | **Target** — whether customer churned (0/1) |\n\n## Testing\n\n```bash\npytest tests/ -v\n```\n\n35 tests covering all agents and the end-to-end pipeline.\n\n## Project Structure\n\n```\npredictive-analytics-agent/\n├── agents/\n│   ├── data_profiler.py        # Dataset profiling \u0026 analysis\n│   ├── preprocessor.py         # Data cleaning \u0026 transformation\n│   ├── feature_engineer.py     # Feature selection \u0026 ranking\n│   ├── model_trainer.py        # Model training \u0026 comparison\n│   ├── evaluator.py            # Model evaluation \u0026 charts\n│   └── orchestrator.py         # Pipeline coordinator\n├── utils/\n│   └── report_generator.py     # DOCX report generation\n├── tests/\n│   ├── test_profiler.py\n│   ├── test_preprocessor.py\n│   ├── test_feature_engineer.py\n│   ├── test_model_trainer.py\n│   ├── test_evaluator.py\n│   └── test_pipeline.py\n├── data/\n│   └── sample_customers.csv\n├── app.py                      # Streamlit web interface\n├── main.py                     # CLI entry point\n└── requirements.txt\n```\n\n## Tech Stack\n\n- **scikit-learn** — Machine learning models, preprocessing, evaluation\n- **pandas** — Data manipulation\n- **matplotlib** — Chart generation\n- **Streamlit** — Web interface\n- **python-docx** — Report generation\n- **Pydantic** — Data validation with typed models\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feugen-goebel%2Fpredictive-analytics-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feugen-goebel%2Fpredictive-analytics-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feugen-goebel%2Fpredictive-analytics-agent/lists"}