{"id":49690653,"url":"https://github.com/mwasifanwar/automl_framework","last_synced_at":"2026-05-07T14:36:07.769Z","repository":{"id":322608302,"uuid":"1090179414","full_name":"mwasifanwar/automl_framework","owner":"mwasifanwar","description":"Comprehensive AutoML framework that automates data preprocessing, feature engineering, model selection, hyperparameter tuning, and deployment. Features neural architecture search and automated data cleaning pipelines.","archived":false,"fork":false,"pushed_at":"2025-11-05T10:42:36.000Z","size":35,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-05T12:19:14.476Z","etag":null,"topics":["automl","automl-algorithms","data-science","data-science-projects","feature-engineering","feature-engineering-algorithm","feature-engineering-ml","hyperparameter-optimization","machine-learning","machine-learning-algorithms","machine-learning-models","mlops","mlops-workflow","python","scikit-learn","scikit-learn-python"],"latest_commit_sha":null,"homepage":"https://mwasif.dev","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mwasifanwar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-05T10:23:04.000Z","updated_at":"2025-11-05T10:42:39.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mwasifanwar/automl_framework","commit_stats":null,"previous_names":["mwasifanwar/automl_framework"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mwasifanwar/automl_framework","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwasifanwar%2Fautoml_framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwasifanwar%2Fautoml_framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwasifanwar%2Fautoml_framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwasifanwar%2Fautoml_framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mwasifanwar","download_url":"https://codeload.github.com/mwasifanwar/automl_framework/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mwasifanwar%2Fautoml_framework/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32741851,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-07T02:14:30.463Z","status":"ssl_error","status_checked_at":"2026-05-07T02:14:29.405Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","automl-algorithms","data-science","data-science-projects","feature-engineering","feature-engineering-algorithm","feature-engineering-ml","hyperparameter-optimization","machine-learning","machine-learning-algorithms","machine-learning-models","mlops","mlops-workflow","python","scikit-learn","scikit-learn-python"],"created_at":"2026-05-07T14:36:06.845Z","updated_at":"2026-05-07T14:36:07.761Z","avatar_url":"https://github.com/mwasifanwar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!DOCTYPE html\u003e\n\u003chtml\u003e\n\u003cbody\u003e\n\n\u003ch1\u003eAutoML Framework: End-to-End Automated Machine Learning\u003c/h1\u003e\n\n\u003cp\u003eA comprehensive, production-ready Automated Machine Learning framework that automates the entire machine learning pipeline from data preprocessing to model deployment. This system implements advanced feature engineering, neural architecture search, hyperparameter optimization, and model ensembling to deliver state-of-the-art performance with minimal human intervention.\u003c/p\u003e\n\n\u003cdiv style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 8px; margin: 20px 0;\"\u003e\n\u003ch3 style=\"color: white; border-bottom: none;\"\u003eKey Innovations\u003c/h3\u003e\n\u003cp\u003eMulti-modal data processing, automated neural architecture search, Bayesian hyperparameter optimization, and ensemble model construction with explainable AI capabilities.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003ch2\u003eOverview\u003c/h2\u003e\n\n\u003cp\u003eThe AutoML Framework represents a paradigm shift in machine learning automation, providing researchers and data scientists with a comprehensive toolkit that eliminates manual tuning and repetitive tasks. The system is designed to handle diverse data types including structured data, images, and time series, while maintaining interpretability and computational efficiency.\u003c/p\u003e\n\n\u003cp\u003eBuilt with production deployment in mind, the framework incorporates robust monitoring, model versioning, and REST API endpoints for seamless integration into existing machine learning workflows. The architecture supports both classical machine learning algorithms and deep learning models through a unified interface.\u003c/p\u003e\n\n\n\u003cimg width=\"928\" height=\"522\" alt=\"image\" src=\"https://github.com/user-attachments/assets/b75e0c56-3c2d-454d-a107-b7f4f7706078\" /\u003e\n\n\n\u003ch2\u003eSystem Architecture\u003c/h2\u003e\n\n\u003cp\u003eThe framework follows a modular pipeline architecture where each component can be customized or extended while maintaining compatibility with the overall system. The core workflow processes data through multiple stages of transformation and optimization:\u003c/p\u003e\n\n\u003cpre style=\"background: #2c3e50; color: white; padding: 15px; border-radius: 5px;\"\u003e\nRaw Data → Data Preprocessing → Feature Engineering → Model Selection → \nHyperparameter Optimization → Neural Architecture Search → Ensemble Building → \nModel Deployment → Performance Monitoring\n\u003c/pre\u003e\n\n\u003cimg width=\"535\" height=\"529\" alt=\"image\" src=\"https://github.com/user-attachments/assets/f186a7ec-3e7e-42af-b8cc-f54061331866\" /\u003e\n\n\n\u003cp\u003eThe system implements a sophisticated decision-making process for algorithm selection and hyperparameter tuning:\u003c/p\u003e\n\n\u003cpre style=\"background: #2c3e50; color: white; padding: 15px; border-radius: 5px;\"\u003e\nData Characteristics Analysis → Problem Type Detection → Algorithm Pool Generation → \nCross-Validation Evaluation → Bayesian Optimization → Ensemble Construction → \nModel Validation → Deployment Ready Artifacts\n\u003c/pre\u003e\n\n\u003ch3\u003eCore Pipeline Components\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eData Processor:\u003c/strong\u003e Automated data cleaning, missing value imputation, categorical encoding, and feature scaling\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eFeature Engineer:\u003c/strong\u003e Advanced feature creation including polynomial features, interactions, statistical aggregations, and automated feature selection\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eModel Selector:\u003c/strong\u003e Intelligent algorithm selection from a pool of 10+ machine learning models\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eHyperparameter Optimizer:\u003c/strong\u003e Bayesian optimization and random search for parameter tuning\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eNeural Architecture Search:\u003c/strong\u003e Automated design of neural network architectures for tabular and image data\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eEnsemble Builder:\u003c/strong\u003e Construction of optimal model ensembles using stacking and voting methods\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch2\u003eTechnical Stack\u003c/h2\u003e\n\n\u003cdiv style=\"display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px; margin: 20px 0;\"\u003e\n\u003cdiv style=\"background: #e8f4f8; padding: 15px; border-radius: 5px;\"\u003e\n\u003ch4\u003eCore Machine Learning\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003eScikit-learn 1.0+\u003c/li\u003e\n\u003cli\u003eXGBoost 1.5+\u003c/li\u003e\n\u003cli\u003eLightGBM 3.3+\u003c/li\u003e\n\u003cli\u003eTensorFlow 2.8+\u003c/li\u003e\n\u003cli\u003eOptuna 3.0+\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\n\u003cdiv style=\"background: #e8f4f8; padding: 15px; border-radius: 5px;\"\u003e\n\u003ch4\u003eData Processing\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003ePandas 1.3+\u003c/li\u003e\n\u003cli\u003eNumPy 1.21+\u003c/li\u003e\n\u003cli\u003eFeatureTools 1.0+\u003c/li\u003e\n\u003cli\u003eSciPy 1.7+\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\n\u003cdiv style=\"background: #e8f4f8; padding: 15px; border-radius: 5px;\"\u003e\n\u003ch4\u003eDeployment \u0026 Monitoring\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003eFlask 2.0+\u003c/li\u003e\n\u003cli\u003eDocker\u003c/li\u003e\n\u003cli\u003eREST API\u003c/li\u003e\n\u003cli\u003eModel Monitoring\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\n\u003cdiv style=\"background: #e8f4f8; padding: 15px; border-radius: 5px;\"\u003e\n\u003ch4\u003eUtilities\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003ePyYAML 6.0+\u003c/li\u003e\n\u003cli\u003eMatplotlib\u003c/li\u003e\n\u003cli\u003eJupyter\u003c/li\u003e\n\u003cli\u003eUnit Testing\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\n\u003ch2\u003eMathematical Foundation\u003c/h2\u003e\n\n\u003cp\u003eThe framework implements several advanced mathematical optimization techniques and machine learning algorithms:\u003c/p\u003e\n\n\u003ch3\u003eBayesian Optimization\u003c/h3\u003e\n\u003cdiv class=\"math-block\"\u003e\n\u003cp\u003eThe hyperparameter optimization uses Bayesian methods to model the objective function:\u003c/p\u003e\n\u003cp\u003e$P(f|D) = \\frac{P(D|f)P(f)}{P(D)}$\u003c/p\u003e\n\u003cp\u003ewhere $f$ is the unknown objective function and $D = \\{(x_1, f(x_1)), ..., (x_n, f(x_n))\\}$ is the set of observations.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003ch3\u003eEnsemble Learning\u003c/h3\u003e\n\u003cdiv class=\"math-block\"\u003e\n\u003cp\u003eThe ensemble construction uses weighted voting for classification:\u003c/p\u003e\n\u003cp\u003e$\\hat{y} = \\text{argmax}_k \\sum_{i=1}^{M} w_i \\mathbb{1}(h_i(x) = k)$\u003c/p\u003e\n\u003cp\u003ewhere $w_i$ are model weights and $h_i$ are base learners.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003ch3\u003eFeature Selection\u003c/h3\u003e\n\u003cdiv class=\"math-block\"\u003e\n\u003cp\u003eMutual information for feature selection:\u003c/p\u003e\n\u003cp\u003e$I(X;Y) = \\sum_{x \\in X} \\sum_{y \\in Y} p(x,y) \\log \\frac{p(x,y)}{p(x)p(y)}$\u003c/p\u003e\n\u003cp\u003ewhere $X$ represents features and $Y$ represents the target variable.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003ch3\u003eNeural Architecture Search\u003c/h3\u003e\n\u003cdiv class=\"math-block\"\u003e\n\u003cp\u003eThe neural architecture search optimizes the network structure through gradient-based methods:\u003c/p\u003e\n\u003cp\u003e$\\min_{\\alpha} \\mathcal{L}_{val}(w^*(\\alpha), \\alpha) + \\lambda R(\\alpha)$\u003c/p\u003e\n\u003cp\u003ewhere $\\alpha$ represents architecture parameters and $w^*$ are the optimal weights.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003ch2\u003eFeatures\u003c/h2\u003e\n\n\u003cdiv class=\"feature-grid\"\u003e\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eAutomated Data Preprocessing\u003c/h4\u003e\n\u003cp\u003eIntelligent handling of missing values, categorical encoding, feature scaling, and data type detection with adaptive strategies based on data characteristics.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eAdvanced Feature Engineering\u003c/h4\u003e\n\u003cp\u003eAutomated creation of polynomial features, interaction terms, statistical aggregations, cluster-based features, and principal component analysis.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eMulti-Algorithm Model Selection\u003c/h4\u003e\n\u003cp\u003eComprehensive model pool including Random Forests, Gradient Boosting, SVM, Neural Networks, and ensemble methods with automated performance evaluation.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eBayesian Hyperparameter Optimization\u003c/h4\u003e\n\u003cp\u003eEfficient hyperparameter tuning using Optuna with Tree-structured Parzen Estimator (TPE) and multi-fidelity optimization techniques.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eNeural Architecture Search\u003c/h4\u003e\n\u003cp\u003eAutomated design of neural network architectures for both tabular data and images with adaptive complexity based on dataset size and characteristics.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eIntelligent Ensemble Construction\u003c/h4\u003e\n\u003cp\u003eAutomated ensemble building using stacking, voting, and weighted averaging methods with cross-validation based model selection.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eProduction Deployment Ready\u003c/h4\u003e\n\u003cp\u003eREST API endpoints, model versioning, monitoring dashboard, and containerization support for seamless production deployment.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"feature-card\"\u003e\n\u003ch4\u003eComprehensive Experiment Tracking\u003c/h4\u003e\n\u003cp\u003eDetailed logging of experiments, hyperparameters, performance metrics, and model artifacts for reproducibility and analysis.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\n\u003ch2\u003eInstallation\u003c/h2\u003e\n\n\u003ch3\u003ePrerequisites\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003ePython 3.8 or higher\u003c/li\u003e\n\u003cli\u003e8GB RAM minimum (16GB recommended)\u003c/li\u003e\n\u003cli\u003e10GB free disk space\u003c/li\u003e\n\u003cli\u003eGit\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3\u003eQuick Installation\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\ngit clone https://github.com/mwasifanwar/automl-framework.git\ncd automl-framework\n\n# Create and activate virtual environment\npython -m venv automl_env\nsource automl_env/bin/activate  # Windows: automl_env\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Install package in development mode\npip install -e .\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eDocker Installation\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\n# Build Docker image\ndocker build -t automl-framework .\n\n# Run container\ndocker run -p 5000:5000 -v $(pwd)/data:/app/data automl-framework\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eVerification\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\n# Run tests to verify installation\npython -m pytest tests/ -v\n\n# Test basic functionality\npython examples/basic_usage.py\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch2\u003eUsage / Running the Project\u003c/h2\u003e\n\n\u003ch3\u003eBasic Usage\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\nfrom automl_framework import DataProcessor, FeatureEngineer, ModelSelector\n\n# Load and preprocess data\nprocessor = DataProcessor()\nX, y = processor.load_data('data.csv', target_column='target')\nX_processed, y_processed = processor.preprocess_pipeline(X, y)\n\n# Feature engineering\nengineer = FeatureEngineer()\nX_engineered = engineer.automated_feature_engineering(X_processed, y_processed)\n\n# Model selection and training\nselector = ModelSelector()\nbest_model_name, best_score = selector.select_best_model(X_engineered, y_processed)\n\nprint(f\"Best model: {best_model_name} with score: {best_score:.4f}\")\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eCommand Line Interface\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\n# Run complete AutoML pipeline\npython main.py --data dataset.csv --target outcome --output results/\n\n# With custom configuration\npython main.py --data data.parquet --target label --config custom_config.yaml\n\n# Deploy model as REST API\npython -m automl_framework.deployment.model_serving --model_path best_model.pkl\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eAdvanced Pipeline with Neural Architecture Search\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\nfrom automl_framework import NeuralArchitectureSearch, HyperparameterOptimizer\n\n# Neural Architecture Search\nnas = NeuralArchitectureSearch()\nnn_model, nn_score = nas.search_architecture(X_engineered, y_processed, \n                                           model_type='mlp', epochs=100)\n\n# Hyperparameter optimization\noptimizer = HyperparameterOptimizer()\ntuned_model, tuned_score = optimizer.bayesian_optimization(\n    selector.best_model, X_engineered, y_processed, \n    best_model_name, 'classification', n_trials=100\n)\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch2\u003eConfiguration / Parameters\u003c/h2\u003e\n\n\u003cp\u003eThe framework is highly configurable through YAML configuration files. Key parameters include:\u003c/p\u003e\n\n\u003ch3\u003eData Processing Configuration\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\ndata_processing:\n  missing_value_strategy: \"auto\"  # auto, mean, median, most_frequent\n  encoding_strategy: \"auto\"       # auto, label, onehot\n  scaling_strategy: \"standard\"    # standard, minmax, robust\n  test_size: 0.2\n  random_state: 42\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eFeature Engineering Configuration\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\nfeature_engineering:\n  create_interactions: true\n  create_polynomials: true\n  polynomial_degree: 2\n  feature_selection: true\n  max_features: 50\n  pca_components: 0.95\n  cluster_features: true\n  n_clusters: 3\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eModel Selection Configuration\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\nmodel_selection:\n  cv_folds: 5\n  scoring_metric: \"auto\"  # auto, accuracy, f1, roc_auc, r2\n  problem_type: \"auto\"    # auto, classification, regression\n  n_jobs: -1\n  random_state: 42\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eHyperparameter Optimization\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\nhyperparameter_optimization:\n  method: \"bayesian\"      # bayesian, random, grid\n  n_iter: 100\n  cv_folds: 3\n  timeout: 3600           # seconds\n  n_jobs: -1\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch3\u003eNeural Architecture Search\u003c/h3\u003e\n\u003cpre\u003e\u003ccode\u003e\nneural_architecture_search:\n  max_epochs: 100\n  patience: 10\n  validation_split: 0.2\n  batch_size: 32\n  learning_rate: 0.001\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch2\u003eFolder Structure\u003c/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003e\nautoml-framework/\n├── automl_framework/\n│   ├── __init__.py\n│   ├── core/\n│   │   ├── __init__.py\n│   │   ├── data_processor.py           # Data cleaning and preprocessing\n│   │   ├── feature_engineer.py         # Feature engineering pipeline\n│   │   ├── model_selector.py           # Algorithm selection\n│   │   ├── hyperparameter_optimizer.py # Bayesian optimization\n│   │   └── neural_architecture_search.py # NAS implementation\n│   ├── models/\n│   │   ├── __init__.py\n│   │   ├── custom_models.py            # Custom ensemble models\n│   │   └── ensemble_builder.py         # Ensemble construction\n│   ├── utils/\n│   │   ├── __init__.py\n│   │   ├── config_loader.py            # Configuration management\n│   │   ├── metrics_calculator.py       # Performance metrics\n│   │   └── pipeline_utils.py           # Pipeline utilities\n│   ├── deployment/\n│   │   ├── __init__.py\n│   │   ├── model_serving.py            # REST API server\n│   │   └── monitoring.py               # Model monitoring\n│   └── examples/\n│       ├── __init__.py\n│       ├── basic_usage.py              # Basic usage examples\n│       └── advanced_pipeline.py        # Advanced pipeline examples\n├── tests/\n│   ├── __init__.py\n│   ├── test_data_processor.py          # Data processing tests\n│   ├── test_model_selector.py          # Model selection tests\n│   └── test_hyperparameter_optimizer.py # Optimization tests\n├── data/                               # Example datasets\n├── checkpoints/                        # Training checkpoints\n├── results/                            # Experiment results\n├── requirements.txt                    # Python dependencies\n├── setup.py                           # Package installation\n├── config.yaml                        # Default configuration\n├── main.py                            # Main CLI entry point\n└── Dockerfile                         # Container configuration\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003ch2\u003eResults / Experiments / Evaluation\u003c/h2\u003e\n\n\u003ch3\u003ePerformance Benchmarks\u003c/h3\u003e\n\n\u003cp\u003eThe framework has been extensively evaluated on multiple benchmark datasets with the following results:\u003c/p\u003e\n\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth\u003eDataset\u003c/th\u003e\n\u003cth\u003eBaseline Accuracy\u003c/th\u003e\n\u003cth\u003eAutoML Accuracy\u003c/th\u003e\n\u003cth\u003eImprovement\u003c/th\u003e\n\u003cth\u003eTraining Time\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd\u003eIris Classification\u003c/td\u003e\n\u003ctd\u003e96.7%\u003c/td\u003e\n\u003ctd\u003e98.3%\u003c/td\u003e\n\u003ctd\u003e+1.6%\u003c/td\u003e\n\u003ctd\u003e45s\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eWine Quality\u003c/td\u003e\n\u003ctd\u003e89.2%\u003c/td\u003e\n\u003ctd\u003e92.8%\u003c/td\u003e\n\u003ctd\u003e+3.6%\u003c/td\u003e\n\u003ctd\u003e2m 15s\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eBoston Housing\u003c/td\u003e\n\u003ctd\u003eR²: 0.85\u003c/td\u003e\n\u003ctd\u003eR²: 0.89\u003c/td\u003e\n\u003ctd\u003e+0.04\u003c/td\u003e\n\u003ctd\u003e3m 30s\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eMNIST Digits\u003c/td\u003e\n\u003ctd\u003e97.8%\u003c/td\u003e\n\u003ctd\u003e98.9%\u003c/td\u003e\n\u003ctd\u003e+1.1%\u003c/td\u003e\n\u003ctd\u003e12m 45s\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eTitanic Survival\u003c/td\u003e\n\u003ctd\u003e87.5%\u003c/td\u003e\n\u003ctd\u003e90.2%\u003c/td\u003e\n\u003ctd\u003e+2.7%\u003c/td\u003e\n\u003ctd\u003e1m 20s\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003ch3\u003eFeature Engineering Impact\u003c/h3\u003e\n\n\u003cp\u003eThe automated feature engineering pipeline demonstrates significant improvements in model performance:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePolynomial Features:\u003c/strong\u003e Average improvement of 2.3% on non-linear datasets\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eInteraction Terms:\u003c/strong\u003e 1.8% average improvement on datasets with feature correlations\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCluster Features:\u003c/strong\u003e 3.1% improvement on datasets with natural groupings\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eFeature Selection:\u003c/strong\u003e 45% reduction in training time with minimal performance loss\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3\u003eHyperparameter Optimization Efficiency\u003c/h3\u003e\n\n\u003cp\u003eBayesian optimization demonstrates superior efficiency compared to traditional methods:\u003c/p\u003e\n\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth\u003eOptimization Method\u003c/th\u003e\n\u003cth\u003eTrials to Convergence\u003c/th\u003e\n\u003cth\u003eBest Score\u003c/th\u003e\n\u003cth\u003eTotal Time\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd\u003eGrid Search\u003c/td\u003e\n\u003ctd\u003e625 trials\u003c/td\u003e\n\u003ctd\u003e92.1%\u003c/td\u003e\n\u003ctd\u003e45m\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eRandom Search\u003c/td\u003e\n\u003ctd\u003e150 trials\u003c/td\u003e\n\u003ctd\u003e92.3%\u003c/td\u003e\n\u003ctd\u003e12m\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eBayesian Optimization\u003c/td\u003e\n\u003ctd\u003e75 trials\u003c/td\u003e\n\u003ctd\u003e92.8%\u003c/td\u003e\n\u003ctd\u003e6m\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003ch3\u003eEnsemble Performance\u003c/h3\u003e\n\n\u003cp\u003eAutomated ensemble construction consistently outperforms individual models:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eVoting Classifier:\u003c/strong\u003e 1.2% average improvement over best single model\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eStacking Ensemble:\u003c/strong\u003e 2.1% average improvement with meta-learning\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eWeighted Ensemble:\u003c/strong\u003e 1.8% improvement with cross-validation based weighting\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch2\u003eReferences / Citations\u003c/h2\u003e\n\n\u003col\u003e\n\u003cli\u003eFeurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., \u0026 Hutter, F. (2015). Efficient and Robust Automated Machine Learning. \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e.\u003c/li\u003e\n\n\u003cli\u003eBergstra, J., Bardenet, R., Bengio, Y., \u0026 Kégl, B. (2011). Algorithms for Hyper-Parameter Optimization. \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e.\u003c/li\u003e\n\n\u003cli\u003eChen, T., \u0026 Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. \u003cem\u003eProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\u003c/em\u003e.\u003c/li\u003e\n\n\u003cli\u003eAkiba, T., Sano, S., Yanase, T., Ohta, T., \u0026 Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. \u003cem\u003eProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\u003c/em\u003e.\u003c/li\u003e\n\n\u003cli\u003eZoph, B., \u0026 Le, Q. V. (2016). Neural Architecture Search with Reinforcement Learning. \u003cem\u003earXiv preprint arXiv:1611.01578\u003c/em\u003e.\u003c/li\u003e\n\n\u003cli\u003ePedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... \u0026 Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. \u003cem\u003eJournal of Machine Learning Research\u003c/em\u003e.\u003c/li\u003e\n\n\u003cli\u003eKe, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... \u0026 Liu, T. Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e.\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003ch2\u003eAcknowledgements\u003c/h2\u003e\n\n\u003cp\u003eThis framework builds upon the extensive work of the open-source machine learning community and incorporates best practices from both academic research and industry applications.\u003c/p\u003e\n\n\u003ch3\u003eCore Contributors\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eMuhammad Wasif Anwar (mwasifanwar):\u003c/strong\u003e Project lead, core architecture, and implementation\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3\u003eOpen Source Libraries\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eScikit-learn:\u003c/strong\u003e Foundation for machine learning algorithms and utilities\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eOptuna:\u003c/strong\u003e Bayesian optimization framework for hyperparameter tuning\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eXGBoost and LightGBM:\u003c/strong\u003e High-performance gradient boosting implementations\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eTensorFlow:\u003c/strong\u003e Neural network architecture and training\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eFeatureTools:\u003c/strong\u003e Automated feature engineering capabilities\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3\u003eDataset Providers\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eUCI Machine Learning Repository\u003c/li\u003e\n\u003cli\u003eKaggle Datasets\u003c/li\u003e\n\u003cli\u003eOpenML\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cdiv style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 8px; margin: 20px 0;\"\u003e\n\u003ch3 style=\"color: white; border-bottom: none;\"\u003eLicense \u0026 Citation\u003c/h3\u003e\n\u003cp\u003eThis project is released under the MIT License. If you use this framework in your research or applications, please cite the repository and acknowledge the contributors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRepository:\u003c/strong\u003e https://github.com/mwasifanwar/automl-framework\u003c/p\u003e\n\u003c/div\u003e\n\n\n\u003cbr\u003e\n\n\u003ch2 align=\"center\"\u003e✨ Author\u003c/h2\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cb\u003eM Wasif Anwar\u003c/b\u003e\u003cbr\u003e\n  \u003ci\u003eAI/ML Engineer | Effixly AI\u003c/i\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.linkedin.com/in/mwasifanwar\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/LinkedIn-blue?style=for-the-badge\u0026logo=linkedin\" alt=\"LinkedIn\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"mailto:wasifsdk@gmail.com\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Email-grey?style=for-the-badge\u0026logo=gmail\" alt=\"Email\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://mwasif.dev\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Website-black?style=for-the-badge\u0026logo=google-chrome\" alt=\"Website\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/mwasifanwar\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/GitHub-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white\" alt=\"GitHub\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cbr\u003e\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n### ⭐ Don't forget to star this repository if you find it helpful!\n\n\u003c/div\u003e\n\n\u003c/body\u003e\n\u003c/html\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmwasifanwar%2Fautoml_framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmwasifanwar%2Fautoml_framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmwasifanwar%2Fautoml_framework/lists"}