{"id":32734854,"url":"https://github.com/bahar15984/obesity-classification","last_synced_at":"2025-11-03T07:02:41.349Z","repository":{"id":321125371,"uuid":"1084480206","full_name":"Bahar15984/Obesity-Classification","owner":"Bahar15984","description":"Machine Learning Pipeline for Obesity Classification using Azure ML \u0026 Python","archived":false,"fork":false,"pushed_at":"2025-10-27T22:09:23.000Z","size":4271,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-28T00:12:46.668Z","etag":null,"topics":["azure","azure-ml","classification","data-science","healthcare","machine-learning","mlops","obesity","pandas","pipeline","python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bahar15984.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-27T18:29:11.000Z","updated_at":"2025-10-27T22:09:27.000Z","dependencies_parsed_at":"2025-10-28T00:22:55.094Z","dependency_job_id":null,"html_url":"https://github.com/Bahar15984/Obesity-Classification","commit_stats":null,"previous_names":["bahar15984/obesity-classification"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Bahar15984/Obesity-Classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bahar15984%2FObesity-Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bahar15984%2FObesity-Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bahar15984%2FObesity-Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bahar15984%2FObesity-Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bahar15984","download_url":"https://codeload.github.com/Bahar15984/Obesity-Classification/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bahar15984%2FObesity-Classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":282415958,"owners_count":26665441,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-03T02:00:05.676Z","response_time":108,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","azure-ml","classification","data-science","healthcare","machine-learning","mlops","obesity","pandas","pipeline","python","scikit-learn"],"created_at":"2025-11-03T07:01:15.459Z","updated_at":"2025-11-03T07:02:41.341Z","avatar_url":"https://github.com/Bahar15984.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Obesity Classification using Azure Machine Learning and Databricks\n\nThis project demonstrates an end-to-end Machine Learning pipeline for predicting obesity levels based on demographic and biometric data such as Age, Gender, Height, Weight, and BMI.  \nThe solution integrates Azure Machine Learning, Databricks Lakeflow Jobs, and Python to enable automated training, deployment, and real-time inference in a cloud environment.\n\n---\n\n## Project Overview\n\nThe goal of this project is to classify individuals into four categories:\n- Underweight  \n- Normal weight  \n- Overweight  \n- Obese  \n\nThe project covers the complete lifecycle of model development:\n1. Data ingestion and preprocessing  \n2. Exploratory Data Analysis (EDA)  \n3. Model training and evaluation  \n4. Model registration and deployment in Azure ML  \n5. Pipeline orchestration using Lakeflow Jobs in Databricks  \n\n---\n\n## Technology Stack\n\n| Category | Tools and Frameworks |\n|-----------|----------------------|\n| Cloud Platform | Azure Machine Learning, Azure Databricks, Azure Blob Storage |\n| Language | Python |\n| Machine Learning | Scikit-learn, Pandas, NumPy, Seaborn, Matplotlib |\n| Workflow Automation | Lakeflow Jobs, Azure ML Pipelines, Compute Clusters (AmlCompute) |\n| Version Control | Git, GitHub |\n| Deployment | Azure ML Endpoints |\n\n---\n\n## Machine Learning Pipeline Architecture\n\n1. **Data Ingestion:** CSV dataset uploaded to Azure Blob Storage.  \n2. **Preprocessing:** Missing value treatment, encoding categorical variables, feature scaling.  \n3. **Modeling:** Decision Tree Classifier trained and validated on Azure ML compute cluster.  \n4. **Evaluation:** Accuracy, precision, recall, F1-score, confusion matrix.  \n5. **Deployment:** Model registered and deployed for real-time prediction.  \n6. **Orchestration:** Automated execution and monitoring using Lakeflow Jobs.\n\n---\n\n## Results\n\n| Metric | Value |\n|---------|-------|\n| Accuracy | 93.4% |\n| Precision (macro/micro) | 0.92 / 0.93 |\n| Recall (macro/micro) | 0.91 / 0.93 |\n\nThe similarity between macro and micro metrics indicates balanced performance across all weight categories.\n\n---\n\n## Dataset\n\n- **Source:** Custom dataset representing obesity levels and related attributes.  \n- **Features:** Age, Gender, Height, Weight, BMI  \n- **Target Variable:** Label (Underweight, Normal, Overweight, Obese)\n\n---\n\n## Deployment Details\n\n- Workspace: BaharML-Canada  \n- Resource Group: databricks-lab-rg  \n- Compute Cluster: cpu-cluster  \n- Pipeline ID: febf487e-a1e2-4f8b-92e7-02f7f46a54fd  \n- Experiment Name: ObesityPrediction_Run  \n\nExample execution in Azure ML:\n\n```python\nfrom azureml.core import Experiment\nfrom azureml.pipeline.core import PublishedPipeline\n\npublished_pipeline = PublishedPipeline.get(ws, id=pipeline_id)\nexperiment = Experiment(workspace=ws, name=\"ObesityPrediction_Run\")\nrun = experiment.submit(published_pipeline)\nrun.wait_for_completion(show_output=True)\n```\n\n---\n\n## Integration with Databricks Lakeflow Jobs\n\nLakeflow Jobs are used to orchestrate the workflow and automate:\n- Data preparation and validation  \n- Model retraining and deployment  \n- Periodic monitoring and retriggering of pipelines  \n\nThis approach ensures scalability, reproducibility, and adherence to MLOps best practices.\n\n---\n\n## Repository Structure\n\n```\nObesity-Classification/\n│\n├── obesity_pipeline.py          # Main pipeline code\n├── Obesity Classification.csv   # Dataset\n├── CloudProject_Bahar.pptx      # Presentation slides\n├── requirements.txt             # Dependencies\n├── README.md                    # Project documentation\n└── LICENSE                      # MIT License\n```\n\n---\n\n## Author\n\n**Bahar Almasi**  \nToronto, Canada  \nData Science and Analytics | Cloud Machine Learning | Azure ML | Databricks  \nLinkedIn: [linkedin.com/in/bahar-almasi](https://linkedin.com/in/bahar-almasi)  \nGitHub: [github.com/Bahar15984](https://github.com/Bahar15984)\n\n---\n\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbahar15984%2Fobesity-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbahar15984%2Fobesity-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbahar15984%2Fobesity-classification/lists"}