{"id":20306451,"url":"https://github.com/drkbluescience/autogluon_cameroon_air_quality","last_synced_at":"2025-03-04T07:25:18.459Z","repository":{"id":262625154,"uuid":"887862748","full_name":"drkbluescience/AutoGluon_Cameroon_Air_Quality","owner":"drkbluescience","description":"Finished 5th in the Cameroon Air Quality Prediction competition, later refining the model to achieve a score better than the 1st place submission using AutoGluon.","archived":false,"fork":false,"pushed_at":"2024-11-14T10:06:17.000Z","size":4361,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-14T11:33:07.255Z","etag":null,"topics":["autogluon","automl","feature-engineering","machine-learning","regression-analysis","tabular-data"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/drkbluescience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-13T12:16:29.000Z","updated_at":"2024-11-14T10:06:21.000Z","dependencies_parsed_at":"2024-11-13T12:36:04.881Z","dependency_job_id":"3da554cd-d256-4e74-abe1-805d8721bd65","html_url":"https://github.com/drkbluescience/AutoGluon_Cameroon_Air_Quality","commit_stats":null,"previous_names":["drkbluescience/autogluon_cameroon_air_quality"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drkbluescience%2FAutoGluon_Cameroon_Air_Quality","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drkbluescience%2FAutoGluon_Cameroon_Air_Quality/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drkbluescience%2FAutoGluon_Cameroon_Air_Quality/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drkbluescience%2FAutoGluon_Cameroon_Air_Quality/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/drkbluescience","download_url":"https://codeload.github.com/drkbluescience/AutoGluon_Cameroon_Air_Quality/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241803054,"owners_count":20022766,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autogluon","automl","feature-engineering","machine-learning","regression-analysis","tabular-data"],"created_at":"2024-11-14T17:13:24.956Z","updated_at":"2025-03-04T07:25:18.428Z","avatar_url":"https://github.com/drkbluescience.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **Cameroon Air Quality Prediction - AutoGluon**\n\n## Introduction\n\nThis study focuses on predicting air quality in Cameroon, specifically the concentration of particulate matter (**PM2.5**), using various machine learning techniques. The dataset includes weather and air quality features collected from different cities across Cameroon.\n\n## Leaderboard Achievement\n\nHere’s a snapshot of the position on the leaderboard during the **Cameroon Air Quality Prediction** competition, showing the score in 5th place at the end of the competition. After further model improvements, the 1st-place score was surpassed.\n![Leaderboard](images/leaderboard.png)\n\n## Methodology\n\nThe analysis began with an exploration of the dataset, where **data inconsistencies** were addressed. Features with a **single value**, such as **'sunrise'**, **'sunset'**, and **'snowfall_sum'**, were removed. Redundant variables, including **city**, **longitude**, and **latitude**, were also eliminated to reduce unnecessary complexity in the models.\n\n### Feature Engineering\n\nEnhancing predictive power involved analyzing the distribution of **PM2.5** concentrations across different cities, leading to the creation of a new feature:\n- **Distance from Bafoussam**, the city with the highest PM2.5 levels.\n  \n## Models\n\nSeveral machine learning models were initially employed to predict **PM2.5** levels, including:\n\n- **CatBoost**\n- **LightGBM (LGBM)**\n- **XGBoost (XGB)**\n- **GradientBoostingRegressor**\n- **ExtraTreesRegressor**\n- **RandomForestRegressor**\n- **AdaBoostRegressor**\n- **MLPRegressor**\n\nThese models were evaluated using a **9-split RepeatedKFold cross-validation** strategy to ensure reliable results.\n\nHowever, after initial testing, **AutoGluon** was introduced and ultimately provided the best performance, surpassing all other models in predictive accuracy.\n\n## Results\n\nAmong all models tested, **CatBoost** performed well, achieving a **root mean squared error (RMSE)** of **3.11078**. However, **AutoGluon** outperformed every other model, achieving the **lowest RMSE of 2.97008**.\n\n### Key Result Comparison\n\n| Model                     | RMSE      |\n| -------------------------- | --------- |\n| **CatBoost**               | 3.11078   |\n| **AutoGluon**              | **2.97008** |\n\nThis result demonstrates a **significant improvement** over the other models, indicating the superior predictive capabilities of **AutoGluon** for this particular task.\n\nWhile other models, such as **CatBoost** and **ExtraTrees**, provided competitive results, **AutoGluon’s automatic model selection and hyperparameter tuning** led to the best performance, further validating its effectiveness as an **AutoML tool** for air quality prediction.\n\n## Conclusion\n\nThis study highlights the critical role of **feature engineering** in improving model performance, as well as the superiority of **AutoGluon** over other machine learning models for the task of predicting **PM2.5** concentrations. AutoGluon’s automated approach to model selection and optimization resulted in a more accurate prediction, achieving the best RMSE score and outperforming traditional models like **CatBoost** and **XGBoost**.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrkbluescience%2Fautogluon_cameroon_air_quality","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrkbluescience%2Fautogluon_cameroon_air_quality","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrkbluescience%2Fautogluon_cameroon_air_quality/lists"}