{"id":19525426,"url":"https://github.com/tderick/android-malware-detection","last_synced_at":"2025-08-26T12:13:12.156Z","repository":{"id":259589732,"uuid":"878138760","full_name":"tderick/android-malware-detection","owner":"tderick","description":"This project aims to build an effective classification model to classify a mobile application as Benign or Malware. To do so, we'll evaluate multiple classification models using different metrics and select the best model with better performance for our dataset. Finally, we deployed our model as a REST API using FastAPI.","archived":false,"fork":false,"pushed_at":"2024-11-05T16:47:54.000Z","size":5822,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-08T15:24:39.790Z","etag":null,"topics":["android","androidmalware","classification","fastapi","machine-learning","machine-learning-algorithms","xgboost-classifier"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tderick.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-24T20:57:11.000Z","updated_at":"2024-11-06T09:16:19.000Z","dependencies_parsed_at":"2024-10-26T19:19:03.058Z","dependency_job_id":"ac266b43-5ec5-4692-babd-37137eb7ec10","html_url":"https://github.com/tderick/android-malware-detection","commit_stats":null,"previous_names":["tderick/android-malware-detection-api","tderick/android-malware-detection"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tderick%2Fandroid-malware-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tderick%2Fandroid-malware-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tderick%2Fandroid-malware-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tderick%2Fandroid-malware-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tderick","download_url":"https://codeload.github.com/tderick/android-malware-detection/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240777555,"owners_count":19855856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["android","androidmalware","classification","fastapi","machine-learning","machine-learning-algorithms","xgboost-classifier"],"created_at":"2024-11-11T01:04:24.452Z","updated_at":"2025-02-26T01:43:32.353Z","avatar_url":"https://github.com/tderick.png","language":"Jupyter Notebook","readme":"# Android Malware Detection Using Machine Learning\n\n## Project Overview\nThis project aims to build an effective classification model to classify a mobile application as **Benign** or **Malware**. To do so, we'll evaluate multiple classification models using different metrics and select the best model with better performance for our dataset. Finally, we deployed our model as a REST API using FastAPI.\n\n## Dataset\n\nThe dataset used in this project, hosted on [FigShare](https://figshare.com/articles/dataset/Android_malware_dataset_for_machine_learning_2/5854653), contains feature vectors of 215 distinct attributes gathered from 15,036 mobile applications-5,560 classified as malware from the [Drebin](https://drebin.mlsec.org/) project and 9,476 as benign. It is structured with 215 columns and 15,036 rows, designed for binary classification where the target variable differentiates between **Malware (S) and Benign (B) apps**. Each attribute is encoded in binary format: 0 indicates an attribute's absence, while 1 denotes its presence. The class distribution is the following:\n\n![Class Distribution](assets/datadistribution.png \"Class Distribution\")\n\nThe 215 features of the dataset are divided into four different categories: **API Call Signature, Manifest Permission, Intent, Commands signature**.\n\n![Group Feature](assets/groupoffeature.png)\n\n## Machine Learning Models\nSeveral machine learning models were tested, including:\n\n- Random Forest\n- XGBoost\n- LightGBM\n- Extra Tree Classifier\n- Logistic Regression\n- Support Vector Machine\n- AdaBoost\n- Decision Tree\n- Bagging\n- Bayesian\n\n## Model Comparison\n\nThe models were evaluated based on accuracy, precision, recall, F1-score, and ROC AUC. XGBoost model emerged as the best performer with the following metrics:\n\n- **Accuracy**: 0.986698\n- **Precision**: 0.98914\n- **Recall**: 0.975022\n- **F1 Score**: 0.982031\n- **ROC AUC**: 0.998764\n\n## Fine-tuning\n\nUsing GridSearchCV, the hyperparameters for the XGBoost were fine-tuned to maximize recall. The optimal parameters were:\n\n- **colsample_bytree**: 0.8\n- **learning_rate**: 0.2\n- **max_depth**: 7\n- **n_estimators**: 200\n- **subsample**: 1.0\n\n## Deployment\n\nTo deploy our model, we package everything within a Docker container and expose the model as an API. When a user wants to make a prediction, they submit an APK to the API. The first step in the process involves reverse-engineering the APK to extract all the features necessary for the prediction. These features are then used to determine the status of the application. The complete workflow is illustrated in Figure:\n\n![](assets/android-malware-deploiement.png)\n\nTo have access to the application, you have to follow the following steps:\n\n1. Have Docker installed on your computer\n2. Run the following command: `docker run -p 8080:8000 tderick/android-malware-detection`\n3. Go to [http://localhost:8080/docs](http://localhost:8080/docs) to test the application.\n\nThe following pictures show the analysis of the WhatsApp APK:\n\n![](assets/apk_execution_1.png)\n![](assets/apk_execution_2.png)\n\n\nYou can download the APK version of mobile apps at [https://apkpure.com](https://apkpure.com) to test.\n\n## Build the docker image\n```\ndocker build -t tderick/android-malware-detection:latest .\n```\n\n## Run the image\n```\ndocker run -p 8080:8000 tderick/android-malware-detection:latest\n```\n\n## Push to docker hub\n```\ndocker push tderick/android-malware-detection:latest\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftderick%2Fandroid-malware-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftderick%2Fandroid-malware-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftderick%2Fandroid-malware-detection/lists"}