{"id":27789166,"url":"https://github.com/aleenagibi/honeytrap","last_synced_at":"2026-04-13T21:32:18.349Z","repository":{"id":290168539,"uuid":"973570825","full_name":"aleenagibi/HoneyTrap","owner":"aleenagibi","description":"Machine learning model for real-time attack detection based on AWS honeypot data, achieving over 91% accuracy using Random Forest and feature engineering techniques","archived":false,"fork":false,"pushed_at":"2025-04-27T09:53:27.000Z","size":6383,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-28T01:40:44.375Z","etag":null,"topics":["aws","cybersecurity","honeypot","machine-learning","random-forest-classifier"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aleenagibi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-27T09:21:02.000Z","updated_at":"2025-04-27T10:25:11.000Z","dependencies_parsed_at":"2025-04-30T17:48:40.990Z","dependency_job_id":null,"html_url":"https://github.com/aleenagibi/HoneyTrap","commit_stats":null,"previous_names":["aleenagibi/honeytrap"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aleenagibi/HoneyTrap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aleenagibi%2FHoneyTrap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aleenagibi%2FHoneyTrap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aleenagibi%2FHoneyTrap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aleenagibi%2FHoneyTrap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aleenagibi","download_url":"https://codeload.github.com/aleenagibi/HoneyTrap/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aleenagibi%2FHoneyTrap/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271746551,"owners_count":24813570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","cybersecurity","honeypot","machine-learning","random-forest-classifier"],"created_at":"2025-04-30T17:49:08.509Z","updated_at":"2026-04-13T21:32:18.299Z","avatar_url":"https://github.com/aleenagibi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HoneyTrap: AWS Honeypot Attack Detection Using Machine Learning\n\n\n##  Overview\n\nCybersecurity threats are becoming increasingly sophisticated, and cloud infrastructures like AWS are popular targets.  \nThis project focuses on building a machine learning model to detect and predict attacks based on real-world honeypot data collected from AWS servers.\n\nUsing a carefully designed pipeline — including data preprocessing, feature engineering, model training, hyperparameter optimization, and evaluation — we aim to accurately differentiate between normal and malicious activities.  \n\nThe project demonstrates how traditional machine learning techniques can still achieve high reliability (91%+ accuracy) in a highly sensitive domain like cybersecurity.\n\n\n##  Dataset\n\n- **Source**: AWS Honeypot Attack Data\n- **Nature**: Real-world network traffic logs\n- **Features**:\n  - IP addresses\n  - Ports accessed\n  - Protocols (TCP, UDP)\n  - Attack categories (e.g., brute force, botnets)\n  - Geolocation metadata\n- **Label**:\n  - `Attack` (1) — Malicious Activity\n  - `No Attack` (0) — Normal Traffic\n- **Challenges**:\n  - Highly imbalanced data\n  - Presence of outliers and noisy data\n\n\n##  Methodology\n\n###  Data Preprocessing\n- **Label Encoding**: Converted categorical fields into numerical format.\n- **SMOTE Oversampling**: Addressed class imbalance by synthetically generating new samples for minority class.\n- **Scaling**: Applied `RobustScaler` to mitigate the effects of outliers.\n\n###  Feature Engineering\n- **Feature Selection**: Used Random Forest Feature Importances to select the most impactful features.\n\n###  Model Building\n- **Model Chosen**: Random Forest Classifier\n- **Hyperparameter Tuning**: Used GridSearchCV with 5-fold cross-validation to tune:\n  - `n_estimators`\n  - `max_depth`\n  - `min_samples_split`\n  - `min_samples_leaf`\n\n###  Model Evaluation\n- **Cross-Validation**: 5-fold cross-validation to evaluate training set performance.\n- **Testing**: Separate holdout test set to assess real-world model performance.\n\n**Metrics Used**:\n- Accuracy\n- Precision\n- Recall\n- F1-Score\n- ROC-AUC Score\n\n\n##  Results and Analysis\n\n| Metric                      | Value |\n|:-----------------------------|:------|\n| **Training Cross-Validation Mean Score** | ~90–91% |\n| **Test Set Accuracy**        | ~91%  |\n| **ROC AUC Score**            | ~0.93 |\n| **Precision (Attack Class)** | High (above 90%) |\n| **Recall (Attack Class)**    | High (above 90%) |\n\n\nConfusion Matrix:\n\n![image](https://github.com/user-attachments/assets/5d34bca3-2378-43d7-8e2f-efecb029e859)\n\n\nROC Curve:\n\n![image](https://github.com/user-attachments/assets/0c95c25d-76e0-4caa-a897-7c8ffc0772b2)\n\n\nFeature Importance:\n\n![image](https://github.com/user-attachments/assets/64e0144e-237c-4515-8e7f-99e5e1938fd6)\n\n\n###  Detailed Insights:\n\n- **High Accuracy (91%)**:\n  - Shows the model generalizes well to unseen attack data.\n- **High ROC-AUC (0.93)**:\n  - Indicates a strong ability to distinguish between \"attack\" and \"normal\" classes.\n- **Balanced Precision and Recall**:\n  - Critical in cybersecurity — not missing real attacks (high recall) and avoiding too many false alarms (high precision).\n\n### Observations:\n\n- Imbalanced data was successfully handled using SMOTE.\n- Feature selection improved both speed and accuracy.\n- Random Forest after hyperparameter tuning outperformed basic models.\n  \n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faleenagibi%2Fhoneytrap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faleenagibi%2Fhoneytrap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faleenagibi%2Fhoneytrap/lists"}