{"id":18449543,"url":"https://github.com/kianoushamirpour/intrusion_detection_with_unsupervised_learning","last_synced_at":"2025-04-16T19:42:29.897Z","repository":{"id":166292032,"uuid":"541084364","full_name":"KianoushAmirpour/Intrusion_Detection_with_Unsupervised_Learning","owner":"KianoushAmirpour","description":"Using unsupervised learning methods to detect anomalies in a system based on logs collected in real-time from the log aggregation systems of an enterprise.","archived":false,"fork":false,"pushed_at":"2023-08-25T18:18:12.000Z","size":3428,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-16T13:35:11.197Z","etag":null,"topics":["bot-detection","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KianoushAmirpour.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-25T07:04:13.000Z","updated_at":"2024-02-22T05:31:45.000Z","dependencies_parsed_at":"2024-11-06T07:34:11.497Z","dependency_job_id":null,"html_url":"https://github.com/KianoushAmirpour/Intrusion_Detection_with_Unsupervised_Learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KianoushAmirpour%2FIntrusion_Detection_with_Unsupervised_Learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KianoushAmirpour%2FIntrusion_Detection_with_Unsupervised_Learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KianoushAmirpour%2FIntrusion_Detection_with_Unsupervised_Learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KianoushAmirpour%2FIntrusion_Detection_with_Unsupervised_Learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KianoushAmirpour","download_url":"https://codeload.github.com/KianoushAmirpour/Intrusion_Detection_with_Unsupervised_Learning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249128348,"owners_count":21217126,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot-detection","unsupervised-learning"],"created_at":"2024-11-06T07:20:32.779Z","updated_at":"2025-04-15T18:28:09.470Z","avatar_url":"https://github.com/KianoushAmirpour.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Intrusion detection with unsupervised learning\n\nThis is the final project for advanced machine learning course represented by [Rahnema College](https://rahnemacollege.com/). In this project, we were tasked with identifying intrusions in a system, relying on the analysis of logs. Since the ground truth labels for anomalous behaviors weren't provided, we had to employ unsupervised anomaly detection methods.\n\n## Dataset.\nBecause the dataset cannot be shared publicly, we've included a few samples below to give you an idea of its content.\n\n* 207.213.193.143 [2021-5-12T5:6:0.0+0430] [Get /cdn/profiles/1026106239] 304 0 [[Googlebot-Image/1.0]] 32\n* 207.213.193.143 [2021-5-12T5:6:0.0+0430] [Get images/badge.png] 304 0 [[Googlebot-Image/1.0]] 4\n\n## Project structure\n- EDA\n  - Data_Cleaning_and_Basic_EDA.ipynb\n  - Distributions.ipynb\n  - Feature_Generation_and_EDA_based_on_them.ipynb\n\n- modes\n  - AutoEncoder.ipynb\n  - Gaussian_Mixture_Models.ipynb\n  - IsolationForest.ipynb\n\n- utils\n  - Gaussian_mixture_from_scratch.py\n  - build_features.py\n  - scraping_crawlers.py\n  - utils.py\n\n## Workflow\n- Data Cleaning and EDA:\n    - We performed data cleaning by removing unnecessary characters, modifying data types, and identifying missing values. We handled these issues using suitable approaches, along with visualizations.\n- Finding Sessions:\n  - We identified sessions for each unique pair of IP addresses and user agents, incorporating a 30-minute interval between two consecutive sessions.\n- Feature Engineering:\n  - num_requests\n  - Image_to_request ratio\n  - Percentage of `4xx` error responses\n  - Percentage of `HTTP` requests of type `HEAD`\n  - Standard deviation of the requested page’s depth\n  - Percentage of consecutively repeated `HTTP` requests\n  - Average and sum of response length and response time for each session\n  - Session duration\n  - Average time per page\n  - Robot.txt file request\n- Scraped Well-Known Crawlers.\n- Data Transformation Experimentation:\n  - We experimented with various data transformation techniques, including Power, Quantile, Logarithmic, Reciprocal, Square Root, Exponential, and Box-Cox transformations.\n- Anomaly Detection:\n  - Isolation Forest\n  - Gaussian mixture models\n  - Autoencoders\n\n\n\n\n   \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkianoushamirpour%2Fintrusion_detection_with_unsupervised_learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkianoushamirpour%2Fintrusion_detection_with_unsupervised_learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkianoushamirpour%2Fintrusion_detection_with_unsupervised_learning/lists"}