{"id":34689063,"url":"https://github.com/githubfoam/unsw-nb15-anomaly-detection","last_synced_at":"2026-05-28T05:31:01.396Z","repository":{"id":308664748,"uuid":"1031886312","full_name":"githubfoam/unsw-nb15-anomaly-detection","owner":"githubfoam","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-07T07:23:05.000Z","size":260,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-07T07:25:11.998Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/githubfoam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-04T13:31:39.000Z","updated_at":"2025-08-06T12:57:56.000Z","dependencies_parsed_at":"2025-08-07T07:25:22.993Z","dependency_job_id":"46ccdbd5-7137-4b9a-9c63-b9f139e5dd9c","html_url":"https://github.com/githubfoam/unsw-nb15-anomaly-detection","commit_stats":null,"previous_names":["githubfoam/unsw-nb15-anomaly-detection"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/githubfoam/unsw-nb15-anomaly-detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubfoam%2Funsw-nb15-anomaly-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubfoam%2Funsw-nb15-anomaly-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubfoam%2Funsw-nb15-anomaly-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubfoam%2Funsw-nb15-anomaly-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/githubfoam","download_url":"https://codeload.github.com/githubfoam/unsw-nb15-anomaly-detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubfoam%2Funsw-nb15-anomaly-detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33596316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-24T21:56:50.497Z","updated_at":"2026-05-28T05:31:01.390Z","avatar_url":"https://github.com/githubfoam.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# unsw-nb15-anomaly-detection\n[![Python CI](https://github.com/githubfoam/unsw-nb15-anomaly-detection/actions/workflows/python-ci.yml/badge.svg)](https://github.com/githubfoam/unsw-nb15-anomaly-detection/actions/workflows/python-ci.yml)\n\nUNSW-NB15 Anomaly Detection\n\nIntroduction\n\nThis repository contains a data science project for network anomaly detection using the UNSW-NB15 dataset. The project leverages an Isolation Forest model to identify suspicious network traffic patterns, which is a key task in cybersecurity.\n\nDataset\n\nThis project uses the UNSW-NB15 dataset, which was created by the IXIA PerfectStorm tool in the Cyber Range Lab of UNSW Canberra. The raw network packets were captured using the tcpdump tool, resulting in 100 GB of raw traffic. The dataset contains nine types of attacks, including Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms.\n\nUsing the Argus and Bro-IDS tools, 49 features were generated, which are described in the UNSW-NB15_features.csv file. The total dataset consists of over 2.5 million records, distributed across four CSV files: UNSW-NB15_1.csv, UNSW-NB15_2.csv, UNSW-NB15_3.csv, and UNSW-NB15_4.csv. Ground truth information is available in the UNSW-NB15_GT.csv file, and a list of events is in UNSW-NB15_LIST_EVENTS.csv. The dataset is also available in pre-partitioned training and testing sets.\n\nThe official source for the dataset and its documentation can be found here:\nhttps://research.unsw.edu.au/projects/unsw-nb15-dataset\n\nRepository Structure\n\nThe project's directory structure is organized as follows:\n\n.github/workflows/\n├── python-ci.yml           # Stable CI workflow\nnotebooks/\n├── isolation_forest_model.pkl    # Pre-trained Isolation Forest model\n├── scaler.pkl                  # Data scaler for preprocessing\n└── unsw_nb15_anomaly_detection.ipynb\n.gitignore\nLICENSE\nREADME.md\nrequirements.txt\n\n    .github/workflows/: Contains the GitHub Actions workflow files for Continuous Integration.\n\n    notebooks/: Holds the Jupyter notebook and the project's assets.\n\n        Why are they stored?\n        The isolation_forest_model.pkl (the trained model) and scaler.pkl (the data scaler) are stored to eliminate the need for time-consuming re-training and re-fitting processes. Training the model on the large UNSW-NB15 dataset is computationally intensive. By saving these files after they are created, we can ensure consistent results and a faster workflow.\n\n        How are they used?\n        These files are loaded directly into the notebook using a library like pickle. The scaler.pkl is loaded first to transform raw input data into the same format the model was trained on. Then, the isolation_forest_model.pkl is loaded to make predictions on the preprocessed data, allowing for quick and efficient anomaly detection without repeating the initial setup steps.\n\n    requirements.txt: Lists all the necessary Python libraries for this project.\n\nCI/CD Workflow\n\nThis repository uses a GitHub Actions workflow defined in .github/workflows/python-ci.yml to ensure code quality. The workflow automatically:\n\n    Installs project dependencies.\n\n    Downloads the UNSW-NB15 dataset from a private Kaggle repository using encrypted repository secrets for secure authentication.\n\n    Executes the Jupyter notebook to confirm all cells run without errors.\n\nThe workflow is triggered on every push and pull request to the main branch and runs on a daily schedule to monitor for any regressions.\n\nJupyter Notebook Walkthrough\n\nThe unsw_nb15_anomaly_detection.ipynb notebook provides a complete, end-to-end data science pipeline for network anomaly detection. Here's a brief breakdown of what each section of the notebook does:\n\n    Data Loading: It first loads the four UNSW-NB15_*.csv files and merges them into a single pandas DataFrame, providing an initial look at the dataset's raw structure and size.\n\n    Preprocessing: This section prepares the data for modeling by handling categorical features with LabelEncoder and filling any missing values.\n\n    Feature Scaling: The preprocessed data is then scaled using StandardScaler, a crucial step for many machine learning algorithms to ensure all features are on a comparable scale.\n\n    Model Training: An Isolation Forest model is trained on the prepared data. This unsupervised learning algorithm is ideal for anomaly detection, as it works by isolating observations that are distinct from the majority.\n\n    Prediction and Analysis: After training, the notebook uses the model to predict anomalies and adds these predictions to the DataFrame. It then visualizes the results to show the number of detected anomalies.\n\n    Saving the Model: Finally, the notebook uses joblib to save the trained Isolation Forest model and the fitted StandardScaler to .pkl files. This step allows you to reuse the trained model without having to retrain it.\n\nHow to Use\n\nTo run this project locally, follow these steps:\n\n    Clone the repository:\n    Bash\n\ngit clone https://github.com/githubfoam/unsw-nb15-anomaly-detection.git\n\nNavigate to the project directory:\nBash\n\ncd unsw-nb15-anomaly-detection\n\nCreate a virtual environment (recommended):\nBash\n\npython -m venv venv\nsource venv/bin/activate\n\nInstall the required dependencies:\nBash\n\npip install -r requirements.txt\n\nLaunch Jupyter and open the notebook to get started. You can run the cells sequentially to reproduce the entire anomaly detection workflow.\nBash\n\n    jupyter notebook\n\nLicense\n\nThis project is licensed under the MIT License. See the LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubfoam%2Funsw-nb15-anomaly-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgithubfoam%2Funsw-nb15-anomaly-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubfoam%2Funsw-nb15-anomaly-detection/lists"}