{"id":31911665,"url":"https://github.com/lorenzo9uerra/graphids","last_synced_at":"2025-10-13T17:22:37.064Z","repository":{"id":315728859,"uuid":"988990334","full_name":"lorenzo9uerra/GraphIDS","owner":"lorenzo9uerra","description":"Official repository for the paper \"Self-Supervised Learning of Graph Representations for Network Intrusion Detection\" accepted in NeurIPS 2025","archived":false,"fork":false,"pushed_at":"2025-09-20T10:59:10.000Z","size":21099,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-20T11:32:02.594Z","etag":null,"topics":["anomaly-detection","graph-neural-networks","intrusion-detection","network-security","representation-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lorenzo9uerra.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-23T11:41:02.000Z","updated_at":"2025-09-20T10:59:13.000Z","dependencies_parsed_at":"2025-09-20T11:43:50.196Z","dependency_job_id":null,"html_url":"https://github.com/lorenzo9uerra/GraphIDS","commit_stats":null,"previous_names":["lorenzo9uerra/graphids"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/lorenzo9uerra/GraphIDS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorenzo9uerra%2FGraphIDS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorenzo9uerra%2FGraphIDS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorenzo9uerra%2FGraphIDS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorenzo9uerra%2FGraphIDS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lorenzo9uerra","download_url":"https://codeload.github.com/lorenzo9uerra/GraphIDS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorenzo9uerra%2FGraphIDS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279016299,"owners_count":26085828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","graph-neural-networks","intrusion-detection","network-security","representation-learning"],"created_at":"2025-10-13T17:22:35.399Z","updated_at":"2025-10-13T17:22:37.053Z","avatar_url":"https://github.com/lorenzo9uerra.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![arXiv](https://img.shields.io/badge/arXiv-Preprint-b31b1b.svg)](https://arxiv.org/abs/2509.16625) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n# Self-Supervised Learning of Graph Representations for Network Intrusion Detection\n\nThis repository provides the official code and pretrained models for our paper, accepted at NeurIPS 2025.\n\n\u003cdetails\u003e\n\u003csummary\u003eAbstract\u003c/summary\u003e\nDetecting intrusions in network traffic is a challenging task, particularly under limited supervision and constantly evolving attack patterns. While recent works have leveraged graph neural networks for network intrusion detection, they often decouple representation learning from anomaly detection, limiting the utility of the embeddings for identifying attacks. We propose GraphIDS, a self-supervised intrusion detection model that unifies these two stages by learning local graph representations of normal communication patterns through a masked autoencoder. An inductive graph neural network embeds each flow with its local topological context to capture typical network behavior, while a Transformer-based encoder-decoder reconstructs these embeddings, implicitly learning global co-occurrence patterns via self-attention without requiring explicit positional information. During inference, flows with unusually high reconstruction errors are flagged as potential intrusions. This end-to-end framework ensures that embeddings are directly optimized for the downstream task, facilitating the recognition of malicious traffic. On diverse NetFlow benchmarks, GraphIDS achieves up to 99.98% PR-AUC and 99.61% macro F1-score, outperforming baselines by 5-25 percentage points.\n\u003c/details\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figures/graph_repr.png\" alt=\"Graph representation learning process\" width=\"60%\"\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003eNote: The main branch uses DGL. A PyTorch Geometric (PyG) implementation is available on the \u003ca href=\"https://github.com/lorenzo9uerra/GraphIDS/tree/PyG\"\u003ePyG branch\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\n## Requirements\n\nTo install the requirements run this command:\n\n```bash\nconda env create -f environment.yml\n```\nTo prepare the environment, activate the newly created conda environment and set the environment variables to allow reproducibility. To do so, run these commands:\n```bash\nconda activate dgl\nexport PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True\nexport CUBLAS_WORKSPACE_CONFIG=:4096:8\n```\n\nWe use Weights \u0026 Biases for experiment tracking. For review purposes, W\u0026B is set to offline mode by default and no login is required. All logs will be stored locally. Optionally, you can enable online mode by passing the `--wandb` parameter.\n\nThe datasets can be downloaded from this website: https://staff.itee.uq.edu.au/marius/NIDS_datasets/\n\nAfter downloading each dataset zip file, unzip it with the following command:\n\n```bash\nunzip -d \u003cdataset_name\u003e -j \u003cfilename\u003e.zip\n```\n\nFor example, for the NF-UNSW-NB15-v3 dataset:\n```bash\nunzip -d NF-UNSW-NB15-v3 -j f7546561558c07c5_NFV3DATA-A11964_A11964.zip\n```\n\nNOTE: The authors recently renamed the file for the NF-CSE-CIC-IDS2018-v2 and NF-CSE-CIC-IDS2018-v3 datasets as NF-CICIDS2018-v2 and NF-CICIDS2018-v3.\\\nTo keep a consistent naming convention with the literature, the code expects the dataset directory and the dataset CSV file to be named as one of the 4 considered datasets: `NF-UNSW-NB15-v2`, `NF-UNSW-NB15-v3`, `NF-CSE-CIC-IDS2018-v2`, `NF-CSE-CIC-IDS2018-v3`.\n\n## Training\n\nTo train GraphIDS, run this command:\n\n```bash\npython3 main.py --data_dir \u003cdata_dir\u003e --config configs/\u003cdataset_name\u003e.yaml\n```\n`\u003cdata_dir\u003e` should point to the directory containing all the datasets. The code expects the directory structure found in the zip files (i.e., each CSV file should be located at `\u003cdata_dir\u003e/\u003cdataset_name\u003e/\u003cdataset_name\u003e.csv`). For example, for the following directory structure:\n```tree\ndata/\n└── NF-UNSW-NB15-v3\n    ├── FurtherInformation.txt\n    ├── NF-UNSW-NB15-v3.csv\n    ├── NetFlow_v3_Features.csv\n    ├── bag-info.txt\n    ├── bagit.txt\n    ├── manifest-sha1.txt\n    └── tagmanifest-sha1.txt\nconfigs/\n└── NF-UNSW-NB15-v3.yaml\n```\nYou should run:\n```bash\npython3 main.py --data_dir data/ --config configs/NF-UNSW-NB15-v3.yaml\n```\n\nTo specify different training parameters, you can either modify the configuration file in the `configs/` directory, or provide all parameters using command-line arguments. The full list of possible arguments can be accessed by running the command:\n```bash\npython3 main.py --help\n```\n\n## Evaluation\n\nBy running the command above, the model would also be evaluated after training. However, to only evaluate the model from a saved checkpoint, run the following command:\n\n```bash\npython3 main.py --data_dir \u003cdata_dir\u003e --config configs/\u003cdataset_name\u003e.yaml --checkpoint checkpoints/GraphIDS_\u003cdataset_name\u003e_\u003cseed\u003e.ckpt --test\n```\n\n## Pretrained Models\n\nAs the models are relatively lightweight, the saved checkpoints of the pretrained models can be found directly in this repository under the `pretrained/` directory. To test the pretrained models, run the following command:\n\n```bash\npython3 main.py --data_dir \u003cdata_dir\u003e --config configs/\u003cdataset_name\u003e.yaml --checkpoint pretrained/GraphIDS_\u003cdataset_name\u003e.ckpt --test\n```\nSince the model predictions depend on feature normalization, our code automatically loads the MinMaxScaler from the `scalers/` directory. These scalers were fitted on the training data and must be used consistently for proper evaluation of pretrained models.\n\n## Results\n\nOur model achieves the following performance on the following datasets:\n\n### [NF-UNSW-NB15-v3](https://rdm.uq.edu.au/files/abd2f5d8-e268-4ff0-84fb-f2f7b3ca3e8f)\n\n| Model name         |  Macro F1-score  |  Macro PR-AUC  |\n| ------------------ | ---------------- | -------------- |\n| GraphIDS           |      99.61%      |      99.98%    |\n\n### [NF-CSE-CIC-IDS2018-v3](https://rdm.uq.edu.au/files/4ac221b1-6bd6-42b1-bdf7-03f4fc7efb22)\n\n| Model name         |  Macro F1-score  |  Macro PR-AUC  |\n| ------------------ | ---------------- | -------------- |\n| GraphIDS           |      94.47%      |      88.19%    |\n\n### [NF-UNSW-NB15-v2](https://rdm.uq.edu.au/files/8c6e2a00-ef9c-11ed-827d-e762de186848)\n\n| Model name         |  Macro F1-score  |  Macro PR-AUC  |\n| ------------------ | ---------------- | -------------- |\n| GraphIDS           |      92.64%      |      81.16%    |\n\n### [NF-CSE-CIC-IDS2018-v2](https://rdm.uq.edu.au/files/ce5161d0-ef9c-11ed-827d-e762de186848)\n\n| Model name         |  Macro F1-score  |  Macro PR-AUC  |\n| ------------------ | ---------------- | -------------- |\n| GraphIDS           |      94.31%      |      92.01%    |\n\nThe results are averaged over multiple seeds. \n\n## Citation\nIf you find this work useful, please consider citing our paper:\n```bibtex\n@misc{guerra2025selfsupervisedlearninggraphrepresentations,\n      title={Self-Supervised Learning of Graph Representations for Network Intrusion Detection}, \n      author={Lorenzo Guerra and Thomas Chapuis and Guillaume Duc and Pavlo Mozharovskyi and Van-Tam Nguyen},\n      year={2025},\n      eprint={2509.16625},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2509.16625}, \n}\n```\n\n## License\n\nAll original components of this repository are licensed under the [Apache License 2.0](./LICENSE). Third-party components are used in compliance with their respective licenses.\n\n### Third-Party Code\n- `baselines/Anomal-E/`: Contains modified components from the [Anomal-E repository](https://github.com/waimorris/Anomal-E/), which is licensed under the Apache License 2.0.\n\n- `baselines/SAFE/`: Due to the absence of a formal license in the original SAFE repository, we do **not redistribute its code**. This directory contains instructions and tooling to apply our changes externally. Users must manually download the original SAFE code from its source (https://github.com/ElvinLit/SAFE/). We have received explicit permission from the SAFE authors to use their code for research purposes.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Florenzo9uerra%2Fgraphids","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Florenzo9uerra%2Fgraphids","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Florenzo9uerra%2Fgraphids/lists"}