{"id":48289438,"url":"https://github.com/dreizehnutters/pcapae","last_synced_at":"2026-04-04T23:01:37.809Z","repository":{"id":113548618,"uuid":"405400865","full_name":"dreizehnutters/pcapAE","owner":"dreizehnutters","description":"convGRU based autoencoder for unsupervised \u0026 spatial-temporal anomaly detection in computer network (PCAP) traffic.","archived":false,"fork":false,"pushed_at":"2024-02-16T16:31:17.000Z","size":28854,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-02-16T17:49:44.417Z","etag":null,"topics":["anomaly-detection","autoencoder","feature-learning","intrusion-detection","machine-learning","network","pcap","representation-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2205.08953","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dreizehnutters.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-09-11T14:30:09.000Z","updated_at":"2024-02-16T16:24:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"c70315e6-66c8-4c7d-93e6-de960269aa14","html_url":"https://github.com/dreizehnutters/pcapAE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dreizehnutters/pcapAE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreizehnutters%2FpcapAE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreizehnutters%2FpcapAE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreizehnutters%2FpcapAE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreizehnutters%2FpcapAE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dreizehnutters","download_url":"https://codeload.github.com/dreizehnutters/pcapAE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreizehnutters%2FpcapAE/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31418287,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T20:09:54.854Z","status":"ssl_error","status_checked_at":"2026-04-04T20:09:44.350Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","autoencoder","feature-learning","intrusion-detection","machine-learning","network","pcap","representation-learning"],"created_at":"2026-04-04T23:01:31.414Z","updated_at":"2026-04-04T23:01:37.795Z","avatar_url":"https://github.com/dreizehnutters.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cpre\u003e\n  ______  ______  ______  ______     ______  ______ \n /\\  __ \\/\\  ___\\/\\  __ \\/\\  __ \\   /\\  __ \\/\\  ___\\ \n \\ \\  __/\\ \\ \\___\\ \\  __ \\ \\  __/   \\ \\  __ \\ \\  _\\_\n  \\ \\_\\   \\ \\_____\\ \\_\\ \\_\\ \\_\\      \\ \\_\\ \\_\\ \\_____\\ \n   \\/_/    \\/_____/\\/_/\\/_/\\/_/       \\/_/\\/_/\\/_____/\n\u003c/pre\u003e\n\n\n__*Representation Learning for Content-Sensitive Anomaly Detection in Industrial Networks*__\n\nUsing a convGRU-based autoencoder, this thesis proposes a framework to learn spatial-temporal aspects of raw network traffic in an unsupervised and protocol-agnostic manner. The learned representations are used to measure the effect on the results of a subsequent anomaly detection and are compared to the application without the extracted features. The evaluation showed, that the anomaly detection could not effectively be enhanced when applied on compressed traffic fragments for the context of network intrusion detection. Yet, the trained autoencoder successfully generates a compressed representation (code) of the network traffic, which hold spatial and temporal information. Based on the models residual loss, the autoencoder is also capable of detecting anomalies by itself. Lastly, an approach for a kind of model interpretability (LRP) was investigated in order to identify relevant areas within the raw input data, which is used to enrich alerts generated by an anomaly detection method.\n\n_Master thesis submitted on 13.11.2021_\n\n\n[paper -\u003e https://arxiv.org/abs/2205.08953](https://arxiv.org/abs/2205.08953)\n---\n\n![AE](tex/slides/assets/exp.png)\n\n---\n* [__\u003cem\u003eMilestones\u003c/em\u003e__](#milestones)\n* [__\u003cem\u003eFile Structure\u003c/em\u003e__](#file-structure)\n* [__\u003cem\u003eInstall\u003c/em\u003e__](#install)\n\t* [_tshark_](#tshark2620)\n\t* [_python_](#python373)\n\t* [_torch_](#torch171)\n\t* [_Verify Installation_](#verify-installation)\n* [__\u003cem\u003eUsage\u003c/em\u003e__](#usage)\n    * [_Verify Framework_](#test-framework)\n    * [_Jupyter Note Demo_](#jupyter-notebook-demo)\n    * [_pcap2ds.py_](#pcap2dspy)\n    * [_main.py_](#mainpy)\n* [__\u003cem\u003eHyperparameters\u003c/em\u003e__](#hyperparameters)\n* [__\u003cem\u003eTested Hardware\u003c/em\u003e__](#tested-hardware)\n---\n\n## *Milestones*\n* **tex**\n\t* [x] finished paper\n* **talk**\n\t* [x] initial presentation\n\t* [x] TUC presentation\n\t* [x] thesis defense\n* **src**\n\t* [x] pcap -\u003e dataset\n\t* [x] dataloader\n\t* [x] pytorch convGRU AE\n\t* [x] anomaly detection\n\t* [x] evaluation\n\t* [x] visualization\n\t* [x] demo notebooks\n\t* [x] experiment helper script\n\t* [x] LRP\n* **experiments**\n\t* [x] baseline \n\t* [x] SWAT\n\t* [x] VOERDE\n---\n\n\n## *File Structure*\n\u003cpre\u003e\nthesis\n\t├── \u003ca href=\"src/\"\u003esrc\u003c/a\u003e\n\t│   ├── \u003ca href=\"src/lib/\"\u003elib\u003c/a\u003e                          \u003cins\u003e\u003ci\u003epcapAE Framework\u003c/i\u003e\u003c/ins\u003e\n\t│   │   ├── \u003ca href=\"src/lib/CLI.py\"\u003eCLI.py\u003c/a\u003e\u003cb\u003e...................handle argument passing\u003c/b\u003e\n\t│   │   ├── \u003ca href=\"src/lib/H5Dataset.py\"\u003eH5Dataset.py\u003c/a\u003e\u003cb\u003e.............data loading\u003c/b\u003e\n\t│   │   ├── \u003ca href=\"src/lib/ConvRNN.py\"\u003eConvRNN.py\u003c/a\u003e\u003cb\u003e...............convolutional recurrent cell implementation\u003c/b\u003e\n\t│   │   ├── \u003ca href=\"src/lib/decoder.py\"\u003edecoder.py\u003c/a\u003e\u003cb\u003e...............decoder logic\u003c/b\u003e\n\t│   │   ├── \u003ca href=\"src/lib/encoder.py\"\u003eencoder.py\u003c/a\u003e\u003cb\u003e...............encoder logic\u003c/b\u003e\n\t│   │   ├── \u003ca href=\"src/lib/model.py\"\u003emodel.py\u003c/a\u003e\u003cb\u003e.................network orchestration\u003c/b\u003e\n\t│   │   ├── \u003ca href=\"src/lib/pcapAE.py\"\u003epcapAE.py\u003c/a\u003e\u003cb\u003e................framework interface\u003c/b\u003e\n\t│   │   ├── \u003ca href=\"src/lib/earlystopping.py\"\u003eearlystopping.py\u003c/a\u003e\u003cb\u003e.........training heuristic\u003c/b\u003e\n\t│   │   └── \u003ca href=\"src/lib/utils.py\"\u003eutils.py\u003c/a\u003e\u003cb\u003e.................helper functions\u003c/b\u003e\n\t│   └── \u003ca href=\"requirements.txt\"\u003erequirements.txt\u003c/a\u003e\n\t│   │\n\t│   ├── \u003ca href=\"src/ad/\"\u003ead\u003c/a\u003e                           \u003cins\u003e\u003ci\u003escikit-learn AD Framework\u003c/i\u003e\u003c/ins\u003e\n\t│   │\t├── \u003ca href=\"src/ad/blueprints\"\u003eAD model blueprints\u003c/a\u003e\u003cb\u003e\u003c/b\u003e\n\t│   │\t├── \u003ca href=\"src/ad/ad.py\"\u003eAD.py\u003c/a\u003e\u003cb\u003e.. .................AD wrappers\u003c/b\u003e\n\t│   │\t└── \u003ca href=\"src/ad/utils.py\"\u003eutils.py\u003c/a\u003e\u003cb\u003e.................helper functions\u003c/b\u003e\n\t│   │\n\t│   ├── \u003ca href=\"src/main.py\"\u003emain.py\u003c/a\u003e\u003cb\u003e......................framework interaction\u003c/b\u003e\n\t│   ├── \u003ca href=\"src/pcap2ds.py\"\u003epcap2ds.py\u003c/a\u003e\u003cb\u003e...................convert packet captures to \u003ci\u003edatasets\u003c/i\u003e\u003c/b\u003e\n\t│   └── \u003ca href=\"src/test/test_install.py\"\u003etest_install.py\u003c/a\u003e\u003cb\u003e..............rudimentary installation test\u003c/b\u003e\n\t│\n\t├── \u003ca href=\"exp/\"\u003eexp\u003c/a\u003e                              \u003cins\u003e\u003ci\u003eExperiments\u003c/i\u003e\u003c/ins\u003e\n\t│   ├── \u003ca href=\"exp/data.ods\"\u003edata.ods\u003c/a\u003e\u003cb\u003e.....................results\u003c/b\u003e\n\t│   ├── \u003ca href=\"exp/dim_redu.py\"\u003edim_redu.py\u003c/a\u003e\u003cb\u003e..................experiment script\u003c/b\u003e\n\t│   └── \u003ca href=\"exp/exp_wrapper.sh\"\u003eexp_wrapper.sh\u003c/a\u003e\u003cb\u003e...............execute experiments\u003c/b\u003e\n\t│\n\t├── \u003ca href=\"notebooks/\"\u003etest\u003c/a\u003e             \t             \u003cins\u003e\u003ci\u003eJupyter Notebooks\u003c/i\u003e\u003c/ins\u003e\n\t│   ├── \u003ca href=\"notebooks/demo.ipynb\"\u003edemo.ipynb\u003c/a\u003e\u003cb\u003e...................Jupyter Notebook demo\u003c/b\u003e\n\t│   ├── \u003ca href=\"notebooks/visu_tool.ipynb\"\u003evius_tool.ipynb\u003c/a\u003e\u003cb\u003e..............PCAP analysis\u003c/b\u003e\n\t│   └── \u003ca href=\"notebooks/LRP.ipynb\"\u003eLRP.ipynb\u003c/a\u003e\u003cb\u003e....................LRP Heatmap generation\u003c/b\u003e\n\t│\n\t├── \u003ca href=\"tex/\"\u003etex\u003c/a\u003e                              \u003cins\u003e\u003ci\u003eWriting\u003c/i\u003e\u003c/ins\u003e\n\t│   ├── \u003ca href=\"tex/slides/\"\u003eslides\u003c/a\u003e\n\t│   │\t├── \u003ca href=\"tex/slides/init.pdf\"\u003einit.pdf\u003c/a\u003e\u003cb\u003e..................init presentation\u003c/b\u003e\n\t│   │\t├── \u003ca href=\"tex/slides/mid.pdf\"\u003emid.pdf\u003c/a\u003e\u003cb\u003e...................mid presentation\u003c/b\u003e\u003c/b\u003e\n\t│   │\t└── \u003ca href=\"tex/slides/end.pdf\"\u003eend.pdf\u003c/a\u003e\u003cb\u003e...................defense presentation\u003c/b\u003e\u003c/b\u003e\n\t│   ├── \u003ca href=\"tex/main.pdf\"\u003emain.pdf\u003c/a\u003e\u003cb\u003e......................thesis paper\u003c/b\u003e\n\t│   └── \u003ca href=\"tex/papers.bib\"\u003epapers.bib\u003c/a\u003e\u003cb\u003e....................sources\u003c/b\u003e\n\t│\n\t├── \u003ca href=\"LICENSE\"\u003eLICENSE\u003c/a\u003e\n\t└── \u003ca href=\"README.md\"\u003eREADME.md\u003c/a\u003e\n\n\u003c/pre\u003e\n---\n\n\n## *Install*\n- #### tshark\u003e=2.6.20\n```bash\napt-get install tshark capinfos\n```\n\n- #### 3.9\u003epython\u003e=3.7.3\n```bash\napt-get install python3 pip3\npip install -r requirements.txt\n```\n\n- #### torch\u003e=1.7.1\n\tGPU=\u003e10.2\n\t```bash\n\tpip install torch torchvision\n\t```\n\tCPU ONLY\n\t```bash\n\tpip install torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html\n\t```\n\n- #### *Verify Installation*\n```bash\ncd src/test/\n./test_install.sh\n```\n---\n\n## *Usage*\n\n### [Test Framework](src/test/framework_test.sh)\n```bash\n./test_pcapAE.sh [--CUDA]\n```\n\n### [Jupyter Notebook Demo](notebooks/demo.ipynb)\n---\n\n### [__pcap2ds.py__](src/pcap2ds.py)\n\n__[+] generate h5 data set from a set of PCAPs__\n\n\n#### Arguments\n\n|short|long|default|help|\n| :---: | :---: | :---: | :---: |\n|`-h`|`--help`||show this help message and exit|\n|`-p`|`--pcap`|`None`|__path__ to pcap or list of pcap pathes to process|\n|`-o`|`--out`|`None`|__path__ to output dir|\n|`-m`|`--modus`|`None`|gradient decent __strategy__ | [byte | packet | flow]|\n|`-g`|`--ground`|`None`|__path__ to optional evaluation packet level ground truth .csv|\n|`-n`|`--name`|`None`|data set optional __name__|\n|`-t`|`--threads`|`1`|__number__ of threads | __-1__ to use maximum|\n||`--chunk`|`1024`|square number __fragment__ size|\n||`--oneD`||process fragemnts in __one__ dimension|\n||`--force`||__force__ to delete output dir path|\n\n#### Usage\n```bash\npython3 pcap2ds.py -p \u003csome.pcap\u003e -o \u003cout_dir\u003e --chunk 1024 --modus byte [-g \u003cground_truth.csv\u003e]\n```\n---\n\n### [__main.py__](src/main.py)\n\n__[+] pcapAE API wrapper__\n\ntrain an autoencoder with a given h5 data set\n\n\n#### Arguments\n\n|short|long|default|help|options\n| :---: | :---: | :---: | :---: | :---: |\n|`-h`|`--help`||show this help message and exit||\n|`-t`|`--train`||__path__ to dataset to learn||\n|`-v`|`--vali`||__path__ to dataset to validate||\n|`-f`|`--fit`||__path__ to data set to fit AD||\n|`-p`|`--predict`||__path__ to data to make a predict on||\n|`-m`|`--model`||__path__ to model to retrain or evaluate||\n|`-b`|`--batch_size`|`128`|__number__ of samples per pass|[2,32,512,1024]|\n|`-lr`|`--learn_rate`|`0.001`|starting learning __rate__ between | [1,0)\n|`-fi`|`--finput`|`1`|__number__ input frames|[1,3,5]|\n|`-o`|`--optim`|`adamW`|gradient decent __strategy__ | [adamW, adam, sgd]||\n|`-c`|`--clipping`|`10.0`|gradient clip __value__ | [0,10]|\n||`--fraction`|`1`|__fraction__ of data to process | (0, 1]|\n|`-w`|`--workers`|`0`|__number__ of data loader worker threads | [0, 8]|\n||`--loss`|`MSE`|loss __criterion__ | [MSE]|\n||`--scheduler`|`cycle`|learn rate __scheduler__ | [step ; cycle ; plateau]|\n||`--cell`|`GRU`|network cell __type__ | [GRU ; LSTM]|\n||`--epochs`|`144`|__number__ of epochs|\n||`--seed`|`1994`|__seed__ to fixing randomness||\n||`--noTensorboard`||do not start tensorboard||\n||`--cuda`||enable GPU support||\n||`--verbose`||verbose output||\n||`--cache`||cache dataset to GPU||\n||`--retrain`||retrain given model||\n||`--name`||experiment __name__ prefix||\n||`--AD`||use AD framework||\n||`--grid_search`||use AD gridsearch||\n\n#### Usage\n```bash\n# pcapAE training\npython3 main.py --train \u003cTRAIN_SET_PATH\u003e --vali \u003cVALI_SET_PATH\u003e [--cuda]\n\n# pcapAE data compression (pcap -\u003e _codes_)\npython3 main.py --model \u003cPCAPAE_MODEL\u003e --fit \u003cFIT_SET_PATH\u003e --predict \u003cPREDICT_SET_PATH\u003e [--cuda]\n\n# shallow ML anomaly detection training\npython3 main.py --AD --model *.yaml --fit \u003cREDU_FIT_SET_PATH\u003e [--predict \u003cREDU_PREDICT_SET_PATH\u003e] [--grid_search]\n\n# test training AD on new data\npython3 main.py --model \u003cAD_MODLE_PATH\u003e --predict \u003cREDU_SET_PATH\u003e\n\n# naive baseline\npython3 main.py --baseline pcapAE --model \u003cPCAPAE_MODEL\u003e --predict \u003cPREDICT_SET_PATH\u003e\n\n# raw baseline\npython3 main.py --baseline noDL --AD --model ../test/blueprints/base_if.yaml --fit \u003cFIT_SET_PATH\u003e --vali \u003cVALI_SET_PATH\u003e --predict \u003cPREDICT_SET_PATH\u003e \n\n```\n---\n\n\n\n## *Hyperparameters*\n* __Data__\n\t+ dataset = ```[VOERDE, SWaT] ```\n\t+ preprocessing = ```[byte, packet, flow] ```\n\t+ fragment size = ```[16**2, 32**2] ```\n\t+ sequence length = ```[1, 3 ,5] ```\n* __Representation Learning__\n\t+ [optimizer](https://pytorch.org/docs/stable/optim.html) = ```[adamW, adam, SGD] ```\n\t+ [scheduler](https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.OneCycleLR) = ```[step, cycle, plateau] ```\n\t+ [learn rate](https://miro.medium.com/max/2470/1*An4tZEyQAYgPAZl396JzWg.png) = ```[.1, .001] ```\n\t+ [cell](https://miro.medium.com/max/3032/1*yBXV9o5q7L_CvY7quJt3WQ.png) =  ```[GRU, LSTM] ```\n\t+ [loss function](https://en.wikipedia.org/wiki/Loss_function) = ```[MSE, BCE] ```\n\t+ [batch size](https://research.nvidia.com/sites/default/files/publications/adabatch_logo_medium.png) = ```[2, 512, 1024] ```\n\t+ [max epochs](https://miro.medium.com/max/700/1*GXftMdKjyaLYuAIn-nB4zA.png) = ```[144] ```\n---\n\n\n## *Tested Hardware*\n* *Debian 10* | __Intel i5-6200U__ ~200 CUDA Cores\n* *Ubuntu 18.04* | __AMD EPYC 7552__ ~1500 CUDA Cores\n* *Cent OS 7.9* | __GTX 1080__ 2560 CUDA Cores\n* *Ubuntu 20.04* | __RTX 3090__ 10496 CUDA Cores\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdreizehnutters%2Fpcapae","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdreizehnutters%2Fpcapae","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdreizehnutters%2Fpcapae/lists"}