{"id":15683652,"url":"https://github.com/ma7555/nndpi","last_synced_at":"2025-05-07T14:06:44.479Z","repository":{"id":73205768,"uuid":"271124354","full_name":"ma7555/nnDPI","owner":"ma7555","description":"High performance deep packet inspection AI model using neural networks with an embedding layer, 1D Convolution layers and bidirectional gated recurrent unit.","archived":false,"fork":false,"pushed_at":"2022-08-19T06:11:49.000Z","size":44504,"stargazers_count":22,"open_issues_count":0,"forks_count":9,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-07T14:06:22.881Z","etag":null,"topics":["deep-learning","deep-neural-networks","deep-packet-inspection","keras-tensorflow","network-security","tensorflow","tensorflow2"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ma7555.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-09T22:40:07.000Z","updated_at":"2024-08-20T14:46:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"4af42dea-5cbc-4deb-aa3d-dd68ef87201c","html_url":"https://github.com/ma7555/nnDPI","commit_stats":{"total_commits":10,"total_committers":2,"mean_commits":5.0,"dds":"0.30000000000000004","last_synced_commit":"412b04ce9b1e41bee6770854982e20cb5eb7d8db"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ma7555%2FnnDPI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ma7555%2FnnDPI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ma7555%2FnnDPI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ma7555%2FnnDPI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ma7555","download_url":"https://codeload.github.com/ma7555/nnDPI/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252892503,"owners_count":21820648,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-neural-networks","deep-packet-inspection","keras-tensorflow","network-security","tensorflow","tensorflow2"],"created_at":"2024-10-03T17:08:01.690Z","updated_at":"2025-05-07T14:06:44.473Z","avatar_url":"https://github.com/ma7555.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# nnDPI\nThis is the original implementation for the paper [nnDPI: A Novel Deep Packet Inspection Technique Using Word Embedding, Convolutional and Recurrent Neural Networks.](https://ieeexplore.ieee.org/document/9257912)\n\nSchool of Information Technology and Computer Science, Informatics Science Center, Nile University\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3946787.svg)](https://doi.org/10.5281/zenodo.3946787)\n\n## What is nnDPI?\nHigh performance deep packet inspection AI model using neural networks with an embedding layer, 1D Convolution layers and bidirectional gated recurrent unit.\n\n### How was it trained?\n- nnDPI was trained on VPN-nonVPN dataset (ISCXVPN2016) which consists of labeled network traffic, including full packet in pcap or pcapng format.\n- As the dataset is unbalanced, the classes have been weighted and all samples have been used - no downsampling.​\n- The model has been trained using Keras Tensorflow \n- Model has 796,418 trainable parameters only.​\n- ADAM was used as an optimizer, with starting learning rate of 0.001​\n- Learning rate is reduced over training epochs when no more improvements is being achieved.\n- Batch size was set to 3072, which what could be fitted into 2 RTX 2080 GPUs memory, every GPU handles 1536 samples per batch, and then both GPUs add their results together.​\n- 70 epochs (Early stopping)\n\n### Dataset (ISCXVPN2016)\n- Dataset available at: https://www.unb.ca/cic/datasets/vpn.html\n- Uncompressed pcaps size: ~26GB\n- After filtering irrelevant packets, we had ~19.5 million packets for training\n- Stratified Train/Test/Validation splits were taken, with 80%, 10%, 10% respectively.\n\n### What is the dataset packets content?\n- Web Browsing: Firefox and Chrome\n- Email: SMPTS, POP3S and IMAPS\n- Chat: ICQ, AIM, Skype, Facebook and Hangouts\n- Streaming: Vimeo and Youtube\n- File Transfer: Skype, FTPS and SFTP using Filezilla and an external service\n- VoIP: Facebook, Skype and Hangouts voice calls (1h duration)\n- P2P: uTorrent and Transmission (Bittorrent)\n\n### Which architecture was used?\nnnDPI uses a mix of neural network layers, including:\n- Word Embedding\n- 1D Convolutions\n- Batch Normalization\n- Max Pooling\n- RNN (Bidirectional GRU)\n- Dense\n\n### How to preprocess packets?\n```\npython nndpi_preprocessing.py --n_jobs=1 --pcap_dir=./CompletePCAPs/ --processed_pcap_dir=./ProcessedPackets/ --max_len=1500 --one_df=False\n```\n\n- set `pcap_dir` to the location of original pcap files, leave default if you would put the files in the CompletePCAPs dir.\n- set `processed_pcap_dir` to the location to save the newely preprocessed dataframes, leave default to keep files in the ProcessedPackets dir\n- set `n_jobs` to the number of workers to parallelize the process. (-1 to run workers on all cores) or to 1 if you want to run as a sequential process.\n- set `max_len` to the preprocessed packet length in bytes\n- set `one_df` to True if you want to save all the processed packets into a single dataframe. (Be careful, expensive RAM usage)\n\nWe have included a small sample of the dataset in this repo,  to get the full dataset please refer to the link above and download it.\nWe have also included the processed packets for this sample under ProcessedPackets dir for review.\n\n### How to train the model?\n```\npython nndpi_train.py --multi_gpu=True --batch_size=3072 --max_len=1500\n```\n- set `multi_gpu` to True if you want to parallelize training on a multi gpu system.\n- set `batch_size` to the training batch_size\n- set `max_len` to the packet length specified during preprocessing\n\nTraining takes time and resources. This model was trained on a multi-gpu system with 64GB of RAM so there was no need to use generators as all data could fit into the memory.\nIf you have limited RAM, you might need to either use a generator or take a sample of the data.\n\n### How to use the model?\nThe model is available under `./Model` dir in '.h5' format, you can load it with TF Keras.\n\nWith older TF versions, you might have problems loading the model directly as it was created with a multi gpu strategy, if so, please create the model architecture first with the `create_model` function then use `model.load_weights`\n\n### DISCLAIMER\nWhile we do our best to detect network traffic types from a single captured packed, we cannot guarantee that our software is error free and 100% accurate in traffic detection. \nPlease respect the privacy of users and make sure you have proper authorization to listen, capture and inspect network traffic.\n\n## CITATION\nIf you use this code for a paper, please cite:\n```\n@INPROCEEDINGS{9257912,\n  author={Bahaa, Mahmoud and Aboulmagd, Ayman and Adel, Khaled and Fawzy, Hesham and Abdelbaki, Nashwa},\n  booktitle={2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES)}, \n  title={nnDPI: A Novel Deep Packet Inspection Technique Using Word Embedding, Convolutional and Recurrent Neural Networks}, \n  year={2020},\n  volume={},\n  number={},\n  pages={165-170},\n  doi={10.1109/NILES50944.2020.9257912}}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fma7555%2Fnndpi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fma7555%2Fnndpi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fma7555%2Fnndpi/lists"}