{"id":20101024,"url":"https://github.com/fkie-cad/ipal_datasets","last_synced_at":"2025-05-06T06:32:55.616Z","repository":{"id":88149066,"uuid":"423843160","full_name":"fkie-cad/ipal_datasets","owner":"fkie-cad","description":"Industrial datasets - datasets for evaluating industrial intrusion detection systems on IPAL.","archived":false,"fork":false,"pushed_at":"2025-04-22T13:55:06.000Z","size":5652,"stargazers_count":43,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-22T14:49:50.668Z","etag":null,"topics":["datasets","electra","elegant","hai","ids","iec-104","ipal","lemay","modbus","s7","swat","wadi"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fkie-cad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-11-02T12:54:53.000Z","updated_at":"2025-04-22T13:54:20.000Z","dependencies_parsed_at":"2023-11-08T13:27:18.241Z","dependency_job_id":"7e7a77cb-47e2-4f95-8580-44b136fc2f4c","html_url":"https://github.com/fkie-cad/ipal_datasets","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2Fipal_datasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2Fipal_datasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2Fipal_datasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2Fipal_datasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fkie-cad","download_url":"https://codeload.github.com/fkie-cad/ipal_datasets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252633796,"owners_count":21779922,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datasets","electra","elegant","hai","ids","iec-104","ipal","lemay","modbus","s7","swat","wadi"],"created_at":"2024-11-13T17:22:57.566Z","updated_at":"2025-05-06T06:32:55.586Z","avatar_url":"https://github.com/fkie-cad.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IPAL - Datasets\n\n\u003cimg src=\"./misc/Logo.png\" alt=\"Logo\" width=\"100\" height=\"auto\" align=\"right\"\u003e\n\nThis repository is part of [IPAL](https://github.com/fkie-cad/ipal)  - an Industrial Protocol Abstraction Layer. IPAL aims to establish an abstract representation of industrial network traffic for subsequent unified and protocol-independent industrial intrusion detection. IPAL consists of a [transcriber](https://github.com/fkie-cad/ipal_transcriber) to automatically translate industrial traffic into the IPAL representation, an [IDS Framework](https://github.com/fkie-cad/ipal_ids_framework) implementing various industrial intrusion detection systems (IIDSs), and a collection of evaluation [datasets](https://github.com/fkie-cad/ipal_datasets). For details about IPAL, please refer to our publications listed down below.\n\nThis repository contains a collection of datasets for evaluating industrial IDS. Therefore, this repository contains scripts to convert (transcribe) existing datasets into IPAL format. It does \u003cu\u003enot\u003c/u\u003e contain the raw datasets nor the datasets transcribed into IPAL. **We merely use placeholders which can be replaced after obtaining the original datasets at the respective publishers** (see link in the table below).\n\n| Dataset                 | Type                | Notes                                                        | Link                                                         |\n| ----------------------- | ------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| BATADAL                   | State | Dataset from the BATtle of the Attack Detection ALgorithms against a Water Distribution Sytem | [BATADAL](http://www.batadal.net/data.html) |\n| ELEGANT                 | Packet (Modbus)     | The ELEGANT dataset consists of a MiTM and a DoS part. Until now we consider only the MiTM dataset and not the DoS dataset. | [IEEE Dataport](https://ieee-dataport.org/open-access/denial-service-and-man-middle-attacks-programmable-logic-controllers) |\n| Electra                 | Packet (Modbus, S7) | Not all IPAL features are present, e.g., crc or length are missing. Also the request data/address fields are not always correct. We skip few duplicated packets. | [Webseite](http://perception.inf.um.es/electra/)             |\n| Energy Dataset          | Packet (IEC-104) | A short PCAP of the WATTSON simulator from Fraunhofer FKIE. We use the manipulateTraces tool from the DTMC IDS paper to add attacks to the WATTSON PCAP. | [Paper](https://dl.acm.org/doi/pdf/10.1145/3372297.3420016), [manipulateTraces](https://github.com/jjchromik/manipulateTraces) [DTMC Paper](https://doi.org/10.1007/978-3-319-74947-1_4) |\n| GeekLounge              | Packet (S7)         | The dataset does not contain any attacks. We added attacks according to the description of a paper. This results in 6 datasets with 3 attacks types each on requests and responses of S7 packets. | [Website ](https://www.netresec.com/?page=PCAP4SICS), [Paper](https://doi.org/10.1007/978-3-319-99843-5_5) |\n| HAI                     | State               | Dataset contains three training and five test files. Train and test are not in linear time order and have overlapping time-regions. | [Github](https://github.com/icsdataset/hai/tree/master/hai-21.03) |\n| IEC61850SecurityDataset | Packet (Goose)      |                                                              | [Github](https://github.com/smartgridadsc/IEC61850SecurityDataset) |\n| Lemay                   | Packet (Modbus)     | Most attacks are not performed with Modbus and use different protocols not relevant for the transcriber. | [Paper](https://www.usenix.org/conference/cset16/workshop-program/presentation/lemay) [Github](https://github.com/antoine-lemay/Modbus_dataset) |\n| MorrisDS1               | State     | There exist different versions of the datset (binary, ternary, or multiclass labels). We use the multi-class dataset. | [Website](https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets) |\n| MorrisDS4               | Packet (Modbus)     | There are minor differences between the Raw and Arff dataset. These differences affect only the attack packets. Default: Use the Arff dataset. | [Website](https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets) |\n| PowerDuck               | Packet (GOOSE)     |  | [Paper](https://doi.org/10.1145/3546096.3546102) |\n| QUT\\_DNP3              | Packet (DNP3, GOOSE)         |  | [Git](https://github.com/qut-infosec/2017QUT_DNP3) [Thesis](https://eprints.qut.edu.au/121760/1/Nicholas_Rodofile_Thesis.pdf) |\n| QUT\\_S7\\_Myers            | Packet (S7).        |  TODO: Check Rules | [Dataset](https://cloudstor.aarnet.edu.au/plus/index.php/s/9qFfeVmfX7K5IDH) [Paper](https://research-repository.griffith.edu.au/bitstream/handle/10072/385711/FOO229943.pdf?sequence=1) |\n| QUT\\_S7comm            | Packet (S7)        |  | [Dataset](https://github.com/qut-infosec/2017QUT_S7comm) [Paper](https://link.springer.com/chapter/10.1007/978-3-319-59870-3_30) |\n| Sherlock (v1)          | State (and IEC-104 |  | Contains three differently sized scenarios of power grids. | [Website](https://sherlock.wattson.it/) [Paper](https://www.comsys.rwth-aachen.de/fileadmin/papers/2025/2025-wagner-sherlock.pdf) |\n| SWaT                    | State               | Attack dataset has a 81s gap which we fill with the previous state. The first 1800s are often skipped in literature. The version 0 of SWaT has a slightly different start of the training data. | [iTrust](https://itrust.sutd.edu.sg/itrust-labs-home/itrust-labs_swat/) |\n| TEP-PASAD               | State               | The dataset consists of 5 different scenarios. Each scenario has its own training and test part combined in one single file. | [Github](https://github.com/mikeliturbe/pasad/tree/master/data) |\n| WADI                    | State               | WADI has a large gap in the training data of ~73h. Note: we use the row number as index for the timestamp since WADI has a challenging time notation. | [iTrust](https://itrust.sutd.edu.sg/itrust-labs-home/itrust-labs_wadi/) |\n| WDT | Packet \u0026 State (Modbus) |  | [Paper](https://doi.org/10.1109/ACCESS.2021.3109465) |\n\n###### Publications\n\n- Konrad Wolsing, Eric Wagner, Antoine Saillard, and Martin Henze. 2022. IPAL: Breaking up Silos of Protocol-dependent and Domain-specific Industrial Intrusion Detection Systems. In 25th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2022), October 26–28, 2022, Limassol, Cyprus. ACM, New York, NY, USA, 17 pages. [https://doi.org/10.1145/3545948.3545968 ](https://doi.org/10.1145/3545948.3545968)\n- Wolsing, Konrad, Eric Wagner, and Martin Henze. \"Poster: Facilitating Protocol-independent Industrial Intrusion Detection Systems.\" *Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security*. 2020 [https://doi.org/10.1145/3372297.3420019](https://doi.org/10.1145/3372297.3420019)\n\n## Getting started\n\nIf you are new to IPAL and want to learn about the general idea or try out our tutorials, please refer to IPAL's main repository: [https://github.com/fkie-cad/ipal](https://github.com/fkie-cad/ipal).\n\n###### Prerequisites\n\nTranscribing the datasets requires the `ipal-transcriber` and `tshark` to be installed (see [IPAL - Transcriber](https://github.com/fkie-cad/ipal_transcriber) and https://tshark.dev/setup/install/).\n\nOn certain operating systems running all available scripts might require additional dependencies. \nEnsure that the following commands are available:\n- [pv](https://www.ivarch.com/programs/pv.shtml)\n- `gzip` and `gunzip` from the [gzip](https://www.gnu.org/software/gzip/) project or an alternative implementation with similar features\n- `bash` from the [Bash](https://www.gnu.org/software/bash/) project\n\n##### Install\n\n- After cloning the repository, initialise Git's submodules with `git submodule init` and `git submodule update`\n\n- To transcribe a dataset into IPAL, one needs to obtain copy of the original datasets, e.g., from the source listed in table above. This dataset needs to be placed under `[dataset-name]/raw/`.\n- Use the `transcribe.sh` or `transcribe.py` scripts to convert the dataset into IPAL. The dataset will be exported to `[datset-name]/ipal`.\n\n## Development\n\n##### Tooling\n\nThe set of tools used for development, code formatting, style checking, and testing can be installed with the following command:\n\n```bash\npython3 -m pip install -r requirements-dev.txt\n```\n\nAll tools can be executed manually with the following commands and report errors if encountered:\n\n```bash\nblack .\nflake8\npython3 -m pytest\n```\n\nA `black` and `flake8` check of modified files before any commit can also be forced using Git's pre-commit hook functionality:\n\n```bash\npre-commit install\n```\n\nMore information on the black and flake8 setup can be found at https://ljvmiranda921.github.io/notebook/2018/06/21/precommits-using-black-and-flake8/\n\n## Contributors\n\n- Konrad Wolsing (Fraunhofer FKIE \u0026 RWTH Aachen University)\n- Sven Zemanek (Fraunhofer FKIE)\n- Dominik Kus (RWTH Aachen University)\n\n## License\n\nMIT License. See LICENSE for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffkie-cad%2Fipal_datasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffkie-cad%2Fipal_datasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffkie-cad%2Fipal_datasets/lists"}