{"id":23359906,"url":"https://github.com/ait-aecid/kyoushi-environment","last_synced_at":"2026-01-20T20:34:03.835Z","repository":{"id":41957618,"uuid":"438956136","full_name":"ait-aecid/kyoushi-environment","owner":"ait-aecid","description":"Scripts to deploy virtual testbed for log data analysis and anomaly detection.","archived":false,"fork":false,"pushed_at":"2023-11-21T15:04:40.000Z","size":8757,"stargazers_count":23,"open_issues_count":5,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-07T20:54:25.114Z","etag":null,"topics":["anomaly-detection","cyber-attacks","data-mining","hids","ids","kyoushi","log-data","logs","monitoring","nids","security","simulation"],"latest_commit_sha":null,"homepage":"","language":"Jinja","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ait-aecid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-16T10:54:24.000Z","updated_at":"2025-03-14T02:30:30.000Z","dependencies_parsed_at":"2024-12-21T11:12:29.728Z","dependency_job_id":"5be3b5f1-1e28-49e1-8e54-c14f273c6158","html_url":"https://github.com/ait-aecid/kyoushi-environment","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ait-aecid/kyoushi-environment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ait-aecid%2Fkyoushi-environment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ait-aecid%2Fkyoushi-environment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ait-aecid%2Fkyoushi-environment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ait-aecid%2Fkyoushi-environment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ait-aecid","download_url":"https://codeload.github.com/ait-aecid/kyoushi-environment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ait-aecid%2Fkyoushi-environment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28612993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-20T18:56:40.769Z","status":"ssl_error","status_checked_at":"2026-01-20T18:54:26.653Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","cyber-attacks","data-mining","hids","ids","kyoushi","log-data","logs","monitoring","nids","security","simulation"],"created_at":"2024-12-21T11:12:23.438Z","updated_at":"2026-01-20T20:34:03.816Z","avatar_url":"https://github.com/ait-aecid.png","language":"Jinja","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kyoushi Testbed Environment\nThis tool allows to generate labeled log datasets in simulation testbeds for security evaluations, e.g., IDSs, alert aggregation, or federated learning. \n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://media.githubusercontent.com/media/ait-aecid/kyoushi-environment/main/img/kyoushi_logo.png\" width=25% height=25%\u003e\u003c/p\u003e\n\nThe testbed simulates an enterprise IT network, involving mail servers, file share, firewall, intranet, DMZ, DNS, VPN, etc. Log data is collected from many sources, including network traffic, apache access logs, DNS logs, syslog, authentication logs, audit logs, suricata logs, exim/mail logs, monitoring logs, etc. The Kyoushi testbed was used to generate the following publicly available log datasets:\n\n * [AIT-LDSv1](https://zenodo.org/record/4264796)\n * [AIT-LDSv2](https://zenodo.org/record/5789064)\n * [Kyoushi LDS](https://zenodo.org/record/5779411)\n \n# Overview\n \nThe Kyoushi Testbed comprises a network with three zones: Intranet, DMZ, and Internet. Ubuntu VMs that simulate employees are located in all zones, where remote employees access the Intranet through a VPN connection. Employees utilize the Horde Mail platform, access the WordPress platform, share files, browse the web, and access the servers via SSH, while external users only send and respond to mails. The following figure shows an overview of the network.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://media.githubusercontent.com/media/ait-aecid/kyoushi-environment/main/img/kyoushi_network.png\" width=50% height=50%\u003e\u003c/p\u003e\n \nSeveral attacks are launched against the network from an attacker host. Thereby, the attacker gets access to the infrastructure through stolen VPN credentials. The following attacks are implemented:\n\n * Scans (nmap, WPScan, dirb)\n * Webshell upload (CVE-2020-24186)\n * Password cracking (John the Ripper)\n * Privilege escalation\n * Remote command execution\n * Data exfiltration (DNSteal)\n \n## Getting Started\n\nThis is the main repository for the Kyoushi Testbed Environment that contains all models of the testbed infrastructure; it relies on several other repositories that are responsible for generating testbeds from the models (kyoushi-generator), running user and attacker simulations (kyoushi-simulation), collecting and labeling log data (kyoushi-dataset), etc. The following figure provides a rough overview of how all involved repositories work together.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://media.githubusercontent.com/media/ait-aecid/kyoushi-environment/main/img/kyoushi_repos.png\" width=50% height=50%\u003e\u003c/p\u003e\n\nThe following instructions cover the whole procedure to create a testbed and collect log data from scratch. *Please note*: The Kyoushi Testbed Environment is designed for deployment on cloud infrastructure and will require at least 30 VCPUs, 800 GB of disk space, and 60 GB of RAM. This getting-started relies on OpenStack, Ansible, and Terragrunt, and assumes that the user is experienced with infrastructure/software provisioning. We tested the getting-started in a local OpenStack infrastructure as well as an OVH cloud platform. Note that for OVH deployment, it is necessary to use an account with maximum privileges, deploy the kyoushi environment on a GRA9 project (because GRA9 has some required beta features), and add the project to vracks. In the following we use local installation as default and outline necessary changes for public cloud deployment.\n\nFor the instructions stated in this getting-started, we assume that the following packages are installed in the correct versions:\n\n```\nPython 3.8.5\nPoetry 1.1.7\nTerraform v1.1.7\nterragrunt version v0.31.3\nansible [core 2.11.5]\npacker 1.8.0\n```\n\n### Generating a Testbed from the Models\n\nFirst, switch into a directory named kyoushi and check out the kyoushi-environment (this repository):\n\n```bash\nuser@ubuntu:~$ mkdir kyoushi\nuser@ubuntu:~$ cd kyoushi\nuser@ubuntu:~/kyoushi$ git clone https://github.com/ait-aecid/kyoushi-environment.git\nCloning into 'kyoushi-environment'...\n```\n\nThe kyoushi-environment contains all models of the testbed infrastructure. These models allow to generate many different testbeds that vary in size and configuration. Testbed parameters that are subject to change include IP addresses of hosts, the number of simulated users, as well as their names and behavior profiles. Most of these parameters are set in the `context.yml.j2` file. Here you can specify the number of users hosts (default: 2 internal, 2 remote, and 2 external users), the number of external mail servers (default: 1), and the times when attacks are carried out. For now, set the `kyoushi_attacker_start` and `dnsteal_endtime` variables to some point in time in the near future, e.g., the following day.\n\n```bash\nuser@ubuntu:~/kyoushi$ cat /home/user/kyoushi/kyoushi-environment/model/context.yml.j2\n{% set employees_internal_count = gen.random.randint(2, 2) %}\n{% set employees_remote_count = gen.random.randint(2, 2) %}\n{% set ext_mail_users_count = gen.random.randint(2, 2) %}\n{% set mail_servers_external_count = gen.random.randint(1, 1) %}\nkyoushi_attacker_start: 2021-10-04T{{ (gen.random.randint(9, 14) | string()).zfill(2) }}:{{ (gen.random.randint(0, 59) | string()).zfill(2) }}\nkyoushi_attacker_escalate_start: +P00DT{{ (gen.random.randint(3, 4) | string()).zfill(2) }}H{{ (gen.random.randint(0, 59) | string()).zfill(2) }}M\ndnsteal_endtime: 2021-10-02T{{ (gen.random.randint(9, 18) | string()).zfill(2) }}:{{ (gen.random.randint(0, 59) | string()).zfill(2) }}\n```\n\nThe [kyoushi-generator](https://github.com/ait-aecid/kyoushi-generator) transforms the infrastructure models from the kyoushi-environment into setup scripts that are ready for deployment. Clone the kyoushi-generator as follows and install it using poetry:\n\n```bash\nuser@ubuntu:~/kyoushi$ git clone https://github.com/ait-aecid/kyoushi-generator.git\nCloning into 'kyoushi-generator'...\nuser@ubuntu:~/kyoushi$ cd kyoushi-generator/\nuser@ubuntu:~/kyoushi/kyoushi-generator$ poetry install\nCreating virtualenv cr-kyoushi-generator-PMpTKTKv-py3.8 in /home/user/.cache/pypoetry/virtualenvs\n```\n\nNow you are ready to run the kyoushi-generator. For this, you need to specify the source directory containing the models as well as the destination directory where you want to save the instantiated testbed. Use the following command to save the testbed in the directory called env:\n\n```bash\nuser@ubuntu:~/kyoushi/kyoushi-generator$ poetry run cr-kyoushi-generator apply /home/user/kyoushi/kyoushi-environment/ /home/user/kyoushi/env\nCreated TSM in /home/user/kyoushi/env\nYou can now change to the directory and push TSM to a new GIT repository.\n```\n\nBut what exactly happened there? Let's have a look at an example to understand the transformation from the testbed-independent models (TIM) to the testbed-specific models (TSM). Have a look at the DNS configuration of our testbed. The configuration file `dns.yml.j2` in the kyoushi-environment is a jinja2 template that does not specify several properties, such as the name of the domain or the number of mail servers. On the other hand, the dns.yml in the newly generated `env` directory contains specific values for all these variables. For example, in the following, the network is named mccoy. Note that these variables are randomly selected and therefore change every time you run the kyoushi-generator.\n\n```bash\nuser@ubuntu:~/kyoushi/kyoushi-generator$ cat /home/user/kyoushi/kyoushi-environment/provisioning/ansible/group_vars/all/dns.yml.j2\ndomains:\n  \\var{context.network_name}:\n    id: \\var{context.network_name}\n    server: inet-dns\n    domain: \\var{context.network_domain}\n    ...\nuser@ubuntu:~/kyoushi/kyoushi-generator$ cat /home/user/kyoushi/env/provisioning/ansible/group_vars/all/dns.yml\ndomains:\n  mccoy:\n    id: mccoy\n    server: inet-dns\n    domain: mccoy.com\n    ...\n```\n\nFor more information on the kyoushi-generator, check out the [documentation](https://ait-aecid.github.io/kyoushi-generator/).\n\n### Testbed Deployment\n\nFrom the OpenStack platform where you plan to deploy the testbed you first need to download an RC file that contains all the necessary environment variables. Note that depending on your cloud provider, this step may be different. Save the file locally and source it as follows.\n\n```bash\nuser@ubuntu:~/kyoushi$ source /home/user/openrc.sh\n```\n\nThe Kyoushi testbed is designed for deployment with Consul, so make sure that Consul is available in your infrastructure and that your have a Consul HTTP token with write access for the keystore. There are two main settings that need to be done. First, create an environment variable `CONSUL_HTTP_TOKEN` and point it to your Consul. Second, open the file `/home/user/kyoushi/env/provisioning/terragrunt/terragrunt.hcl` and set the `path` and `address` parameters of the Consul configuration to fit your infrastructure. If you use a public cloud infrastructure such as OVH, you will also have to set the environment variable `TF_VAR_parallelism=1`.\n\nThen, create a key pair and add your key in the `terragrunt.hcl` file. You likely also need to update the `path` parameter in that file. Then apply the changes:\n\n```bash\nuser@ubuntu:~/kyoushi$ cd /home/user/kyoushi/env/provisioning/terragrunt/keys/\nuser@ubuntu:~/kyoushi/env/provisioning/terragrunt/keys$ terragrunt apply\nInitializing modules...\n...\nTerraform has been successfully initialized!\n```\n\nOnce this step is completed, check in your cloud provider that the key was actually uploaded. Next, download the Ubuntu image `bionic-server-cloudimg-amd64.img` from the [Ubuntu cloud images repository](https://cloud-images.ubuntu.com/) and upload it to your cloud provider with the name `kyoushi-ubuntu-bionic` and the format `Raw`. Then create the following flavors:\n\n| Name | VCPUs | Disk space | RAM |\n| ---- | ----- | ---------- | --- |\n| m1.small | 1 | 20 GB      | 2 GB |\n| aecid.d1.small | 1 | 50 GB      | 2 GB |\n| m1.medium | 2 | 40 GB     | 4 GB |\n| m1.xlarge | 8 | 160 GB     | 16 GB |\n\nThen go to the bootstrap directory and configure the settings to fit your virtualization provider if necessary, e.g., the router (default: `kyoushi-router`). If you are using a public cloud, you have to make the following changes:\n* Comment out `host_ext_address_index` and `floating_ip_pool` in `inputs` in `bootstrap/terragrunt.hcl`\n* Add `provider_net_uuid` and set it to the Ext-Net ID from the cloud provider in `inputs` in `bootstrap/terragrunt.hcl`\n* Add `access = false` in `inputs.networks.local` in `bootstrap/terragrunt.hcl`\n* Add `access = true` in `inputs.networks.dmz` in `bootstrap/terragrunt.hcl`\n* Set the source version of module `vmnets` to `git@github.com:ait-cs-IaaS/terraform-openstack-vmnets.git?ref=v1.5.6` in `bootstrap/module/main.tf`\n* Comment out `host_ext_address_index` and `ext_dns` in module `vmnets` in `bootstrap/module/main.tf`\n* Set `router_create = true` in module `vmnets` in `bootstrap/module/main.tf`\n* Set `provider_net_uuid = var.provider_net_uuid` in module `vmnets` in `bootstrap/module/main.tf`\n* Add `access = bool` to the variable networks object in `bootstrap/module/variables.tf`\n* Add the following variable  in `bootstrap/module/variables.tf`\n```bash\nvariable \"provider_net_uuid\" {\n  type        = string\n  description = \"UUID of the provider net\"\n}\n```\n\nThese changes are not necessary when using a local OpenStack platform. Then use terragrunt to deploy the infrastructure as follows.\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/terragrunt/keys$ cd ../bootstrap/\nuser@ubuntu:~/kyoushi/env/provisioning/terragrunt/bootstrap$ terragrunt apply\nInitializing modules...\n```\n\nNow change to the packer directory to create the images for employee hosts and the share. Again, for local cloud instances, no changes are necessary; however, in case that public cloud infrastructures such as OVH are used, it is necessary to carry out the following changes:\n* Set `base_image` to `Ubuntu 18.04` in `employee_image/default.json` and `share_image/default.json` or use any other appropriate image name that is available in the cloud infrastructure\n* Set `network` to the Ext-Net ID where the floating IP pool is provided in `employee_image/default.json` and `share_image/default.json`\n* Comment out `floating_ip_network` in `employee_image/source.pkr.hcl` and `share_image/source.pkr.hcl`\n\nThen you are ready to generate the image for the employee hosts. For this, install the required packages and then run packer as shown in the following.\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/terragrunt/bootstrap$ cd ../../packer/employee_image/playbook/\nuser@ubuntu:~/kyoushi/env/provisioning/packer/employee_image/playbook$ ansible-galaxy install -r requirements.yaml\n...\nuser@ubuntu:~/kyoushi/env/provisioning/packer/employee_image/playbook$ cd ..\nuser@ubuntu:~/kyoushi/env/provisioning/packer/employee_image$ packer build -var-file=default.json .\n...\nerror writing '/tmp/whitespace': No space left on device\n...\nBuild 'openstack.builder' finished after 24 minutes 39 seconds.\n```\n\nNote that some errors stating `No space left on device` may appear during image generation. These errors do not indicate any problems for successfully generating working images; so just wait until the process stops on its own. Now you also need to repeat these commands for creating the share image:\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/packer/employee_image$ cd ../share_image/playbook/\nuser@ubuntu:~/kyoushi/env/provisioning/packer/share_image/playbook$ ansible-galaxy install -r requirements.yaml\nuser@ubuntu:~/kyoushi/env/provisioning/packer/share_image/playbook$ cd ..\nuser@ubuntu:~/kyoushi/env/provisioning/packer/share_image$ packer build -var-file=default.json .\n```\n\nOnce this step is complete, make sure that the images are successfully uploaded and available in the cloud infrastructure. Just as before, some changes are necessary in case that a public cloud infrastructure is used:\n* Commend out `host_address_index` in `hosts/module/01-management.tf`\n* Comment out the output `mgmthost_floating_ip` in `hosts/module/outputs.tf`\n* Set `floating_ip_pool = \"Ext-Net\"` in `hosts/terragrunt.hcl`\n* Set `image` to `Ubuntu 18.04` and set `mail_image` and `ext_mail_image` to `Debian 9` in `hosts/terragrunt.hcl` or use any other appropriate image names that are available in the cloud infrastructure\n* Set `employee_image` and `ext_user_image` to the name of the employee image and set `share_image` to the name of the share image that were generated using packer in the previous step, in `hosts/terragrunt.hcl`\n\nWhile these changes are not necessary on local cloud instances, you still need to make sure that the `terragrunt.hcl` file of the hosts directory fits your infrastructure. Then, apply the changes:\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/packer/share_image$ cd ../../terragrunt/hosts/\nuser@ubuntu:~/kyoushi/env/provisioning/terragrunt/hosts$ terragrunt apply\nInitializing modules...\n```\n\nOnce all virtual machines are up and running, you are ready to setup all services. For this, you need to install all requirements in the requirements.txt and requirements.yml file.\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/terragrunt$ cd ../ansible/\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ source activate\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ pip3 install -r requirements.txt\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ansible-galaxy install -r requirements.yml\nStarting galaxy role install process\n...\n```\n\nAfter installing all requirements, you can run all playbooks that are required for the testbed. The script run_all.sh installs all playbooks one after another, so you can just run the script. In case that one of the playbooks fails, the script will be interrupted. After fixing the error, you may also comment out all playbooks that have successfully been installed to save time. In total, running all playbooks may take several hours.\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ chmod +x run_all.sh\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ./run_all.sh\nPLAY [Fact gathering pre dns server configuration] \nTASK [Gathering Facts] \nok: [ext_user_1]\n...\n```\n\n### Starting the Simulation\n\nAfter all virtual machines are successfully deployed and configured, the simulation is ready to be started. First, start the employee simulations that carry out normal (benign) activities such as sending mails or sharing files. Run the following playbook to start simulations for internal employees (Intranet zone), remote employees (connect through VPN), and external users (Internet zone). \n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ansible-playbook playbooks/run/simulation/main.yml\nPLAY [Start employee simulations] **************************************************************************************\n\nTASK [Clean SM log] ****************************************************************************************************\nskipping: [internal_employee_1]\n...\n```\n\nTo verify that the simulations successfully launched, it is recommended to log into one of the user machines and check the status of the simulation. To access any machine in the testbed, it is necessary to use the management host (mgmthost) as a proxy, e.g., `ssh -J ait@\u003cmgmthost_ip\u003e ait@\u003cemployee_ip\u003e`. Note that the user `ait` is available on all machines. The user simulation runs as a service; its current status can be retrived with the command `service ait.beta_user status`. However, it may be more interesting to actually see what the user is currently doing. For this, check out the `sm.log` as shown in the following; the simulation log file generates new lines when new states are reached or actions are executed.\n\n```bash\nait@internal-employee-1:~$ sudo -i\nroot@internal-employee-1:~# tail /var/log/kyoushi/ait.beta_user/sm.log\n{\"current_state\": \"selecting_activity\", \"level\": \"info\", \"message\": \"Opened horde\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_login_check\", \"timestamp\": 1646304147.4995081, \"transition\": \"horde_go_to_horde\", \"transition_id\": \"38b772e5-24d6-44e1-83ee-8d052deee5ef\"}\n{\"current_state\": \"selecting_activity\", \"level\": \"info\", \"message\": \"Moved to new state\", \"new_state\": \"horde_login_check\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_login_check\", \"timestamp\": 1646304147.5001175, \"transition\": \"horde_go_to_horde\", \"transition_id\": \"38b772e5-24d6-44e1-83ee-8d052deee5ef\"}\n{\"current_state\": \"horde_login_check\", \"level\": \"info\", \"message\": \"Executing transition horde_login_check -\u003e name='horde_logged_in_no' -\u003e target=horde_login_page\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_login_page\", \"timestamp\": 1646304147.5199068, \"transition\": \"horde_logged_in_no\", \"transition_id\": \"fbdf9abb-d56b-4b7c-8e5a-5700dc1110ed\"}\n{\"current_state\": \"horde_login_check\", \"level\": \"info\", \"message\": \"Moved to new state\", \"new_state\": \"horde_login_page\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_login_page\", \"timestamp\": 1646304147.5202732, \"transition\": \"horde_logged_in_no\", \"transition_id\": \"fbdf9abb-d56b-4b7c-8e5a-5700dc1110ed\"}\n{\"current_state\": \"horde_login_page\", \"level\": \"info\", \"message\": \"Executing transition horde_login_page -\u003e name='horde_login' -\u003e target=horde_selecting_menu\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_selecting_menu\", \"timestamp\": 1646304147.5212908, \"transition\": \"horde_login\", \"transition_id\": \"9ff90793-ff61-4597-a517-aa33578eb0ee\"}\n{\"current_state\": \"horde_login_page\", \"level\": \"info\", \"message\": \"Trying valid login\", \"password\": \"X0xVbHXfwZta\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_selecting_menu\", \"timestamp\": 1646304153.3251817, \"transition\": \"horde_login\", \"transition_id\": \"9ff90793-ff61-4597-a517-aa33578eb0ee\", \"username\": \"renee.barnes\"}\n{\"current_state\": \"horde_login_page\", \"level\": \"info\", \"message\": \"Logged in\", \"password\": \"X0xVbHXfwZta\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_selecting_menu\", \"timestamp\": 1646304154.4820826, \"transition\": \"horde_login\", \"transition_id\": \"9ff90793-ff61-4597-a517-aa33578eb0ee\", \"username\": \"renee.barnes\"}\n{\"current_state\": \"horde_login_page\", \"level\": \"info\", \"message\": \"Moved to new state\", \"new_state\": \"horde_selecting_menu\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_selecting_menu\", \"timestamp\": 1646304177.647844, \"transition\": \"horde_login\", \"transition_id\": \"9ff90793-ff61-4597-a517-aa33578eb0ee\"}\n{\"current_state\": \"horde_selecting_menu\", \"level\": \"info\", \"message\": \"Executing transition horde_selecting_menu -\u003e name='horde_nav_preferences' -\u003e target=horde_preferences_page\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_preferences_page\", \"timestamp\": 1646304177.651773, \"transition\": \"horde_nav_preferences\", \"transition_id\": \"7b73236c-f4c4-4e1a-bb0b-1c7b311d6d33\"}\n{\"current_state\": \"horde_selecting_menu\", \"level\": \"info\", \"message\": \"Navigating to Global Preferences\", \"run\": \"e38ce588-4a7f-409d-b419-e30ede64bf79\", \"target\": \"horde_preferences_page\", \"timestamp\": 1646304177.685902, \"transition\": \"horde_nav_preferences\", \"transition_id\": \"7b73236c-f4c4-4e1a-bb0b-1c7b311d6d33\"}\n```\n\nNext, it is necessary to start the attacker simulation. Note that the attacker carries out a sequence of attacks, including scans, remote command execution, password cracking, etc. For more information on the attacks, please refer to the publications stated at the bottom of this page. Run the playbook that starts the simulation as follows.\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ansible-playbook playbooks/run/attacker_takeover/main.yml\nPLAY [Start employee simulations] **************************************************************************************\n\nTASK [Clean SM log] ****************************************************************************************************\nskipping: [attacker_0]\n\nTASK [Start attacker SM] ***********************************************************************************************\nchanged: [attacker_0]\n\nPLAY RECAP *************************************************************************************************************\nattacker_0                 : ok=1    changed=1    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0\n```\n\nAs for the normal user simulations, the attacker simulation runs as a service. To retrieve its status, use the command `service ait.aecid.attacker.wpdiscuz status`. Again, it is worth checking out the statemachine logs. As seen in the following sample, the attacker state machine will be created but wait until the datetime `kyoushi_attacker_start` that was configured earlier is reached. Only then the attack chain is launched.\n\n```bash\nroot@attacker-0:~# tail /var/log/kyoushi/ait.aecid.attacker.wpdiscuz/sm.log\n{\"level\": \"info\", \"message\": \"Created state machine\", \"run\": \"e5a66416-8fbf-4773-b4f4-3173b1b6b4ff\", \"seed\": 672088864, \"timestamp\": 1646296446.5048175}\n```\n\nThe `run_all.sh` script that configured all servers also launched the data exfiltration attack, which is assumed to be already ongoing from the beginning of the simulation and stops at some point in time (in case that it is not starting, make sure that the endtime is set to a future date). Check out the logs of this attack case to ensure that the exfiltration script is actually transferring the data. The logs should appear similar to the following sample:\n\n```bash\nroot@attacker-0:~# tail /var/log/dnsteal.log\n{\"data_length\": 180, \"event\": \"Received data\", \"file\": \"b'2010_invoices.xlsx'\", \"level\": \"info\", \"timestamp\": 1646298257.592706}\n{\"data\": \"b'3x6-.2-.s0sYjwCEihbeKKNdbIOdYlZo6A7EeRg3GTklJq5XPo9bAlWYdiD9Dh8tkuMAj-.1vpJnNwUmtnTNZXSPAF7sPBeqN0nvmS9D4Z79cVp7mO3H*ZSxEQYAIPDASkBw-.2010_invoices.xlsx.lt-22.kelly-livingston.com.'\", \"event\": \"Received data text on server\", \"ip\": \"192.168.87.64\", \"level\": \"debug\", \"port\": 53, \"timestamp\": 1646298257.5934248}\n...\n```\n\nFor more information on the kyoushi-simulation, check out the [documentation](https://ait-aecid.github.io/kyoushi-simulation/).\n\n### Log Data Collection\n\nOnce the simulation is completed (i.e., the attacker simulation has successfully carried out all attacks and the service stopped), it is possible to collect all logs from the testbed. Since copying all logs generates a high amount of traffic and thus unnecessarily bloats the size of the resulting log data set, it is recommended to stop suricata before proceeding. This is accomplished with the following command.\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ansible-playbook playbooks/run/gather/stop_suricata.yml\n```\n\nThen, use the following command to copy all logs to your local system. The playbook will additionally copy relevant facts from the servers, e.g., IP addresses and configurations. When running the playbook, it is necessary to enter the name of the output directory. In the following, the name `out` is used.\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ansible-playbook playbooks/run/gather/main.yml\n[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details\nEnter the gather directory path: out\n\nPLAY [all,!mgmthost] ***************************************************************************************************\n\nTASK [Gathering Facts] *************************************************************************************************\nok: [internal_employee_1]\n...\n```\n\nIn case that some of the files fail to be transferred, this usually means that these files are not available (e.g., error files that are not created when no errors occur). The `out` folder will contain directories for all hosts. Inside `out/\u003chost_name\u003e/logs` are the collected log data:\n\n```bash\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ls playbooks/run/gather/out/\nattacker_0\ncloud_share\next_user_0\next_user_1\nharrisbarnett_mail\ninet-dns\ninet-firewall\ninternal_employee_0\ninternal_employee_1\ninternal_share\nintranet_server\nmail\nmonitoring\nremote_employee_0\nremote_employee_1\nvpn\nwebserver\nuser@ubuntu:~/kyoushi/env/provisioning/ansible$ ls playbooks/run/gather/out/intranet_server/logs/\napache2\naudit\nauth.log\njournal\nsuricata\nsyslog\n```\n\nMoreover, the script extracted server configurations and facts in `out/\u003chost_name\u003e/configs/` and `out/\u003chost_name\u003e/facts.json`. If you just want to use the log data as is and all you need are the attack times (available in `out/attacker_0/logs/ait.aecid.attacker.wpdiscuz/sm.log`), then you are done at this point. In case that you want to apply labeling rules to mark single events according to their corresponding attack step, continue to the next section.\n\n### Log data Labeling\n\nLabeling of log data is accomplished by processing the data in a pipeline that trims the logs according to the simulation time, parses them with logstash, stores them in an elasticsearch database, and queries the log events corresponding to attacker activities with predefined rules. Accordingly, the machine where labeling takes place should have at least 16 GB RAM and the following dependencies installed:\n\n```\nelasticsearch 7.16.2\nlogstash 7.16.2\nTShark (Wireshark) 3.4.8\n```\n\nFurthermore, `http.max_content_length: 400mb` needs to be set in `/etc/elasticsearch/elasticsearch.yml`.\n\nThe main repository for log data labeling is the [kyoushi-dataset](https://github.com/ait-aecid/kyoushi-dataset). Clone and install the repository as follows.\n\n```bash\nuser@ubuntu:~/kyoushi$ git clone https://github.com/ait-aecid/kyoushi-dataset.git\nCloning into 'kyoushi-dataset'...\nuser@ubuntu:~/kyoushi$ cd kyoushi-dataset/\nuser@ubuntu:~/kyoushi/kyoushi-dataset$ poetry install\nCreating virtualenv kyoushi-dataset-L9Pkzr_M-py3.8 in /home/user/.cache/pypoetry/virtualenvs\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/kyoushi-dataset$ cd ..\n```\n\nNow create a new directory where the log data should be processed. However, do not copy the log data directly from the `out` directory; instead, use the following `prepare` command from the kyoushi-dataset to prepare the logs for further processing. Make sure that `-g` points to the gather directory containing the logs, `-p` points to the processing directory of the kyoushi-environment containing the labeling templates, and the attack execution is within `--start` and `--end` (logs outside of this interval are trimmed).\n\n```bash\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi$ mkdir processed\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi$ cd processed\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ cr-kyoushi-dataset prepare -g /home/user/kyoushi/env/provisioning/ansible/playbooks/run/gather/out -p /home/user/kyoushi/kyoushi-environment/datasets/scenario/processing/ --start 2022-03-03T00:00:00 --end 2022-03-04T00:00:00 --name processed\nCreating dataset directory structure ...\nCreating dataset config file ..\nCopying gathered logs and facts into the dataset ...\nCopying the processing configuration into the dataset ...\nDataset initialized in: /home/user/kyoushi/processed\n```\n\nBefore going to the next step, make sure that the elasticsearch database is empty and no legacy files from previous runs exist (this should not apply when kyoushi-dataset is executed for the first time). If such legacy files exist, the kyoushi-generator will get stuck in the following step without error messages or timeout. Therefore, make sure to run the following commands to clear the database and delete existing sincedb files.\n\n```bash\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ curl -XDELETE localhost:9200/_all\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ sudo service elasticsearch restart\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ rm processing/logstash/data/plugins/inputs/file/.sincedb_*\n```\n\nNext, run the `process` command to parse the logs and store them in the elasticsearch database. Depending on the size of your dataset, this step may take several hours. Be aware that several warnings may occur that can be ignored.\n\n```bash\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ cr-kyoushi-dataset process\nRunning pre-processors ...\nExecuting - Ensure processing config directory exists ...\nExecuting - Prepare server list ...\nExecuting - Prepare server facts ...\nExecuting - Ensure attacker config directory exists ...\nExecuting - Extract attacker information ...\nExecuting - Decompress all GZIP logs ...\nExecuting - Convert attacker pcap to elasticsearch json ...\nExecuting - Convert attacker pcap to elasticsearch json ...\nExecuting - Add filebeat index mapping ...\nExecuting - Add auditbeat index mapping ...\nExecuting - Add metricsbeat index mapping ...\nExecuting - Add pcap index mapping ...\nExecuting - Add openvpn index mapping ...\nExecuting - Add auditd ingest pipeline to elasticsearch ...\nExecuting - Setup logstash pipeline ...\nParsing log files ...\n```\n\nOnce this step is complete, all parsed logs are in the database and can therefore be queried with the elastic query language. In addition, all rules have been rendered from the templates into the `rules` directory. Run the following command to execute the query rules and obtain the labels.\n\n```bash\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ cr-kyoushi-dataset label\nApplying rule attacker.escalate.sudo.command ...\nRule attacker.escalate.sudo.command applied labels: ['escalated_command', 'escalated_sudo_command', 'escalate'] to 3 lines.\n...\nStart writing /home/user/kyoushi/processed/gather/intranet_server/logs/auth.log\n```\n\nAgain, this step may take several hours to complete. Note that this process adds the labels to the log events in the database, i.e., you can see the labels assigned to the events if you query the database manually. Accordingly, no labels are assigned when running the `cr-kyoushi-dataset label` again without first clearing the database and running `cr-kyoushi-dataset process` to avoid that the same label is assigned multiple times to the same event. Once the process is finished, you can find the labels in the `labels` directory. For example, have a look at the labels of the `auth.log` file:\n\n```bash\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ cat labels/intranet_server/logs/auth.log \n{\"line\": 145, \"labels\": [\"attacker_change_user\", \"escalate\"], \"rules\": {\"attacker_change_user\": [\"attacker.escalate.su.login\"], \"escalate\": [\"attacker.escalate.su.login\"]}}\n{\"line\": 146, \"labels\": [\"attacker_change_user\", \"escalate\", \"escalated_command\", \"escalated_sudo_command\"], \"rules\": {\"attacker_change_user\": [\"attacker.escalate.su.login\", \"attacker.escalate.systemd.newsession.after\"], \"escalate\": [\"attacker.escalate.su.login\", \"attacker.escalate.systemd.newsession.after\", \"attacker.escalate.sudo.command\"], \"escalated_command\": [\"attacker.escalate.sudo.command\"], \"escalated_sudo_command\": [\"attacker.escalate.sudo.command\"]}}\n{\"line\": 147, \"labels\": [\"attacker_change_user\", \"escalate\"], \"rules\": {\"attacker_change_user\": [\"attacker.escalate.su.login\"], \"escalate\": [\"attacker.escalate.su.login\"]}}\n{\"line\": 148, \"labels\": [\"attacker_change_user\", \"escalate\"], \"rules\": {\"attacker_change_user\": [\"attacker.escalate.systemd.newsession.after\"], \"escalate\": [\"attacker.escalate.systemd.newsession.after\"]}}\n{\"line\": 149, \"labels\": [\"escalated_command\", \"escalated_sudo_command\", \"escalate\"], \"rules\": {\"escalated_command\": [\"attacker.escalate.sudo.command\"], \"escalated_sudo_command\": [\"attacker.escalate.sudo.command\"], \"escalate\": [\"attacker.escalate.sudo.command\"]}}\n{\"line\": 150, \"labels\": [\"escalated_command\", \"escalated_sudo_command\", \"escalate\", \"escalated_sudo_session\"], \"rules\": {\"escalated_command\": [\"attacker.escalate.sudo.command\", \"attacker.escalate.sudo.open\"], \"escalated_sudo_command\": [\"attacker.escalate.sudo.command\", \"attacker.escalate.sudo.open\"], \"escalate\": [\"attacker.escalate.sudo.command\", \"attacker.escalate.sudo.open\"], \"escalated_sudo_session\": [\"attacker.escalate.sudo.open\"]}}\n{\"line\": 151, \"labels\": [\"escalated_command\", \"escalated_sudo_command\", \"escalated_sudo_session\", \"escalate\"], \"rules\": {\"escalated_command\": [\"attacker.escalate.sudo.open\"], \"escalated_sudo_command\": [\"attacker.escalate.sudo.open\"], \"escalated_sudo_session\": [\"attacker.escalate.sudo.open\"], \"escalate\": [\"attacker.escalate.sudo.open\"]}}\n{\"line\": 152, \"labels\": [\"escalated_command\", \"escalated_sudo_command\", \"escalated_sudo_session\", \"escalate\"], \"rules\": {\"escalated_command\": [\"attacker.escalate.sudo.open\"], \"escalated_sudo_command\": [\"attacker.escalate.sudo.open\"], \"escalated_sudo_session\": [\"attacker.escalate.sudo.open\"], \"escalate\": [\"attacker.escalate.sudo.open\"]}}\n```\n\nEach line in the label file corresponds to a specific line in the respective log file, which is referenced by the line number in the field `line`. The field `labels` states the list of labels assigned to the file and the field `rules` states the list of rules that matched the line and assigned the labels. For example, consider the following lines of the `auth.log` file that are marked with aforementioned labels:\n\n```bash\n(kyoushi-dataset-L9Pkzr_M-py3.8) user@ubuntu:~/kyoushi/processed$ sed -n '145,152p' gather/intranet_server/logs/auth.log\nMar  3 11:37:40 intranet-server su[27950]: Successful su for bguerrero by www-data\nMar  3 11:37:40 intranet-server su[27950]: + /dev/pts/1 www-data:bguerrero\nMar  3 11:37:40 intranet-server su[27950]: pam_unix(su:session): session opened for user bguerrero by (uid=33)\nMar  3 11:37:40 intranet-server systemd-logind[957]: New session c1 of user bguerrero.\nMar  3 11:37:58 intranet-server sudo:    bguerrero : TTY=pts/1 ; PWD=/var/www/intranet.williams.mccoy.com/wp-content/uploads/2022/03 ; USER=root ; COMMAND=list\nMar  3 11:38:06 intranet-server sudo:    bguerrero : TTY=pts/1 ; PWD=/var/www/intranet.williams.mccoy.com/wp-content/uploads/2022/03 ; USER=root ; COMMAND=/bin/cat /etc/shadow\nMar  3 11:38:06 intranet-server sudo: pam_unix(sudo:session): session opened for user root by (uid=0)\nMar  3 11:38:06 intranet-server sudo: pam_unix(sudo:session): session closed for user root\n```\n\nFor more information on kyoushi-dataset, check out the [documentation](https://ait-aecid.github.io/kyoushi-dataset/).\n\n## Publications\n\nIf you use the Kyoushi Testbed Environment or any of the generated datasets, please cite the following publications: \n\n* Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): [Maintainable Log Datasets for Evaluation of Intrusion Detection Systems.](https://ieeexplore.ieee.org/abstract/document/9866880) IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. \\[[PDF](https://arxiv.org/pdf/2203.08580.pdf)\\]\n* Landauer M., Skopik F., Wurzenberger M., Hotwagner W., Rauber A. (2021): [Have It Your Way: Generating Customized Log Data Sets with a Model-driven Simulation Testbed.](https://ieeexplore.ieee.org/document/9262078) IEEE Transactions on Reliability, Vol.70, Issue 1, pp. 402-415. IEEE. \\[[PDF](https://www.skopik.at/ait/2020_trel.pdf)\\]\n* Landauer M., Frank M., Skopik F., Hotwagner W., Wurzenberger M., Rauber A. (2022): [A Framework for Automatic Labeling of Log Datasets from Model-driven Testbeds for HIDS Evaluation.](https://dl.acm.org/doi/abs/10.1145/3510547.3517924) Proceedings of the Workshop on Secure and Trustworthy Cyber-Physical Systems, pp. 77-86. ACM. \\[[PDF](https://www.skopik.at/ait/2022_satcps.pdf)\\]\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fait-aecid%2Fkyoushi-environment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fait-aecid%2Fkyoushi-environment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fait-aecid%2Fkyoushi-environment/lists"}