{"id":13598996,"url":"https://github.com/splunk/attack_data","last_synced_at":"2025-10-11T09:56:36.463Z","repository":{"id":38311392,"uuid":"281495346","full_name":"splunk/attack_data","owner":"splunk","description":"A repository of curated datasets from various attacks","archived":false,"fork":false,"pushed_at":"2025-10-05T18:40:14.000Z","size":67297,"stargazers_count":679,"open_issues_count":2,"forks_count":121,"subscribers_count":36,"default_branch":"master","last_synced_at":"2025-10-05T20:39:18.290Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/splunk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-07-21T20:16:23.000Z","updated_at":"2025-10-04T14:03:46.000Z","dependencies_parsed_at":"2023-10-13T09:03:05.636Z","dependency_job_id":"79f67ee6-d462-4ee0-a374-157a74835080","html_url":"https://github.com/splunk/attack_data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/splunk/attack_data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fattack_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fattack_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fattack_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fattack_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/splunk","download_url":"https://codeload.github.com/splunk/attack_data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fattack_data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006749,"owners_count":26084185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T17:00:58.808Z","updated_at":"2025-10-11T09:56:31.455Z","avatar_url":"https://github.com/splunk.png","language":"Python","readme":"![](environments/static/attack-data-logo.png)\n\nA Repository of curated datasets from various attacks to:\n\n* Easily develop detections without having to build an environment from scratch or simulate an attack.\n* Test detections, specifically [Splunks Security Content](https://github.com/splunk/security-content)\n* [Replay](#replay-datasets-) into streaming pipelines for validating your detections in your production SIEM\n\n# Installation\nNotes:\n* These steps are inteded to be ran on your actual Splunk host/server (not remotely)\n\nGitHub LFS is used in this project. For Mac users git-lfs can be derived with homebrew (for another OS click [here](https://github.com/git-lfs/git-lfs/wiki/Installation)):\n\n````\nbrew install git-lfs\n````\n\nThen you need to install it. I would recommend using the --skip-smudge parameter, which will avoid that all Git LFS files are downloaded during git clone. You can install it with the following command:\n\n````\ngit lfs install --skip-smudge\n````\n\nDownload the repository with this command:\n\n````\ngit clone https://github.com/splunk/attack_data\n````\n\nFetch all or select attack data sets\n\n````\n# This pulls all data - Warning \u003e9Gb of data\ngit lfs pull\n\n# This pulls one data set directory\ngit lfs pull --include=datasets/attack_techniques/T1003.001/atomic_red_team/\n\n# Or pull just one log like this\ngit lfs pull --include=datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log\n\n````\n\n\n# Anatomy of a Dataset 🧬\n### Datasets\nDatasets are defined by a common YML structure. The structure has the following fields:\n\n|field| description|\n|---|---|\n| id | UUID of dataset |\n|name  | name of author  |\n| date  | last modified date  |\n| dataset  | array of URLs where the hosted version of the dataset is located  |\n| description | describes the dataset as detailed as possible |\n| environment |  markdown filename of the environment description see below |\n| technique | array of MITRE ATT\u0026CK techniques associated with dataset |\n| references | array of URLs that reference the dataset |\n| sourcetypes | array of sourcetypes that are contained in the dataset |\n\n\nFor example\n\n```\nid: 405d5889-16c7-42e3-8865-1485d7a5b2b6\nauthor: Patrick Bareiss\ndate: '2020-10-08'\ndescription: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows\n  Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory\n  using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using\n  comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using\n  direct system calls and API unhooking Return value unclear for test T1003.001-6\n  Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7\n  LSASS read with pypykatz '\nenvironment: attack_range\ntechnique:\n- T1003.001\ndataset:\n- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-powershell.log\n- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-security.log\n- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log\n- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-system.log\nreferences:\n- https://attack.mitre.org/techniques/T1003/001/\n- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md\n- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml\nsourcetypes:\n- XmlWinEventLog:Microsoft-Windows-Sysmon/Operational\n- WinEventLog:Microsoft-Windows-PowerShell/Operational\n- WinEventLog:System\n- WinEventLog:Security\n```\n\n\n### Environments\n\nEnvironments are a description of where the dataset was collected. At this moment there are no specific restrictions, although we do have a simple [template](https://github.com/splunk/attack_data/blob/master/environments/TEMPLATE.md) a user can start with here. The most common environment for most datasets will be the [attack_range](https://github.com/splunk/attack_data/blob/master/environments/attack_range.md) since this is the tool that used to generate attack data sets automatically.\n\n# Replay Datasets 📼\nMost datasets generated will be raw log files. There are two main simple ways to ingest it.\n\n### Into Splunk\n\n\n##### using replay.py\npre-requisite, clone, create virtual env and install python deps:\n\n```\ngit clone git@github.com:splunk/attack_data.git\ncd attack_data\npip install virtualenv\nvirtualenv venv\nsource venv/bin/activate\npip install -r bin/requirements.txt\n```\n\n0. Download dataset \n1. configure [`bin/replay.yml`](/bin/replay.yml) \n2. run `python bin/replay.py -c bin/replay.yml`\n\n\n##### using UI\n\n0. Download dataset\n1. In Splunk enterprise , add data -\u003e Files \u0026 Directories -\u003e select dataset\n2. Set the sourcetype as specified in the YML file\n3. Explore your data\n\nSee a quick demo 📺 of this process [here](https://www.youtube.com/watch?v=41NAG0zGg40).\n\n### Into DSP\n\nTo send datasets into DSP the simplest way is to use the [scloud](https://docs.splunk.com/Documentation/DSP/1.1.0/Admin/AuthenticatewithSCloud) command-line-tool as a requirement.\n\n1. Download the dataset\n2. Ingest the dataset into DSP via scloud command `cat attack_data.json | scloud ingest post-events --format JSON\n3. Build a pipeline that reads from the firehose and you should see the events.\n\n# Contribute Datasets 🥰\n\n1. Generate a dataset\n2. Under the corresponding MITRE Technique ID folder create a folder named after the tool the dataset comes from, for example: `atomic_red_Team`\n3. Make PR with \u003ctool_name_yaml\u003e.yml file under the corresponding created folder, upload dataset into the same folder.\n\nSee [T1003.002](datasets/attack_techniques/T1003.003/atomic_red_team/) for a complete example.\n\nNote the simplest way to generate a dataset to contribute is to launch your simulations in the attack_range, or manually attack the machines and when done dump the data using the [dump function](https://github.com/splunk/attack_range#dump-log-data-from-attack-range).\n\nSee a quick demo 📺 of the process to dump a dataset [here](https://www.youtube.com/watch?v=CnD0BtjCILs).\n\nTo contribute a dataset simply create a PR on this repository, for general instructions on creating a PR [see this guide](https://gist.github.com/Chaser324/ce0505fbed06b947d962).\n\n# Automatically generated Datasets ⚙️\n\nThis project takes advantage of automation to generate datasets using the attack_range. You can see details about this service on this [sub-project folder attack_data_service](https://github.com/splunk/attack_data/tree/master/attack_data_service).\n\n## Author\n* [Patrick Bareiß](https://twitter.com/bareiss_patrick)\n* [Jose Hernandez](https://twitter.com/d1vious)\n\n\n## License\n\nCopyright 2023 Splunk Inc.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","funding_links":[],"categories":["Python","Uncategorized","Synopsis"],"sub_categories":["Uncategorized","Table of Contents"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsplunk%2Fattack_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsplunk%2Fattack_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsplunk%2Fattack_data/lists"}