{"id":14970843,"url":"https://github.com/ashton-sidhu/sysmon-extract","last_synced_at":"2025-05-06T20:50:47.260Z","repository":{"id":57473005,"uuid":"265702368","full_name":"Ashton-Sidhu/sysmon-extract","owner":"Ashton-Sidhu","description":"Extract logs based off events from sysmon. Comes as a package, cli and ui.","archived":false,"fork":false,"pushed_at":"2020-05-22T17:53:13.000Z","size":44146,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-16T01:49:21.345Z","etag":null,"topics":["data-science","dataengineering","infosec","spark","streamlit","sysmon","threat-intelligence","threathunting"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ashton-Sidhu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-20T22:49:34.000Z","updated_at":"2020-06-30T12:53:58.000Z","dependencies_parsed_at":"2022-09-26T17:40:46.326Z","dependency_job_id":null,"html_url":"https://github.com/Ashton-Sidhu/sysmon-extract","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ashton-Sidhu%2Fsysmon-extract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ashton-Sidhu%2Fsysmon-extract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ashton-Sidhu%2Fsysmon-extract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ashton-Sidhu%2Fsysmon-extract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ashton-Sidhu","download_url":"https://codeload.github.com/Ashton-Sidhu/sysmon-extract/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252769140,"owners_count":21801373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","dataengineering","infosec","spark","streamlit","sysmon","threat-intelligence","threathunting"],"created_at":"2024-09-24T13:44:13.800Z","updated_at":"2025-05-06T20:50:47.215Z","avatar_url":"https://github.com/Ashton-Sidhu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sysmon Extract\n\nSysmon Extract is a library to extract events from the sysmon log type based off the event id. They can be extracted as a file (any big data format) with support for HDFS or in memory as a Spark or Pandas DataFrame. As a note, this library works best with Spark as it leverages it for the ETL process.\n\nThe library comes with a library, cli and UI.\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n## Table of Contents\n\n- [Usage](#usage)\n  * [Command Line Interface](#command-line)\n  * [UI](#ui)\n  * [Package](#package)\n- [Installation](#installation)\n- [Acknowledgements](#acknowledgments)\n- [Feedback](#feedback)\n\n## Usage\n\n### Command Line\n\n```\nUsage: sysxtract [OPTIONS]\n\nOptions:\n\n  -i, --input-file PATH\n  -h, --header\n  -e, --event TEXT\n  -lc, --log-column TEXT\n  -ec, --event-column TEXT       [default: ]\n  -a, --additional-columns TEXT\n  -o, --output-file TEXT         [default: /home/sidhu/sysmon-extract/sysmon-output.csv]\n  -s, --single-file\n  -m, --master TEXT              [default: local]\n  -ui, --start-ui\n  --help                         Show this message and exit.\n```\n\n`sysxtract -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json -e 1 -e 2 -lc log_name -ec event_data -s -a host.name -o /home/sidhu/output.json`\n\nLet's break it down.\n\n*Input file:* -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json\n\n*Sysmon Events to extract:* -e 1 -e 2\n\n*Column in the dataset that describes the log source (Sysmon, Microsoft Security, Microsoft Audit, etc.):* -lc log_name\n\n*Column in the dataset that contains the nested sysmon data (often event_data):* -ec event_data\n\n*Output as a single file:* -s\n\n*Additional columns to extract:* -a host.name\n\n*Output file name:* /home/sidhu/output.json\n\n### UI\n\n`sysxtract -ui`\n\n![Alt Text](docs/media/ui.gif)\n\n### Package\n\nUsing the example above:\n\n```python\nfrom sysxtract import extract\n\n# Extract to a file\nextract(\n    \"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json\",\n    [1, 2],\n    log_column=\"log_name\",\n    event_column=\"event_data\",\n    additional_columns=\"host.name\",\n    single_file=True,\n    output_file=\"/home/sidhu/output.json\"\n)\n\n# Extract to a file using an existing Spark cluster\nextract(\n    \"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json\",\n    [1, 2],\n    log_column=\"log_name\",\n    event_column=\"event_data\",\n    additional_columns=\"host.name\",\n    single_file=True,\n    output_file=\"/home/sidhu/output.json\",\n    master=\"spark://HOST:PORT\" # mesos://HOST:PORT for yarn/mesos cluster\n)\n\n# Extract to a file using an existing spark session\nextract(\n    \"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json\",\n    [1, 2],\n    log_column=\"log_name\",\n    event_column=\"event_data\",\n    additional_columns=\"host.name\",\n    single_file=True,\n    output_file=\"/home/sidhu/output.json\",\n    spark_sess=spark, # spark session variable, usually named spark\n)\n\n# Extract to a Spark DataFrame\n# NOTE: Must provide an existing Spark Session\nextract(\n    \"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json\",\n    [1, 2],\n    log_column=\"log_name\",\n    event_column=\"event_data\",\n    additional_columns=\"host.name\",\n    single_file=True,\n    spark_sess=spark, # spark session variable, usually named spark\n    as_spark_frame=True\n)\n\n# Extract to a Pandas DataFrame\ndf = extract(\n    \"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json\",\n    [1, 2],\n    log_column=\"log_name\",\n    event_column=\"event_data\",\n    additional_columns=\"host.name\",\n    single_file=True,\n    as_pandas_frame=True\n)\n\n# Extract using SparkDf as input\n# NOTE: Must provide an existing Spark Session\ndf = extract(\n    spark_df,\n    [1, 2],\n    log_column=\"log_name\",\n    event_column=\"event_data\",\n    additional_columns=\"host.name\",\n    single_file=True,\n    as_pandas_frame=True\n)\n\n# Extract using PandasDf as input\n# NOTE: To use a Pandas DataFrame as input and a Spark DataFrame as output, a Spark Session must be provided.\ndf = extract(\n    pandas_df,\n    [1, 2],\n    log_column=\"log_name\",\n    event_column=\"event_data\",\n    additional_columns=\"host.name\",\n    single_file=True,\n    as_pandas_frame=True\n)\n```\n\n## Installation\n\n`pip install sysxtract`\n\nSince this library leverages Spark, specifically PySpark, you need to install it manually. This allows for version compatability when connecting to existing clusters.\n\n`pip install pyspark==$VERSION`.\n\nIf you're going to use spark locally:\n\n`pip install pyspark`\n\n## Acknowledgments\n\nCommunity credits go to:\n\n[@hunters-forge](https://github.com/hunters-forge) for their [openhunt library](https://github.com/hunters-forge/openhunt) library and schema documentation with OSSEM.\n\n## Feedback\n\nI appreciate any feedback so if you have any feature requests or issues make an issue with the appropriate tag or futhermore, send me an email at sidhuashton@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashton-sidhu%2Fsysmon-extract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashton-sidhu%2Fsysmon-extract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashton-sidhu%2Fsysmon-extract/lists"}