Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wapiti08/utlparser
Toward Unified Temporal Causal Graph Construction with Semantic Log Parser
https://github.com/wapiti08/utlparser
Last synced: 4 days ago
JSON representation
Toward Unified Temporal Causal Graph Construction with Semantic Log Parser
- Host: GitHub
- URL: https://github.com/wapiti08/utlparser
- Owner: Wapiti08
- License: gpl-3.0
- Created: 2023-04-15T15:09:27.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-21T18:06:42.000Z (8 months ago)
- Last Synced: 2024-05-22T16:37:29.810Z (8 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 48.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# UTLParser
![Python](https://img.shields.io/badge/Python3-3.10-brightgreen.svg)
![License](https://img.shields.io/badge/license-MIT3.0-green.svg)
![Testing Environment](https://img.shields.io/badge/macOS-14.2.1-golden.svg)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13918585.svg)](https://doi.org/10.5281/zenodo.13918585)---
Unified Semantic Log Parsing and Causal Graph Construction for Attack Attribution
## Features
- correlate data from multiple sources (network traffic, system/applications/service logs, process execution status)
- automatically recognize log format, and calculate depth and similarity threshold
- extract the entities (obj, sub, action) with depedency relationships from events (both structured and unstructured logs)
- provenance graph construction from multi-source logs
- measure the delay for log fusion
- interfaces for optimized temporal graph query and graph community detection## Structure
- core:
- entity_reco: custom entity extraction from unifited output- graph_create: the module block to build causal graphs
- graph_label: labelling temporal graph
- logparse: multiple log parsers
- pattern: the rule to build unifited output and graph
- eval: benchmark testing
- eval_data: the code to generate evaluation data
- src: the running main interface
- unit_test: the unit testing for core modules
- utils: util functions to support processing
- config: the config file including regexes, defined poi, etc
## Running
- preprepration
```
# avoid python version conflict --- pyenv
brew install pyenv-virtualenv
brew install pyenv
pyenv install 3.10
pyenv global 3.10
pyenv virtualenv 3.10 UTLParser
# activate the environment
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv local UTLParser
pyenv activate UTLParser
pip3 install -r requirements.txt
# download large language library
python -m spacy download en_core_web_lg
```- how to use
```
# single log source processing
python3 main.py -a dns -i /xxx/UTLParser/unit_test/data/dns.log# multiple log sources processing --- fused graph
python3 main.py -f True -al 'dns,error,access,audit'# temporal graph query
python3 main.py -al 'dns,error,access,audit' -t "2022-Jan-15 10:17:01.246000"# assign labels to fused graphs
python3 main.py -l True```
- custom running
- add poi and iocs for custom logs inside config.py
- repeat above steps## Output Format
- IOCs:
Timestamp, Src_IP, Dst_IP, Proto or Application, Domain, PacketSize, ParaPair (tuple)
## Explaination of Dataset
- AIT (fox) --- pure unstructured logs:
- used for intrusion detection systems, federated learning, alert aggregation
- include logs from all hosts, apache, error, authentication, DNS/VPN, audit, network traffic, syslog, system monitoring logs
- ground truth labels for events
- details:
- host log: gather/ host name / logs
- labels directory: labelling information
- rules directory: how the labels are assigned- launched attacks:
- Scans
- Webshell upload --- apache
- password cracking
- privilege escalation --- dnsmasq, apache, audit (internal_server), system.cpu
- remote command execution --- dnsmasq,apache, audit (internal_server), system.cpu
- data exfiltration --- dnsmasq, audit (internal_share),- Sysdig Process:
```
# follow the format like: evt.num, evt.time, evt.cpu, proc.name, thread.tid, evt.dir, evt.type, evt.args
- 123 23:40:09.105899621 3 httpd (28599) > switch next=0 pgft_maj=3 pgft_min=619 vm_size=442720 vm_rss=668 vm_swap=7004
```- IoT23 (structured logs) --- network traffic:
- label information
- attack (part of APT):
indictors that there was some type of attack from the infected device to another host
- C & C (part of APT):
the infected device was connnected to a CC server
- DDoS:
ddos attack is being executed by the infected device
- FileDownload (part of APT):
a file is being downloaded to the infected device
- HeartBeat (periodic similar connections)
packets sent on this connection are used to keep a track on the infected host
- Mirai (botnet)
similar patterns
- Okiru (botnet)
same parameters
- PortScan (part of APT)
- Torii (botnet)
same parameters- related field and its number
- id.resp_h (5) ----> C & C
- id.resp_p (6) ----> Malware, HeartBeat, Port Scan
- conn_state (12) ----> Port Scan- choosen fields to extract features
- ts? -- time series --- dynamic beyasian network
- id.orig_h, id.orig_p, id.resp_h, id.resp_p
- resp_bytes ---- filedownload
- conn_state ---- port scan
- feature analysis? --- other features## Next Plan
- Build Temporal Graph Neural Networks
- reduce the graph size to some extent: suitable for low-memory cost training
- capable of process heterogeneous graph attributes
- capable of capture the changes between temporal graphs
- capable of measuring normal and abnormal behaviour in unsupervised way