https://github.com/subconsciouscompute/syscall-ids
Intrusion detection pipeline leveraging statistical syscall sequence modeling techniques
https://github.com/subconsciouscompute/syscall-ids
host-based-intrusion-detection-system machine-learning statistical-inference
Last synced: 4 months ago
JSON representation
Intrusion detection pipeline leveraging statistical syscall sequence modeling techniques
- Host: GitHub
- URL: https://github.com/subconsciouscompute/syscall-ids
- Owner: SubconsciousCompute
- License: mit
- Created: 2024-06-07T22:07:27.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-30T23:12:11.000Z (almost 2 years ago)
- Last Synced: 2025-07-30T01:59:32.469Z (11 months ago)
- Topics: host-based-intrusion-detection-system, machine-learning, statistical-inference
- Language: Jupyter Notebook
- Homepage:
- Size: 24.1 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
## Syscall-IDS
Host-based Intrusion Detection System (HIDS) that identifies anomalies in system call traces by leveraging a combination of statistical and machine learning techniques to distinguish between normal (clean) and potentially malicious (infected) behaviors.
This pipeline is currently run offline / post-hoc; it therefore serves to be a practical bound on accuracy and a guide for future research efforts.
View pipeline [here](https://github.com/Vismay-dev/SysCall-IDS/blob/main/notebooks/pipeline.ipynb).
### 🌟 Key Developments
| Technique/Feature | Description |
|-------------------------------------|-----------------------------------------------------------------------------------|
| Feature Engineering | Conversion of syscall info into high-dimensional feature vectors. |
| Probabilistic Syscall Subclustering | Gaussian mixture models for granular syscall behavior understanding. |
| Temporal Dependency Modeling | Markov chains capture transitions between syscall states as a function of time. |
| Buffer Overflow Detection | Gaussian interval of string argument lengths to catch overflow attempts. |
| Pathname Similarity Analysis | Self-organizing maps to visualize and detect anomalies in syscall pathnames. |
| DoS Attack Detection | Markov chain edge frequency analysis per-trace for DoS detection. |
| Segmentation | Suffix-tree based longest repeating substring is used as a segmentation sequence. |
### 📊 Results
Below are the confusion matrices showing the performance of the HIDS pipeline on the Twindroid dataset:
a) **Average-Case Confusion Matrix:**

b) **Best-Case Confusion Matrix:**

### 🎓 References:
- [Liao et al. "Anomaly Detection of System Call Sequence Based on Dynamic
Features and Relaxed-SVM"](https://typeset.io/papers/anomaly-detection-of-system-call-sequence-based-on-dynamic-1oukdqgy)
- [Shamim et al. "Efficient Approach for Anomaly Detection in IoT Using System Calls"](https://www.mdpi.com/1424-8220/23/2/652)
- [Frossi et al. "Selecting and Improving System Call Models for Anomaly Detection"](https://maggi.cc/publication/frossi_hybridsyscalls_2009/frossi_hybridsyscalls_2009.pdf)
- [Android Dataset](https://ieeexplore.ieee.org/document/9796248)
### 🙏 Acknowledgments:
- [Cosma Shalizi's Notes on Markov Chains and Prediction Processes](http://bactra.org/notebooks/prediction-process.html)
- [Columbia CS Dept's Intrusion Detection Pipeline](http://ids.cs.columbia.edu/sites/default/files/smt-syscall-discex01.pdf)
## 📝 License
This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).