https://github.com/qcri/azerg
Artifacts for our paper: From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction
https://github.com/qcri/azerg
mitre-attack stix stix2 threat-intelligence
Last synced: 26 days ago
JSON representation
Artifacts for our paper: From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction
- Host: GitHub
- URL: https://github.com/qcri/azerg
- Owner: qcri
- Created: 2025-08-13T01:09:45.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-09-07T09:39:26.000Z (27 days ago)
- Last Synced: 2025-09-07T11:29:23.593Z (27 days ago)
- Topics: mitre-attack, stix, stix2, threat-intelligence
- Language: Python
- Homepage:
- Size: 16.6 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AZERG: Automating STIX Entity and Relationship Extraction
**AZERG** is a framework for automatically extracting Structured Threat Information Expression (STIX) entities and relationships from unstructured cyber threat intelligence reports. This tool uses fine-tuned language models to assist security analysts in generating STIX-compatible data, streamlining the threat intelligence lifecycle.
The project is detailed in our paper: [From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction](https://arxiv.org/abs/2507.16576).
The models and datasets are available on Hugging Face:
- **Hugging Face Collection:** [QCRI/azerg](https://huggingface.co/collections/QCRI/azerg-687264a76236a362e833d8eb)
- **Dataset:** [QCRI/AZERG-Dataset](https://huggingface.co/datasets/QCRI/AZERG-Dataset)
- **Models:**
- [QCRI/AZERG-MixTask-Mistral](https://huggingface.co/QCRI/AZERG-MixTask-Mistral)
- [QCRI/AZERG-T1-Mistral](https://huggingface.co/QCRI/AZERG-T1-Mistral)
- [QCRI/AZERG-T2-Mistral](https://huggingface.co/QCRI/AZERG-T2-Mistral)
- [QCRI/AZERG-T3-Mistral](https://huggingface.co/QCRI/AZERG-T3-Mistral)
- [QCRI/AZERG-T4-Mistral](https://huggingface.co/QCRI/AZERG-T4-Mistral)## Quickstart
### 1. Download Datasets
First, download the necessary datasets for running inference and evaluation.
```bash
python download_dataset.py
```### 2. Install Dependencies
Install the required Python libraries from `requirements.txt`.
```bash
pip install -r requirements.txt
```
### 3. Run InferenceExecute the `run_inference.py` script to generate predictions from a model. The script saves results in the `./results/{TASK}` directory.
**Usage**:
```bash
python run_inference.py --task --dataset --model_name --api_key
```- ``: `T1`, `T2`, `T3`, or `T4`.
- ``: `azerg` or `annoctr`.
- ``: The model to use for inference (e.g., `QCRI/AZERG-MixTask-Mistral` or `gpt-4o`).
- ``: Your OpenAI-compatible API key.
**Example**:
```bash
python run_inference.py --task T1 --dataset azerg --model_name QCRI/AZERG-MixTask-Mistral
```### 4. Run Evaluation
Use the `evaluate_results.py` script to calculate performance metrics from the generated result files. The script appends a summary to `results.csv`.
**Usage**:
```bash
python evaluate_results.py --task --dataset
```- ``: `T1`, `T2`, `T3`, or `T4`.
- ``: `azerg` or `annoctr`.
**Example**:
```
python evaluate_results.py --task T1 --dataset azerg
```## Citation (to appear in RAID 2025)
If you use AZERG in your research, please cite our paper:
```
@article{lekssays2025azerg,
title={From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction},
author={Lekssays, Ahmed and Sencar, Husrev Taha and Yu, Ting},
journal={arXiv preprint arXiv:2507.16576},
year={2025}
}
```## Issues
Please report any bugs or feature requests by opening an issue on our GitHub repository: [https://github.com/QCRI/azerg/issues](https://github.com/QCRI/azerg/issues).