https://github.com/cybermetric/CyberMetric?tab=readme-ov-file

CyberMetric dataset
https://github.com/cybermetric/CyberMetric?tab=readme-ov-file

Last synced: 7 months ago
JSON representation

CyberMetric dataset

Host: GitHub
URL: https://github.com/cybermetric/CyberMetric?tab=readme-ov-file
Owner: cybermetric
Created: 2024-02-12T09:17:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-01T20:19:22.000Z (10 months ago)
Last Synced: 2025-01-01T21:25:03.626Z (10 months ago)
Language: Python
Size: 2.73 MB
Stars: 59
Watchers: 2
Forks: 12
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-foundation-model-leaderboards - CyberMetric

README

          # CyberMetric Dataset



    



# Description

The **CyberMetric Dataset** introduces a new benchmarking tool consisting of 10,000 questions designed to evaluate the cybersecurity knowledge of various Large Language Models (LLMs) within the cybersecurity domain. This dataset is created using different LLMs and has been verified by human experts in the cybersecurity field to ensure its relevance and accuracy. The dataset is compiled from various sources including standards, certifications, research papers, books, and other publications within the cybersecurity field.  We provide the dataset in four distinct sizes —small, medium, big and large— comprising 80, 500, 2000 and 10,000 questions, respectively.The smallest version is tailored for comparisons between different LLMs and humans. The CyberMetric-80 dataset has been subject to testing with 30 human participants, enabling an effective comparison between human and machine intelligence.

# Cite

The CyberMetric paper **"CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge"** has been accepted for publication in the 2024 IEEE International Conference on Cyber Security and Resilience (IEEE CSR 2024).

IEEE Xplore link: https://ieeexplore.ieee.org/document/10679494

Cite the paper:

```python

@INPROCEEDINGS{10679494,

  author={Tihanyi, Norbert and Ferrag, Mohamed Amine and Jain, Ridhi and Bisztray, Tamas and Debbah, Merouane},

  booktitle={2024 IEEE International Conference on Cyber Security and Resilience (CSR)}, 

  title={CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge}, 

  year={2024},

  volume={},

  number={},

  pages={296-302},

  keywords={Accuracy;Reverse engineering;Benchmark testing;NIST Standards;Risk management;Problem-solving;Computer security},

  doi={10.1109/CSR61664.2024.10679494}}

```

# Architecture

The CyberMetric dataset was created by applying different language models using Retrieval-Augmented Generation (RAG), with human validation included in the process. The AI-driven generation framework is illustrated in the following figure.



# Prompt 

Most of the evaluated models were instruction-fine-tuned, and the following prompts were used to obtain the correct answers in XML format.

```python

    def make_messages(question, answers):

        """

        Formats a single question+answers into a list of message dictionaries for the pipeline.

        """

        options_str = ', '.join([f"{key}) {value}" for key, value in answers.items()])

        instructions = (

            "You are a helpful AI assistant.\n"

            "Instructions:\n"

            "a. Carefully read the question.\n"

            "b. Choose the correct answer (A, B, C, or D) only.\n"

            "c. Do NOT include any explanation or additional text in the response.\n"

            "d. Always return the answer in this XML format: 'answer'. "

            "For example, if the correct answer is D, then return D.\n\n"

        )

    

        messages = [

            {"role": "system", "content": instructions},

            {"role": "user", "content": f"#Question: {question}\nOptions: {options_str}"}

        ]

        return messages

```

# LLM Leaderboard on CyberMetric Dataset

We have assessed and compared state-of-the-art LLM models using the CyberMetric dataset. The most recent evaluation was conducted on December 27th, 2024.



    



# Usage

We have developed a compact Python script called `CyberMetric_evaluator.py` to showcase how to utilize the Dataset with OpenAI GPT. Simply insert your API key in the script by setting `API_KEY=""`, and then execute the evaluator program.

Here's an example output generated by the script using the CyberMetric-80 dataset:

![output](https://github.com/cybermetric/CyberMetric/assets/159767263/30cdb8c6-b7c7-40e9-b086-48b79c275172)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cybermetric/CyberMetric?tab=readme-ov-file

Awesome Lists containing this project

README