Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cybermetric/CyberMetric?tab=readme-ov-file
CyberMetric dataset
https://github.com/cybermetric/CyberMetric?tab=readme-ov-file
Last synced: about 1 month ago
JSON representation
CyberMetric dataset
- Host: GitHub
- URL: https://github.com/cybermetric/CyberMetric?tab=readme-ov-file
- Owner: cybermetric
- Created: 2024-02-12T09:17:03.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-06-06T11:39:04.000Z (6 months ago)
- Last Synced: 2024-06-06T12:58:49.531Z (6 months ago)
- Language: Python
- Size: 4.72 MB
- Stars: 26
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-foundation-model-leaderboards - CyberMetric
README
# CyberMetric Dataset
# Description
The **CyberMetric Dataset** introduces a new benchmarking tool consisting of 10,000 questions designed to evaluate the cybersecurity knowledge of various Large Language Models (LLMs) within the cybersecurity domain. This dataset is created using different LLMs and has been verified by human experts in the cybersecurity field to ensure its relevance and accuracy. The dataset is compiled from various sources including standards, certifications, research papers, books, and other publications within the cybersecurity field. We provide the dataset in four distinct sizes —small, medium, big and large— comprising 80, 500, 2000 and 10,000 questions, respectively.The smallest version is tailored for comparisons between different LLMs and humans. The CyberMetric-80 dataset has been subject to testing with 30 human participants, enabling an effective comparison between human and machine intelligence.
# Cite
The CyberMetric paper **"CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge"** has been accepted for publication in the 2024 IEEE International Conference on Cyber Security and Resilience (IEEE CSR 2024).
IEEE Xplore link: https://ieeexplore.ieee.org/document/10679494
Cite the paper:
```python
@INPROCEEDINGS{10679494,
author={Tihanyi, Norbert and Ferrag, Mohamed Amine and Jain, Ridhi and Bisztray, Tamas and Debbah, Merouane},
booktitle={2024 IEEE International Conference on Cyber Security and Resilience (CSR)},
title={CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge},
year={2024},
volume={},
number={},
pages={296-302},
keywords={Accuracy;Reverse engineering;Benchmark testing;NIST Standards;Risk management;Problem-solving;Computer security},
doi={10.1109/CSR61664.2024.10679494}}
```# Architecture
The CyberMetric dataset was created by applying different language models using Retrieval-Augmented Generation (RAG), with human validation included in the process. The AI-driven generation framework is illustrated in the following figure.
# LLM Leaderboard on CyberMetric Dataset
We have evaluated and compared 25 state-of-the-art LLM models on the CyberMetric dataset
# Usage
We have developed a compact Python script called `CyberMetric_evaluator.py` to showcase how to utilize the Dataset with OpenAI GPT. Simply insert your API key in the script by setting `API_KEY=""`, and then execute the evaluator program.
Here's an example output generated by the script using the CyberMetric-80 dataset:
![output](https://github.com/cybermetric/CyberMetric/assets/159767263/30cdb8c6-b7c7-40e9-b086-48b79c275172)