https://github.com/Tencent/AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security evaluation benchmark developed by Tencent Wukong Code Security Team.
https://github.com/Tencent/AICGSecEval

agent aigc benchmark codesecurity llm

Last synced: 4 months ago
JSON representation

A.S.E (AICGSecEval) is a repository-level AI-generated code security evaluation benchmark developed by Tencent Wukong Code Security Team.

Host: GitHub
URL: https://github.com/Tencent/AICGSecEval
Owner: Tencent
License: other
Created: 2025-07-08T01:19:12.000Z (12 months ago)
Default Branch: master
Last Pushed: 2026-01-12T06:54:25.000Z (6 months ago)
Last Synced: 2026-01-12T16:34:44.023Z (6 months ago)
Topics: agent, aigc, benchmark, codesecurity, llm
Language: Python
Homepage: https://aicgseceval.tencent.com
Size: 1.49 MB
Stars: 1,110
Watchers: 41
Forks: 100
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-AI-Security - AICGSecEval - repository-level, CVE-grounded tasks; multi-language; run scripts + leaderboard. [arXiv](https://arxiv.org/abs/2508.18106) ([↑](#table-of-contents)Benchmarks <a name="benchmarking"></a> / **Code Security**)
awesome-ai-security - AICGSecEval - _Tencent's comprehensive evaluation benchmark for AI code generation security, covering 10 CWE categories with automated test harness_ (Benchmarks & Evaluations / AI-Assisted Offensive Security)

README

中文 |
English

🚀 Repository-level AI-generated Code Security Evaluation Framework by
「Tencent Wukong Code Security Team」

**A.S.E (AICGSecEval)** provides a **project-level benchmark for evaluating the security of AI-generated code**, designed to assess the security performance of AI-assisted programming by simulating real-world development workflows:
* **Code Generation Tasks** – Derived from real-world GitHub projects and authoritative CVE patches, ensuring both practical relevance and security sensitivity.
* **Code Generation Process** – Automatically extracts project-level code context to accurately simulate realistic AI programming scenarios.
* **Code Security Evaluation** – Integrates a hybrid evaluation suite combining static and dynamic analysis, balancing detection coverage and verification precision to enhance the scientific rigor and practical value of security assessments.

We are committed to building **A.S.E (AICGSecEval)** into an **open, reproducible, and continuously evolving community project**. You are welcome to contribute through **Star**, **Fork**, **Issue**, or **Pull Request** to help expand the dataset and improve the evaluation framework. Your attention and contributions will help **A.S.E** grow, advancing both **industrial adoption** and **academic research** in **AI-generated code security**.

## Table of Contents

- [✨ A.S.E Framework Design](#-ase-framework-design)
- [🧱 2.0 Major Upgrades](#-20-major-upgrades)
- [🚀 Quick Start](#-quick-start)
- [📖 Citation](#-citation)
- [🤝 Contribution Guide](#-contribution-guide)
- [🙏 Acknowledgements](#-acknowledgements)
- [📱 Join the Community](#-join-the-community)
- [📄 License](#-license)

## ✨ A.S.E Framework Design

## 🧱 2.0 Major Upgrades

1️⃣ **Dataset Upgrade – Broader Coverage of Code Generation Vulnerability Scenarios**
Includes key risks from the OWASP Top 10 and CWE Top 25, covering 29 CWE vulnerability types across major programming languages such as C/C++, PHP, Java, Python, and JavaScript.

2️⃣ **Evaluation Target Upgrade – Support for Agentic Programming Tools**
Expands evaluation dimensions to better reflect real-world AI programming scenarios.

3️⃣ **Code Evaluation Upgrade – Static and Dynamic Hybrid Assessment**
Introduces a dynamic evaluation scheme based on test cases and vulnerability PoCs, forming a hybrid assessment framework that balances detection breadth and verification precision, significantly enhancing the scientific rigor and practical value of the evaluation process.

## 🚀 Quick Start

**System Requirements**
| Memory | Disk Space | Python | Docker |
|:------:|:-----------:|:-------:|:--------:|
| Recommended ≥16GB | ≥100GB | ≥3.11 | ≥27 |

**1. Install Python Dependencies**
```
pip install -r requirements.txt
```

**2. Run Evaluation with One Command**
```
# Basic Usage
python3 invoke.py [options...] {--llm | --agent} [llm_options... | agent_options...]

# View all available options
python3 invoke.py -h

# Example: LLM Evaluation
python3 invoke.py \
--llm \
--model_name gpt-4o-2024-11-20 \
--base_url https://api.openai.com/v1/ \
--api_key sk-xxxxxx \
--batch_id v1.0 \
--dataset_path ./data/data_v2.json \
--output_dir ./outputs
--max_workers 1
--github_token xxxxx // If not provided, anonymous cloning will be used, which may be subject to clone rate limiting.

# Example: Agent Evaluation
When running Agent-based evaluations, note that different Agents may require distinct configurations (e.g., model parameters, credentials, or APIs).
The launcher automatically forwards all unrecognized arguments (i.e., those not listed in -h) to the corresponding Agent module for parsing, allowing flexible extension of Agent-specific parameters.

For example, to evaluate Claude Code, run:

python3 invoke.py \
--agent \
--agent_name claude_code \
--batch_id v1.0 \
--dataset_path ./data/data_v2.json \
--claude_api_url https://ai.nengyongai.cn \
--claude_api_key sk-XXXXX \
--claude_model claude-sonnet-4-20250514
--github_token xxxxx // If not provided, anonymous cloning will be used, which may be subject to clone rate limiting.

The --claude_XXX options are parsed and used directly by the Agent evaluation module.
```

**Notes**
1️⃣ A full evaluation may take a long time depending on your hardware. You can adjust --max_workers to increase concurrency and reduce total runtime.
2️⃣ The tool supports automatic checkpoint recovery — if execution is interrupted, simply rerun the command to resume from the last state.

## 📖 Citation

If your research uses or references **A.S.E** or its evaluation results, please cite it as follows:
```bibtex
@misc{lian2025aserepositorylevelbenchmarkevaluating,
title={A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code},
author={Keke Lian and Bin Wang and Lei Zhang and Libo Chen and Junjie Wang and Ziming Zhao and Yujiu Yang and Miaoqian Lin and Haotong Duan and Haoran Zhao and Shuang Liao and Mingda Guo and Jiazheng Quan and Yilu Zhong and Chenhao He and Zichuan Chen and Jie Wu and Haoling Li and Zhaoxuan Li and Jiongchi Yu and Hui Li and Dong Zhang},
year={2025},
eprint={2508.18106},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2508.18106},
}
```

## 🤝 Contribution Guide

**A.S.E** aims to build an **open, reproducible, and continuously evolving ecosystem** for evaluating the security of AI-generated code.
We welcome developers and researchers from academia, industry, and the open-source community to collaborate and contribute to the project.

### Contribution Areas

* 🧠 **Dataset Contribution**：Expand real-world vulnerability samples, enrich SAST tools/rules, and provide code functionality test cases and vulnerability PoCs.
* ⚙️ **Framework Optimization**：Improve code generation logic, evaluation metrics, and context extraction strategies; support Agent integration and code refactoring.
* 💡 **Discussions & Suggestions**：Propose new ideas, co-develop evaluation strategies, or share best practices.
> 💬 Beyond the above, we welcome any form of participation and support, including contributing real-world use cases, providing feedback, improving documentation, or joining community discussions.

### Reference Documents

> 📌 If you plan to contribute, please read the following guides first to understand the data format, submission process, and validation standards.
* 📘 Dataset Contribution Guide
* [Static Dataset Contribute](./docs/static_dataset_contribute.md)
* [Dynamic Dataset Contribute](./docs/dynamic_dataset_contribute.md)
* 📘 [Agent Integration Guide](./docs/agent_contribute.md)

### Community Interaction

* 💭 Report issues or suggestions: via [Issues](https://github.com/Tencent/AICGSecEval/issues)
* 💡 Brainstorm and discuss: join [Discussions](https://github.com/Tencent/AICGSecEval/discussions)

Your engagement and contributions will help A.S.E evolve faster, expand its coverage, and advance the open standardization of AI-generated code security evaluation.

## 🙏 Acknowledgements

A.S.E is collaboratively developed by Tencent Security Platform Department with the following academic partners:

* Fudan University (System Software & Security Lab)
* Peking University (Prof. Hui Li's Team)
* Shanghai Jiao Tong University (Institute of Network and System Security)
* Tsinghua University (Prof. Yujiu Yang's Team)
* Zhejiang University (Asst. Prof. Ziming Zhao's Team)

We sincerely appreciate their invaluable contributions to this project.

**🙌 Contributors**

## 📱 Join the Community

### 🔗 Recommended Security Tools
If you are interested in AI infrastructure security, refer to [A.I.G (AI-Infra-Guard)](https://github.com/Tencent/AI-Infra-Guard), a comprehensive, intelligent, and easy-to-use AI Red Teaming platform developed by Tencent Zhuque Lab.

## 📄 License
This project is open source under the Apache-2.0 License. For more details, please refer to the [License.txt](./License.txt) file.

---

[![Star History Chart](https://api.star-history.com/svg?repos=Tencent/AICGSecEval&type=Date)](https://www.star-history.com/#Tencent/AICGSecEval&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Tencent/AICGSecEval

Awesome Lists containing this project

README

🚀 Repository-level AI-generated Code Security Evaluation Framework by
「Tencent Wukong Code Security Team」

https://github.com/Tencent/AICGSecEval

Awesome Lists containing this project

README

🚀 Repository-level AI-generated Code Security Evaluation Framework by 「Tencent Wukong Code Security Team」

🚀 Repository-level AI-generated Code Security Evaluation Framework by
「Tencent Wukong Code Security Team」