https://github.com/hpai-bsc/turtle
A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)
https://github.com/hpai-bsc/turtle
evaluation-framework rtl
Last synced: 9 months ago
JSON representation
A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)
- Host: GitHub
- URL: https://github.com/hpai-bsc/turtle
- Owner: HPAI-BSC
- License: apache-2.0
- Created: 2025-03-19T11:00:34.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-09T15:18:34.000Z (9 months ago)
- Last Synced: 2025-07-10T14:14:32.176Z (9 months ago)
- Topics: evaluation-framework, rtl
- Language: Python
- Homepage: https://huggingface.co/spaces/HPAI-BSC/TuRTLe-Leaderboard
- Size: 482 KB
- Stars: 19
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
TuRTLe is a framework to assess LLMs across
key RTL generation tasks systematically. It integrates multiple existing benchmarks and automates the evaluation process, enabling a comprehensive assessment of LLM performance in syntax correctness,
functional correctness, synthesis, PPA optimization, and exact line
completion.
This work extends the functionality and flexibility of [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness) with the use of open-source EDA tools to run Specification-to-RTL and RTL Code Completion benchmarks. Furthermore, it is inspired from [vllm-code-harness](https://github.com/iNeil77/vllm-code-harness) to allow an efficient inference with vLLM.
Benchmarks implemented so far are:
- [VerilogEval v2.0](https://github.com/NVlabs/verilog-eval): Specification-to-RTL and Module Completion
- [RTLLM v1.1 and v2.0](https://github.com/hkust-zhiyao/RTLLM): Specification-to-RTL
- [VGen](https://github.com/shailja-thakur/VGen): Module Completion
- [RTL-Repo](https://github.com/AUCOHL/RTL-Repo): Single Line Completion
Open-source EDA tools integrated:
- [Icarus Verilog](https://github.com/steveicarus/iverilog): syntax and functionality
- [Verilator](https://www.veripool.org/verilator/): syntax and functionality
- [Yosys](https://github.com/YosysHQ/yosys): synthesis
- [OpenROAD](https://github.com/The-OpenROAD-Project/OpenROAD): PPA
- [OpenLane](https://github.com/The-OpenROAD-Project/OpenLane): to integrate YoSys and OpenROAD
For more details about our work, refer to our [ArXiv paper](https://arxiv.org/abs/2504.01986). Here you have a diagram of the high-level structure of the framework:

## News
- **[2025-07-03]** TuRTLe now supports Verilator as a simulator to check for Syntax and Functionality
- **[2025-06-12]** We add support for multi-node inference with Ray and the configurations for bigger models
- **[2025-05-19]** The project’s source code is now publicly released. We’d love to hear your feedback, so give it a try!
- **[2025-03-31]** Our paper *"TuRTLe: A Unified Evaluation of LLMs for RTL Generation"* is now available on [ArXiv](https://arxiv.org/abs/2504.01986)!
- **[2025-03-20]** The leaderboard is now live! Check it out on our [Huggingface Space](https://huggingface.co/spaces/HPAI-BSC/TuRTLe-Leaderboard)
## Road Map
- **[In progress]** Release repo compatible with local execution
## Leaderboard 🥇
Check the [TuRTLe Leaderboard](https://huggingface.co/spaces/HPAI-BSC/TuRTLe-Leaderboard) to know the best open-source models for each task.

## Usage
> [!WARNING]
> **Dependencies Notice**
> **vLLM** currently supports up to **Python 3.12**. Ensure that your Python version does not exceed this limit to avoid compatibility issues.
### HPC Environment Requirements
Most of the modes require to be executed in HPC environments. For this reason, TuRTLe currently relies on **Slurm** and **Singularity** for its execution.
### Installation
1. **Clone the repository**:
```bash
git clone --recursive https://github.com/HPAI-BSC/TuRTLe.git
```
2. **(Optional) Create and activate a virtual environment**:
```bash
python3 -m venv venv
source venv/bin/activate
```
3. **Install Python dependencies**:
```bash
pip install -r requirements.txt
```
On non-Linux devices the above command will raise:
```
AssertionError: vLLM only supports Linux platform (including WSL).
```
In this case, vLLM has to be installed from source (see their [installation page](https://docs.vllm.ai/en/stable/getting_started/installation.html) for details).
4. **Install bigcode-evaluation-harness as a pypi package**:
```bash
cd TuRTLe/bigcode-evaluation-harness/
pip install -e .
```
5. **Intall EDA Tools (not required for single line completion benchmarks)**
To install **OpenLane**, follow the instructions provided in the [OpenLane Installation Guide](https://openlane2.readthedocs.io/en/latest/getting_started/installation_overview.html).
To install **ICARUS Verilog** on Windows check the [Icarus Verilog Windows download page](https://bleyer.org/icarus/). To install it on Linux execute:
```bash
sudo apt-get update
sudo apt-get install iverilog
```
Finally, we recommend using Singularity for containerization on HPC environments. TuRTLe can dynamically create and submit Slurm job script. To enable this, include the following settings in your benchmark configuration file:
- **singularity_image**: path to your singularity image.
- For each model, specify a **slurm_config** from `turtle/configs/slurm.yml` with the slurm directives to run the benchmark.
### Quick Demo
Coming soon.
### Running the Project
To execute the project, use the `turtle/run.py` script with the appropriate arguments. Below are the details of the available parameters:
```bash
python turtle/run.py [--benchmark ] [--model ] [--run_all]
```
If the configuration file includes both `singularity_image` and `slurm_config`, TuRTLe will automatically generate and execute a Slurm script to run the benchmark using the specified Singularity image.
#### Core Parameters
- `--benchmark`: Name of the .yml file in `turtle/configs/` with the configurations of the benchmark to run (e.g., `rtlrepo`, `rtllm_v2.0`, `verilog_eval_cc`, `verilog_eval_rtl`, `verigen`).
- `--model`: Specify a particular model to run. If not provided, all models in the configuration file will be executed.
- `--run_all`: Use this flag to run all benchmarks against all models.
#### Additional Parameters
Due to the dual-image setup, one for inference and another including EDA tools (e.g., Icarus Verilog, Verilator, Yosys, OpenLane), you can control each phase of the pipeline separately:
- `--generation_only`: Use this flag to only perform inference.
- `--evaluation_only`: Use this flag to only perform evaluation. We load the generations automatically from the YAML `metric_output_path` variable
#### Examples
1. Run all models specified in the configuration file for the RTL-Repo benchmark:
```bash
python turtle/run.py --benchmark rtlrepo
```
2. Test Qwen2.5-32B against the benchmark VerilogEval Code Completion:
```bash
python turtle/run.py --benchmark verilog_eval_cc --model Qwen2.5-32B
```
3. Run all benchmarks against all models:
```bash
python turtle/run.py --run_all
```
### Add your benchmark
The process to implement a benchmark is very similar to the one described by [bigcode-evaluation-harness guide](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/guide.md). Follow these steps:
1. Copy the `turtle/tasks/template/new_task.py` into `turtle/tasks/` and rename it to the name of your benchmark `.py`.
3. Complete all the TODO comments in the template file.
3. Define a configuration file named `turtle/configs/.yml` and list the models you want to evaluate along with their required parameters.
4. Update the `_load_new_modules()` and `_create_extended_registry()` methods within `turtle/src/utils/task_updater.py`.
## Citation
```
@misc{garciagasulla2025turtleunifiedevaluationllms,
title={TuRTLe: A Unified Evaluation of LLMs for RTL Generation},
author={Dario Garcia-Gasulla and Gokcen Kestor and Emanuele Parisi and Miquel Albert\'i-Binimelis and Cristian Gutierrez and Razine Moundir Ghorab and Orlando Montenegro and Bernat Homs and Miquel Moreto},
year={2025},
eprint={2504.01986},
archivePrefix={arXiv},
primaryClass={cs.AR},
url={https://arxiv.org/abs/2504.01986},
}
```
## How to contribute 🤝
Any contribution is more than welcome! If you've found a bug or have an idea for an improvement, don't hesitate to [open a new issue](https://github.com/HPAI-BSC/TuRTLe/issues) using our issue forms. We also encourage people to do pull requests with new benchmarks of any task relevant for chip design.
## Contact
If you have any questions or feedback, feel free to email us at hpai@bsc.es. You can also support the project by following or starring the repository.
---
**Made with ❤️ by [HPAI](https://hpai.bsc.es/) at the [Barcelona Supercomputing Center (BSC)](https://www.bsc.es/)**