https://github.com/LiberCoders/FeatureBench
[ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"
https://github.com/LiberCoders/FeatureBench
agentic-coding benchmark large-language-model software-engineering
Last synced: 5 days ago
JSON representation
[ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"
- Host: GitHub
- URL: https://github.com/LiberCoders/FeatureBench
- Owner: LiberCoders
- License: mit
- Created: 2026-01-20T09:35:46.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-03-03T04:07:15.000Z (20 days ago)
- Last Synced: 2026-03-03T08:31:56.636Z (19 days ago)
- Topics: agentic-coding, benchmark, large-language-model, software-engineering
- Language: Python
- Homepage: https://LiberCoders.github.io/FeatureBench/
- Size: 12.5 MB
- Stars: 25
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-agent-experience - FeatureBench - ICLR 2026 benchmark of 200 complex feature-development tasks across 24 OSS repos; top agents achieve only 11% vs 74% on SWE-bench, exposing a gap in real-world feature implementation. (Tools / Benchmarking & Testing)
README
---
FeatureBench is a test-driven data generation and evaluation pipeline for feature-level coding benchmarks.
It provides a unified CLI to run inference, evaluation, and dataset generation.
## 📰 News
🎁 **2026.02.06**: We now support one-click inference for mainstream agent frameworks, including **OpenHands, Claude Code, Codex, Gemini CLI, and mini-swe-agent**. All supported agent frameworks can be found [here](featurebench/infer/agents/). We have also open-sourced the FeatureBench **data pipeline**.
## 🚀 Quickstart
**Prerequisites:**
- [uv](https://docs.astral.sh/uv/getting-started/installation/) for Python environment management
- [docker](https://docs.docker.com/engine/install/) for reproducible builds and evaluation
```bash
# pypi
pip install featurebench
# or uv add featurebench
# local
git clone https://github.com/LiberCoders/FeatureBench.git
cd FeatureBench
uv sync
source .venv/bin/activate
```
**Configure:**
```bash
cp config_example.toml config.toml
```
See [docs/config.md](docs/config.md) for a comprehensive reference (harness, infer, data pipeline) with examples.
**Optional: pre-pull images to reduce network variance:**
```bash
fb pull --mode lite # lite split image list (13 images)
fb pull --mode full # full split image list (24 images)
fb pull --mode /path/to/images.txt # one image name per line
# full list: featurebench/resources/constants/full_images.txt
# lite list: featurebench/resources/constants/lite_images.txt
```
**Run inference:**
```bash
fb infer \
--config-path config.toml \
--agent mini_swe_agent \
--model openai/qwen3-coder-480b-a35b-instruct \
--split lite
```
**Run evaluation:**
```bash
fb eval \
-p runs//output.jsonl \
--split lite
# use -p gold to verify the gold patches
```
## 🧭 CLI Overview
`fb` provides three core commands:
- `fb infer` runs `featurebench.infer.run_infer` (docs: [docs/infer_cli_arg.md](docs/infer_cli_arg.md))
- `fb eval` runs `featurebench.harness.run_evaluation` (docs: [docs/harness_cli_arg.md](docs/harness_cli_arg.md))
- `fb data` runs `featurebench.pipeline` (docs: [docs/pipeline.md](docs/pipeline.md))
## ✍️ Citation
If you found FeatureBench useful, please cite us as:
```bibtex
@article{zhou2026featurebench,
title={FeatureBench: Benchmarking Agentic Coding for Complex Feature Development},
author={Zhou, Qixing and Zhang, Jiacheng and Wang, Haiyang and Hao, Rui and Wang, Jiahe and Han, Minghao and Yang, Yuxue and Wu, Shuzhe and Pan, Feiyang and Fan, Lue and others},
journal={arXiv preprint arXiv:2602.10975},
year={2026}
}
```
## 📧 Contact
If you have any questions, feel free to contact [qixingzhou1125@gmail.com](mailto:qixingzhou1125@gmail.com) or [zjcheng2022@gmail.com](mailto:zjcheng2022@gmail.com).