{"id":16269400,"url":"https://github.com/YJiangcm/FollowBench","last_synced_at":"2025-10-25T08:30:29.652Z","repository":{"id":205970721,"uuid":"712191330","full_name":"YJiangcm/FollowBench","owner":"YJiangcm","description":"Code for \"FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)\"","archived":false,"fork":false,"pushed_at":"2024-12-10T09:04:11.000Z","size":4015,"stargazers_count":90,"open_issues_count":6,"forks_count":12,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-10T09:38:23.810Z","etag":null,"topics":["benchmark","constraints","instruction-following","large-language-models","multi-level"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2310.20410","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YJiangcm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-31T01:21:40.000Z","updated_at":"2024-12-10T09:04:14.000Z","dependencies_parsed_at":"2023-12-27T03:21:55.762Z","dependency_job_id":"1dd9d7b6-e464-4b4a-8262-588c4fd74112","html_url":"https://github.com/YJiangcm/FollowBench","commit_stats":null,"previous_names":["yjiangcm/followbench"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YJiangcm%2FFollowBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YJiangcm%2FFollowBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YJiangcm%2FFollowBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YJiangcm%2FFollowBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YJiangcm","download_url":"https://codeload.github.com/YJiangcm/FollowBench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238105049,"owners_count":19417246,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","constraints","instruction-following","large-language-models","multi-level"],"created_at":"2024-10-10T18:07:58.035Z","updated_at":"2025-10-25T08:30:29.647Z","avatar_url":"https://github.com/YJiangcm.png","language":"Python","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"![](figures/logo.png)\n\n\n[![Github](https://img.shields.io/static/v1?logo=github\u0026style=flat\u0026color=pink\u0026label=github\u0026message=YJiangcm/FollowBench)](https://github.com/YJiangcm/FollowBench)\n[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97-huggingface-yellow)](https://huggingface.co/datasets/YuxinJiang/FollowBench)\n\n\n\n# FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)\n\nWe introduce **FollowBench**, a Multi-level Fine-grained Constraints Following Benchmark for **systemically** and **precisely** evaluate the instruction-following capability of LLMs.\n- **FollowBench** comprehensively includes five different types (i.e., Content, Situation, Style, Format, and Example) of _fine-grained constraints_. \n- To enable a precise constraint following estimation on diverse difficulties, we introduce a _Multi-level_ mechanism that incrementally adds a single constraint to the initial instruction at each increased level. \n- To evaluate whether LLMs' outputs have satisfied every individual constraint, we propose to prompt strong LLMs with _constraint-evolution paths_ to handle challenging open-ended instructions.\n- By evaluating **13** closed-source and open-source popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work.\n\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"figures/overview.png\" width=\"1200\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n## 🔥 Updates\n* 2025/04/22: FollowBench supports [vllm](https://github.com/vllm-project/vllm) for a faster inference.\n* 2024/05/16: We are delighted that FollowBench has been accepted to ACL 2024 main conference!\n* 2024/01/11: We have uploaded the English and Chinese version of FollowBench to [Hugging Face](https://huggingface.co/datasets/YuxinJiang/FollowBench).\n* 2023/12/20: We evaluated Qwen-Chat-72B/14B/7B on FollowBench, check it in [Leaderboard](#leaderboard).\n* 2023/12/15: We released a Chinese version of FolllowBench, check it in [data_zh/](data_zh/).\n* 2023/11/14: We released the second verson of our [paper](https://arxiv.org/abs/2310.20410). Check it out!\n* 2022/11/10: We released the data and code of FollowBench.\n* 2023/10/31: We released the first verson of our [paper](https://arxiv.org/abs/2310.20410v1). Check it out!\n\n\n## 🔍 Table of Contents\n  - [🖥️ Leaderboard](#leaderboard)\n  - [📄 Data of FollowBench](#data-of-followbench)\n  - [⚙️ How to Evaluate on FollowBench](#how-to-evaluate-on-followbench)\n  - [📝 Citation](#citation)\n\n\n\u003ca name=\"leaderboard\"\u003e\u003c/a\u003e\n## 🖥️ Leaderboard\n\n### Metrics\n* **Hard Satisfaction Rate (HSR):** the average rate at which all constraints of individual instructions are fully satisfied\n* **Soft Satisfaction Rate (SSR):** the average satisfaction rate of individual constraints across all instructions\n* **Consistent Satisfaction Levels (CSL):** how many consecutive levels a model can satisfy, beginning from level 1\n\n\n### Level-categorized Results\n#### English\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"figures/Level.png\" width=\"800\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n#### Chinese\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"figures/Level_zh.png\" width=\"800\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n### Constraint-categorized Results\n#### English\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"figures/Category.png\" width=\"500\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n#### Chinese\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"figures/Category_zh.png\" width=\"500\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n\u003ca name=\"data-of-followbench\"\u003e\u003c/a\u003e\n## 📄 Data of FollowBench\nThe data of FollowBench can be found in [data/](data/).\n\nWe also provide a **Chinese version** of FollowBench in [data_zh/](data_zh/).\n\n\n\n\u003ca name=\"how-to-evaluate-on-followbench\"\u003e\u003c/a\u003e\n## ⚙️ How to Evaluate on FollowBench\n\n#### Install Dependencies\n\n```\nconda create -n followbench python=3.10\nconda activate followbench\nconda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia\npip install -r requirements.txt\n```\n\n#### Model Inference\n```bash\ncd FollowBench/\npython code/model_inference_vllm.py --model-path \u003cmodel_name_or_path\u003e\n```\n\n#### LLM-based Evaluation\n```bash\ncd FollowBench/\npython code/llm_eval.py --model_path \u003cmodel_name_or_path\u003e --api_key \u003cyour_own_gpt4_api_key\u003e\n```\n\n#### Merge Evaluation and Save Results \nNext, we conduct **rule-based evaluation** and merge the **rule-based evaluation** results and **LLM-based evaluation** results using the following script:\n```bash\ncd FollowBench/\npython code/eval.py --model_paths \u003ca_list_of_evaluated_models\u003e\n```\nThe final results will be saved in the folder named ```evaluation_result```.\n\n\n\n\u003ca name=\"citation\"\u003e\u003c/a\u003e\n## 📝 Citation\nPlease cite our paper if you use the data or code in this repo.\n```\n@inproceedings{jiang-etal-2024-followbench,\n    title = \"{F}ollow{B}ench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models\",\n    author = \"Jiang, Yuxin  and\n      Wang, Yufei  and\n      Zeng, Xingshan  and\n      Zhong, Wanjun  and\n      Li, Liangyou  and\n      Mi, Fei  and\n      Shang, Lifeng  and\n      Jiang, Xin  and\n      Liu, Qun  and\n      Wang, Wei\",\n    editor = \"Ku, Lun-Wei  and\n      Martins, Andre  and\n      Srikumar, Vivek\",\n    booktitle = \"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    month = aug,\n    year = \"2024\",\n    address = \"Bangkok, Thailand\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.acl-long.257\",\n    pages = \"4667--4688\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYJiangcm%2FFollowBench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYJiangcm%2FFollowBench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYJiangcm%2FFollowBench/lists"}