https://github.com/thinkwee/awesome-llm-if

An Awesome List to LLM Instruction Following
https://github.com/thinkwee/awesome-llm-if

Last synced: 2 months ago
JSON representation

An Awesome List to LLM Instruction Following

Host: GitHub
URL: https://github.com/thinkwee/awesome-llm-if
Owner: thinkwee
Created: 2023-05-25T03:43:38.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-10-21T05:55:19.000Z (8 months ago)
Last Synced: 2025-03-30T00:01:41.616Z (3 months ago)
Homepage:
Size: 190 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

ultimate-awesome - awesome-llm-if - An Awesome List to LLM Instruction Following. (Other Lists / Julia Lists)

README

        


  



Excellent **IF (Instruction Following)** capabilities are the foundation for building complex applications (such as [Tool Usage](https://github.com/thunlp/ToolLearningPapers) or [Multi-Agent System](https://thinkwee.top/multiagent_ebook/)) based on LLMs. This repository aims to provide a comprehensive list of papers, repositories, and other resources related to improving, evaluating, benchmarking, and theoretically analyzing instruction-following capabilities, in order to advance research in this field.

The repository is still under active construction, and we welcome everyone to collaborate and contribute!

# Method

- [DO LLMS “KNOW” INTERNALLY WHEN THEY FOLLOW INSTRUCTIONS?](https://arxiv.org/pdf/2410.14516)

  - Cambridge, Apple

  - In submission to ICLR 2025

- [SELF-PLAY WITH EXECUTION FEEDBACK: IMPROVING INSTRUCTION-FOLLOWING CAPABILITIES OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2406.13542)

  - Alibaba

  - [AutoIF](https://github.com/QwenLM/AutoIF)  ![](https://img.shields.io/github/stars/QwenLM/AutoIF.svg)

- [LESS: Selecting Influential Data for Targeted Instruction Tuning](https://arxiv.org/pdf/2402.04333)

  - Princeton University, University of Washington

  - ICML 2024

  - [LESS](https://github.com/princeton-nlp/less)  ![](https://img.shields.io/github/stars/princeton-nlp/less.svg)

- [WizardLM: Empowering Large Language Models to Follow Complex Instructions](https://arxiv.org/pdf/2304.12244)

  - Microsoft, Peking University

  - ICLR 2024

  - [WizardLM](https://github.com/nlpxucan/WizardLM)  ![](https://img.shields.io/github/stars/nlpxucan/WizardLM.svg)

- [Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models](https://arxiv.org/pdf/2402.11532)

  - University of Minnesota, Amazon AGI, Grammarly

- [Instruction Pre-Training: Language Models are Supervised Multitask Learners](https://arxiv.org/pdf/2406.14491)

  - Microsoft Research, Tsinghua University

  - [LMOps](https://github.com/microsoft/LMOps)  ![](https://img.shields.io/github/stars/microsoft/LMOps.svg)

# Evaluation

- [Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators](https://arxiv.org/pdf/2404.04475)

  - Stanford University, Independent Researcher

  - [alpaca_eval](https://github.com/tatsu-lab/alpaca_eval)  ![](https://img.shields.io/github/stars/tatsu-lab/alpaca_eval.svg)

- [INFOBENCH: Evaluating Instruction Following Ability in Large Language Models](https://arxiv.org/pdf/2401.03601)

  - Tencent AI Lab, Seattle; University of Central Florida; Emory University; University of Georgia; Shanghai Jiao Tong University

  - [InfoBench](https://github.com/qinyiwei/InfoBench)  ![](https://img.shields.io/github/stars/qinyiwei/InfoBench.svg)

- [STRUC-BENCH: Are Large Language Models Good at Generating Complex Structured Tabular Data?](https://aclanthology.org/2024.naacl-short.2.pdf)

  - Yale University, Zhejiang University, New York University

  - NAACL 2024

  - [Struc-Bench](https://github.com/gersteinlab/Struc-Bench)  ![](https://img.shields.io/github/stars/gersteinlab/Struc-Bench.svg)

- [FOFO: A Benchmark to Evaluate LLMs’ Format-Following Capability](https://arxiv.org/pdf/2402.18667)

  - Salesforce Research, University of Illinois at Chicago, Pennsylvania State University

  - [FoFo](https://github.com/SalesforceAIResearch/FoFo)  ![](https://img.shields.io/github/stars/SalesforceAIResearch/FoFo.svg)

- [AlignBench: Benchmarking Chinese Alignment of Large Language Models](https://arxiv.org/pdf/2311.18743)

  - Tsinghua University, Zhipu AI, Renmin University of China, Sichuan University, Lehigh University

  - [AlignBench](https://github.com/THUDM/AlignBench)  ![](https://img.shields.io/github/stars/THUDM/AlignBench.svg)

- [Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena](https://proceedings.neurips.cc/paper_files/paper/2023/file/91f18a1287b398d378ef22505bf41832-Paper-Datasets_and_Benchmarks.pdf)

  - UC Berkeley, UC San Diego, Carnegie Mellon University, Stanford, MBZUAI

  - NeuralPS 2023

  - [llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge)  ![](https://img.shields.io/github/stars/lm-sys/FastChat.svg)

- [Benchmarking Complex Instruction-Following with Multiple Constraints Composition](https://arxiv.org/pdf/2407.03978)

  - Tsinghua, Zhipu, China University of Geosciences, Central China Normal University

  - [ComplexBench](https://github.com/thu-coai/ComplexBench)  ![](https://img.shields.io/github/stars/thu-coai/ComplexBench.svg)

- [EVALUATING LARGE LANGUAGE MODELS AT EVALUATING INSTRUCTION FOLLOWING](https://arxiv.org/pdf/2310.07641)

  - Tsinghua, Princeton, UIUC

  - ICLR 2024

  - [LLMBar](https://github.com/lyogavin/Anima)  ![](https://img.shields.io/github/stars/princeton-nlp/LLMBar.svg)

- [Instruction-Following Evaluation for Large Language Models](https://arxiv.org/pdf/2311.07911)

  - Google, Yale

  - [instruction_following_eval](https://github.com/google-research/google-research/tree/master/instruction_following_eval) ![](https://img.shields.io/github/stars/google-research/google-research.svg)

- [FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models](https://arxiv.org/pdf/2311.09829)

  - Lenovo, TJU

- [Can Large Language Models Understand Real-World Complex Instructions?](https://arxiv.org/pdf/2309.09150)

  - Fudan, ECNU

  - AAAI 2024

  - [CELLO](https://github.com/Abbey4799/CELLO)  ![](https://img.shields.io/github/stars/Abbey4799/CELLO.svg)

- [FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models](https://arxiv.org/pdf/2310.20410)

  - HKUST, Huawei

  - ACL 2024

  - [FollowBench](https://github.com/YJiangcm/FollowBench)  ![](https://img.shields.io/github/stars/YJiangcm/FollowBench.svg)

- [Evaluating Large Language Models on Controlled Generation Tasks](https://arxiv.org/pdf/2310.14542)

  - USC, UC, ETH, Amazon, Deepmind

  - [llm-controlgen](https://github.com/sunjiao123sun/llm-controlgen)  ![](https://img.shields.io/github/stars/sunjiao123sun/llm-controlgen.svg)

# Contributors

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thinkwee/awesome-llm-if

Awesome Lists containing this project

README