https://github.com/thinkwee/awesome-llm-if
An Awesome List to LLM Instruction Following
https://github.com/thinkwee/awesome-llm-if
List: awesome-llm-if
Last synced: about 2 months ago
JSON representation
An Awesome List to LLM Instruction Following
- Host: GitHub
- URL: https://github.com/thinkwee/awesome-llm-if
- Owner: thinkwee
- Created: 2023-05-25T03:43:38.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-21T05:55:19.000Z (7 months ago)
- Last Synced: 2025-03-30T00:01:41.616Z (about 2 months ago)
- Homepage:
- Size: 190 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-llm-if - An Awesome List to LLM Instruction Following. (Other Lists / Julia Lists)
README
![]()
Excellent **IF (Instruction Following)** capabilities are the foundation for building complex applications (such as [Tool Usage](https://github.com/thunlp/ToolLearningPapers) or [Multi-Agent System](https://thinkwee.top/multiagent_ebook/)) based on LLMs. This repository aims to provide a comprehensive list of papers, repositories, and other resources related to improving, evaluating, benchmarking, and theoretically analyzing instruction-following capabilities, in order to advance research in this field.
The repository is still under active construction, and we welcome everyone to collaborate and contribute!
# Method
- [DO LLMS “KNOW” INTERNALLY WHEN THEY FOLLOW INSTRUCTIONS?](https://arxiv.org/pdf/2410.14516)
- Cambridge, Apple
- In submission to ICLR 2025
- [SELF-PLAY WITH EXECUTION FEEDBACK: IMPROVING INSTRUCTION-FOLLOWING CAPABILITIES OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2406.13542)
- Alibaba
- [AutoIF](https://github.com/QwenLM/AutoIF) 
- [LESS: Selecting Influential Data for Targeted Instruction Tuning](https://arxiv.org/pdf/2402.04333)
- Princeton University, University of Washington
- ICML 2024
- [LESS](https://github.com/princeton-nlp/less) 
- [WizardLM: Empowering Large Language Models to Follow Complex Instructions](https://arxiv.org/pdf/2304.12244)
- Microsoft, Peking University
- ICLR 2024
- [WizardLM](https://github.com/nlpxucan/WizardLM) 
- [Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models](https://arxiv.org/pdf/2402.11532)
- University of Minnesota, Amazon AGI, Grammarly
- [Instruction Pre-Training: Language Models are Supervised Multitask Learners](https://arxiv.org/pdf/2406.14491)
- Microsoft Research, Tsinghua University
- [LMOps](https://github.com/microsoft/LMOps) # Evaluation
- [Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators](https://arxiv.org/pdf/2404.04475)
- Stanford University, Independent Researcher
- [alpaca_eval](https://github.com/tatsu-lab/alpaca_eval) 
- [INFOBENCH: Evaluating Instruction Following Ability in Large Language Models](https://arxiv.org/pdf/2401.03601)
- Tencent AI Lab, Seattle; University of Central Florida; Emory University; University of Georgia; Shanghai Jiao Tong University
- [InfoBench](https://github.com/qinyiwei/InfoBench) 
- [STRUC-BENCH: Are Large Language Models Good at Generating Complex Structured Tabular Data?](https://aclanthology.org/2024.naacl-short.2.pdf)
- Yale University, Zhejiang University, New York University
- NAACL 2024
- [Struc-Bench](https://github.com/gersteinlab/Struc-Bench) 
- [FOFO: A Benchmark to Evaluate LLMs’ Format-Following Capability](https://arxiv.org/pdf/2402.18667)
- Salesforce Research, University of Illinois at Chicago, Pennsylvania State University
- [FoFo](https://github.com/SalesforceAIResearch/FoFo) 
- [AlignBench: Benchmarking Chinese Alignment of Large Language Models](https://arxiv.org/pdf/2311.18743)
- Tsinghua University, Zhipu AI, Renmin University of China, Sichuan University, Lehigh University
- [AlignBench](https://github.com/THUDM/AlignBench) 
- [Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena](https://proceedings.neurips.cc/paper_files/paper/2023/file/91f18a1287b398d378ef22505bf41832-Paper-Datasets_and_Benchmarks.pdf)
- UC Berkeley, UC San Diego, Carnegie Mellon University, Stanford, MBZUAI
- NeuralPS 2023
- [llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) 
- [Benchmarking Complex Instruction-Following with Multiple Constraints Composition](https://arxiv.org/pdf/2407.03978)
- Tsinghua, Zhipu, China University of Geosciences, Central China Normal University
- [ComplexBench](https://github.com/thu-coai/ComplexBench) 
- [EVALUATING LARGE LANGUAGE MODELS AT EVALUATING INSTRUCTION FOLLOWING](https://arxiv.org/pdf/2310.07641)
- Tsinghua, Princeton, UIUC
- ICLR 2024
- [LLMBar](https://github.com/lyogavin/Anima) 
- [Instruction-Following Evaluation for Large Language Models](https://arxiv.org/pdf/2311.07911)
- Google, Yale
- [instruction_following_eval](https://github.com/google-research/google-research/tree/master/instruction_following_eval) 
- [FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models](https://arxiv.org/pdf/2311.09829)
- Lenovo, TJU
- [Can Large Language Models Understand Real-World Complex Instructions?](https://arxiv.org/pdf/2309.09150)
- Fudan, ECNU
- AAAI 2024
- [CELLO](https://github.com/Abbey4799/CELLO) 
- [FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models](https://arxiv.org/pdf/2310.20410)
- HKUST, Huawei
- ACL 2024
- [FollowBench](https://github.com/YJiangcm/FollowBench) 
- [Evaluating Large Language Models on Controlled Generation Tasks](https://arxiv.org/pdf/2310.14542)
- USC, UC, ETH, Amazon, Deepmind
- [llm-controlgen](https://github.com/sunjiao123sun/llm-controlgen) 