https://github.com/tigerchen52/awesome_role_of_small_models
a curated list of the role of small models in the LLM era
https://github.com/tigerchen52/awesome_role_of_small_models
List: awesome_role_of_small_models
Last synced: 3 months ago
JSON representation
a curated list of the role of small models in the LLM era
- Host: GitHub
- URL: https://github.com/tigerchen52/awesome_role_of_small_models
- Owner: tigerchen52
- License: mit
- Created: 2024-07-07T21:47:48.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-09-23T10:16:16.000Z (7 months ago)
- Last Synced: 2024-10-14T21:01:34.635Z (6 months ago)
- Language: Python
- Homepage:
- Size: 314 KB
- Stars: 45
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome_ai_agents - Awesome_Role_Of_Small_Models - a curated list of the role of small models in the LLM era (Building / LLM Models)
- awesome_ai_agents - Awesome_Role_Of_Small_Models - a curated list of the role of small models in the LLM era (Building / LLM Models)
README
# The Role of Small Models
[]()
[](https://arxiv.org/abs/2409.06857)

This work is ongoing, and we welcome any comments or suggestions.
Please feel free to reach out if you find we have overlooked any relevant papers.
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen1  
Gaël Varoquaux2  
1 Imperial College London, UK   
2 Soda, Inria Saclay, France   
## Content List
- [Collaboration](#collaboration)
- [SMs Enhance LLMs](#sms-enhance-llms)
- [Data Curation](#data-curation)
- [Curating pre-training data](#curating-pre-training-data)
- [Curating Instruction-tuning Data](#curating-instruction-tuning-data)
- [Weak-to-Strong Paradigm](#weak-to-strong-paradigm)
- [Efficient Inference](#efficient-inference)
- [Ensembling different-size models to reduce inference costs](#ensembling-different-size-models-to-reduce-inference-costs)
- [Speculative Decoding](#speculative-decoding)
- [Evaluating LLMs](#evaluating-llms)
- [Domain Adaptation](#domain-adaptation)
- [Using domain-specific SMs to generate knowledge for LLMs at reasoning time](#using-domain-specific-sms-to-generate-knowledge-for-llms-at-reasoning-time)
- [Using domain-specific SMs to adjust token probability of LLMs at decoding time](#using-domain-specific-sms-to-adjust-token-probability-of-llms-at-decoding-time)
- [Retrieval Augmented Generation](#retrieval-augmented-generation)
- [Prompt-based Reasoning](#prompt-based-reasoning)
- [Deficiency Repair](#deficiency-repair)
- [Developing SM plugins to repair deficiencies](#developing-sm-plugins-to-repair-deficiencies)
- [Contrasting LLMs and SMs for better generations](#contrasting-llms-and-sms-for-better-generations)
- [LLMs Enhance SMs](#llms-enhance-sms)
- [Knowledge Distillation](#knowledge-distillation)
- [Black-box Distillation](#black-box-distillation)
- [White-box distillation](#white-box-distillation)
- [Data Synthesis](#data-synthesis)
- [Data Augmentation](#data-augmentation)
- [Training Data Generation](#training-data-generation)
- [Competition](#competition)
- [Computation-constrained Environment](#computation-constrained-environment)
- [Task-specific Environment](#task-specific-environment)
- [Interpretability-required Environment](#interpretability-required-environment)#### Curating pre-training data
Title
Topic
Venue
Code
Data selection for language models via importance resampling
Data Selection
![]()
![]()
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Data Selection
![]()
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Data Selection
![]()
![]()
QuRating: Selecting High-Quality Data for Training Language Models
Data Selection
![]()
![]()
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Data Reweighting
![]()
![]()
#### Curating Instruction-tuning Data
Title
Topic
Venue
Code
MoDS: Model-oriented Data Selection for Instruction Tuning
Data Selection
![]()
![]()
LESS: Selecting Influential Data for Targeted Instruction Tuning
Data Selection
![]()
![]()
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Data Selection
![]()
![]()
#### Using weaker (smaller) models to align stronger (larger) models
Title
Topic
Venue
Code
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Weak-to-Strong
![]()
![]()
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Weak-to-Strong
![]()
![]()
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
Weak-to-Strong
![]()
![]()
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Weak-to-Strong
![]()
![]()
Aligner: Efficient Alignment by Learning to Correct
Weak-to-Strong
![]()
![]()
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Theoretical Analysis of Weak-to-Strong Generalization
#### Ensembling different-size models to reduce inference costs
Title
Topic
Venue
Code
Efficient Edge Inference by Selective Query
Model Cascading
![]()
![]()
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Model Cascading
![]()
Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance
Model Cascading
![]()
![]()
AutoMix: Automatically Mixing Language Models
Model Cascading
![]()
![]()
FrugalML: How to use ML Prediction APIs more accurately and cheaply
Model Cascading
![]()
![]()
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Model Cascading
![]()
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Model Routing
![]()
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models
Model Routing
![]()
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking
Model Routing
![]()
RouteLLM: Learning to Route LLMs with Preference Data
Model Routing
![]()
![]()
Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling
Model Routing
![]()
![]()
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
LLM-BLENDER: Ensembling Large Language Models
with Pairwise Ranking and Generative FusionRouterBench: A Benchmark for Multi-LLM Routing System
Large Language Model Routing with Benchmark Datasets
Title
Topic
Venue
Code
Fast Inference from Transformers via Speculative Decoding
Speculative Decoding
![]()
![]()
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Speculative Decoding
![]()
![]()
Accelerating Large Language Model Decoding with Speculative Sampling
Speculative Decoding
![]()
![]()
#### Using SMs to evaluate LLM's generations
Title
Topic
Venue
Code
BERTScore: Evaluating Text Generation with BERT
General Evaluation
![]()
![]()
BARTScore: Evaluating Generated Text as Text Generation
General Evaluation
![]()
![]()
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Uncertainty
![]()
![]()
Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models
Uncertainty
![]()
![]()
ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models
Performance Prediction
![]()
![]()
#### Using domain-specific SMs to adjust token probability of LLMs at decoding time
Title
Topic
Venue
Code
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
White-box Domain Adaptation
![]()
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
White-box Domain Adaptation
![]()
![]()
Tuning Language Models by Proxy
White-box Domain Adaptation
![]()
![]()
#### Using domain-specific SMs to generate knowledge for LLMs at reasoning time
Title
Topic
Venue
Code
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Black-box Domain Adaptation
![]()
![]()
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models
Black-box Domain Adaptation
![]()
### Retrieval Augmented Generation
#### Using SMs to retrieve knowledge for enhancing generations:
Title
Topic
Venue
Code
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Documents
![]()
KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases
Knowledge Bases
![]()
End-to-End Table Question Answering via Retrieval-Augmented Generation
Tables
![]()
DocPrompting: Generating Code by Retrieving the Docs
Codes
![]()
![]()
Toolformer: Language Models Can Teach Themselves to Use Tools
Tools
![]()
![]()
Retrieval-Augmented Multimodal Language Modeling
Images
![]()
#### Using SMs to augment prompts for LLMs
Title
Topic
Venue
Code
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Retrieving Prompts
![]()
![]()
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning
Decomposing Complex Problems
![]()
![]()
Small Models are Valuable Plug-ins for Large Language Models
Generating Pseudo Labels
![]()
![]()
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Generating Pseudo Labels
![]()
CaLM: Contrasting Large and Small Language Models to Verify Grounded GenerationGenerating Feedback
![]()
Small Language Models Improve Giants by Rewriting Their Outputs
Generating Feedback
![]()
![]()
#### Developing SM plugins to repair deficiencies:
Title
Topic
Venue
Code
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Hallucinations
![]()
![]()
Reconfidencing LLMs from the Grouping Loss Perspective
Hallucinations
![]()
Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost
Out-Of-Vocabulary Words
![]()
![]()
#### Contrasting LLMs and SMs for better generations:
Title
Topic
Venue
Code
Contrastive Decoding: Open-ended Text Generation as Optimization
Reducing Repeated Texts
![]()
![]()
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Mitigating Hallucinations
![]()
Contrastive Decoding Improves Reasoning in Large Language Models
Augmenting Reasoning Capabilities
![]()
CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
Safeguarding Privacy
![]()
Title
Topic
Venue
Code
Explanations from Large Language Models Make Small Reasoners Better
Chain-Of-Thought Distillation
![]()
Distilling Step-by-Step! Outperforming Larger Language Models
with Less Training Data and Smaller Model Sizes
Chain-Of-Thought Distillation
![]()
![]()
Distilling Reasoning Capabilities into Smaller Language Models
Chain-Of-Thought Distillation
![]()
![]()
Teaching Small Language Models to Reason
Chain-Of-Thought Distillation
![]()
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-StepChain-Of-Thought Distillation
![]()
![]()
Specializing Smaller Language Models towards Multi-Step Reasoning
Chain-Of-Thought Distillation
![]()
TinyLLM: Learning a Small Student from Multiple Large Language ModelsChain-Of-Thought Distillation
![]()
Lion: Adversarial Distillation of Proprietary Large Language Models
Instruction Following Distillation
![]()
![]()
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Instruction Following Distillation
![]()
![]()
Title
Topic
Venue
Code
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Logits
![]()
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Intermediate Features
![]()
![]()
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Intermediate Features
![]()
![]()
MiniLLM: Knowledge Distillation of Large Language Models
Intermediate Features
![]()
![]()
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Intermediate Features
![]()
Title
Topic
Venue
Code
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing
Text Paraphrase
![]()
Paraphrasing with Large Language Models
Text Paraphrase
![]()
Query Rewriting for Retrieval-Augmented Large Language Models
Query Rewriting
![]()
![]()
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model
Specific Tasks
![]()
Data Augmentation for Intent Classification with Off-the-shelf Large Language Models
Specific Tasks
![]()
![]()
Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding
Specific Tasks
![]()
#### Training Data Generation:
Title
Topic
Venue
Code
Want To Reduce Labeling Cost? GPT-3 Can Help
Label Annotation
![]()
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Label Annotation
![]()
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Dataset Generation
![]()
![]()
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Dataset Generation
![]()
![]()
Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions
Dataset Generation
![]()
Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations
Dataset Generation
![]()
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dataset Generation
![]()
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction
Dataset Generation
![]()
![]()
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Dataset Generation
![]()
![]()
## Computation-constrained Environment
## Task-specific Environment
## Interpretability-required Environment## Citation
```
@misc{chen2024rolesmallmodelsllm,
title={What is the Role of Small Models in the LLM Era: A Survey},
author={Lihu Chen and Gaël Varoquaux},
year={2024},
eprint={2409.06857},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.06857},
}
``````