Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tigerchen52/awesome_role_of_small_models
a curated list of the role of small models in the LLM era
https://github.com/tigerchen52/awesome_role_of_small_models
List: awesome_role_of_small_models
Last synced: about 2 months ago
JSON representation
a curated list of the role of small models in the LLM era
- Host: GitHub
- URL: https://github.com/tigerchen52/awesome_role_of_small_models
- Owner: tigerchen52
- License: mit
- Created: 2024-07-07T21:47:48.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2024-09-18T10:01:28.000Z (about 2 months ago)
- Last Synced: 2024-09-19T13:03:57.149Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 484 KB
- Stars: 26
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# The Role of Small Models
[![Awesome](https://awesome.re/badge.svg)]()
[![PDF](https://img.shields.io/badge/PDF-2409.06857-green)](https://arxiv.org/abs/2409.06857)
![GitHub License](https://img.shields.io/github/license/tigerchen52/role_of_small_models)
![](https://img.shields.io/badge/PRs-Welcome-red)This work is ongoing, and we welcome any comments or suggestions.
Please feel free to reach out if you find we have overlooked any relevant papers.
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen1  
Gaël Varoquaux2  
1 Imperial College London, UK   
2 Soda, Inria Saclay, France   
## Content List
- [Collaboration](#collaboration)
- [SMs Enhance LLMs](#sms-enhance-llms)
- [Data Curation](#data-curation)
- [Curating pre-training data](#curating-pre-training-data)
- [Curating Instruction-tuning Data](#curating-instruction-tuning-data)
- [Weak-to-Strong Paradigm](#weak-to-strong-paradigm)
- [Efficient Inference](#efficient-inference)
- [Ensembling different-size models to reduce inference costs](#ensembling-different-size-models-to-reduce-inference-costs)
- [Speculative Decoding](#speculative-decoding)
- [Evaluating LLMs](#evaluating-llms)
- [Domain Adaptation](#domain-adaptation)
- [Using domain-specific SMs to generate knowledge for LLMs at reasoning time](#using-domain-specific-sms-to-generate-knowledge-for-llms-at-reasoning-time)
- [Using domain-specific SMs to adjust token probability of LLMs at decoding time](#using-domain-specific-sms-to-adjust-token-probability-of-llms-at-decoding-time)
- [Retrieval Augmented Generation](#retrieval-augmented-generation)
- [Prompt-based Reasoning](#prompt-based-reasoning)
- [Deficiency Repair](#deficiency-repair)
- [Developing SM plugins to repair deficiencies](#developing-sm-plugins-to-repair-deficiencies)
- [Contrasting LLMs and SMs for better generations](#contrasting-llms-and-sms-for-better-generations)
- [LLMs Enhance SMs](#llms-enhance-sms)
- [Knowledge Distillation](#knowledge-distillation)
- [Black-box Distillation](#black-box-distillation)
- [White-box distillation](#white-box-distillation)
- [Data Synthesis](#data-synthesis)
- [Data Augmentation](#data-augmentation)
- [Training Data Generation](#training-data-generation)
- [Competition](#competition)
- [Computation-constrained Environment](#computation-constrained-environment)
- [Task-specific Environment](#task-specific-environment)
- [Interpretability-required Environment](#interpretability-required-environment)#### Curating pre-training data
Title
Topic
Venue
Code
Data selection for language models via importance resampling
Data Selection
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Data Selection
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Data Reweighting
#### Curating Instruction-tuning Data
Title
Topic
Venue
Code
MoDS: Model-oriented Data Selection for Instruction Tuning
Data Selection
LESS: Selecting Influential Data for Targeted Instruction Tuning
Data Selection
#### Using weaker (smaller) models to align stronger (larger) models
Title
Topic
Venue
Code
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Weak-to-Strong
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Weak-to-Strong
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
Weak-to-Strong
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Weak-to-Strong
Aligner: Efficient Alignment by Learning to Correct
Weak-to-Strong
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
#### Ensembling different-size models to reduce inference costs
Title
Topic
Venue
Code
Efficient Edge Inference by Selective Query
Model Cascading
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Model Cascading
Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance
Model Cascading
AutoMix: Automatically Mixing Language Models
Model Cascading
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Model Routing
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models
Model Routing
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking
Model Routing
RouteLLM: Learning to Route LLMs with Preference Data
Model Routing
Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling
Model Routing
Title
Topic
Venue
Code
Fast Inference from Transformers via Speculative Decoding
Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Speculative Decoding
Accelerating Large Language Model Decoding with Speculative Sampling
Speculative Decoding
#### Using SMs to evaluate LLM's generations
Title
Topic
Venue
Code
BERTScore: Evaluating Text Generation with BERT
General Evaluation
BARTScore: Evaluating Generated Text as Text Generation
General Evaluation
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Uncertainty
Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models
Uncertainty
ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models
Performance Prediction
#### Using domain-specific SMs to adjust token probability of LLMs at decoding time
Title
Topic
Venue
Code
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
White-box Domain Adaptation
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
White-box Domain Adaptation
Tuning Language Models by Proxy
White-box Domain Adaptation
#### Using domain-specific SMs to generate knowledge for LLMs at reasoning time
Title
Topic
Venue
Code
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Black-box Domain Adaptation
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models
Black-box Domain Adaptation
### Retrieval Augmented Generation
#### Using SMs to retrieve knowledge for enhancing generations:
Title
Topic
Venue
Code
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Documents
KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases
Knowledge Bases
End-to-End Table Question Answering via Retrieval-Augmented Generation
Tables
DocPrompting: Generating Code by Retrieving the Docs
Codes
Toolformer: Language Models Can Teach Themselves to Use Tools
Tools
Retrieval-Augmented Multimodal Language Modeling
Images
#### Using SMs to augment prompts for LLMs
Title
Topic
Venue
Code
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Retrieving Prompts
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning
Decomposing Complex Problems
Small Models are Valuable Plug-ins for Large Language Models
Generating Pseudo Labels
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Generating Pseudo Labels
CaLM: Contrasting Large and Small Language Models to Verify Grounded GenerationGenerating Feedback
Small Language Models Improve Giants by Rewriting Their Outputs
Generating Feedback
#### Developing SM plugins to repair deficiencies:
Title
Topic
Venue
Code
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Hallucinations
Reconfidencing LLMs from the Grouping Loss Perspective
Hallucinations
Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost
Out-Of-Vocabulary Words
#### Contrasting LLMs and SMs for better generations:
Title
Topic
Venue
Code
Contrastive Decoding: Open-ended Text Generation as Optimization
Reducing Repeated Texts
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Mitigating Hallucinations
Contrastive Decoding Improves Reasoning in Large Language Models
Augmenting Reasoning Capabilities
CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
Safeguarding Privacy
Title
Topic
Venue
Code
Explanations from Large Language Models Make Small Reasoners Better
Chain-Of-Thought Distillation
Distilling Step-by-Step! Outperforming Larger Language Models
with Less Training Data and Smaller Model Sizes
Chain-Of-Thought Distillation
Distilling Reasoning Capabilities into Smaller Language Models
Chain-Of-Thought Distillation
Teaching Small Language Models to Reason
Chain-Of-Thought Distillation
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-StepChain-Of-Thought Distillation
Specializing Smaller Language Models towards Multi-Step Reasoning
Chain-Of-Thought Distillation
TinyLLM: Learning a Small Student from Multiple Large Language ModelsChain-Of-Thought Distillation
Lion: Adversarial Distillation of Proprietary Large Language Models
Instruction Following Distillation
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Instruction Following Distillation
Title
Topic
Venue
Code
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Logits
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Intermediate Features
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Intermediate Features
MiniLLM: Knowledge Distillation of Large Language Models
Intermediate Features
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Intermediate Features
Title
Topic
Venue
Code
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing
Text Paraphrase
Paraphrasing with Large Language Models
Text Paraphrase
Query Rewriting for Retrieval-Augmented Large Language Models
Query Rewriting
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model
Specific Tasks
Data Augmentation for Intent Classification with Off-the-shelf Large Language Models
Specific Tasks
Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding
Specific Tasks
#### Training Data Generation:
Title
Topic
Venue
Code
Want To Reduce Labeling Cost? GPT-3 Can Help
Label Annotation
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Label Annotation
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Dataset Generation
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Dataset Generation
Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions
Dataset Generation
Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations
Dataset Generation
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dataset Generation
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction
Dataset Generation
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Dataset Generation
## Computation-constrained Environment
## Task-specific Environment
## Interpretability-required Environment## Citation
```
@misc{chen2024rolesmallmodelsllm,
title={What is the Role of Small Models in the LLM Era: A Survey},
author={Lihu Chen and Gaël Varoquaux},
year={2024},
eprint={2409.06857},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.06857},
}
``````