https://github.com/tigerchen52/awesome_role_of_small_models

a curated list of the role of small models in the LLM era
https://github.com/tigerchen52/awesome_role_of_small_models

Last synced: 3 months ago
JSON representation

a curated list of the role of small models in the LLM era

Host: GitHub
URL: https://github.com/tigerchen52/awesome_role_of_small_models
Owner: tigerchen52
License: mit
Created: 2024-07-07T21:47:48.000Z (10 months ago)
Default Branch: master
Last Pushed: 2024-09-23T10:16:16.000Z (7 months ago)
Last Synced: 2024-10-14T21:01:34.635Z (6 months ago)
Language: Python
Homepage:
Size: 314 KB
Stars: 45
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome_ai_agents - Awesome_Role_Of_Small_Models - a curated list of the role of small models in the LLM era (Building / LLM Models)
awesome_ai_agents - Awesome_Role_Of_Small_Models - a curated list of the role of small models in the LLM era (Building / LLM Models)

README

# The Role of Small Models
[![Awesome](https://awesome.re/badge.svg)]()
[![PDF](https://img.shields.io/badge/PDF-2409.06857-green)](https://arxiv.org/abs/2409.06857)
![GitHub License](https://img.shields.io/github/license/tigerchen52/role_of_small_models)
![](https://img.shields.io/badge/PRs-Welcome-red)

This work is ongoing, and we welcome any comments or suggestions.

Please feel free to reach out if you find we have overlooked any relevant papers.

What is the Role of Small Models in the LLM Era: A Survey

Lihu Chen¹&nbsp&nbsp
Gaël Varoquaux²&nbsp&nbsp

¹ Imperial College London, UK &nbsp&nbsp
² Soda, Inria Saclay, France &nbsp&nbsp

## Content List
- [Collaboration](#collaboration)
- [SMs Enhance LLMs](#sms-enhance-llms)
- [Data Curation](#data-curation)
- [Curating pre-training data](#curating-pre-training-data)
- [Curating Instruction-tuning Data](#curating-instruction-tuning-data)
- [Weak-to-Strong Paradigm](#weak-to-strong-paradigm)
- [Efficient Inference](#efficient-inference)
- [Ensembling different-size models to reduce inference costs](#ensembling-different-size-models-to-reduce-inference-costs)
- [Speculative Decoding](#speculative-decoding)
- [Evaluating LLMs](#evaluating-llms)
- [Domain Adaptation](#domain-adaptation)
- [Using domain-specific SMs to generate knowledge for LLMs at reasoning time](#using-domain-specific-sms-to-generate-knowledge-for-llms-at-reasoning-time)
- [Using domain-specific SMs to adjust token probability of LLMs at decoding time](#using-domain-specific-sms-to-adjust-token-probability-of-llms-at-decoding-time)
- [Retrieval Augmented Generation](#retrieval-augmented-generation)
- [Prompt-based Reasoning](#prompt-based-reasoning)
- [Deficiency Repair](#deficiency-repair)
- [Developing SM plugins to repair deficiencies](#developing-sm-plugins-to-repair-deficiencies)
- [Contrasting LLMs and SMs for better generations](#contrasting-llms-and-sms-for-better-generations)
- [LLMs Enhance SMs](#llms-enhance-sms)
- [Knowledge Distillation](#knowledge-distillation)
- [Black-box Distillation](#black-box-distillation)
- [White-box distillation](#white-box-distillation)
- [Data Synthesis](#data-synthesis)
- [Data Augmentation](#data-augmentation)
- [Training Data Generation](#training-data-generation)
- [Competition](#competition)
- [Computation-constrained Environment](#computation-constrained-environment)
- [Task-specific Environment](#task-specific-environment)
- [Interpretability-required Environment](#interpretability-required-environment)

# Collaboration

## SMs Enhance LLMs

### Data Curation

#### Curating pre-training data

Title
Topic
Venue
Code

Data selection for language models via importance resampling
Data Selection

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Data Selection

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Data Selection

QuRating: Selecting High-Quality Data for Training Language Models
Data Selection

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Data Reweighting

#### Curating Instruction-tuning Data

Title
Topic
Venue
Code

MoDS: Model-oriented Data Selection for Instruction Tuning
Data Selection

LESS: Selecting Influential Data for Targeted Instruction Tuning
Data Selection

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Data Selection

### Weak-to-Strong Paradigm

#### Using weaker (smaller) models to align stronger (larger) models

Title
Topic
Venue
Code

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Weak-to-Strong

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Weak-to-Strong

Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
Weak-to-Strong

Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Weak-to-Strong

Aligner: Efficient Alignment by Learning to Correct
Weak-to-Strong

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Weak-to-Strong

Theoretical Analysis of Weak-to-Strong Generalization

Weak-to-Strong

### Efficient Inference

#### Ensembling different-size models to reduce inference costs

Title
Topic
Venue
Code

Efficient Edge Inference by Selective Query
Model Cascading

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Model Cascading

Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance
Model Cascading

AutoMix: Automatically Mixing Language Models
Model Cascading

FrugalML: How to use ML Prediction APIs more accurately and cheaply
Model Cascading

Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Model Cascading

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Model Routing

Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models
Model Routing

OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking
Model Routing

RouteLLM: Learning to Route LLMs with Preference Data
Model Routing

Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling
Model Routing

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

Model Routing

LLM-BLENDER: Ensembling Large Language Models
with Pairwise Ranking and Generative Fusion

Model Routing

RouterBench: A Benchmark for Multi-LLM Routing System

Model Routing

Large Language Model Routing with Benchmark Datasets

Model Routing

#### Speculative Decoding

Title
Topic
Venue
Code

Fast Inference from Transformers via Speculative Decoding
Speculative Decoding

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Speculative Decoding

Accelerating Large Language Model Decoding with Speculative Sampling
Speculative Decoding

### Evaluating LLMs

#### Using SMs to evaluate LLM's generations

Title
Topic
Venue
Code

BERTScore: Evaluating Text Generation with BERT
General Evaluation

BARTScore: Evaluating Generated Text as Text Generation
General Evaluation

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Uncertainty

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models
Uncertainty

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models
Performance Prediction

### Domain Adaptation

#### Using domain-specific SMs to adjust token probability of LLMs at decoding time

Title
Topic
Venue
Code

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
White-box Domain Adaptation

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
White-box Domain Adaptation

Tuning Language Models by Proxy
White-box Domain Adaptation

#### Using domain-specific SMs to generate knowledge for LLMs at reasoning time

Title
Topic
Venue
Code

Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Black-box Domain Adaptation

BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models
Black-box Domain Adaptation

### Retrieval Augmented Generation

#### Using SMs to retrieve knowledge for enhancing generations:

Title
Topic
Venue
Code

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Documents

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases
Knowledge Bases

End-to-End Table Question Answering via Retrieval-Augmented Generation
Tables

DocPrompting: Generating Code by Retrieving the Docs
Codes

Toolformer: Language Models Can Teach Themselves to Use Tools
Tools

Retrieval-Augmented Multimodal Language Modeling
Images

### Prompt-based Reasoning

#### Using SMs to augment prompts for LLMs

Title
Topic
Venue
Code

UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Retrieving Prompts

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning
Decomposing Complex Problems

Small Models are Valuable Plug-ins for Large Language Models
Generating Pseudo Labels

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Generating Pseudo Labels

CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

Generating Feedback

Small Language Models Improve Giants by Rewriting Their Outputs
Generating Feedback

### Deficiency Repair

#### Developing SM plugins to repair deficiencies:

Title
Topic
Venue
Code

Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Hallucinations

Reconfidencing LLMs from the Grouping Loss Perspective
Hallucinations

Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost
Out-Of-Vocabulary Words

#### Contrasting LLMs and SMs for better generations:

Title
Topic
Venue
Code

Contrastive Decoding: Open-ended Text Generation as Optimization
Reducing Repeated Texts

Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Mitigating Hallucinations

Contrastive Decoding Improves Reasoning in Large Language Models
Augmenting Reasoning Capabilities

CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
Safeguarding Privacy

## LLMs Enhance SMs

### Knowledge Distillation

#### Black-box Distillation:

Title
Topic
Venue
Code

Explanations from Large Language Models Make Small Reasoners Better
Chain-Of-Thought Distillation

Distilling Step-by-Step! Outperforming Larger Language Models
with Less Training Data and Smaller Model Sizes
Chain-Of-Thought Distillation

Distilling Reasoning Capabilities into Smaller Language Models
Chain-Of-Thought Distillation

Teaching Small Language Models to Reason
Chain-Of-Thought Distillation

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step

Chain-Of-Thought Distillation

Specializing Smaller Language Models towards Multi-Step Reasoning
Chain-Of-Thought Distillation

TinyLLM: Learning a Small Student from Multiple Large Language Models

Chain-Of-Thought Distillation

Lion: Adversarial Distillation of Proprietary Large Language Models
Instruction Following Distillation

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Instruction Following Distillation

#### White-box Distillation:

Title
Topic
Venue
Code

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Logits

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Intermediate Features

Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Intermediate Features

MiniLLM: Knowledge Distillation of Large Language Models
Intermediate Features

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Intermediate Features

### Data Synthesis

#### Data Augmentation:

Title
Topic
Venue
Code

Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing
Text Paraphrase

Paraphrasing with Large Language Models
Text Paraphrase

Query Rewriting for Retrieval-Augmented Large Language Models
Query Rewriting

LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model
Specific Tasks

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models
Specific Tasks

Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding
Specific Tasks

#### Training Data Generation:

Title
Topic
Venue
Code

Want To Reduce Labeling Cost? GPT-3 Can Help
Label Annotation

Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Label Annotation

ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Dataset Generation

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Dataset Generation

Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions
Dataset Generation

Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations
Dataset Generation

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dataset Generation

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction
Dataset Generation

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Dataset Generation

# Competition

## Computation-constrained Environment
## Task-specific Environment
## Interpretability-required Environment

## Citation

```
@misc{chen2024rolesmallmodelsllm,
title={What is the Role of Small Models in the LLM Era: A Survey},
author={Lihu Chen and Gaël Varoquaux},
year={2024},
eprint={2409.06857},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.06857},
}
``````

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tigerchen52/awesome_role_of_small_models

Awesome Lists containing this project

README

What is the Role of Small Models in the LLM Era: A Survey