https://github.com/trae1oung/awesome-parametric-knowledge-in-llms

Must-read papers and blogs about parametric knowledge mechanism in LLMs.
https://github.com/trae1oung/awesome-parametric-knowledge-in-llms
List: awesome-parametric-knowledge-in-llms
awesome knowledge large-language-models llms papers parametric-analysis prompt prompt-engineering survey
Last synced: 3 months ago
JSON representation
Must-read papers and blogs about parametric knowledge mechanism in LLMs.
Host: GitHub
URL: https://github.com/trae1oung/awesome-parametric-knowledge-in-llms
Owner: Trae1ounG
License: mit
Created: 2024-10-10T06:27:31.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-10-22T07:32:22.000Z (8 months ago)
Last Synced: 2024-10-23T10:47:46.225Z (8 months ago)
Topics: awesome, knowledge, large-language-models, llms, papers, parametric-analysis, prompt, prompt-engineering, survey
Homepage:
Size: 11.7 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-parametric-knowledge-in-llms - Must-read papers and blogs about parametric knowledge mechanism in LLMs. (Other Lists / Julia Lists)
README

        
 Awesome Parametric Knowledge in LLMs




[![LICENSE](https://img.shields.io/github/license/Xnhyacinth/Awesome-LLM-Long-Context-Modeling)](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs/blob/main/LICENSE)

![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)

[![commit](https://img.shields.io/github/last-commit/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs?color=blue)](https://github.com/Xnhyacinth/Long_Text_Modeling_Papers/commits/main)

[![PR](https://img.shields.io/badge/PRs-Welcome-red)](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs/pulls)

[![GitHub Repo stars](https://img.shields.io/github/stars/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs)](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs)



This repo includes papers about parametric knowledge in LLMs, now we have parametric knowledge detection and parametric knowledge application these two main categories!👻

We believe that the parametric knowledge in LLMs is still a largely unexplored area, and we hope this repository will provide you with some valuable insights!😶‍🌫️

# Prametric Knowledge Detection

## Knowledge in Transformer-based Model——Analysis🧠

### 2025

1. **[Decoding specialised feature neurons in LLMs with the final projection layer](http://arxiv.org/abs/2501.02688)**

    [Logits Lens, Analysis of Query Neuron]

### 2024

1. **[What does the knowledge neuron thesis  have to do with knowledge? ](https://arxiv.org/abs/2405.02421)**

    *Jingcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn.*   ICLR'24(Spotlight)

2. **[Knowledge Mechanisms in Large Language Models: A Survey and Perspective](https://arxiv.org/abs/2407.15017)**

    *Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang.* EMNLP'24 Findings

3. **[Disentangling Memory and Reasoning Ability in Large Language Models](https://arxiv.org/abs/2411.13504v2)** [![github repo stars](https://img.shields.io/github/stars/MingyuJ666/Disentangling-Memory-and-Reasoning)](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning)

    *Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang.* preprint'24

4. **[Linguistic collapse: Neural collapse in (large) language models](https://arxiv.org/abs/2405.17767)**[![github repo stars](https://img.shields.io/github/stars/rhubarbwu/linguistic-collapse)]( https://github.com/rhubarbwu/linguistic-collapse)

    *Robert Wu, Vardan Papyan.* NIPS'24

5. **[Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models](https://arxiv.org/abs/2410.08414)**[![github repo stars](https://img.shields.io/github/stars/sitaocheng/Knowledge_Interplay)](https://github.com/sitaocheng/Knowledge_Interplay)

    *Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang.* Preprint'24

6. **[Evaluating the External and Parametric Knowledge Fusion of Large Language Models](https://arxiv.org/abs/2405.19010)**

    *Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong Liu, Ruiming Tang.* Preprint'24 

7. **[Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts](https://arxiv.org/abs/2305.13300)**[![github repo stars](https://img.shields.io/github/stars/OSU-NLP-Group/LLM-Knowledge-Conflict)](https://github.com/OSU-NLP-Group/LLM-Knowledge-Conflict)

    *Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su.* ICLR'24 Spotlight

8. **[Knowledge entropy decay during language model pretraining hinders new knowledge acquisition](https://arxiv.org/abs/2410.01380)**

    *Jiyeon Kim, Hyunji Lee, Hyowon Cho, Joel Jang, Hyeonbin Hwang, Seungpil Won, Youbin Ahn, Dohaeng Lee, Minjoon Seo.* Preprint'24

9. **[When Context Leads but Parametric Memory Follows in Large Language Models](https://arxiv.org/abs/2409.08435)**[![github repo stars](https://img.shields.io/github/stars/PortNLP/WikiAtomic)](https://github.com/PortNLP/WikiAtomic)

    *Yufei Tao, Adam Hiatt, Erik Haake, Antonie J. Jetter, Ameeta Agrawal.* EMNLP'24

10. **[Neuron-level knowledge attribution in large language models](https://arxiv.org/abs/2312.12141)**[![github repo stars](https://img.shields.io/github/stars/zepingyu0512/neuron-attribution)](https://github.com/zepingyu0512/neuron-attribution)

    *Zeping Yu, Sophia Ananiadou.* EMNLP'24

11. **[Dissecting recall of factual associations in auto-regressive language models](http://arxiv.org/abs/2304.14767)**[[code](https://github.com/google-research/google-research/tree/master/dissecting_factual_predictions)]

    *Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson.* EMNLP'23

### 2021

1. **[Transformer Feed-Forward Layers Are Key-Value Memories](https://arxiv.org/abs/2012.14913)** 

    *Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy.* EMNLP'21

## Knowledge in Transformer-based Model——Causal Tracing🦾

1. **[Does knowledge localization hold true? Surprising differences between entity and relation perspectives in language models](https://arxiv.org/pdf/2409.00617)**

    *Yifan Wei, Xiaoyan Yu, Yixuan Weng, Huanhuan Ma, Yuanzhe Zhang, Jun Zhao, Kang Liu.* CIKM'24

### 2022

1. **[Locating and Editing Factual Associations in GPT](https://arxiv.org/abs/2202.05262)**

    *Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov.* NIPS'22

### 2024

## Knowledge in Transformer-based Model——Gradient Attribution👀

1. **[Identifying query-relevant neurons in large language models for long-form texts](https://arxiv.org/abs/2406.10868)**

    *Lihu Chen, Adam Dejl, Francesca Toni.* Preprint'24

2. **[Revealing the parametric knowledge of language models: A unified framework for attribution methods](https://arxiv.org/abs/2404.18655)**

    *Haeun Yu, Pepa Atanasova, Isabelle Augenstein.* ACL'24

3. **[Does Large Language Model contain Task-Specific Neurons.](https://aclanthology.org/2024.emnlp-main.403/)**

    *Ran Song, Shizhu He, Shuting Jiang, Yantuan Xian, Shengxiang Gao, Kang Liu, and Zhengtao Yu.* EMNLP'24

4. **[Journey to the center of the knowledge neurons: Discoveries of language-independent knowledge neurons and degenerate knowledge neurons](http://arxiv.org/abs/2308.13198)**[![github repo stars](https://img.shields.io/github/stars/heng840/AMIG)](https://github.com/heng840/AMIG)

    *Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao.* AAAI'24

### 2022

1. **[Knowledge Neurons in Pretrained Transformers](https://arxiv.org/abs/2104.08696)**[![github repo stars](https://img.shields.io/github/stars/Hunter-DDM/knowledge-neurons)](https://github.com/Hunter-DDM/knowledge-neurons)

    *Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei.* ACL'22

## Knowledge in Transformer-based Model——Activation🫀

### 2024

1. **[Separating tongue from thought: Activation patching reveals language-agnostic concept representations in transformers](https://arxiv.org/abs/2411.08745)** [![github repo stars](https://img.shields.io/github/stars/Butanium/llm-lang-agnostic)](https://github.com/Butanium/llm-lang-agnostic)

    *Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West.* ICLR'24 Spotlight

2. **[From yes-men to truth-tellers Addressing sycophancy in large language models with pinpoint tuning](https://arxiv.org/pdf/2409.01658v2)**

    *Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wang, Xu Shen, Jieping Ye.* ICML'24

3. **[Language-specific neurons: The key to multilingual capabilities in large language models.](https://arxiv.org/abs/2402.16438)**

    *Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen.* ACL'24

4. **[Multi-property Steering of Large Language Models with Dynamic Activation Composition](https://arxiv.org/abs/2406.17563)** [![github repo stars](https://img.shields.io/github/stars/DanielSc4/Dynamic-Activation-Composition)](https://github.com/DanielSc4/Dynamic-Activation-Composition)

    *Daniel Scalena, Gabriele Sarti, Malvina Nissim.* ACL'24 BlackboxNLP Workshop

5. **[Exploring the benefit of activation sparsity in pre-training](http://arxiv.org/abs/2410.03440)**[![github repo stars](https://img.shields.io/github/stars/thunlp/moefication)](https://github.com/thunlp/moefication)

    [MoE, Activation Sparsity, Activation Pattern, Inference Speedup]

    *Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou.* ICML'24

## 2023

1. **[Activation Addition: Steering Language Models Without Optimization](https://arxiv.org/abs/2308.10248v4)**

    *Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid.* Preprint'23

2. **[Deja vu: Contextual sparsity for efficient LLMs at inference time](http://arxiv.org/abs/2310.17157)**

    [Sparsity, Inference Speedup]

    *Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen.* ICML'23

# Parametric Knowledge Application

## Knowledge Editing 🧑‍⚕️

### 2024

1. **[A Comprehensive Study of Knowledge Editing for Large Language Models](https://arxiv.org/abs/2401.01286)**

    *Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen.* Preprint'24

2. **[FAME: Towards Factual Multi-Task Model Editing](https://arxiv.org/abs/2410.10859)**[![GitHub Repo stars](https://img.shields.io/github/stars/BITHLP/FAME)](https://github.com/BITHLP/FAME)

    *Li Zeng, Yingyu Shan, Zeming Liu, Jiashu Yao, Yuhang Guo.* EMNLP'24

3. **[To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models](https://arxiv.org/abs/2407.01920)**[![github repo stars](https://img.shields.io/github/stars/zjunlp/KnowUnDo)](https://github.com/zjunlp/KnowUnDo)

    *Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang.* EMNLP'24 findings

4. **[Understanding the Collapse of LLMs in Model Editing](https://arxiv.org/abs/2406.11263)**

    *Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen.* EMNLP'24 findings

5. **[Is it possible to edit large language models robustly?](https://arxiv.org/pdf/2402.05827)**[![github repo stars](https://img.shields.io/github/stars/xbmxb/edit_analysis)](https://github.com/xbmxb/edit_analysis)

    *Xinbei Ma, Tianjie Ju, Jiyang Qiu, Zhuosheng Zhang, Hai Zhao, Lifeng Liu, Yulong Wang.* Preprint'24

6. **[Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering](https://arxiv.org/pdf/2403.19631)**[![github repo stars](https://img.shields.io/github/stars/sycny/RAE)](https://github.com/sycny/RAE)

    *Yucheng Shi, Qiaoyu Tan, Xuansheng Wu, Shaochen Zhong, Kaixiong Zhou, Ninghao Liu.* CIKM'24

7. **[Latent paraphrasing: Perturbation on layers improves knowledge injection in language models](https://arxiv.org/abs/2411.00686)**

    *Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho.* NIPS'24

8. **[Learning to edit: Aligning LLMs with knowledge editing](https://arxiv.org/abs/2402.11905)**[![github repo stars](https://img.shields.io/github/stars/YJiangcm/LTE)](https://github.com/YJiangcm/LTE)

    *Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang.* ACL'24

9. **[Inspecting and Editing Knowledge Representations in Language Models](https://arxiv.org/abs/2304.00740)**[![github repo stars](https://img.shields.io/github/stars/evandez/REMEDI)](https://github.com/evandez/REMEDI)

    *Evan Hernandez, Belinda Z. Li, Jacob Andreas.* COLM'24

10. **[Forgetting before learning: Utilizing parametric arithmetic for knowledge updating in large language models](https://arxiv.org/abs/2311.08011)**

    *Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang.* ACL'24

11. **[Ethos: Rectifying language models in orthogonal parameter space](http://arxiv.org/abs/2403.08994)**

    [Toxic/Bias Unlearning, SVD, Analysis of Parametric Knowledge, Task Vector]

    NAACL'24 findings

### 2023

1. **[Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172)**

    *Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang.* EMNLP'23

### 2022

1. **[Locating and Editing Factual Associations in GPT](https://arxiv.org/abs/2202.05262)**

    *Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov.* NIPS'22

2. **[Memory-Based Model Editing at Scale](https://arxiv.org/abs/2206.06520)**

    *Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn.* ICLR'22

### 2021

1. **[Editing Factual Knowledge in Language Models](https://arxiv.org/abs/2104.08164)**

    *Nicola De Cao, Wilker Aziz, Ivan Titov.* EMNLP'21

### 2020

1. **[Editable neural networks.](https://arxiv.org/abs/2004.00345)**

    *Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitriy Pyrkin, Sergei Popov, Artem Babenko.* ICLR'20

## Knowledge Transfer🧚‍♀️

### 2024

1. **[Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective](https://arxiv.org/abs/2310.11451)**

 *Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He.* ICLR'24

2. **[Initializing models with larger ones](https://arxiv.org/abs/2311.18823)**[![github repo stars](https://img.shields.io/github/stars/OscarXZQ/weight-selection)](https://github.com/OscarXZQ/weight-selection)

    *Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu.* ICLR'24 **Spotlight**

3. **[Cross-model Control: Improving Multiple Large Language Models in One-time Training](https://www.arxiv.org/abs/2410.17599)**[![github repo stars](https://img.shields.io/github/stars/wujwyi/CMC)](https://github.com/wujwyi/CMC)

    *Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao.* NIPS'24

4. **[Knowledge fusion of large language models](https://arxiv.org/abs/2401.10491)**[![github repo stars](https://img.shields.io/github/stars/fanqiwan/FuseLLM)](https://github.com/fanqiwan/FuseLLM)

    *Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi.* ICLR'24

5. **[Tuning language models by proxy](https://arxiv.org/abs/2401.08565)**[![github repo stars](https://img.shields.io/github/stars/alisawuffles/proxy-tuning)](https://github.com/alisawuffles/proxy-tuning)

    *Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith.* COLM'24

6. **[Chat vector: A simple approach to equip LLMs with instruction following and model alignment in new languages](http://arxiv.org/abs/2310.04799)**[![github repo stars](https://img.shields.io/github/stars/aqweteddy/ChatVector)](https://github.com/aqweteddy/ChatVector)

    

    [Task Vector, Parametric Knowledge, Knowledge Transfer]

    ACL'24

7. **[FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models](https://arxiv.org/abs/2406.02224)**

    [Federated Learning, Knowledge Transfer, Heterogeneous Token Alignment]

    Coling'25

8. **[Function vectors in large language models](http://arxiv.org/abs/2310.15213)**

    [Function Vector, Causal Mediation, Mechanism Interpretation]

    ICLR'24

9. **[Refine large language model fine-tuning via instruction vector](http://arxiv.org/abs/2406.12227)**

    [Catastrophic Forgetting, Function Vector, Causal Mediation]

    Preprint'24

10. **[KlF: Knowledge localization and fusion for language model continual learning](http://arxiv.org/abs/2408.05200)**

    [Catastrophic Forgetting, Continual Learning, Sensetity-based Location]

    ACL'24

11. **[Language models are super mario: Absorbing abilities from homologous models as a free lunch](http://arxiv.org/abs/2311.03099)**

    [Knowledge Transfer, Model Merging, Efficient Skill] ICML'24

12. **[Beyond task vectors: Selective task arithmetic based on importance metrics](http://arxiv.org/abs/2411.16139)**

 

    [Task Vector, Sensetivity-based Importance Score, Model Merging] Preprint'24

13. **[Determine-then-ensemble: Necessity of top-k union for large language model ensembling](http://arxiv.org/abs/2410.03777)**

    [Model Ensemble, Prabability-Level, Analysis] ICLR'25 Spotlight

### 2023   

1. **[Mutual enhancement of large and small language models with cross-silo knowledge transfer](https://arxiv.org/abs/2312.05842)**

    *Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang.* Preprint'23

2. **[Learning to grow pretrained models for efficient transformer training](https://arxiv.org/abs/2303.00980)**[![github repo stars](https://img.shields.io/github/stars/VITA-Group/LiGO)](https://github.com/VITA-Group/LiGO)

    *Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David D. Cox, Zhangyang Wang, Yoon Kim.* ICLR'23 

3. **[Retrieval-based knowledge transfer: An effective approach for extreme large language model compression](https://arxiv.org/abs/2310.15594)**

    *Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan.* EMNLP'23 Findings

4. **[Editing models with task arithmetic](http://arxiv.org/abs/2212.04089)**[![github repo stars](https://img.shields.io/github/stars/mlfoundations/task_vectors)](https://github.com/mlfoundations/task_vectors)

    [Task Vecotr, Parametric Knowledge, Knowledge Transfer, Multi-task Learning]

    ICLR'23

5. **[Task-Specific Skill Localization in Fine-tuned Language Models](http://arxiv.org/abs/2302.06600)**

    [Knowledge Transfer, Model Graft, Skill Parameter Localization]

    ICML'23

6. **[Composing parameter-efficient modules with arithmetic operations](http://arxiv.org/abs/2306.14870)**

    [PEFT, Task Vector, Model Merge]

    NIPS'23

7. **[Dataless knowledge fusion by merging weights of language models](http://arxiv.org/abs/2212.09849)**

    [Model Merge]

    ICLR'23

### 2021

1. **[Weight distillation: Transferring the knowledge in neural network parameters](https://arxiv.org/abs/2009.09152)**[![github repo stars](https://img.shields.io/github/stars/Lollipop321/weight-distillation)](https://github.com/Lollipop321/weight-distillation)

    *Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu.* ACL'21

## Activation Steering

## 2024

1. **[Multi-property Steering of Large Language Models with Dynamic Activation Composition](https://arxiv.org/abs/2406.17563)** [![github repo stars](https://img.shields.io/github/stars/DanielSc4/Dynamic-Activation-Composition)](https://github.com/DanielSc4/Dynamic-Activation-Composition)

    *Daniel Scalena, Gabriele Sarti, Malvina Nissim.* ACL'24 BlackboxNLP Workshop

2. **[Word embeddings are steers for language models](http://arxiv.org/abs/2305.12798)**

    [Word Embedding Steering, Generation Control] ACL'24

## 2023

1. **[Activation Addition: Steering Language Models Without Optimization](https://arxiv.org/abs/2308.10248v4)**

*Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid.* Preprint'23

## Knowledge Distillation 

### 2024

1. **[PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning](https://arxiv.org/abs/2402.12842)**[![github repo stars](https://img.shields.io/github/stars/gmkim-ai/PromptKD)](https://github.com/gmkim-ai/PromptKD)(Note: not parametric)

    *Gyeongman Kim, Doohyuk Jang, Eunho Yang.* EMNLP'24 findings

2. **[From Instance Training to Instruction Learning: Task Adapters Generation from Instructions](https://arxiv.org/abs/2406.12382)**[![github repo stars](https://img.shields.io/github/stars/Xnhyacinth/TAGI)](https://github.com/Xnhyacinth/TAGI/)

    *Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao.* NIPS'24

3. **[When babies teach babies: Can student knowledge sharing outperform teacher-guided distillation on small datasets?](https://arxiv.org/abs/2411.16487v1)**

    *Srikrishna Iyer.* EMNLP'24  CoNLL Workshop

## Pramatric Quantization

### 2024

1. **[OneBit: Towards extremely low-bit large language models](https://arxiv.org/abs/2402.11295)** [![github repo stars](https://img.shields.io/github/stars/xuyuzhuang11/OneBit)]( https://github.com/xuyuzhuang11/OneBit)

    *Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che.* NIPS'24

### 2023

1. **[The cost of compression: Investigating the impact of compression on parametric knowledge in language models](https://arxiv.org/abs/2312.00960)** [![github repo stars](https://img.shields.io/github/stars/NamburiSrinath/LLMCompression)](https://github.com/NamburiSrinath/LLMCompression)

    *Satya Sai Srinath Namburi, Makesh Sreedhar, Srinath Srinivasan, Frederic Sala.* EMNLP'23 findings

## Knowledge Injection

### 2024

1. **[Awakening augmented generation: Learning to awaken internal knowledge of large language models for question answering](http://arxiv.org/abs/2403.15268)**[![github repo stars](https://img.shields.io/github/stars/Xnhyacinth/IAG)](https://github.com/Xnhyacinth/IAG)

    [HyperNet, RAG, Context Compression]

    *Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Shengping Liu, Jun Zhao.* AAAI'25

2. **[Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass](https://arxiv.org/abs/2411.05877)**

    [Hypernetwork, Temperal Knowledge, Context Compression] ICLR'25

### 2023

1. **[Memory injections: Correcting multi-hop reasoning failures during inference in transformer-based language models](http://arxiv.org/abs/2309.05605)**[![github repo stars](https://img.shields.io/github/stars/msakarvadia/memory_injections)](https://github.com/msakarvadia/memory_injections)

    *Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster.*	Oral Presentation at BlackboxNLP Workshop at EMNLP'23

2. **[Decouple knowledge from parameters for plug-and-play language modeling](http://arxiv.org/abs/2305.11564)**[![github repo stars](https://img.shields.io/github/stars/Hannibal046/PlugLM)](https://github.com/Hannibal046/PlugLM)

    *Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, Rui Yan.* ACL'23 findings

3. **[IN-PARAMETER KNOWLEDGE INJECTION: INTEGRATING TEMPORARY CONTEXTUAL INFORMATION INTO  MODEL PARAMETERS](https://openreview.net/forum?id=sl4hOq9wm9)**

    submitted to ICLR'25

### 2022

1. **[Kformer: Knowledge injection in transformer feed-forward layers](http://arxiv.org/abs/2201.05742)**[![github repo stars](https://img.shields.io/github/stars/zjunlp/Kformer)](https://github.com/zjunlp/Kformer)

    *Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang.* NLPCC'22

## Parameter-Effecient Fine-tuning(PEFT)

### 2024

1. **[KaSA: Knowledge-aware singular-value adaptation of large language models](http://arxiv.org/abs/2412.06071)**[![github repo stars](https://img.shields.io/github/stars/juyongjiang/KaSA)](https://github.com/juyongjiang/KaSA)

    [Knowledge-aware LoRA, SVD]

    *Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang.* Preprint'24

2. **[CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning](https://arxiv.org/abs/2406.05223)**[![github repo stars](https://img.shields.io/github/stars/iboing/CorDA)](https://github.com/iboing/CorDA)

    [Knowledge-aware LoRA, SVD]

    *Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem.* NIPS'24

3. **[DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353)**[![github repo stars](https://img.shields.io/github/stars/NVlabs/DoRA)](https://github.com/NVlabs/DoRA)

    [Weight-Decomposed LoRA, SVD, Analysis of FT and LoRA]

    *Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen.* ICML'24 Oral

4. **[Low-rank adaptation with task-relevant feature enhancement for fine-tuning language models](http://arxiv.org/abs/2412.09827)**

    [Task-aware LoRA, Hidden Representation Enhancement] AAAI'25 CoLoRAI Workshop

5 **[Train small, infer large: Memory-efficient LoRA training for large language models](http://arxiv.org/abs/2502.13533)**

    [Memory-efficient LoRA Training, Pruning Methods, High memory efficiency]

## Continual Learning

### 2024

1. **[Learn more, but bother less: Parameter efficient continual learning](https://neurips.cc/virtual/2024/poster/94599)**

    [Continual Learning, Parameter Efficient, Knowledge Transfer] NIPS'24

2. **[What will my model forget? Forecasting forgotten examples in language model refinement](http://arxiv.org/abs/2402.01865)**

    [Catastrophic Forgetting, Forecasting Forgetting, Analysis] ICML'24 Spotlight

## RAG  

### 2024

1. **[xRAG: Extreme context compression for retrieval-augmented generation with one token](http://arxiv.org/abs/2405.13792)**

    [Context Compression, RAG, Multimodal Fusion] NIPS'24

2. **[Parametric retrieval augmented generation](http://arxiv.org/abs/2501.15915)**

    [Parametric RAG, Document Parameterization, Offline Method]

3. **[RAGTruth: A hallucination corpus for developing trustworthy retrieval-augmented language models](http://arxiv.org/abs/2401.00396)**

    [RAG, Hallucination, Benchmark] ACL'24

## Long Context Extend

### 2024

1. **[LongEmbed: Extending embedding models for long context retrieval](http://arxiv.org/abs/2404.12096)**

    [Long Context, Embedding Model, Benchmark] EMNLP'24

2. **[LLM maybe LongLM: Self-extend LLM context window without tuning](http://arxiv.org/abs/2401.01325)**

    [Long Context Extend, Plug-and-Play Method] ICML'24 Spotlight

3. **[Two stones hit one bird: Bilevel positional encoding for better length extrapolation](http://arxiv.org/abs/2401.16421)**

    [Long Context Extend, Absolute PE + Relative PE, Plug-and-Play but Training-based Method] ICML'24

### 2023

1. **YaRN: Efficient context window extension of large language models[http://arxiv.org/abs/2309.00071]**

    [Long Context Extend, Variation of RoPE] ICLR'24

### 2022

1. **[Train short, test long: Attention with linear biases enables input length extrapolation](http://arxiv.org/abs/2108.12409)**

    [Alibi, Long Context Extrapolate, Training-based Method] ICLR'22

### 2021

1. **[RoFormer: Enhanced Transformer with Rotary Position Embedding.](https://arxiv.org/abs/2104.09864)**

    [Rotary Position Embedding, Classic]

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Trae1ounG/Awesome-parametric-Knowledge-in-LLMs&type=Date)](https://star-history.com/#Trae1ounG/Awesome-parametric-Knowledge-in-LLMs&Date)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/trae1oung/awesome-parametric-knowledge-in-llms

Awesome Lists containing this project

README

Awesome Parametric Knowledge in LLMs