awesome-parametric-knowledge-in-llms

Must-read papers and blogs about parametric knowledge mechanism in LLMs.
https://github.com/trae1oung/awesome-parametric-knowledge-in-llms

Last synced: 5 days ago
JSON representation

Knowledge in Transformer-based Model——Analysis🧠
- 2024
  - Knowledge entropy decay during language model pretraining hinders new knowledge acquisition
  - When Context Leads but Parametric Memory Follows in Large Language Models
  - Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models
  - Evaluating the External and Parametric Knowledge Fusion of Large Language Models
  - Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts - NLP-Group/LLM-Knowledge-Conflict)](https://github.com/OSU-NLP-Group/LLM-Knowledge-Conflict)
  - What does the knowledge neuron thesis have to do with knowledge?
  - Neuron-level knowledge attribution in large language models - attribution)](https://github.com/zepingyu0512/neuron-attribution)
  - Dissecting recall of factual associations in auto-regressive language models - research/google-research/tree/master/dissecting_factual_predictions)]
  - Knowledge Mechanisms in Large Language Models: A Survey and Perspective
  - Disentangling Memory and Reasoning Ability in Large Language Models - Memory-and-Reasoning)](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning)
  - Linguistic collapse: Neural collapse in (large) language models - collapse)]( https://github.com/rhubarbwu/linguistic-collapse)
- 2021
  - Transformer Feed-Forward Layers Are Key-Value Memories
- 2025
  - Decoding specialised feature neurons in LLMs with the final projection layer
Knowledge in Transformer-based Model——Gradient Attribution👀
- 2024
- 2022
  - Knowledge Neurons in Pretrained Transformers - DDM/knowledge-neurons)](https://github.com/Hunter-DDM/knowledge-neurons)
Knowledge Editing 🧑‍⚕️
Knowledge Transfer🧚‍♀️
- 2024
  - Initializing models with larger ones - selection)](https://github.com/OscarXZQ/weight-selection)
  - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
  - Cross-model Control: Improving Multiple Large Language Models in One-time Training
  - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
  - Cross-model Control: Improving Multiple Large Language Models in One-time Training
  - Knowledge fusion of large language models
  - Tuning language models by proxy - tuning)](https://github.com/alisawuffles/proxy-tuning)
  - Initializing models with larger ones - selection)](https://github.com/OscarXZQ/weight-selection)
- 2023
  - Learning to grow pretrained models for efficient transformer training - Group/LiGO)](https://github.com/VITA-Group/LiGO)
  - Mutual enhancement of large and small language models with cross-silo knowledge transfer
  - Retrieval-based knowledge transfer: An effective approach for extreme large language model compression
  - Mutual enhancement of large and small language models with cross-silo knowledge transfer
  - Retrieval-based knowledge transfer: An effective approach for extreme large language model compression
  - Learning to grow pretrained models for efficient transformer training - Group/LiGO)](https://github.com/VITA-Group/LiGO)
- 2021
  - Weight distillation: Transferring the knowledge in neural network parameters - distillation)](https://github.com/Lollipop321/weight-distillation)
  - Weight distillation: Transferring the knowledge in neural network parameters - distillation)](https://github.com/Lollipop321/weight-distillation)
Knowledge Distillation
- 2024
  - PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning - ai/PromptKD)](https://github.com/gmkim-ai/PromptKD)(Note: not parametric)
  - From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
  - From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
  - PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning - ai/PromptKD)](https://github.com/gmkim-ai/PromptKD)(Note: not parametric)
  - When babies teach babies: Can student knowledge sharing outperform teacher-guided distillation on small datasets?
Pramatric Quantization
- 2024
  - OneBit: Towards extremely low-bit large language models
- 2023
  - The cost of compression: Investigating the impact of compression on parametric knowledge in language models
Knowledge in Transformer-based Model——Activation🫀
- 2024
  - Language-specific neurons: The key to multilingual capabilities in large language models.
  - Separating tongue from thought: Activation patching reveals language-agnostic concept representations in transformers - lang-agnostic)](https://github.com/Butanium/llm-lang-agnostic)
  - From yes-men to truth-tellers Addressing sycophancy in large language models with pinpoint tuning
Knowledge in Transformer-based Model🧠
- 2024
  - What does the knowledge neuron thesis have to do with knowledge?
  - Identifying query-relevant neurons in large language models for long-form texts
- 2021
  - Transformer Feed-Forward Layers Are Key-Value Memories
- 2022
  - Knowledge Neurons in Pretrained Transformers - DDM/knowledge-neurons)](https://github.com/Hunter-DDM/knowledge-neurons)
Different Type Neurons in LLMs👀
- 2024
  - Language-specific neurons: The key to multilingual capabilities in large language models.
Knowledge Injection
- 2023
- 2022
  - Kformer: Knowledge injection in transformer feed-forward layers
Star History
- 2022
  - ![Star History Chart - history.com/#Trae1ounG/Awesome-parametric-Knowledge-in-LLMs&Date)
Knowledge in Transformer-based Model——Causal Tracing🦾
- 2021
  - Does knowledge localization hold true? Surprising differences between entity and relation perspectives in language models
2024
- 2021
  - Multi-property Steering of Large Language Models with Dynamic Activation Composition - Activation-Composition)](https://github.com/DanielSc4/Dynamic-Activation-Composition)
2023
- 2021
  - Activation Addition: Steering Language Models Without Optimization

Categories

Knowledge Editing 🧑‍⚕️ 26 Knowledge Transfer🧚‍♀️ 16 Knowledge in Transformer-based Model——Analysis🧠 13 Knowledge in Transformer-based Model——Gradient Attribution👀 5 Knowledge Distillation 5 Knowledge in Transformer-based Model🧠 4 Knowledge Injection 4 Knowledge in Transformer-based Model——Activation🫀 3 Pramatric Quantization 2 2023 1 Different Type Neurons in LLMs👀 1 Knowledge in Transformer-based Model——Causal Tracing🦾 1 Star History 1 2024 1

Sub Categories

2024 51 2023 12 2021 9 2022 8 2020 2 2025 1