{"id":20435486,"url":"https://github.com/jacksonchen1998/llama-paper-list","last_synced_at":"2025-04-12T21:34:21.866Z","repository":{"id":177471265,"uuid":"660445007","full_name":"jacksonchen1998/LLaMA-Paper-List","owner":"jacksonchen1998","description":"Collection of papers using LLaMA as backbone model","archived":false,"fork":false,"pushed_at":"2025-04-06T06:13:07.000Z","size":60,"stargazers_count":38,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-08T08:46:18.928Z","etag":null,"topics":["deep-learning","language-model","llama","opensource","paper","transformer"],"latest_commit_sha":null,"homepage":"https://github.com/meta-llama/llama-models","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacksonchen1998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-30T03:03:28.000Z","updated_at":"2025-04-07T03:24:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"a7988687-a46f-4e52-b1fb-c4bc3a67640b","html_url":"https://github.com/jacksonchen1998/LLaMA-Paper-List","commit_stats":{"total_commits":15,"total_committers":4,"mean_commits":3.75,"dds":0.5333333333333333,"last_synced_commit":"ac6057a5414946f7b5804d85db7a8f6e2208f6a0"},"previous_names":["jacksonchen1998/llama-paper-list"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FLLaMA-Paper-List","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FLLaMA-Paper-List/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FLLaMA-Paper-List/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FLLaMA-Paper-List/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacksonchen1998","download_url":"https://codeload.github.com/jacksonchen1998/LLaMA-Paper-List/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248637036,"owners_count":21137529,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","language-model","llama","opensource","paper","transformer"],"created_at":"2024-11-15T08:34:45.097Z","updated_at":"2025-04-12T21:34:21.859Z","avatar_url":"https://github.com/jacksonchen1998.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLaMA-Paper-List\n\nCollection of papers using LLaMA as backbone model.\n\n## Contributors\n\n\u003ca href=\"https://github.com/jacksonchen1998/LLaMA-Paper-List/graphs/contributors\"\u003e\n  \u003cimg src=\"http://contributors.nn.ci/api?repo=jacksonchen1998/LLaMA-Paper-List\" /\u003e\n\u003c/a\u003e\n\n## Table of Contents\n\n- [Official LLaMA blog post](#official-llama-blog-post)\n- [Original LLaMA paper](#original-llama-paper)\n- [Related theory with LLaMA](#related-theory-with-llama)\n- [LLaMA with parameter efficiency](#llama-with-parameter-efficiency)\n- [Fine-tune LLaMA on downstream tasks](#fine-tune-llama-on-downstream-tasks)\n- [LLaMA combined with multi-modal](#llama-combined-with-multi-modal)\n- [LLaMA with retrieval](#llama-with-retrieval)\n- [LLaMA using reinforcement learning](#llama-using-reinforcement-learning)\n- [Quantitative analysis of LLaMA](#quantitative-analysis-of-llama)\n- [Prompting LLaMA](#prompting-llama)\n\n## Blog posts\n\n### Official LLaMA blog post\n\n- [The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation](https://ai.meta.com/blog/llama-4-multimodal-intelligence/?utm_source=llama-home-behemoth\u0026utm_medium=llama-referral\u0026utm_campaign=llama-utm\u0026utm_offering=llama-behemoth-preview\u0026utm_product=llama)\n\n## Papers\n\n### Original LLaMA paper\n\n- **LLaMA: Open and Efficient Foundation Language Models.** arxiv 2023. [paper](https://arxiv.org/abs/2302.13971). [code](https://github.com/facebookresearch/llama/tree/llama_v1)\u003cbr /\u003e\n*Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample*\n- **Llama 2: Open Foundation and Fine-Tuned Chat Models.** Meta AI 2023. [paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/). [code](https://github.com/facebookresearch/llama/tree/main) \u003cbr /\u003e\n*Hugo Touvron, Louis Martin, Kevin Stone et al.*\n- **The Llama 3 Herd of Models.** arxiv 2024. [paper](https://arxiv.org/abs/2407.21783). [code](https://github.com/meta-llama/llama3) \u003cbr /\u003e\n*Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey et al.*\n\n### Related theory with LLaMA\n\n- **Large Language Models Are Zero-Shot Time Series Forecasters.** NeurIPS 2023. [paper](https://proceedings.neurips.cc/paper_files/paper/2023/hash/3eb7ca52e8207697361b2c0fb3926511-Abstract-Conference.html). [code](https://github.com/ngruver/llmtime) \u003cbr /\u003e\n*Nate Gruver, Marc Finzi, Shikai Qiu, Andrew Gordon Wilson*\n- **Training Compute-Optimal Large Language Models.** NeurIPS 2022. [paper](https://arxiv.org/abs/2203.15556).\u003cbr /\u003e\n*Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre*\n- **Root Mean Square Layer Normalization.** NeurIPS 2019. [paper](https://arxiv.org/abs/1910.07467). [code](https://github.com/bzhangGo/rmsnorm) \u003cbr /\u003e\n*Biao Zhang, Rico Sennrich*\n- **GLU Variants Improve Transformer.** arxiv 2020. [paper](https://arxiv.org/abs/2002.05202). [code](https://github.com/Rishit-dagli/GLU) \u003cbr /\u003e\n*Noam Shazeer*\n- **RoFormer: Enhanced Transformer with Rotary Position Embedding.** arxiv 2021. [paper](https://arxiv.org/abs/2104.09864). [code](https://github.com/ZhuiyiTechnology/roformer) \u003cbr /\u003e\n*Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu*\n- **Decoupled Weight Decay Regularization.** ICLR 2019. [paper](https://arxiv.org/abs/1711.05101). [code](https://github.com/loshchil/AdamW-and-SGDW) \u003cbr /\u003e\n*Ilya Loshchilov, Frank Hutter*\n- **Self-attention Does Not Need $O(n^2)$ Memory.** arxiv 2021. [paper](https://arxiv.org/abs/2112.05682). [code](https://github.com/lucidrains/memory-efficient-attention-pytorch) \u003cbr /\u003e\n*Markus N. Rabe and Charles Staats*\n- **FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.** arxiv 2022. [paper](https://arxiv.org/abs/2205.14135). [code](https://github.com/HazyResearch/flash-attention) \u003cbr /\u003e\n*Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré*\n- **Reducing Activation Recomputation in Large Transformer Models.** arxiv 2022. [paper](https://arxiv.org/abs/2205.14135). \u003cbr /\u003e\n*Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro*\n\n### LLaMA with parameter efficiency\n\n- **LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.** arxiv 2023. [paper](https://arxiv.org/abs/2303.16199). [code](https://github.com/ZrrSkywalker/LLaMA-Adapter)\u003cbr /\u003e\n*Zhang, Renrui and Han, Jiaming and Zhou, Aojun and Hu, Xiangfei and Yan, Shilin and Lu, Pan and Li, Hongsheng and Gao, Peng and Qiao, Yu*\n- **LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.** arxiv 2023. [paper](https://arxiv.org/abs/2304.15010). [code](https://github.com/ZrrSkywalker/LLaMA-Adapter)\u003cbr /\u003e\n*Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao*\n- **LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models.** arxiv 2023. [paper](https://arxiv.org/abs/2304.01933).\u003cbr /\u003e\n*Zhiqiang Hu, Yihuai Lan, Lei Wang, Wanyu Xu, Ee-Peng Lim, Roy Ka-Wei Lee, Lidong Bing, Xing Xu, Soujanya Poria*\n- **A Simple and Effective Pruning Approach for Large Language Models.** arxiv 2023. [paper](https://arxiv.org/abs/2306.11695v1). [code](https://github.com/locuslab/wanda) \u003cbr /\u003e\n*Mingjie Sun, Zhuang Liu, Anna Bair, J. Zico Kolter*\n- **LLM-Pruner: On the Structural Pruning of Large Language Models.** arxiv 2023. [paper](https://arxiv.org/abs/2305.11627v2). [code](https://github.com/horseee/llm-pruner) \u003cbr /\u003e\n*Xinyin Ma, Gongfan Fang, Xinchao Wang*\n\n### Fine-tune LLaMA on downstream tasks\n\n- **Graph of Thoughts: Solving Elaborate Problems with Large Language Models.** AAAI 2024. [paper](https://ojs.aaai.org/index.php/AAAI/article/view/29720). [code](https://github.com/spcl/graph-of-thoughts). \u003cbr /\u003e\n*Maciej Besta, Nils Blach, Ales Kubicek et al.*\n- **How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources.** NeurIPS 2023. [paper](https://arxiv.org/abs/2306.04751). [code](https://github.com/allenai/open-instruct) \u003cbr /\u003e\n*Yizhong Wang, Hamish Ivison, Pradeep Dasigi et al.*\n- **Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision.** NeurIPS 2023. [paper](https://arxiv.org/abs/2305.03047). [code](https://github.com/IBM/Dromedary) \u003cbr /\u003e\n*Zhiqing Sun, Yikang Shen, Qinhong Zhou et al.*\n- **ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.** arxiv 2023. [paper](https://arxiv.org/abs/2303.14070).\u003cbr /\u003e\n*Yunxiang Li, Zihan Li, Kai Zhang, Ruilong Dan, Steve Jiang, You Zhang*\n- **Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca.** arxiv 2023. [paper](https://arxiv.org/abs/2304.08177). [code](https://github.com/ymcui/Chinese-LLaMA-Alpaca)\u003cbr /\u003e\n*Yiming Cui, Ziqing Yang, Xin Yao*\n- **PMC-LLaMA: Further Finetuning LLaMA on Medical Papers.** arxiv 2023. [paper](https://arxiv.org/abs/2304.14454).\u003cbr /\u003e\n*Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie*\n- **Dr. LLaMA: Improving Small Language Models on PubMedQA\nvia Generative Data Augmentation.** arxiv 2023. [paper](https://arxiv.org/abs/2305.07804).\u003cbr /\u003e\n*Zhen Guo, Peiqi Wang, Yanwei Wang, Shangdi Yu*\n- **Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks.** arxiv 2023. [paper](https://arxiv.org/abs/2305.14201).\u003cbr /\u003e\n*Tiedong Liu, Bryan Kian Hsiang Low*\n- **WizardLM: Empowering Large Language Models to Follow Complex Instructions.** arxiv 2023. [paper](https://arxiv.org/abs/2304.12244v2). [code](https://github.com/nlpxucan/wizardlm) \u003cbr /\u003e\n*Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Daxin Jiang*\n- **Enhancing Chat Language Models by Scaling High-quality Instructional Conversations.** arxiv 2023. [paper](https://arxiv.org/abs/2305.14233v1). [code](https://github.com/thunlp/ultrachat) \u003cbr /\u003e\n*Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, BoWen Zhou*\n- **LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction.** arxiv 2023. [paper](https://arxiv.org/abs/2304.08460v1). [code](https://github.com/akoksal/longform) \u003cbr /\u003e\n*Abdullatif Köksal, Timo Schick, Anna Korhonen, Hinrich Schütze*\n- **In-Context Learning User Simulators for Task-Oriented Dialog Systems.** arxiv 2023. [paper](https://arxiv.org/abs/2306.00774v1). [code](https://github.com/telepathylabsai/prompt-based-user-simulator) \u003cbr /\u003e\n*Silvia Terragni, Modestas Filipavicius, Nghia Khau, Bruna Guedes, André Manso, Roland Mathis*\n- **NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services.** arxiv 2023. [paper](https://arxiv.org/pdf/2307.06148.pdf). [code]() \u003cbr /\u003e\n*Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Chenghui Peng, Jianjun Wu, Ekram Hossain, Honggang Zhang*\n- **On decoder-only architecture for speech-to-text and large language model integration.** arxiv 2023. [paper](https://arxiv.org/pdf/2307.03917.pdf). [code]() \u003cbr /\u003e\n*Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu*\n\n### LLaMA combined with multi-modal\n\n- **MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI.** CVPR 2024. [paper](https://openaccess.thecvf.com/content/CVPR2024/html/Yue_MMMU_A_Massive_Multi-discipline_Multimodal_Understanding_and_Reasoning_Benchmark_for_CVPR_2024_paper.html). [code](https://github.com/MMMU-Benchmark/MMMU) \u003cbr /\u003e\n*Xiang Yue, Yuansheng Ni, Kai Zhang et al.*\n\n### LLaMA with retrieval\n\n- **Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge Retrieval from Foundation Language Models.** arxiv 2023. [paper](https://arxiv.org/abs/2305.13675). [code](https://github.com/daniel-furman/polyglot-or-not) \u003cbr /\u003e\n*Tim Schott, Daniel Furman, Shreshta Bhat*\n- **ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models** [paper](https://arxiv.org/abs/2305.18323v1). [code](https://github.com/billxbf/rewoo) \u003cbr /\u003e\n*Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan Xu*\n- **Landmark Attention: Random-Access Infinite Context Length for Transformers.** arxiv 2023. [paper](https://arxiv.org/abs/2305.16300v1). [code](https://github.com/epfml/landmark-attention) \u003cbr /\u003e\n*Amirkeivan Mohtashami, Martin Jaggi*\n\n### LLaMA using reinforcement learning\n\n- **LIMA: Less Is More for Alignment.** arxiv 2023. [paper](https://arxiv.org/abs/2305.11206v1). [code](https://github.com/h2oai/h2o-llmstudio) \u003cbr /\u003e\n*Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy*\n- **RRHF: Rank Responses to Align Language Models with Human Feedback without tears.** [paper](https://arxiv.org/abs/2304.05302v2). [code](https://github.com/ganjinzero/rrhf) \u003cbr /\u003e\n*Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, Fei Huang*\n\n### Quantitative analysis of LLaMA\n\n- **SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression.** arxiv 2023. [paper](https://arxiv.org/abs/2306.03078v1). [code](https://github.com/vahe1994/spqr) \u003cbr /\u003e\n*Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh*\n- **SqueezeLLM: Dense-and-Sparse Quantization.** arxiv 2023. [paper](https://arxiv.org/abs/2306.07629v1). [code](https://github.com/squeezeailab/squeezellm) \u003cbr /\u003e\n*Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer*\n\n### Prompting LLaMA \n- **Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.**. arxiv 2023. [paper](https://arxiv.org/abs/2306.16007).\u003cbr /\u003e\n    *Yuang Li, Yu Wu, Jinyu Li, Shujie Liu*\n## How to contribute\n\nContributions are welcome! Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonchen1998%2Fllama-paper-list","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacksonchen1998%2Fllama-paper-list","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonchen1998%2Fllama-paper-list/lists"}