awesome-llama-resources

just collections about Llama2
https://github.com/MIBlue119/awesome-llama-resources

Last synced: 2 days ago
JSON representation

Models
Demo
- A16z's Llama2-chatbot
- Llama2 70B Chatbot at HuggingFace
Porting
- Karpathy's Llama2.c
- web-llm - language models and chat to web browsers
- pyllama
- HuggingFace release Swift Transformers to help run LLM on Apple Device - transformers), a [swift chat app](https://github.com/huggingface/swift-chat) and a [exporters](https://github.com/huggingface/exporters) for exporting model to coreml.
Tutorial
For specific usage Model/ Finetuned model
- Chinese-Llama-2-7b
- Chinese-LLaMA-Alpaca
- ToolLLaMA
- Llama2-Code-Interpreter
- Llama2-Medical-Chatbot
- Finetune LLaMA 7B with Traditional Chinese instruction datasets
- Taiwan-LLaMa
- Finetuning LLaMa + Text-to-SQL - tune LLaMa 2 7B on a Text-to-SQL dataset
- Huggingface trend about llama2
- Finetuned on code with qLoRA
- Taiwan-LLaMa
Multimodal LLM
- LLaSM: Large Language and Speech Model
- LLaVA - and-Vision Assistant
- Chinese-LLaVA
Toolkits
- TogetherAI
- LLaMA2-Accessory - source Toolkit for LLM Development
- LLaMA-Adapter - tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
- text-generation-webui - J, OPT, and GALACTICA.
- text-generation-inference
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving - source compiler and distributed system for low latency, high performance LLM serving.
Optimiztion(Latency/Size)
- GPTQ: Accurate Post Training Quantization for generative pre-trained transformers
- AutoGPTQ - to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
- Optimizing LLM latency
- Series Quantized LLama2 Model from The Bloke with GPTQ/GGML
- TheBloke/llama-2-7B-Guanaco-QLoRA-GPTQ
- OpenAssistant-Llama2-13B-Orca-8K-3319-GPTQ
- GPTQ: Accurate Post Training Quantization for generative pre-trained transformers
- Together AI's Medusa to accelerate decoding
- NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs - LLM is an open-source library that accelerates and optimizes inference performance on the latest LLMs on NVIDIA Tensor Core GPUs.
- 20231130 Pytorch Team use pytorch tool to accelerate
- AutoGPTQ - to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Optimization(Reasoning)
Other Resources
- LLaMA-efficient-tuning - to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan)
- LLaMA2-Every Resource you need
- Finetune Falcon-7B on Your GPU with TRL and QLoRA - 7b on your consumer GPU
- A Definitive Guide to QLoRA: Fine-tuning Falcon-7b with PEFT
- Amazon sagemaker generativeai: Fine-tune Falcon-40B with QLoRA
- Llama with FlashAttention2 - >40.3GiB
- Anti-hype LLM reading list
- Llama 2 の情報まとめ
- awesome-llm and aigc
- Llama with FlashAttention2 - >40.3GiB
- LLaMA-efficient-tuning - to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan)
Some theory
- LLMSurvey
- Stanford CS324 - Large Language Models
- Why we should train smaller LLMs on more tokens
- Open challenges in LLM research
- Challenges and Applications of Large Language Models
- Why you(Propbably) Don't Need to Fine-tune an LLM - shot prompting/ Retrieval Augmented Generation(RAG)
- CS221:Artificial Intelligence: Principles and Techniques
- Why you(Propbably) Don't Need to Fine-tune an LLM - shot prompting/ Retrieval Augmented Generation(RAG)
- Why you(Propbably) Don't Need to Fine-tune an LLM - shot prompting/ Retrieval Augmented Generation(RAG)
- Challenges and Applications of Large Language Models
Finetune Method/ Scripts
- Finetune with PEFT
- Finetune together.ai 32k context window model
- Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API
- Finetune with QLora at 13b model
- HuggingFace SFT training script
- Pytorch-lightening's script to finetune Llama2 on custom dataset
- Instuction-tune Llama2
- Finetune LLaMA2 7-70B on Amazon SageMaker
- Finetune LLaMa2 with QLoRA at colab
- Fine-tune Llama 2 with DPO by huggingface
- Fine-tune Llama2 on specific usage like SQL Gen/Functional Representation - project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed)
- Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API
Prompt
- LLama2 prompt template
Use
- Run Llama 2 on your own Mac using LLM and Homebrew
- Deploy Llama2 7B/13B/70B model on AWS SageMaker - generation-inference). HuggingFace's text generation inference is a Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power Hugging Chat, the Inference API and Inference Endpoint.
Move on to production
Evaluation
- 🤗Open LLM Leaderboard
- 🤗Open LLM Leaderboard
Calculation
Some basics
- Some Intuition on Attention and the Transformer
- Intro to transformers

Programming Languages

Python 23 C 2 Jupyter Notebook 2 C++ 1 TypeScript 1

Categories

Models 12 Finetune Method/ Scripts 12 For specific usage Model/ Finetuned model 11 Other Resources 11 Optimiztion(Latency/Size) 11 Some theory 10 Tutorial 6 Move on to production 6 Toolkits 6 Optimization(Reasoning) 4 Calculation 4 Porting 4 Multimodal LLM 3 Demo 2 Use 2 Evaluation 2 Some basics 2 Prompt 1

Sub Categories

Keywords

llm 10 llama 6 pytorch 5 deep-learning 5 nlp 4 llama2 4 chatgpt 4 large-language-models 4 quantization 3 lora 3 instruction-tuning 3 inference 2 gpt 2 alpaca 2 llama-2 2 pre-trained-language-models 2 fine-tuning 2 language-model 2 transformer 2 ai 2 llms 2 transformers 2 rlhf 2 codellama 1 codeinterpreter 1 finetuning 1 langchain 1 plm 1 machine-learning 1 python 1 vllm 1 alpaca-2 1 llama2-docker 1 webml 1 webgpu 1 pre-training 1 natural-language-processing 1 in-context-learning 1 chain-of-thought 1 qwen 1 qlora 1 peft 1 moe 1 mistral 1 llama3 1 chatglm 1 agent 1 starcoder 1 falcon 1 bloom 1