An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with deepspeed

A curated list of projects in awesome lists tagged with deepspeed .

https://github.com/internlm/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind

Last synced: 04 Feb 2026

https://github.com/InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind

Last synced: 20 Mar 2025

https://github.com/alibaba/Megatron-LLaMA

Best practice for training LLaMA models in Megatron-LM

deepspeed distributed-training llama llm megatron-lm pretraining pytorch

Last synced: 27 Mar 2025

https://github.com/antgroup/glake

GLake: optimizing GPU memory management and IO transmission.

deepspeed gpu llm memory onnx pytorch

Last synced: 08 Apr 2025

https://github.com/shm007g/LLaMA-Cult-and-More

Large Language Models for All, 🦙 Cult and More, Stay in touch !

alpaca chatgpt deepspeed ggml gpt gpt4 gptq llama llm loralib pytorch tensorflow transformers vicuna

Last synced: 17 Mar 2025

https://github.com/lambdalabsml/distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

cluster cuda deepspeed distributed-training fsdp gpu gpu-cluster kuberentes lambdalabs mpi nccl pytorch sharding slurm

Last synced: 16 May 2025

https://github.com/OpenMOSS/CoLLiE

Collaborative Training of Large Language Models in an Efficient Way

deep-learning deepspeed nlp pytorch

Last synced: 09 May 2025

https://github.com/openmoss/collie

Collaborative Training of Large Language Models in an Efficient Way

deep-learning deepspeed nlp pytorch

Last synced: 05 Apr 2025

https://github.com/LambdaLabsML/distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

cluster cuda deepspeed distributed-training fsdp gpu gpu-cluster kuberentes lambdalabs mpi nccl pytorch sharding slurm

Last synced: 08 Mar 2025

https://github.com/Coobiw/MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

deepspeed fine-tuning mllm model-parallel multimodal-large-language-models pipeline-parallelism pretraining qwen video-language-model video-large-language-models

Last synced: 27 Feb 2025

https://github.com/sunzeyeah/RLHF

Implementation of Chinese ChatGPT

chatgpt deep-learning deepspeed glm nlp pangu pytorch

Last synced: 29 Mar 2025

https://github.com/deepschneider/gpt-neo-fine-tuning-example

Fine-Tune EleutherAI GPT-Neo And GPT-J-6B To Generate Netflix Movie Descriptions Using Hugginface And DeepSpeed

deepspeed deepspeed-library fine-tuning gpt-3 gpt-j gpt-j-6b gpt-neo gpt-neo-fine-tuning gpt-neo-hugging-face gpt-neo-text-generation gpt-neo-xl gptj text-generation

Last synced: 07 Apr 2025

https://github.com/jackaduma/chatglm-lora-rlhf-pytorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

chatglm chatglm-6b chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf

Last synced: 27 Apr 2025

https://github.com/coincheung/gdgpt

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

baichuan2-7b bloom chatglm3-6b deepspeed flash-attention full-finetune llama2 llm mixtral-8x7b model-parallization nlp pipeline pytorch

Last synced: 07 May 2025

https://github.com/xyjigsaw/llm-pretrain-sft

Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed)

baichuan2 deepspeed large-language-models llama lora mistral

Last synced: 14 Apr 2025

https://github.com/jackaduma/alpaca-lora-rlhf-pytorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

alpaca chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf

Last synced: 16 Jun 2025

https://github.com/pszemraj/ai-msgbot

Training & Implementation of chatbots leveraging GPT-like architecture with the aitextgen package to enable dynamic conversations.

ai aitextgen chat-application chatbot deep-learning deepspeed deployment gpt-2 gpt-j gpt-j-6b gradio huggingface huggingface-transformers natural-language-processing nlp nlp-parsing telegram telegram-bot text-generation transformers

Last synced: 14 Oct 2025

https://github.com/opencsgs/llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

deepspeed llama-cpp llm-inference ray transformer vllm

Last synced: 12 Apr 2025

https://github.com/beomi/transformers-language-modeling

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

bert deepspeed language-model transformers

Last synced: 03 Aug 2025

https://github.com/wangclnlp/deepspeed-chat-extension

This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).

deepspeed llama llm rlhf sft

Last synced: 26 Apr 2025

https://github.com/ztjhz/minilm

Small Model Is All You Need - NTU SC4001 Neural Network & Deep Learning Project

bert deep-learning deepspeed gpt2 llama llm neural-network nlp ntu roberta sc4001 wandb

Last synced: 30 Jul 2025

https://github.com/dyedd/deepspeed-diffusers

🚀 原生使用 Deepspeed 训练 Diffusers | Native Training of Diffusers with Deepspeed

deepspeed diffusers diffusion model

Last synced: 07 Apr 2026

https://github.com/janelu9/easyllm

Running Large Language Model easily.

deepseek deepspeed fine-tuning llama megatron-lm npu pretrain qwen qwen-vl

Last synced: 16 May 2026

https://github.com/janelu9/EasyLLM

Running Large Language Model easily.

deepspeed fine-tuning llama3 llm megatron-lm pretrain qwen2-vl vlm

Last synced: 25 Aug 2025

https://github.com/yinpu/llm-trainhub

hub of modular projects for flexible training of large language models with multi-GPU support.

deepspeed huggingface llm-training qwen2 transformers

Last synced: 14 Feb 2026

https://github.com/abhilash1910/framework-optimization

Framework, Model & Kernel Optimizations for Distributed Deep Learning - Data Hack Summit

codegen ddp deepspeed fsdp inductor pipelineparallel pytorch tensorparallel triton

Last synced: 19 May 2026

https://github.com/followb1ind1y/medical-llm-fine-tuning

Fine-tunes LLaMA-3-8B on PubMedQA with QLoRA, optimized via DeepSpeed and vLLM for efficient, low-latency medical QA. Deployable via Docker for scalable clinical inference.

deepspeed fine-tuning llama3 llm qlora vllm

Last synced: 18 Apr 2026

https://github.com/bhimrazy/deepnvme-experiments

A collection of experiments demonstrating simple file reads and writes involving CPU/GPU tensors using DeepNVMe.

deepnvme deepspeed

Last synced: 11 Feb 2026

https://github.com/slinusc/deepspeed-mii-container

Launch your own high-performance DeepSpeed-MII server for seamless local LLM deployment. This repository provides a Dockerized solution to serve Hugging Face models (e.g., Mistral-7B) with an OpenAI-compatible API, enabling GPU-accelerated, low-latency inference out of the box.

container deepspeed docker engine inference llm mii

Last synced: 28 Apr 2026

https://github.com/timelesshc/codellama-nl2sql

Fine tune codellama-7b-hf to achieve natural language to SQL ability

codellama deepspeed fine-tuning lora nl2sql python text2sql

Last synced: 30 May 2026

https://github.com/alinababer/implementation-of-the-longnet-scaling-transformers-to-1-000-000-000-tokens

implementation of the paper LongNet: Scaling Transformers to 1,000,000,000 Tokens by Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, and Furu Wei.

ai deepspeed generative-ai llms longnet ml transformers

Last synced: 03 Apr 2025

https://github.com/hrolive/large-language-models-on-supercomputers

Comprehensive exploration of LLMs, including cutting-edge techniques and tools such as parameter-efficient fine-tuning (PEFT), quantization, zero redundancy optimizers (ZeRO), fully sharded data parallelism (FSDP), DeepSpeed, and Huggingface accelerate.

deepspeed evaluation-metrics fsdp high-performance-computing hpc huggingface huggingface-transformers jupyter llm llm-inference llm-training monitoring peft python quantization slurm tokenization transformer unsloth

Last synced: 18 May 2026