Projects in Awesome Lists tagged with fastertransformer
A curated list of projects in awesome lists tagged with fastertransformer .
https://github.com/internlm/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind
Last synced: 06 May 2025
https://github.com/InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind
Last synced: 20 Mar 2025
https://github.com/curt-park/serving-codegen-gptj-triton
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
codegen docker fastertransformer huggingface-transformers kubernetes pytorch triton-inference-server
Last synced: 13 Apr 2025
https://github.com/clam004/triton-ft-api
tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server
fastapi fastertransformer gpt huggingface nvidia nvidia-docker nvidia-gpu
Last synced: 15 Jan 2025
https://github.com/rajeshthallam/fastertransformer-converter
This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.
fastertransformer gke googlecloudplatform inference large-scale-machine-learning llm triton-inference-server
Last synced: 23 Mar 2025