An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with fastertransformer

A curated list of projects in awesome lists tagged with fastertransformer .

https://github.com/internlm/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind

Last synced: 06 May 2025

https://github.com/InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind

Last synced: 20 Mar 2025

https://github.com/curt-park/serving-codegen-gptj-triton

Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes

codegen docker fastertransformer huggingface-transformers kubernetes pytorch triton-inference-server

Last synced: 13 Apr 2025

https://github.com/clam004/triton-ft-api

tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server

fastapi fastertransformer gpt huggingface nvidia nvidia-docker nvidia-gpu

Last synced: 15 Jan 2025

https://github.com/rajeshthallam/fastertransformer-converter

This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.

fastertransformer gke googlecloudplatform inference large-scale-machine-learning llm triton-inference-server

Last synced: 23 Mar 2025