{"id":15017869,"url":"https://github.com/basetenlabs/truss","last_synced_at":"2026-05-08T23:28:19.986Z","repository":{"id":43017528,"uuid":"511003180","full_name":"basetenlabs/truss","owner":"basetenlabs","description":"The simplest way to serve AI/ML models in production","archived":false,"fork":false,"pushed_at":"2026-04-20T19:04:54.000Z","size":47181,"stargazers_count":1143,"open_issues_count":70,"forks_count":100,"subscribers_count":19,"default_branch":"main","last_synced_at":"2026-04-20T19:10:36.414Z","etag":null,"topics":["artificial-intelligence","easy-to-use","falcon","inference-api","inference-server","machine-learning","model-serving","open-source","packaging","stable-diffusion","whisper","wizardlm"],"latest_commit_sha":null,"homepage":"https://truss.baseten.co","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/basetenlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2022-07-06T05:39:37.000Z","updated_at":"2026-04-20T17:04:28.000Z","dependencies_parsed_at":"2025-12-24T08:09:55.676Z","dependency_job_id":null,"html_url":"https://github.com/basetenlabs/truss","commit_stats":{"total_commits":994,"total_committers":40,"mean_commits":24.85,"dds":0.8792756539235412,"last_synced_commit":"039c75076964e9c90ab7bf9125e509f4c227bc00"},"previous_names":[],"tags_count":722,"template":false,"template_full_name":null,"purl":"pkg:github/basetenlabs/truss","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basetenlabs%2Ftruss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basetenlabs%2Ftruss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basetenlabs%2Ftruss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basetenlabs%2Ftruss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/basetenlabs","download_url":"https://codeload.github.com/basetenlabs/truss/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basetenlabs%2Ftruss/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32362783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"online","status_checked_at":"2026-04-28T02:00:07.250Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","easy-to-use","falcon","inference-api","inference-server","machine-learning","model-serving","open-source","packaging","stable-diffusion","whisper","wizardlm"],"created_at":"2024-09-24T19:51:06.342Z","updated_at":"2026-04-28T02:03:42.822Z","avatar_url":"https://github.com/basetenlabs.png","language":"Python","funding_links":[],"categories":["*Ops for AI"],"sub_categories":["Model Serving \u0026 Inference"],"readme":"# Truss\n\n**The simplest way to serve AI/ML models in production**\n\n[![PyPI version](https://badge.fury.io/py/truss.svg)](https://badge.fury.io/py/truss)\n[![Python versions](https://img.shields.io/pypi/pyversions/truss.svg)](https://pypi.org/project/truss/)\n[![ci_status](https://github.com/basetenlabs/truss/actions/workflows/release.yml/badge.svg)](https://github.com/basetenlabs/truss/actions/workflows/release.yml)\n\nTruss is the CLI for deploying and serving ML models on Baseten. Package your model's serving logic in Python, launch training jobs, and deploy to production—Truss handles containerization, dependency management, and GPU configuration.\n\nTruss lets you serve models with the [Baseten Inference Stack](https://www.baseten.co/resources/guide/the-baseten-inference-stack/) as well as deploy models from any open-source framework: vLLM, SGLang, TensorRT-LLM, `transformers`, `diffusers`, PyTorch, TensorFlow, and more.\n\n**[Get started](https://docs.baseten.co/examples/deploy-your-first-model)** | [100+ examples](https://github.com/basetenlabs/truss-examples/) | [Documentation](https://docs.baseten.co)\n\n# Why Truss?\n\n* **Write once, run anywhere:** Package model code, weights, and dependencies with a model server that behaves the same in development and production.\n* **Fast developer loop:** Iterate with live reload, skip Docker and Kubernetes configuration, and use a batteries-included serving environment.\n* **Support for all Python frameworks:** From `transformers` and `diffusers` to PyTorch and TensorFlow to vLLM, SGLang, and TensorRT-LLM, Truss supports models created and served with any framework.\n* **Production-ready:** Built-in support for GPUs, secrets, caching, and autoscaling when deployed to [Baseten](https://baseten.co) or your own infrastructure.\n\n# Installation\n\nInstall Truss with:\n\n```\npip install --upgrade truss\n```\n\n# Quickstart\n\nDeploying a model to Baseten via Truss turns a Hugging Face model into a production-ready API endpoint. You write a `config.yaml` that specifies the model, the hardware, and the engine, then `uvx truss push` builds a TensorRT-optimized container and deploys it. No Python code, no Dockerfile, no container management.\n\nThis guide walks through deploying [Qwen 2.5 3B Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), a small but capable LLM, from a config file to a production API. You'll set up Truss, write a config, deploy to Baseten, call the model's OpenAI-compatible endpoint, and promote to production.\n\n## Set up your environment\n\nBefore you begin:\n\n- [Sign up](https://app.baseten.co/signup) or [sign in](https://app.baseten.co/login) to Baseten.\n- Install [uv](https://docs.astral.sh/uv/), a fast Python package manager. This guide uses `uvx` to run [Truss](https://pypi.org/project/truss/) commands without a separate install step.\n\n### Authenticate with Baseten\n\nGenerate an API key from [Settings \u003e API keys](https://app.baseten.co/settings/account/api_keys), then log in:\n\n```sh\nuvx truss login\n```\n\nPaste your API key when prompted:\n\n```output\n💻 Let's add a Baseten remote!\n🤫 Quietly paste your API_KEY:\n```\n\nYou can skip the interactive prompt by setting `BASETEN_API_KEY` as an environment variable:\n```bash\nexport BASETEN_API_KEY=\"paste-your-api-key-here\"\n```\n\n## Create a Truss project\n\nScaffold a new project:\n\n```sh\nuvx truss init qwen-2.5-3b \u0026\u0026 cd qwen-2.5-3b\n```\n\nWhen prompted, name the model `Qwen 2.5 3B`.\n\n```output\n? 📦 Name this model: Qwen 2.5 3B\nTruss Qwen 2.5 3B was created in ~/qwen-2.5-3b\n```\n\nThis creates a directory with a `config.yaml`, a `model/` directory, and supporting files. For engine-based deployments like this one, you only need `config.yaml`. The `model/` directory is for [custom Python code](/examples/customize-a-model) when you need custom preprocessing, postprocessing, or unsupported model architectures.\n\n## Write the config\n\nReplace the contents of `config.yaml` with:\n\n```yaml config.yaml\nmodel_name: Qwen-2.5-3B\nresources:\n  accelerator: L4\n  use_gpu: true\ntrt_llm:\n  build:\n    base_model: decoder\n    checkpoint_repository:\n      source: HF\n      repo: \"Qwen/Qwen2.5-3B-Instruct\"\n    max_seq_len: 8192\n    quantization_type: fp8\n    tensor_parallel_count: 1\n```\n\nThat's the entire deployment specification.\n\n- `model_name` identifies the model in your Baseten dashboard.\n- `resources` selects an L4 GPU (24 GB VRAM), which is plenty for a 3B parameter model.\n- `trt_llm` tells Baseten to use [Engine-Builder-LLM](/engines/engine-builder-llm/overview), which compiles the model with TensorRT-LLM for optimized inference.\n- `checkpoint_repository` points to the model weights on Hugging Face. Qwen 2.5 3B Instruct is ungated, so no access token is needed.\n- `quantization_type: fp8` compresses weights to 8-bit floating point, cutting memory usage roughly in half with negligible quality loss.\n- `max_seq_len: 8192` sets the maximum context length for requests.\n\n---\n\n## Deploy\n\nPush the model to Baseten:\n\nWe'll start by deploying in development mode so we can iterate quickly:\n\n```sh\nuvx truss push --watch\n```\n\nYou should see:\n\n```output\n✨ Model Qwen 2.5 3B was successfully pushed ✨\n\n🪵  View logs for your deployment at https://app.baseten.co/models/abc1d2ef/logs/xyz123\n👀 Watching for changes to truss...\n```\n\nThe logs URL contains your model ID, the string after `/models/` (e.g., `abc1d2ef`). You'll need this to call the model's API. You can also find it in your [Baseten dashboard](https://app.baseten.co/models/).\n\nBaseten now downloads the model weights from Hugging Face, compiles them with TensorRT-LLM, and deploys the resulting container to an L4 GPU. You can watch progress in the logs linked above.\n\n## Call the model\n\nEngine-based deployments serve an OpenAI-compatible API. Once the deployment shows \"Active\" in the dashboard, call it using the OpenAI SDK or cURL. Replace `{model_id}` with your model ID from the deployment output.\n\nInstall the OpenAI SDK if you don't have it:\n\n```sh\nuv pip install openai\n```\n\nCreate a chat completion:\n\n```python\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ[\"BASETEN_API_KEY\"],\n    base_url=\"https://model-{model_id}.api.baseten.co/environments/development/sync/v1\",\n)\n\nresponse = client.chat.completions.create(\n    model=\"Qwen-2.5-3B\",\n    messages=[\n        {\"role\": \"user\", \"content\": \"What is machine learning?\"}\n    ],\n)\n\nprint(response.choices[0].message.content)\n```\n\nYou should see a response like:\n\n```output\nMachine learning is a branch of artificial intelligence where systems learn\npatterns from data to make predictions or decisions without being explicitly\nprogrammed for each task...\n```\n\nAny code that works with the OpenAI SDK works with your deployment. Just point the `base_url` at your model's endpoint.\n\n## Iterate with live reload\n\nWhen you change your `config.yaml` and want to test quickly, use live reload:\n\n```sh\nuvx truss watch\n```\n\nYou should see:\n\n```output\n🪵  View logs for your deployment at https://app.baseten.co/models/\u003cmodel_id\u003e/logs/\u003cdeployment_id\u003e\n🚰 Attempting to sync truss with remote\nNo changes observed, skipping patching.\n👀 Watching for changes to truss...\n```\n\nWhen you save changes, Truss automatically syncs them with the deployed model. This saves time by patching without a full rebuild.\n\nIf you stopped the watch session, you can re-attach with:\n\n```sh\nuvx truss watch\n```\n\nThis creates a production deployment with its own endpoint. The API URL changes from `/environments/development/` to `/environments/production/`:\n\n```python\nclient = OpenAI(\n    api_key=os.environ[\"BASETEN_API_KEY\"],\n    base_url=\"https://model-{model_id}.api.baseten.co/environments/production/sync/v1\",\n)\n```\n\nYour model ID is the string after `/models/` in the logs URL from `uvx truss push`. You can also find it in your [Baseten dashboard](https://app.baseten.co/models/).\n\n# IDE support\n\nTruss ships a [JSON schema](truss/config.schema.json) for `config.yaml`. Projects created with `truss init` include a schema reference automatically, giving you autocompletion, hover docs, and validation in any editor that supports the [YAML language server](https://github.com/redhat-developer/yaml-language-server) (VS Code, JetBrains, Neovim, and others).\n\nTo add schema support to an existing `config.yaml`, add this comment as the first line:\n\n```yaml\n# yaml-language-server: $schema=https://raw.githubusercontent.com/basetenlabs/truss/main/truss/config.schema.json\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasetenlabs%2Ftruss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbasetenlabs%2Ftruss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasetenlabs%2Ftruss/lists"}