https://github.com/tensorzero/tensorzero
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
https://github.com/tensorzero/tensorzero
ai ai-engineering anthropic artificial-intelligence deep-learning genai generative-ai gpt large-language-models llama llm llmops llms machine-learning ml ml-engineering mlops openai python rust
Last synced: 2 days ago
JSON representation
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
- Host: GitHub
- URL: https://github.com/tensorzero/tensorzero
- Owner: tensorzero
- License: apache-2.0
- Created: 2024-07-16T21:00:53.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-04-11T22:31:45.000Z (3 days ago)
- Last Synced: 2025-04-11T23:32:15.891Z (3 days ago)
- Topics: ai, ai-engineering, anthropic, artificial-intelligence, deep-learning, genai, generative-ai, gpt, large-language-models, llama, llm, llmops, llms, machine-learning, ml, ml-engineering, mlops, openai, python, rust
- Language: Rust
- Homepage: https://tensorzero.com
- Size: 36.8 MB
- Stars: 3,489
- Watchers: 36
- Forks: 224
- Open Issues: 154
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- fucking-awesome-for-beginners - TensorZero - first-issue)_ <br> TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Rust)
- trackawesomelist - TensorZero (⭐793) - first-issue)* <br> TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Recently Updated / [Dec 04, 2024](/content/2024/12/04/README.md))
- awesome-for-beginners - TensorZero - first-issue)_ <br> TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Rust)
- awesome-rust - TensorZero - data & learning flywheel for LLMs that unifies inference, observability, optimization, and experimentation  (Applications / MLOps)
- awesome-LLM-resourses - TensorZero
- fucking-awesome-rust - TensorZero - data & learning flywheel for LLMs that unifies inference, observability, optimization, and experimentation  (Applications / MLOps)
- awesome-ChatGPT-repositories - tensorzero - TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Langchain)
- StarryDivineSky - tensorzero/tensorzero
README
# TensorZero
**TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.**
1. Integrate our model gateway
2. Send metrics or feedback
3. Optimize prompts, models, and inference strategies
4. Watch your LLMs improve over timeIt provides a **data & learning flywheel for LLMs** by unifying:
- [x] **Inference:** one API for all LLMs, with <1ms P99 overhead
- [x] **Observability:** inference & feedback → your database
- [x] **Optimization:** from prompts to fine-tuning and RL
- [x] **Evaluations:** compare prompts, models, inference strategies
- [x] **Experimentation:** built-in A/B testing, routing, fallbacks---
Website
·
Docs
·
·
Slack
·
Discord
Quick Start (5min)
·
Comprehensive Tutorial
·
Deployment Guide
·
API Reference
·
Configuration Reference---
What is TensorZero?
TensorZero is an open-source framework for building production-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluations, and experimentation.
How is TensorZero different from other LLM frameworks?
1. TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback.
2. TensorZero supports the needs of industrial-scale LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc.
3. TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges.
Can I use TensorZero with ___?
Yes. Every major programming language is supported. You can use TensorZero with our Python client, any OpenAI SDK, or our HTTP API.
Is TensorZero production-ready?
Yes. Here's a case study: Automating Code Changelogs at a Large Bank with LLMs
How much does TensorZero cost?
Nothing. TensorZero is 100% self-hosted and open-source. There are no paid features.
Who is building TensorZero?
Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic).
How do I get started?
You can adopt TensorZero incrementally. Our Quick Start goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.
---
## Features
### 🌐 LLM Gateway
> **Integrate with TensorZero once and access every major LLM provider.**
Model Providers
Features
The TensorZero Gateway natively supports:
- Anthropic
- AWS Bedrock
- Azure OpenAI Service
- DeepSeek
- Fireworks
- GCP Vertex AI Anthropic
- GCP Vertex AI Gemini
- Google AI Studio (Gemini API)
- Hyperbolic
- Mistral
- OpenAI
- Together
- vLLM
- xAI
Need something else?
Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama).
The TensorZero Gateway supports advanced features like:
- Retries & Fallbacks
- Inference-Time Optimizations
- Prompt Templates & Schemas
- Experimentation (A/B Testing)
- Configuration-as-Code (GitOps)
- Batch Inference
- Multimodal Inference (VLMs)
- Inference Caching
- Metrics & Feedback
- Multi-Step LLM Workflows (Episodes)
- & a lot more...
The TensorZero Gateway is written in Rust 🦀 with performance in mind (<1ms p99 latency overhead @ 10k QPS).
See Benchmarks.
You can run inference using the TensorZero client (recommended), the OpenAI client, or the HTTP API.
Usage: Python — TensorZero Client (Recommended)
You can access any provider using the TensorZero Python client.
1. `pip install tensorzero`
2. Optional: Set up the TensorZero configuration.
3. Run inference:
```python
from tensorzero import TensorZeroGateway # or AsyncTensorZeroGateway
with TensorZeroGateway.build_embedded(clickhouse_url="...", config_file="...") as client:
response = client.inference(
model_name="openai::gpt-4o-mini",
# Try other providers easily: "anthropic::claude-3-7-sonnet-20250219"
input={
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
]
},
)
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
Usage: Python — OpenAI Client
You can access any provider using the OpenAI Python client with TensorZero.
1. `pip install tensorzero`
2. Optional: Set up the TensorZero configuration.
3. Run inference:
```python
from openai import OpenAI
from tensorzero import patch_openai_client
client = OpenAI()
patch_openai_client(
client,
clickhouse_url="http://chuser:chpassword@localhost:8123/tensorzero",
config_file="config/tensorzero.toml",
async_setup=False,
)
response = client.chat.completions.create(
model="tensorzero::model_name::openai::gpt-4o-mini",
# Try other providers easily: "tensorzero::model_name::anthropic::claude-3-7-sonnet-20250219"
messages=[
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
],
)
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
Usage: JavaScript / TypeScript (Node) — OpenAI Client
You can access any provider using the OpenAI Node client with TensorZero.
1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Set up the TensorZero configuration.
3. Run inference:
```ts
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/openai/v1",
});
const response = await client.chat.completions.create({
model: "tensorzero::model_name::openai::gpt-4o-mini",
// Try other providers easily: "tensorzero::model_name::anthropic::claude-3-7-sonnet-20250219"
messages: [
{
role: "user",
content: "Write a haiku about artificial intelligence.",
},
],
});
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
Usage: Other Languages & Platforms — HTTP API
TensorZero supports virtually any programming language or platform via its HTTP API.
1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Optional: Set up the TensorZero configuration.
3. Run inference:
```bash
curl -X POST "http://localhost:3000/inference" \
-H "Content-Type: application/json" \
-d '{
"model_name": "openai::gpt-4o-mini",
"input": {
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence."
}
]
}
}'
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
### 📈 LLM Optimization
> **Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.**
#### Model Optimization
Optimize closed-source and open-source models using supervised fine-tuning (SFT) and preference fine-tuning (DPO).
Supervised Fine-tuning — UI
Preference Fine-tuning (DPO) — Jupyter Notebook
#### Inference-Time Optimization
Boost performance by dynamically updating your prompts with relevant examples, combining responses from multiple inferences, and more.
Best-of-N Sampling
Mixture-of-N Sampling
Dynamic In-Context Learning (DICL)
More coming soon...
#### Prompt Optimization
Optimize your prompts programmatically using research-driven optimization techniques.
MIPROv2
DSPy Integration
TensorZero comes with several optimization recipes, but you can also easily create your own.
This example shows to optimize a TensorZero function using an arbitrary tool — here, DSPy, a popular library for automated prompt engineering.
_More coming soon..._
### 🔍 LLM Observability
> **Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.**
Observability » Inference
Observability » Function
### 📊 LLM Evaluations
> **Compare prompts, models, and inference strategies using TensorZero Evaluations — with support for heuristics and LLM judges.**
Evaluation » UI
Evaluation » CLI
docker compose run --rm evaluations \
--evaluation-name extract_data \
--dataset-name hard_test_cases \
--variant-name gpt_4o \
--concurrency 5
Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4
Number of datapoints: 100
██████████████████████████████████████ 100/100
exact_match: 0.83 ± 0.03
semantic_match: 0.98 ± 0.01
item_count: 7.15 ± 0.39
## Demo
> **Watch LLMs get better at data extraction in real-time with TensorZero!**
>
> **[Dynamic in-context learning (DICL)](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#dynamic-in-context-learning-dicl)** is a powerful inference-time optimization available out of the box with TensorZero.
> It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.
https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb
## LLM Engineering with TensorZero
1. The **[TensorZero Gateway](https://www.tensorzero.com/docs/gateway/)** is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
2. It handles structured schema-based inference with <1ms P99 latency overhead (see **[Benchmarks](https://www.tensorzero.com/docs/gateway/benchmarks)**) and built-in observability, experimentation, and **[inference-time optimizations](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations)**.
3. It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
4. Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
5. Over time, **[TensorZero Recipes](https://www.tensorzero.com/docs/recipes)** leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
6. Finally, the gateway's experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.
Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: systems that learn from real-world experience.
Read more about our **[Vision & Roadmap](https://www.tensorzero.com/docs/vision-roadmap/)**.
## Get Started
**Start building today.**
The **[Quick Start](https://www.tensorzero.com/docs/quickstart)** shows it's easy to set up an LLM application with TensorZero.
If you want to dive deeper, the **[Tutorial](https://www.tensorzero.com/docs/gateway/tutorial)** teaches how to build a simple chatbot, an email copilot, a weather RAG system, and a structured data extraction pipeline.
**Questions?**
Ask us on **[Slack](https://www.tensorzero.com/slack)** or **[Discord](https://www.tensorzero.com/discord)**.
**Using TensorZero at work?**
Email us at **[[email protected]](mailto:[email protected])** to set up a Slack or Teams channel with your team (free).
**Work with us.**
We're **[hiring in NYC](https://www.tensorzero.com/jobs)**.
We'd also welcome **[open-source contributions](https://github.com/tensorzero/tensorzero/blob/main/CONTRIBUTING.md)**!
## Examples
We are working on a series of **complete runnable examples** illustrating TensorZero's data & learning flywheel.
> **[Optimizing Data Extraction (NER) with TensorZero](https://github.com/tensorzero/tensorzero/tree/main/examples/data-extraction-ner)**
>
> This example shows how to use TensorZero to optimize a data extraction pipeline.
> We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL).
> In the end, a optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.
> **[Agentic RAG — Multi-Hop Question Answering with LLMs](https://github.com/tensorzero/tensorzero/tree/main/examples/rag-retrieval-augmented-generation/simple-agentic-rag/)**
>
> This example shows how to build a multi-hop retrieval agent using TensorZero.
> The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.
> **[Writing Haikus to Satisfy a Judge with Hidden Preferences](https://github.com/tensorzero/tensorzero/tree/main/examples/haiku-hidden-preferences)**
>
> This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste.
> You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants.
> You'll see progress by fine-tuning the LLM multiple times.
> **[Improving LLM Chess Ability with Best-of-N Sampling](https://github.com/tensorzero/tensorzero/tree/main/examples/chess-puzzles-best-of-n-sampling/)**
>
> This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.
> **[Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)](https://github.com/tensorzero/tensorzero/tree/main/examples/gsm8k-custom-recipe-dspy)**
>
> TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows.
> But you can also easily create your own recipes and workflows!
> This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.
_& many more on the way!_