Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tensorzero/tensorzero

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
https://github.com/tensorzero/tensorzero

ai ai-engineering anthropic artificial-intelligence deep-learning genai generative-ai gpt large-language-models llama llm llmops llms machine-learning ml ml-engineering mlops openai python rust

Last synced: 4 days ago
JSON representation

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.

Awesome Lists containing this project

README

        

# TensorZero

**TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.**

1. Integrate our model gateway
2. Send metrics or feedback
3. Optimize prompts, models, and inference strategies
4. Watch your LLMs improve over time

It provides a **data & learning flywheel for LLMs** by unifying:

- [x] **Inference:** one API for all LLMs, with <1ms P99 overhead
- [x] **Observability:** inference & feedback → your database
- [x] **Optimization:** from prompts to fine-tuning and RL
- [x] **Experimentation:** built-in A/B testing, routing, fallbacks


Website
·
Docs
·
Twitter
·
Slack
·
Discord




Quick Start (5min)
·
Comprehensive Tutorial
·
Deployment Guide
·
API Reference
·
Configuration Reference

## Features

### 🌐 LLM Gateway

> **Integrate with TensorZero once and access every major LLM provider.**



Model Providers
Features




The TensorZero Gateway natively supports:





Need something else?
Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama).





The TensorZero Gateway supports advanced features like:




The TensorZero Gateway is written in Rust 🦀 with performance in mind (<1ms p99 latency overhead @ 10k QPS).
See Benchmarks.



You can run inference using the TensorZero client (recommended), the OpenAI client, or the HTTP API.




Usage: TensorZero Python Client (Recommended)

You can access any provider using the TensorZero Python client.

1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Optional: Set up the TensorZero configuration.
3. Run inference:

```python
from tensorzero import TensorZeroGateway

with TensorZeroGateway(...) as client:
response = client.inference(
model_name="openai::gpt-4o-mini",
input={
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
]
},
)
```

See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.

Usage: OpenAI Python Client

You can access any provider using the OpenAI Python client with TensorZero.

1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Set up the TensorZero configuration.
3. Run inference:

```python
from openai import OpenAI

with OpenAI(base_url="http://localhost:3000/openai/v1") as client:
response = client.chat.completions.create(
model="tensorzero::your_function_name", # defined in configuration (step 2)
messages=[
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
],
)
```

See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.

Usage: Other Languages & Platforms (HTTP)

TensorZero supports virtually any programming language or platform via its HTTP API.

1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Optional: Set up the TensorZero configuration.
3. Run inference:

```bash
curl -X POST "http://localhost:3000/inference" \
-H "Content-Type: application/json" \
-d '{
"model_name": "openai::gpt-4o-mini",
"input": {
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence."
}
]
}
}'
```

See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.


### 📈 LLM Optimization

> **Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.**

#### Model Optimization

Optimize closed-source and open-source models using supervised fine-tuning (SFT) and preference fine-tuning (DPO).



Supervised Fine-tuning — UI
Preference Fine-tuning (DPO) — Jupyter Notebook




#### Inference-Time Optimization

Boost performance by dynamically updating your prompts with relevant examples, combining responses from multiple inferences, and more.



Best-of-N Sampling
Mixture-of-N Sampling






Dynamic In-Context Learning (DICL)




More coming soon...

#### Prompt Optimization

Optimize your prompts programmatically using research-driven optimization techniques.

Today we provide a sample **[integration with DSPy](https://github.com/tensorzero/tensorzero/tree/main/examples/gsm8k-custom-recipe-dspy)**.

_More coming soon..._


### 🔍 LLM Observability

> **Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.**



Observability » Inference
Observability » Function




## Demo

> **Watch LLMs get better at data extraction in real-time with TensorZero!**
>
> **[Dynamic in-context learning (DICL)](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#dynamic-in-context-learning-dicl)** is a powerful inference-time optimization available out of the box with TensorZero.
> It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.

https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb

## LLM Engineering with TensorZero








TensorZero Flywheel




1. The **[TensorZero Gateway](https://www.tensorzero.com/docs/gateway/)** is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
2. It handles structured schema-based inference with <1ms P99 latency overhead (see **[Benchmarks](https://www.tensorzero.com/docs/gateway/benchmarks)**) and built-in observability, experimentation, and **[inference-time optimizations](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations)**.
3. It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
4. Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
5. Over time, **[TensorZero Recipes](https://www.tensorzero.com/docs/recipes)** leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
6. Finally, the gateway's experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.

Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: systems that learn from real-world experience.
Read more about our **[Vision & Roadmap](https://www.tensorzero.com/docs/vision-roadmap/)**.

## Get Started

**Start building today.**
The **[Quick Start](https://www.tensorzero.com/docs/quickstart)** shows it's easy to set up an LLM application with TensorZero.
If you want to dive deeper, the **[Tutorial](https://www.tensorzero.com/docs/gateway/tutorial)** teaches how to build a simple chatbot, an email copilot, a weather RAG system, and a structured data extraction pipeline.

**Questions?**
Ask us on **[Slack](https://www.tensorzero.com/slack)** or **[Discord](https://www.tensorzero.com/discord)**.

**Using TensorZero at work?**
Email us at **[[email protected]](mailto:[email protected])** to set up a Slack or Teams channel with your team (free).

**Work with us.**
We're **[hiring in NYC](https://www.tensorzero.com/jobs)**.
We'd also welcome **[open-source contributions](https://github.com/tensorzero/tensorzero/blob/main/CONTRIBUTING.md)**!

## Examples

We are working on a series of **complete runnable examples** illustrating TensorZero's data & learning flywheel.

> **[Optimizing Data Extraction (NER) with TensorZero](https://github.com/tensorzero/tensorzero/tree/main/examples/data-extraction-ner)**
>
> This example shows how to use TensorZero to optimize a data extraction pipeline.
> We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL).
> In the end, a optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.

> **[Writing Haikus to Satisfy a Judge with Hidden Preferences](https://github.com/tensorzero/tensorzero/tree/main/examples/haiku-hidden-preferences)**
>
> This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste.
> You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants.
> You'll see progress by fine-tuning the LLM multiple times.

> **[Improving LLM Chess Ability with Best-of-N Sampling](https://github.com/tensorzero/tensorzero/tree/main/examples/chess-puzzles-best-of-n-sampling/)**
>
> This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

> **[Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)](https://github.com/tensorzero/tensorzero/tree/main/examples/gsm8k-custom-recipe-dspy)**
>
> TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows.
> But you can also easily create your own recipes and workflows!
> This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.

_& many more on the way!_