Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tensorzero/tensorzero
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
https://github.com/tensorzero/tensorzero
ai ai-engineering anthropic artificial-intelligence deep-learning genai generative-ai gpt large-language-models llama llm llmops llms machine-learning ml ml-engineering mlops openai python rust
Last synced: 4 days ago
JSON representation
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
- Host: GitHub
- URL: https://github.com/tensorzero/tensorzero
- Owner: tensorzero
- License: apache-2.0
- Created: 2024-07-16T21:00:53.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-01-31T03:42:20.000Z (12 days ago)
- Last Synced: 2025-01-31T08:03:52.600Z (11 days ago)
- Topics: ai, ai-engineering, anthropic, artificial-intelligence, deep-learning, genai, generative-ai, gpt, large-language-models, llama, llm, llmops, llms, machine-learning, ml, ml-engineering, mlops, openai, python, rust
- Language: Rust
- Homepage: https://tensorzero.com
- Size: 19.7 MB
- Stars: 2,219
- Watchers: 26
- Forks: 123
- Open Issues: 91
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- fucking-awesome-for-beginners - TensorZero - first-issue)_ <br> TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Rust)
- trackawesomelist - TensorZero (⭐793) - first-issue)* <br> TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Recently Updated / [Dec 04, 2024](/content/2024/12/04/README.md))
- awesome-for-beginners - TensorZero - first-issue)_ <br> TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Rust)
- awesome-rust - TensorZero - data & learning flywheel for LLMs that unifies inference, observability, optimization, and experimentation ![TensorZero Build Status](https://img.shields.io/github/check-runs/tensorzero/tensorzero/main) (Applications / MLOps)
- awesome-LLM-resourses - TensorZero
- fucking-awesome-rust - TensorZero - data & learning flywheel for LLMs that unifies inference, observability, optimization, and experimentation ![TensorZero Build Status](https://img.shields.io/github/check-runs/tensorzero/tensorzero/main) (Applications / MLOps)
- awesome-ChatGPT-repositories - tensorzero - TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models. (Langchain)
- StarryDivineSky - tensorzero/tensorzero
README
# TensorZero
**TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.**
1. Integrate our model gateway
2. Send metrics or feedback
3. Optimize prompts, models, and inference strategies
4. Watch your LLMs improve over timeIt provides a **data & learning flywheel for LLMs** by unifying:
- [x] **Inference:** one API for all LLMs, with <1ms P99 overhead
- [x] **Observability:** inference & feedback → your database
- [x] **Optimization:** from prompts to fine-tuning and RL
- [x] **Experimentation:** built-in A/B testing, routing, fallbacks
Website
·
Docs
·
·
Slack
·
Discord
Quick Start (5min)
·
Comprehensive Tutorial
·
Deployment Guide
·
API Reference
·
Configuration Reference## Features
### 🌐 LLM Gateway
> **Integrate with TensorZero once and access every major LLM provider.**
Model Providers
Features
The TensorZero Gateway natively supports:
- Anthropic
- AWS Bedrock
- Azure OpenAI Service
- DeepSeek
- Fireworks
- GCP Vertex AI Anthropic
- GCP Vertex AI Gemini
- Google AI Studio (Gemini API)
- Hyperbolic
- Mistral
- OpenAI
- Together
- vLLM
- xAI
Need something else?
Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama).
The TensorZero Gateway supports advanced features like:
- Retries & Fallbacks
- Inference-Time Optimizations
- Prompt Templates & Schemas
- Experimentation (A/B Testing)
- Configuration-as-Code (GitOps)
- Batch Inference
- Metrics & Feedback
- Multi-Step LLM Workflows (Episodes)
- & a lot more...
The TensorZero Gateway is written in Rust 🦀 with performance in mind (<1ms p99 latency overhead @ 10k QPS).
See Benchmarks.
You can run inference using the TensorZero client (recommended), the OpenAI client, or the HTTP API.
Usage: TensorZero Python Client (Recommended)
You can access any provider using the TensorZero Python client.
1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Optional: Set up the TensorZero configuration.
3. Run inference:
```python
from tensorzero import TensorZeroGateway
with TensorZeroGateway(...) as client:
response = client.inference(
model_name="openai::gpt-4o-mini",
input={
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
]
},
)
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
Usage: OpenAI Python Client
You can access any provider using the OpenAI Python client with TensorZero.
1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Set up the TensorZero configuration.
3. Run inference:
```python
from openai import OpenAI
with OpenAI(base_url="http://localhost:3000/openai/v1") as client:
response = client.chat.completions.create(
model="tensorzero::your_function_name", # defined in configuration (step 2)
messages=[
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
],
)
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
Usage: Other Languages & Platforms (HTTP)
TensorZero supports virtually any programming language or platform via its HTTP API.
1. Deploy `tensorzero/gateway` using Docker.
**[Detailed instructions →](https://www.tensorzero.com/docs/gateway/deployment)**
2. Optional: Set up the TensorZero configuration.
3. Run inference:
```bash
curl -X POST "http://localhost:3000/inference" \
-H "Content-Type: application/json" \
-d '{
"model_name": "openai::gpt-4o-mini",
"input": {
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence."
}
]
}
}'
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
### 📈 LLM Optimization
> **Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.**
#### Model Optimization
Optimize closed-source and open-source models using supervised fine-tuning (SFT) and preference fine-tuning (DPO).
Supervised Fine-tuning — UI
Preference Fine-tuning (DPO) — Jupyter Notebook
#### Inference-Time Optimization
Boost performance by dynamically updating your prompts with relevant examples, combining responses from multiple inferences, and more.
Best-of-N Sampling
Mixture-of-N Sampling
Dynamic In-Context Learning (DICL)
More coming soon...
#### Prompt Optimization
Optimize your prompts programmatically using research-driven optimization techniques.
Today we provide a sample **[integration with DSPy](https://github.com/tensorzero/tensorzero/tree/main/examples/gsm8k-custom-recipe-dspy)**.
_More coming soon..._
### 🔍 LLM Observability
> **Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.**
Observability » Inference
Observability » Function
## Demo
> **Watch LLMs get better at data extraction in real-time with TensorZero!**
>
> **[Dynamic in-context learning (DICL)](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#dynamic-in-context-learning-dicl)** is a powerful inference-time optimization available out of the box with TensorZero.
> It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.
https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb
## LLM Engineering with TensorZero
1. The **[TensorZero Gateway](https://www.tensorzero.com/docs/gateway/)** is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
2. It handles structured schema-based inference with <1ms P99 latency overhead (see **[Benchmarks](https://www.tensorzero.com/docs/gateway/benchmarks)**) and built-in observability, experimentation, and **[inference-time optimizations](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations)**.
3. It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
4. Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
5. Over time, **[TensorZero Recipes](https://www.tensorzero.com/docs/recipes)** leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
6. Finally, the gateway's experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.
Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: systems that learn from real-world experience.
Read more about our **[Vision & Roadmap](https://www.tensorzero.com/docs/vision-roadmap/)**.
## Get Started
**Start building today.**
The **[Quick Start](https://www.tensorzero.com/docs/quickstart)** shows it's easy to set up an LLM application with TensorZero.
If you want to dive deeper, the **[Tutorial](https://www.tensorzero.com/docs/gateway/tutorial)** teaches how to build a simple chatbot, an email copilot, a weather RAG system, and a structured data extraction pipeline.
**Questions?**
Ask us on **[Slack](https://www.tensorzero.com/slack)** or **[Discord](https://www.tensorzero.com/discord)**.
**Using TensorZero at work?**
Email us at **[[email protected]](mailto:[email protected])** to set up a Slack or Teams channel with your team (free).
**Work with us.**
We're **[hiring in NYC](https://www.tensorzero.com/jobs)**.
We'd also welcome **[open-source contributions](https://github.com/tensorzero/tensorzero/blob/main/CONTRIBUTING.md)**!
## Examples
We are working on a series of **complete runnable examples** illustrating TensorZero's data & learning flywheel.
> **[Optimizing Data Extraction (NER) with TensorZero](https://github.com/tensorzero/tensorzero/tree/main/examples/data-extraction-ner)**
>
> This example shows how to use TensorZero to optimize a data extraction pipeline.
> We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL).
> In the end, a optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.
> **[Writing Haikus to Satisfy a Judge with Hidden Preferences](https://github.com/tensorzero/tensorzero/tree/main/examples/haiku-hidden-preferences)**
>
> This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste.
> You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants.
> You'll see progress by fine-tuning the LLM multiple times.
> **[Improving LLM Chess Ability with Best-of-N Sampling](https://github.com/tensorzero/tensorzero/tree/main/examples/chess-puzzles-best-of-n-sampling/)**
>
> This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.
> **[Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)](https://github.com/tensorzero/tensorzero/tree/main/examples/gsm8k-custom-recipe-dspy)**
>
> TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows.
> But you can also easily create your own recipes and workflows!
> This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.
_& many more on the way!_