An open API service indexing awesome lists of open source software.

https://github.com/cleanlab/tlm

Score the trustworthiness of outputs from any LLM in real-time
https://github.com/cleanlab/tlm

ai-agents ai-safety confidence-estimation data-extraction data-labeling error-detection evals evaluation guardrails hallucination hallucination-detection human-in-the-loop-ai llm llm-as-a-judge llm-evaluation rag structured-outputs trustworthy-ai uncertainty-quantification verifiers

Last synced: 3 months ago
JSON representation

Score the trustworthiness of outputs from any LLM in real-time

Awesome Lists containing this project

README

          

# Trustworthy Language Model (TLM)

The [Trustworthy Language Model](https://cleanlab.ai/blog/trustworthy-language-model/) scores the **trustworthiness** of outputs from *any* LLM in *real-time*.

Automatically detect hallucinated/incorrect responses in: Q&A (RAG), Chatbots, Agents, Structured Outputs, Data Extraction, Tool Calling, Classification/Tagging, Data Labeling, and other LLM applications.

Use TLM to:
- Guardrail AI mistakes before they are served to user
- Escalate cases where AI is untrustworthy to humans
- Discover incorrect LLM (or human) generated outputs in datasets/logs
- Boost AI accuracy

Powered by *uncertainty estimation* techniques, TLM **works out of the box**, and does **not** require:

data preparation/labeling work or custom model training/serving infrastructure.

Learn more and see precision/recall benchmarks with frontier models (from OpenAI, Anthropic, Google, etc):

[Blog](https://cleanlab.ai/blog/), [Research Paper](https://aclanthology.org/2024.acl-long.283/)

## Usage

See [notebooks](notebooks) for Jupyter notebooks with example usage.