An open API service indexing awesome lists of open source software.

https://github.com/cgnorthcutt/reliablity_framework_for_rag

Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.
https://github.com/cgnorthcutt/reliablity_framework_for_rag

chatgpt data-cleaning data-curation data-observability data-quality llms observability rag

Last synced: 11 months ago
JSON representation

Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.

Awesome Lists containing this project

README

          

# Demo of TLM: The Reliablity Solution for RAG, LLMs, and Data Enrichment

## The main file to look at in this repo is the [tlm_demo_new.ipynb](https://github.com/cgnorthcutt/reliablity_framework_for_rag/blob/main/tlm_demo_new.ipynb)

**News!** I added a new data enrichment and LLM reliability [demo](https://github.com/cgnorthcutt/reliablity_framework_for_rag/blob/main/tlm_demo_new.ipynb). Details:
* Demo showing how Trustworthy Language Model add reliability scores to LLM outputs solving 4 use cases for 4 verticals.
* expect typos and imperfection. For better results and more details, visit [https://help.cleanlab.ai](https://help.cleanlab.ai/tutorials/tlm/)

---

Hacked this together in a couple hours. Shows how Cleanlab TLM can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.

image

Dataset used for this example: [here](https://huggingface.co/datasets/nguha/legalbench/viewer/international_citizenship_questions/test?row=2).

## Base Open AI LLM versus Cleanlab TLM Performance on the public test set

Note these results were run with the fastest version of the TLM (`quality_preset="low"`) for speed reasons (its a hackaathon demo). For improved results, use `quality_preset="best"`.

* Base Acc (Open-AI GPT-3.5): ~65%
* TLM Acc: 65.5%

* TLM Acc (TLM Confidence > 0.3): 66.2%
* TLM Acc (TLM Confidence > 0.5): 69.9%
* TLM Acc (TLM Confidence > 0.8): 74.0%

* Base (Open-AI GPT-3.5) Acc (TLM Confidence < 0.5): 55.1%

If an expert reviews/corrects the 100 samples with lowest TLM confidence score:

* the resulting accuracy will be: 79%
* compared to the original base acc: 65%

## The TLM (Trustworthy Langauge Model) is available in Cleanlab Studio

* How to use the TLM: https://help.cleanlab.ai/tutorials/tlm/

There's also a (reduced functionality) demo version available here running on free servers: https://cleanlab.ai/tlm