https://github.com/cgnorthcutt/reliablity_framework_for_rag

Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.
https://github.com/cgnorthcutt/reliablity_framework_for_rag

chatgpt data-cleaning data-curation data-observability data-quality llms observability rag

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/cgnorthcutt/reliablity_framework_for_rag
Owner: cgnorthcutt
License: agpl-3.0
Created: 2024-01-14T05:34:13.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-04-07T21:13:00.000Z (about 2 years ago)
Last Synced: 2024-04-07T22:23:56.444Z (about 2 years ago)
Topics: chatgpt, data-cleaning, data-curation, data-observability, data-quality, llms, observability, rag
Language: Jupyter Notebook
Homepage: https://help.cleanlab.ai/tutorials/tlm/
Size: 18.4 MB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Demo of TLM: The Reliablity Solution for RAG, LLMs, and Data Enrichment

## The main file to look at in this repo is the [tlm_demo_new.ipynb](https://github.com/cgnorthcutt/reliablity_framework_for_rag/blob/main/tlm_demo_new.ipynb)

**News!** I added a new data enrichment and LLM reliability [demo](https://github.com/cgnorthcutt/reliablity_framework_for_rag/blob/main/tlm_demo_new.ipynb). Details:
* Demo showing how Trustworthy Language Model add reliability scores to LLM outputs solving 4 use cases for 4 verticals.
* expect typos and imperfection. For better results and more details, visit [https://help.cleanlab.ai](https://help.cleanlab.ai/tutorials/tlm/)

---

Hacked this together in a couple hours. Shows how Cleanlab TLM can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.

Dataset used for this example: [here](https://huggingface.co/datasets/nguha/legalbench/viewer/international_citizenship_questions/test?row=2).

## Base Open AI LLM versus Cleanlab TLM Performance on the public test set

Note these results were run with the fastest version of the TLM (`quality_preset="low"`) for speed reasons (its a hackaathon demo). For improved results, use `quality_preset="best"`.

* Base Acc (Open-AI GPT-3.5): ~65%
* TLM Acc: 65.5%

* TLM Acc (TLM Confidence > 0.3): 66.2%
* TLM Acc (TLM Confidence > 0.5): 69.9%
* TLM Acc (TLM Confidence > 0.8): 74.0%

* Base (Open-AI GPT-3.5) Acc (TLM Confidence < 0.5): 55.1%

If an expert reviews/corrects the 100 samples with lowest TLM confidence score:

* the resulting accuracy will be: 79%
* compared to the original base acc: 65%

## The TLM (Trustworthy Langauge Model) is available in Cleanlab Studio

* How to use the TLM: https://help.cleanlab.ai/tutorials/tlm/

There's also a (reduced functionality) demo version available here running on free servers: https://cleanlab.ai/tlm

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cgnorthcutt/reliablity_framework_for_rag

Awesome Lists containing this project

README