https://github.com/cgnorthcutt/reliablity_framework_for_rag
Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.
https://github.com/cgnorthcutt/reliablity_framework_for_rag
chatgpt data-cleaning data-curation data-observability data-quality llms observability rag
Last synced: 11 months ago
JSON representation
Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.
- Host: GitHub
- URL: https://github.com/cgnorthcutt/reliablity_framework_for_rag
- Owner: cgnorthcutt
- License: agpl-3.0
- Created: 2024-01-14T05:34:13.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-07T21:13:00.000Z (about 2 years ago)
- Last Synced: 2024-04-07T22:23:56.444Z (about 2 years ago)
- Topics: chatgpt, data-cleaning, data-curation, data-observability, data-quality, llms, observability, rag
- Language: Jupyter Notebook
- Homepage: https://help.cleanlab.ai/tutorials/tlm/
- Size: 18.4 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Demo of TLM: The Reliablity Solution for RAG, LLMs, and Data Enrichment
## The main file to look at in this repo is the [tlm_demo_new.ipynb](https://github.com/cgnorthcutt/reliablity_framework_for_rag/blob/main/tlm_demo_new.ipynb)
**News!** I added a new data enrichment and LLM reliability [demo](https://github.com/cgnorthcutt/reliablity_framework_for_rag/blob/main/tlm_demo_new.ipynb). Details:
* Demo showing how Trustworthy Language Model add reliability scores to LLM outputs solving 4 use cases for 4 verticals.
* expect typos and imperfection. For better results and more details, visit [https://help.cleanlab.ai](https://help.cleanlab.ai/tutorials/tlm/)
---
Hacked this together in a couple hours. Shows how Cleanlab TLM can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.

Dataset used for this example: [here](https://huggingface.co/datasets/nguha/legalbench/viewer/international_citizenship_questions/test?row=2).
## Base Open AI LLM versus Cleanlab TLM Performance on the public test set
Note these results were run with the fastest version of the TLM (`quality_preset="low"`) for speed reasons (its a hackaathon demo). For improved results, use `quality_preset="best"`.
* Base Acc (Open-AI GPT-3.5): ~65%
* TLM Acc: 65.5%
* TLM Acc (TLM Confidence > 0.3): 66.2%
* TLM Acc (TLM Confidence > 0.5): 69.9%
* TLM Acc (TLM Confidence > 0.8): 74.0%
* Base (Open-AI GPT-3.5) Acc (TLM Confidence < 0.5): 55.1%
If an expert reviews/corrects the 100 samples with lowest TLM confidence score:
* the resulting accuracy will be: 79%
* compared to the original base acc: 65%
## The TLM (Trustworthy Langauge Model) is available in Cleanlab Studio
* How to use the TLM: https://help.cleanlab.ai/tutorials/tlm/
There's also a (reduced functionality) demo version available here running on free servers: https://cleanlab.ai/tlm