{"id":26896946,"url":"https://github.com/KRLabsOrg/LettuceDetect","last_synced_at":"2025-04-01T04:02:30.235Z","repository":{"id":276029764,"uuid":"927920548","full_name":"KRLabsOrg/LettuceDetect","owner":"KRLabsOrg","description":"LettuceDetect is a hallucination detection framework for RAG applications.","archived":false,"fork":false,"pushed_at":"2025-03-24T13:27:35.000Z","size":2326,"stargazers_count":194,"open_issues_count":6,"forks_count":17,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-26T12:37:47.711Z","etag":null,"topics":["bert","hallucination-detection","hallucination-evaluation","information-extraction","nlp","python","pytorch","token-classification"],"latest_commit_sha":null,"homepage":"http://krlabs.eu/LettuceDetect/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KRLabsOrg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-05T19:00:49.000Z","updated_at":"2025-03-26T04:28:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"9fa99ef0-14f8-4236-9282-1dea364b610f","html_url":"https://github.com/KRLabsOrg/LettuceDetect","commit_stats":null,"previous_names":["krlabsorg/lettucedetect"],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FLettuceDetect","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FLettuceDetect/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FLettuceDetect/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FLettuceDetect/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KRLabsOrg","download_url":"https://codeload.github.com/KRLabsOrg/LettuceDetect/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246580468,"owners_count":20800111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","hallucination-detection","hallucination-evaluation","information-extraction","nlp","python","pytorch","token-classification"],"created_at":"2025-04-01T04:02:24.650Z","updated_at":"2025-04-01T04:02:30.227Z","avatar_url":"https://github.com/KRLabsOrg.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","Python"],"sub_categories":["大语言对话模型及数据"],"readme":"# LettuceDetect 🥬🔍\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/KRLabsOrg/LettuceDetect/blob/main/assets/lettuce_detective.png?raw=true\" alt=\"LettuceDetect Logo\" width=\"400\"/\u003e\n  \u003cbr\u003e\u003cem\u003eBecause even AI needs a reality check! 🥬\u003c/em\u003e\n\u003c/p\u003e\n\nLettuceDetect is a lightweight and efficient tool for detecting hallucinations in Retrieval-Augmented Generation (RAG) systems. It identifies unsupported parts of an answer by comparing it to the provided context. The tool is trained and evaluated on the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset and leverages [ModernBERT](https://github.com/AnswerDotAI/ModernBERT) for long-context processing, making it ideal for tasks requiring extensive context windows.\n\nOur models are inspired from the [Luna](https://aclanthology.org/2025.coling-industry.34/) paper which is an encoder-based model and uses a similar token-level approach.\n\n[![PyPI](https://img.shields.io/pypi/v/lettucedetect)](https://pypi.org/project/lettucedetect/)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Hugging Face](https://img.shields.io/badge/🤗-Models-yellow.svg)](https://huggingface.co/KRLabsOrg)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubca5aMaBGdHtJ1rpqj3Ke9SLEr-PaDn?usp=sharing)\n[![arXiv](https://img.shields.io/badge/arXiv-2502.17125-b31b1b.svg)](https://arxiv.org/abs/2502.17125)\n\n## Highlights\n\n- LettuceDetect addresses two critical limitations of existing hallucination detection models:\n  - Context window constraints of traditional encoder-based methods\n  - Computational inefficiency of LLM-based approaches\n- Our models currently **outperforms** all other encoder-based and prompt-based models on the RAGTruth dataset and are significantly faster and smaller \n- Achieves higher score than some fine-tuned LLMs e.g. LLAMA-2-13B presented in [RAGTruth](https://aclanthology.org/2024.acl-long.585/), coming up just short of the LLM fine-tuned in the [RAG-HAT paper](https://aclanthology.org/2024.emnlp-industry.113.pdf)\n- We release the code, the model and the tool under the **MIT license**\n\n## Get going  \n\n### Features\n\n- ✨ **Token-level precision**: detect exact hallucinated spans\n- 🚀 **Optimized for inference**: smaller model size and faster inference\n- 🧠 **4K context window** via ModernBERT\n- ⚖️ **MIT-licensed** models \u0026 code\n- 🤖 **HF Integration**: one-line model loading\n- 📦 **Easy to use python API**: can be downloaded from pip and few lines of code to integrate into your RAG system\n\n### Installation\n\nInstall from the repository:\n```bash\npip install -e .\n```\n\nFrom pip:\n```bash\npip install lettucedetect\n```\n\n### Quick Start\n\nCheck out our models published to Huggingface: \n- lettucedetect-base: https://huggingface.co/KRLabsOrg/lettucedect-base-modernbert-en-v1\n- lettucedetect-large: https://huggingface.co/KRLabsOrg/lettucedect-large-modernbert-en-v1\n\nYou can get started right away with just a few lines of code.\n\n```python\nfrom lettucedetect.models.inference import HallucinationDetector\n\n# For a transformer-based approach:\ndetector = HallucinationDetector(\n    method=\"transformer\", model_path=\"KRLabsOrg/lettucedect-base-modernbert-en-v1\"\n)\n\ncontexts = [\"France is a country in Europe. The capital of France is Paris. The population of France is 67 million.\",]\nquestion = \"What is the capital of France? What is the population of France?\"\nanswer = \"The capital of France is Paris. The population of France is 69 million.\"\n\n# Get span-level predictions indicating which parts of the answer are considered hallucinated.\npredictions = detector.predict(context=contexts, question=question, answer=answer, output_format=\"spans\")\nprint(\"Predictions:\", predictions)\n\n# Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9944414496421814, 'text': ' The population of France is 69 million.'}]\n```\n\n## Performance\n\n**Example level results**\n\nWe evaluate our model on the test set of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. Our large model, **lettucedetect-large-v1**, achieves an overall F1 score of 79.22%, outperforming prompt-based methods like GPT-4 (63.4%) and encoder-based models like [Luna](https://aclanthology.org/2025.coling-industry.34.pdf) (65.4%). It also surpasses fine-tuned LLAMA-2-13B (78.7%) (presented in [RAGTruth](https://aclanthology.org/2024.acl-long.585/)) and is competitive with the SOTA fine-tuned LLAMA-3-8B (83.9%) (presented in the [RAG-HAT paper](https://aclanthology.org/2024.emnlp-industry.113.pdf)). Overall, **lettucedetect-large-v1** and **lettucedect-base-v1** are very performant models, while being very effective in inference settings.\n\nThe results on the example-level can be seen in the table below.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/KRLabsOrg/LettuceDetect/blob/main/assets/example_level_lettucedetect.png?raw=true\" alt=\"Example-level Results\" width=\"800\"/\u003e\n\u003c/p\u003e\n\n**Span-level results**\n\nAt the span level, our model achieves the best scores across all data types, significantly outperforming previous models. The results can be seen in the table below. Note that here we don't compare to models, like [RAG-HAT](https://aclanthology.org/2024.emnlp-industry.113.pdf), since they have no span-level evaluation presented.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/KRLabsOrg/LettuceDetect/blob/main/assets/span_level_lettucedetect.png?raw=true\" alt=\"Span-level Results\" width=\"800\"/\u003e\n\u003c/p\u003e\n\n\n## How does it work?\n\nThe model is a token-level model that predicts whether a token is hallucinated or not. The model is trained to predict the tokens that are hallucinated in the answer given the context and the question.\n\n```mermaid\nflowchart LR\n    subgraph Inputs\n        Context[\"**Context**: France is a country in Europe. Population is 67 million.\"]\n        Question[\"**Question**: What is the capital? What is the population?\"]\n        Answer[\"**Answer**: The capital of France is Paris. The population is 69 million.\"]\n    end\n\n    Model[\"**LettuceDetect**: Token Classification\"]\n    Tokens[\"**Token Probabilities**: \u003cbr\u003e ... \u003cbr\u003e The [0.01] \u003cbr\u003e population [0.02] \u003cbr\u003e is [0.01] \u003cbr\u003e 69 [0.95] \u003cbr\u003e million [0.95]\"]\n\n    Context --\u003e Model\n    Question --\u003e Model\n    Answer --\u003e Model\n    Model --\u003e Tokens\n\n```\n\n### Training a Model\n\nYou need to download the RAGTruth dataset first from [here](https://github.com/ParticleMedia/RAGTruth/tree/main/dataset), then put it under the `data/ragtruth` directory. Then run\n\n```bash\npython lettucedetect/preprocess/preprocess_ragtruth.py --input_dir data/ragtruth --output_dir data/ragtruth\n```\n\nThis will create a `data/ragtruth/ragtruth_data.json` file which contains the processed data.\n\nThen you can train the model with the following command.\n\n```bash\npython scripts/train.py \\\n    --ragtruth-path data/ragtruth/ragtruth_data.json \\\n    --model-name answerdotai/ModernBERT-base \\\n    --output-dir output/hallucination_detector \\\n    --batch-size 4 \\\n    --epochs 6 \\\n    --learning-rate 1e-5 \n```\n\nWe trained our models for 6 epochs with a batch size of 8 on a single A100 GPU.\n\n### Evaluation\n\nYou can evaluate the models on each level (example, token and span) and each data-type.\n\n```bash\npython scripts/evaluate.py \\\n    --model_path outputs/hallucination_detector \\\n    --data_path data/ragtruth/ragtruth_data.json \\\n    --evaluation_type example_level\n```\n\n### Model Output Format\n\nThe model can output predictions in two formats:\n\n#### Span Format\n```python\n[{\n    'text': str,        # The hallucinated text\n    'start': int,       # Start position in answer\n    'end': int,         # End position in answer\n    'confidence': float # Model's confidence (0-1)\n}]\n```\n\n### Token Format\n```python\n[{\n    'token': str,       # The token\n    'pred': int,        # 0: supported, 1: hallucinated\n    'prob': float       # Model's confidence (0-1)\n}]\n```\n\n## Streamlit Demo\n\nCheck out the Streamlit demo to see the model in action.\n\nInstall streamlit:\n\n```bash\npip install streamlit\n```\n\nRun the demo:\n\n```bash\nstreamlit run demo/streamlit_demo.py\n```\n\n## Use the Web API\n\nLettuceDetect comes with it's own web API and python client library. To use it, make sure to install the package with the optional API dependencies:\n\n```bash\npip install -e .[api]\n```\n\nor\n\n```bash\npip install lettucedetect[api]\n```\n\nStart the API server with the `scripts/start_api.py` script:\n\n```bash\npython scripts/start_api.py dev  # use \"prod\" for production environments\n```\n\nUsage:\n\n```bash\nusage: start_api.py [-h] [--model MODEL] [--method {transformer}] {prod,dev}\n\nStart lettucedetect Web API.\n\npositional arguments:\n  {prod,dev}            Choose \"dev\" for development or \"prod\" for production\n                        environments. The serve script uses \"fastapi dev\" for \"dev\" or\n                        \"fastapi run\" for \"prod\" to start the web server. Additionally\n                        when choosing the \"dev\" mode, python modules can be directly\n                        imported from the repositroy without installing the package.\n\noptions:\n  -h, --help            show this help message and exit\n  --model MODEL         Path or huggingface URL to the model. The default value is\n                        \"KRLabsOrg/lettucedect-base-modernbert-en-v1\".\n  --method {transformer}\n                        Hallucination detection method. The default value is\n                        \"transformer\".\n````\n\nExample using the python client library:\n\n```python\nfrom lettucedetect_api.client import LettuceClient\n\ncontexts = [\n    \"France is a country in Europe. \"\n    \"The capital of France is Paris. \"\n    \"The population of France is 67 million.\",\n]\nquestion = \"What is the capital of France? What is the population of France?\"\nanswer = \"The capital of France is Paris. The population of France is 69 million.\"\n\nclient = LettuceClient(\"http://127.0.0.1:8000\")\nresponse = client.detect_spans(contexts, question, answer)\nprint(response.predictions)\n\n# [SpanDetectionItem(start=31, end=71, text=' The population of France is 69 million.', hallucination_score=0.989198625087738)]\n```\n\nSee `demo/detection_api.ipynb` for more examples.\nFor async support use the `LettuceClientAsync` class instead.\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Citation\n\nPlease cite the following paper if you use LettuceDetect in your work:\n\n```bibtex\n@misc{Kovacs:2025,\n      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, \n      author={Ádám Kovács and Gábor Recski},\n      year={2025},\n      eprint={2502.17125},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2502.17125}, \n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKRLabsOrg%2FLettuceDetect","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FKRLabsOrg%2FLettuceDetect","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKRLabsOrg%2FLettuceDetect/lists"}