{"id":25856551,"url":"https://github.com/krlabsorg/ragfactchecker","last_synced_at":"2026-06-14T07:32:07.191Z","repository":{"id":274075530,"uuid":"921129374","full_name":"KRLabsOrg/RAGFactChecker","owner":"KRLabsOrg","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-30T12:40:53.000Z","size":150,"stargazers_count":7,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-03-18T12:59:40.863Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KRLabsOrg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-23T11:48:02.000Z","updated_at":"2026-01-15T21:33:43.000Z","dependencies_parsed_at":"2025-07-26T22:23:49.385Z","dependency_job_id":"2a334faa-90e7-4b32-9570-33da1e4f7dc4","html_url":"https://github.com/KRLabsOrg/RAGFactChecker","commit_stats":null,"previous_names":["krlabsorg/ragfactchecker"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/KRLabsOrg/RAGFactChecker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FRAGFactChecker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FRAGFactChecker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FRAGFactChecker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FRAGFactChecker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KRLabsOrg","download_url":"https://codeload.github.com/KRLabsOrg/RAGFactChecker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2FRAGFactChecker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34313515,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-01T18:18:59.461Z","updated_at":"2026-06-14T07:32:07.179Z","avatar_url":"https://github.com/KRLabsOrg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rag Fact Checking System\n\nA Python library for validating the factual accuracy of Large Language Model (LLM) responses against their source documents in Retrieval-Augmented Generation (RAG) systems. The raw textual inputs are converted into triplets (subject, predicate, object) to represent the sentences and then fact-checked against the reference documents.\n\nThis library offers you sentence-level fact checking granularity, with the possibility to extract the exact incorrect triplet from an LLM response.\n\nThe generic way of how it works is as follows:\n- Generate the triplets from the answer of the LLM\n- Generate the triplets from the reference documents\n- Compare the triplets from the LLM answer with the triplets from the reference documents\n- If the triplets from the LLM answer are not present in the reference documents, it is a hallucination\n\n\n**NB**: This fact-checking system was built and validated in https://huggingface.co/datasets/rag-datasets/rag-mini-bioasq dataset, specifically on the Thyroid topic. There is no guarantee that it will work for other datasets and might need more adjustments to be used for other datasets.\n\n## Main Features\n\nThere are a couple of separate components in this library:\n\n- **Triplet Generator**: Extracts factual relationships from the input text in the form of triplets, which consist of a subject, predicate, and object. This is done using LLMs.\n- **Fact Checker**: Compares these triplets with those from a reference text to determine if they match (true/false).\n- **Hallucination Generator**: An LLM generator that generates hallucinated triplets from the reference documents. This is done in order to synthetically generate a dataset of hallucinated triplets.\n- **Answer-Based Hallucination Generator**: Generates controlled hallucinations by taking correct answers and injecting specific types of errors (factual, temporal, numerical, relational, contextual, omission) with configurable intensity levels.\n- **Batch Processing**: All components support concurrent processing of multiple items using threading for improved performance.\n- **Async Support**: Full asynchronous support with semaphore-based concurrency limiting for non-blocking operations.\n\n## Installation\n\nThis is a pip installable package. To install it, run the following command:\n\n```bash\npip install git+https://github.com/KRLabsOrg/RAGFactChecker.git@main\n```\n\t\n## Usage\n\nBelow is a sample that shows how to import and use the `LLMTripletValidator` class to execute fact checking. The inputs are:\n- `input_text`: The answer from the LLM\n- `reference_text`: The reference documents which were fed to the LLM from the RAG system\n\n```python\nimport os\nfrom rag_fact_checker import LLMTripletValidator\napi_key = os.getenv(\"OPENAI_API_KEY\", \"your_openai_api_key\")\n\ntriplet_validator = LLMTripletValidator(\n  input_config = {\"triplet_generator\": \"llm_n_shot\", \"fact_checker\": \"llm_n_shot\"},\n  openai_api_key=api_key\n)\n\nresults = triplet_validator.validate_llm_triplets(\n   input_text=\"The sky is green\", \n   reference_text=[\"The sky is blue and the grass is green\"]\n)\n```\n\nOutput:\n```\nDirectTextMatchOutput(\n    input_triplets=[['The sky', 'is', 'green']], \n    reference_triplets=[['The sky', 'is', 'blue'], ['The grass', 'is', 'green']], \n    fact_check_prediction_binary={0: False} # this means that input triplet with idx 0 is incorrect\n)\n```\n\n\n## Additional usages\n\nBesides being used for fact-checking purposes only, the library can also be used for Information Extraction purposes - namely, given some text, to extract the triplets from it (all using LLMs).\n\n```python\nimport os\nfrom rag_fact_checker import LLMTripletValidator\napi_key = os.getenv(\"OPENAI_API_KEY\", \"your_openai_api_key\")\n\n\ntriplet_validator = LLMTripletValidator(\n    input_config = {\"triplet_generator\": \"llm_n_shot\", \"fact_checker\": \"llm_n_shot\"},\n    openai_api_key=api_key\n)\n\nresults = triplet_validator.triplet_generation(\n    input_text=\"The sky is green and the Eiffel Tower is blue. The Eiffel Tower is in Paris.\"\n)\n```\n\nAnother usecase is to use it for synthetic hallucinated data generation.\n\n```python\nimport os\nfrom rag_fact_checker import LLMTripletValidator\napi_key = os.getenv(\"OPENAI_API_KEY\", \"your_openai_api_key\")\n\ntriplet_validator = LLMTripletValidator(\n    input_config = {\"triplet_generator\": \"llm_n_shot\", \"fact_checker\": \"llm_n_shot\"},\n    openai_api_key=api_key\n)\n\nresults = triplet_validator.generate_hlcntn_data(\n    question=\"Which genes does thyroid hormone receptor beta1 regulate in the liver?\",\n    reference_text=[\"The carbohydrate response element-binding protein (ChREBP) and sterol response element-binding protein (SREBP)-1c, regulated by liver X receptors (LXRs), play central roles in hepatic lipogenesis. Because LXRs and thyroid hormone receptors (TRs) influence each other's transcriptional activity, researchers investigated whether TRs control ChREBP expression. They found that thyroid hormone (T3) and TR-beta1 upregulate ChREBP by binding direct repeat-4 elements (LXRE1/2), thereby fine-tuning hepatic lipid metabolism.\"]\n)\n```\n\n### Answer-Based Hallucination Generation\n\nFor generating controlled hallucinations from correct answers:\n\n```python\nfrom rag_fact_checker.model.hallucination_data_generator import AnswerBasedHallucinationDataGenerator\nfrom rag_fact_checker.data import Config, ErrorType\nimport logging\n\nconfig = Config()\nlogger = logging.getLogger(__name__)\ngenerator = AnswerBasedHallucinationDataGenerator(config, logger)\n\n# Generate hallucination with specific error type and intensity\nresult = generator.generate_hallucination(\n    correct_answer=\"Paris is the capital of France and has a population of 2.1 million.\",\n    error_type=ErrorType.FACTUAL,\n    intensity=0.7\n)\nprint(result.hallucinated_answer)  # Answer with injected factual errors\n```\n\n### Batch Processing\n\nProcess multiple items concurrently:\n\n```python\n# Batch triplet generation\ninput_texts = [\"Text 1\", \"Text 2\", \"Text 3\"]\nbatch_result = triplet_generator.forward_batch(input_texts)\n\n# Batch fact checking\nanswer_triplets_batch = [[[[\"Subject\", \"predicate\", \"object\"]], ...]]\nreference_triplets_batch = [[[[\"Ref\", \"predicate\", \"object\"]], ...]]\nfact_check_result = fact_checker.forward_batch(answer_triplets_batch, reference_triplets_batch)\n\n# Async batch processing\nimport asyncio\nasync def process_async():\n    result = await triplet_generator.forward_batch_async(input_texts)\n    return result\n```\n\n\n## Configuration\n\nWhen creating the LLMTripletValidator, you can pass a dict to input_config to override default settings.\nFor example, to customize model names, logging level, etc.:\n\n```python\ncustom_config = {\n    \"model\": {\n        \"triplet_generator\": {\n            \"model_name\": \"llm_n_shot\"\n        },\n        \"fact_checker\": {\n            \"model_name\": \"llm\"\n        }\n    },\n    \"logger_level\": \"DEBUG\"\n}\n```\n\n### Available Models\n- **Triplet generator**: \"llm\", \"llm_n_shot\" \n- **Fact checker**: \"llm\", \"llm_split\", \"llm_n_shot\", \"llm_n_shot_split\"\n- **Hallucination generator**: \"llm\", \"llm_n_shot\", \"answer_based\"\n- **Logger level**: \"DEBUG\", \"INFO\", \"WARNING\", \"ERROR\", \"CRITICAL\", \"NOTSET\"\n\n### Batch Processing Configuration\n```python\nbatch_config = {\n    \"simple_batch_config\": {\n        \"max_workers\": 5,          # Number of concurrent threads\n        \"max_concurrent\": 10,      # Max concurrent async operations\n        \"timeout\": 30.0           # Timeout per operation in seconds\n    }\n}\n```\n\n### Error Types for Answer-Based Hallucination\n- **FACTUAL**: Incorrect facts and information\n- **TEMPORAL**: Wrong dates, times, or temporal relationships  \n- **NUMERICAL**: Incorrect numbers, quantities, measurements\n- **RELATIONAL**: Wrong relationships between entities\n- **CONTEXTUAL**: Information out of context or misapplied\n- **OMISSION**: Missing critical information\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrlabsorg%2Fragfactchecker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrlabsorg%2Fragfactchecker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrlabsorg%2Fragfactchecker/lists"}