{"id":50668345,"url":"https://github.com/humankernel/rag-eval","last_synced_at":"2026-06-08T08:09:16.952Z","repository":{"id":288852648,"uuid":"953619798","full_name":"humankernel/rag-eval","owner":"humankernel","description":"Create syntetic datasets for RAG evaluation ","archived":false,"fork":false,"pushed_at":"2025-11-03T23:23:03.000Z","size":1742,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-04T01:16:56.189Z","etag":null,"topics":["gradio","rag-metrics","ragas-evaluation","vllm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/humankernel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-23T19:12:38.000Z","updated_at":"2025-11-03T23:23:39.000Z","dependencies_parsed_at":"2025-04-20T01:26:20.420Z","dependency_job_id":"27b86e76-c52f-4ac2-b8e2-abdd420498da","html_url":"https://github.com/humankernel/rag-eval","commit_stats":null,"previous_names":["humankernel/rag-eval"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/humankernel/rag-eval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humankernel%2Frag-eval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humankernel%2Frag-eval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humankernel%2Frag-eval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humankernel%2Frag-eval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/humankernel","download_url":"https://codeload.github.com/humankernel/rag-eval/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humankernel%2Frag-eval/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34053650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gradio","rag-metrics","ragas-evaluation","vllm"],"created_at":"2026-06-08T08:07:44.944Z","updated_at":"2026-06-08T08:09:16.937Z","avatar_url":"https://github.com/humankernel.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧠 WikiQA Dataset Creator\n\n![screenshot](paper/ui.png)\n\n**WikiQA** is a tool for generating **synthetic question–answer datasets** using **Wikipedia** and **Large Language Models (LLMs)**.\nIt was developed to support the evaluation of **Retrieval-Augmented Generation (RAG)** systems, particularly [this RAG evaluator](https://github.com/humankernel/rag-revamped).\n\n## 📚 Selected Wikipedia Topics\n\n### 🧮 Mathematics\n\n* [Prime Numbers](https://en.wikipedia.org/wiki/Prime_number)\n* [Linear Algebra](https://en.wikipedia.org/wiki/Linear_algebra)\n* [Calculus](https://en.wikipedia.org/wiki/Calculus)\n* [Probability](https://en.wikipedia.org/wiki/Probability)\n\n### 💻 Computer Science\n\n* [Algorithm](https://en.wikipedia.org/wiki/Algorithm)\n* [Data Structure](https://en.wikipedia.org/wiki/Data_structure)\n* [Artificial Intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)\n* [Computer Programming](https://en.wikipedia.org/wiki/Computer_programming)\n\n### 🧬 Biology\n\n* [Cell (biology)](https://en.wikipedia.org/wiki/Cell_%28biology%29)\n* [Genetics](https://en.wikipedia.org/wiki/Genetics)\n* [Evolution](https://en.wikipedia.org/wiki/Evolution)\n* [Ecology](https://en.wikipedia.org/wiki/Ecology)\n\n### ⚛️ Physics\n\n* [Classical Mechanics](https://en.wikipedia.org/wiki/Classical_mechanics)\n* [Electromagnetism](https://en.wikipedia.org/wiki/Electromagnetism)\n* [Quantum Mechanics](https://en.wikipedia.org/wiki/Quantum_mechanics)\n* [Thermodynamics](https://en.wikipedia.org/wiki/Thermodynamics)\n\n### 🌍 General Topics\n\n* [Batman](https://en.wikipedia.org/wiki/Batman)\n* [Dachshund](https://en.wikipedia.org/wiki/Dachshund)\n* [Conspiracy Theory](https://en.wikipedia.org/wiki/Conspiracy_theory)\n* [Religion](https://en.wikipedia.org/wiki/Religion)\n\n## 🧩 Question Types\n\nEach dataset entry belongs to one of several **cognitive and reasoning categories**, enabling targeted evaluation of RAG models:\n\n1. ✅ **Factual** – objective, verifiable facts.\n2. 🔗 **Multi-Hop** – multi-step reasoning or combined facts.\n3. 🧠 **Semantic** – interpretation and meaning of concepts.\n4. ⚙️ **Logical Reasoning** – applying formal rules or laws.\n5. 💡 **Creative Thinking** – open-ended or hypothetical reasoning.\n6. 📏 **Problem-Solving** – applying formulas or methods to compute results.\n7. ⚖️ **Ethical \u0026 Philosophical** – moral or conceptual reflection on science.\n\nEach question type is designed to stress different aspects of retrieval and generation in RAG systems.\n\n## 📊 Evaluation Metrics\n\nAlthough WikiQA only generates datasets, it is designed around **RAG evaluation metrics** (see [Key Metrics and Evaluation Methods for RAG](https://www.youtube.com/watch?v=cRz0BWkuwHg)).\n\n### 🔍 Retrieval Metrics\n\n| Metric                           | Measures                    | Description                                           |\n| -------------------------------- | --------------------------- | ----------------------------------------------------- |\n| **Precision**                    | Relevance of retrieved docs | Fraction of retrieved documents that are relevant     |\n| **Recall**                       | Coverage of relevant docs   | Fraction of relevant documents that were retrieved    |\n| **Hit Rate**                     | Top-result success          | % of queries retrieving ≥1 relevant doc in top-k      |\n| **MRR (Mean Reciprocal Rank)**   | Top result position         | Measures how high the first relevant doc ranks        |\n| **NDCG**                         | Ranking quality             | Evaluates both relevance and order of retrieved docs  |\n| **MAP (Mean Average Precision)** | Overall retrieval accuracy  | Averages precision over all relevant docs and queries |\n\n### ✍️ Generation Metrics\n\n| Metric                 | Measures                              | Example                                              |\n| ---------------------- | ------------------------------------- | ---------------------------------------------------- |\n| **Faithfulness**       | Factual consistency with context      | “Einstein was born in Germany on March 14, 1879.”    |\n| **Answer Relevance**   | How well the answer fits the question | Adds missing but relevant info like France → “Paris” |\n| **Answer Correctness** | Alignment with ground truth           | Matches true reference answer accurately             |\n\n## ⚙️ Example Use Case\n\nThis tool can be used to:\n\n* Build **synthetic QA datasets** for RAG benchmark testing.\n* Evaluate the **retrieval** and **generation** quality of LLM-based systems.\n* Train or fine-tune **retrieval models** on domain-specific scientific content.\n\n## 🧠 Related Projects\n\n* 🔗 **RAG Evaluator:** [humankernel/rag-revamped](https://github.com/humankernel/rag-revamped)\n* 🧾 **Undergraduate Thesis:** [humankernel/thesis](https://humankernel.github.io/thesis/main.pdf)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumankernel%2Frag-eval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhumankernel%2Frag-eval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumankernel%2Frag-eval/lists"}