{"id":18429088,"url":"https://github.com/lancedb/ragged","last_synced_at":"2025-07-30T15:04:21.055Z","repository":{"id":239899898,"uuid":"799605968","full_name":"lancedb/ragged","owner":"lancedb","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-14T10:38:38.000Z","size":70,"stargazers_count":19,"open_issues_count":0,"forks_count":3,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-07T17:41:40.580Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lancedb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-12T16:31:15.000Z","updated_at":"2025-03-27T06:07:54.000Z","dependencies_parsed_at":"2024-05-20T10:47:36.292Z","dependency_job_id":"8be0c7bf-c391-4b07-a6b3-f5e8cf33a600","html_url":"https://github.com/lancedb/ragged","commit_stats":null,"previous_names":["lancedb/ragged"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lancedb/ragged","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fragged","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fragged/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fragged/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fragged/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lancedb","download_url":"https://codeload.github.com/lancedb/ragged/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fragged/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267889175,"owners_count":24161144,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T05:15:47.238Z","updated_at":"2025-07-30T15:04:21.005Z","avatar_url":"https://github.com/lancedb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ragged\n\nSimple utilities for piece-wise evaluation of LLM based chat and retrieval systems\n\n### Setup\nBuild from source\n```\npip install -e .\n```\n\n## GUI quickstart \n### VectorDB retrieval eval\n```\nragged --quickstart vectordb\n```\n\u003cdetails open\u003e\n  \u003csummary\u003eDemo\u003c/summary\u003e\n  \u003cimg src=\"https://github.com/lancedb/ragged/assets/15766192/ab1313ef-04f5-461e-8429-28a6d0bdc13c\" width=550 height=600 /\u003e\n\n\u003c/details\u003e\n\n### Dataset Quality eval [Coming soon]\n\n### End-to-End RAG eval [Coming soon]\n\n\n## API Usage\n### VectorDB retrieval eval\n```python\nfrom ragged.dataset import LlamaIndexDataset\nfrom ragged.metrics.retriever import HitRate\nfrom ragged.search_utils import QueryType\nfrom lancedb.rerankers import CrossEncoderReranker\n\n# 1. Select dataset\n# Automatically download the dataset from llama-hub or pass existing path=\"/path/to/dataset\"\ndataset = LlamaIndexDataset(\"Uber10KDataset2021\")\n\n# 2. Select eval metrics\nhit_rate = HitRate(\n            dataset,\n            embedding_registry_id=\"sentence-transformers\",\n            embed_model_kwarg={\"name\":\"BAAI/bge-small-en-v1.5\"},\n            reranker=CohereReranker(),\n            )\n\n# 3. Evaluate on desired query types\n\n#print(hit_rate.evaluate(top_k=5, query_type=QueryType.VECTOR)) # Evaluate vector search\nprint(hit_rate.evaluate(top_k=5, query_type=\"all\")) # Evaliate all possible query types\n```\n### Evaluate across various query types and Rerankers\n```\nfrom ragged.dataset import CSVDataset, SquadDataset\nfrom ragged.rag import llamaIndexRAG\nfrom ragged.metrics.retriever.hit_rate import HitRate\nfrom lancedb.rerankers import LinearCombinationReranker\nfrom ragged.search_utils import QueryType\nimport wandb\n\ndataset = SquadDataset()\nreranker = LinearCombinationReranker()\nhit_rate = HitRate(dataset, embedding_registry_id=\"sentence-transformers\", embed_model_kwarg={\"name\": \"tuned_model_4\", \"device\": \"cuda\"})\n\nquery_types = [QueryType.VECTOR]\nuse_existing_table = False\nfor query_type in query_types:\n    run = wandb.init(project=\"ragged_bench\", name=f\"Base_4\")\n    hr = hit_rate.evaluate(5, query_type=query_type, use_existing_table=use_existing_table)\n    run.log({f\"{query_type}\": hr.model_dump()[f\"{query_type}\"]})\n    use_existing_table = True\n\nwandb.finish()\n```\n\n### Generate a custom semantic search dataset\nMost of popular toy datasets are not semantically challenging enough to evaluate the performance of LLM based retrieval systems. Most of them work well with simple BM25 based retrieval systems. To generate a custom dataset, that is semantically challenging, you can use the following code snippet.\nNOTE: `directory` can contain pdfs, txt files or any other file format that can be handled by Llama-index directory reader.\n```python\nfrom ragged.dataset.gen.gen_retrieval_data import gen_query_context_dataset\nfragged.dataset.gen.llm_calls import OpenAIInferenceClient\n\nclinet = OpenAIInferenceClient()\ndf = gen_query_context_dataset(directory=\"data/source_files\", inference_client=clinet)\n\nprint(df.head())\n# save the dataframe\ndf.to_csv(\"data.csv\")\n```\n\nNow, you can evaluate this dataset using the `ragged --quickstart vectordb` GUI or via the API:\n```python\nfrom ragged.dataset.csv import CSVDataset\nfrom ragged.metrics.retriever import HitRate\nfrom lancedb.rerankers import CohereReranker\n\ndata = CSVDataset(path=\"data.csv\")\nreranker = CohereReranker()\n\nhit_rate = HitRate(data, reranker=reranker, embedding_registry_id=\"openai\", embed_model_kwarg={\"model\":\"text-embedding-3-small\"})\nres = hit_rate.evaluate(top_k=5, query_type=\"all\")\nprint(res)\n```\n\n### Dataset Quality eval [Coming soon]\n\n### End-to-End RAG eval [Coming soon]\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Fragged","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancedb%2Fragged","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Fragged/lists"}