{"id":26720863,"url":"https://github.com/kaos599/betterrag","last_synced_at":"2025-03-27T19:34:43.246Z","repository":{"id":284600573,"uuid":"949962703","full_name":"Kaos599/BetterRAG","owner":"Kaos599","description":"BetterRAG: Powerful RAG evaluation toolkit for LLMs. Measure, analyze, and optimize how your AI processes text chunks with precision metrics. Perfect for RAG systems, document processing, and embedding quality assessment.","archived":false,"fork":false,"pushed_at":"2025-03-26T17:29:01.000Z","size":107,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T18:33:00.083Z","etag":null,"topics":["chunking-optimization","embeddings","embeddings-extraction","embeddings-optimization","evaluation","evaluation-framework","optimization","rag","rag-application","rag-evaluation","rag-optimization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Kaos599.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-17T12:17:38.000Z","updated_at":"2025-03-26T17:29:05.000Z","dependencies_parsed_at":"2025-03-26T18:44:53.407Z","dependency_job_id":null,"html_url":"https://github.com/Kaos599/BetterRAG","commit_stats":null,"previous_names":["kaos599/betterrag"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaos599%2FBetterRAG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaos599%2FBetterRAG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaos599%2FBetterRAG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaos599%2FBetterRAG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Kaos599","download_url":"https://codeload.github.com/Kaos599/BetterRAG/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245911510,"owners_count":20692602,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chunking-optimization","embeddings","embeddings-extraction","embeddings-optimization","evaluation","evaluation-framework","optimization","rag","rag-application","rag-evaluation","rag-optimization"],"created_at":"2025-03-27T19:34:42.708Z","updated_at":"2025-03-27T19:34:43.229Z","avatar_url":"https://github.com/Kaos599.png","language":"Python","readme":"\u003c!-- BetterRAG Logo Banner --\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/2d68ab1c-2962-4429-ad87-f91f00a08160\" alt=\"BetterRAG Logo\" width=\"700px\"\u003e\n  \u003ch1\u003eBetterRAG\u003c/h1\u003e\n  \u003cp\u003e\u003cstrong\u003e🚀 Supercharge your RAG pipeline with optimized text chunking\u003c/strong\u003e\u003c/p\u003e\n  \n  [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n  [![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)\n  [![MongoDB](https://img.shields.io/badge/MongoDB-4EA94B?logo=mongodb\u0026logoColor=white)](https://www.mongodb.com/)\n  [![Dashboard](https://img.shields.io/badge/Dash-Interactive-blue?logo=plotly\u0026logoColor=white)](https://dash.plotly.com/)\n\u003c/div\u003e\n\n## ✨ Overview\n \n**BetterRAG** helps you find the optimal text chunking strategy for your Retrieval-Augmented Generation pipeline through rigorous, data-driven evaluation. Stop guessing which chunking method works best—measure it!\n\n\u003cdiv align=\"center\"\u003e\n  \u003ctable\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003e📊 \u003cb\u003eCompare Strategies\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e⚙️ \u003cb\u003eZero-Code Configuration\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e📈 \u003cb\u003eInteractive Dashboard\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/table\u003e\n\u003c/div\u003e\n\n## 🔎 Why BetterRAG?\n\nText chunking can make or break your RAG system's performance. Different strategies yield dramatically different results, but the optimal approach depends on your specific documents and use case. BetterRAG provides:\n\n- **Quantitative comparison** between chunking strategies\n- **Visualized metrics** to understand performance differences\n- **Clear recommendations** based on real data\n- **No coding required** to evaluate and improve your pipeline\n\n## 🛠️ Features\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd width=\"50%\"\u003e\n      \u003ch3\u003e🧩 Multiple Chunking Strategies\u003c/h3\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003cb\u003eFixed-size chunking\u003c/b\u003e: Simple token-based splitting\u003c/li\u003e\n        \u003cli\u003e\u003cb\u003eRecursive chunking\u003c/b\u003e: Follows document hierarchy\u003c/li\u003e\n        \u003cli\u003e\u003cb\u003eSemantic chunking\u003c/b\u003e: Preserves meaning and context\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/td\u003e\n    \u003ctd width=\"50%\"\u003e\n      \u003ch3\u003e🤖 LLM Integration\u003c/h3\u003e\n      \u003cul\u003e\n        \u003cli\u003eAzure OpenAI compatibility\u003c/li\u003e\n        \u003cli\u003eGoogle Gemini support\u003c/li\u003e\n        \u003cli\u003eExtensible for other models\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\n      \u003ch3\u003e📊 Comprehensive Metrics\u003c/h3\u003e\n      \u003cul\u003e\n        \u003cli\u003eContext precision\u003c/li\u003e\n        \u003cli\u003eToken efficiency\u003c/li\u003e\n        \u003cli\u003eAnswer relevance\u003c/li\u003e\n        \u003cli\u003eLatency measurement\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/td\u003e\n    \u003ctd\u003e\n      \u003ch3\u003e💾 Persistent Storage\u003c/h3\u003e\n      \u003cul\u003e\n        \u003cli\u003eMongoDB integration\u003c/li\u003e\n        \u003cli\u003eReuse embeddings across evaluations\u003c/li\u003e\n        \u003cli\u003eCache results for faster iteration\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.8+\n- MongoDB (local or remote)\n- API keys for Azure OpenAI and/or Google Gemini\n\n### Installation in 3 Steps\n\n```bash\n# 1. Clone the repository\ngit clone https://github.com/yourusername/betterrag.git\ncd betterrag\n\n# 2. Install dependencies\npip install -r requirements.txt\n\n# 3. Set up your configuration\ncp config.template.yaml config.yaml\n# Edit config.yaml with your API keys and preferences\n```\n\n### Running Your First Evaluation\n\n```bash\n# Add your documents to data/documents/\n\n# Run the evaluation\npython -m app.main\n\n# View the interactive dashboard\n# Default: http://127.0.0.1:8050/\n```\n\n## 📊 Sample Results\n\nBetterRAG provides clear visual comparisons between chunking strategies:\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://via.placeholder.com/800x400?text=Chunking+Strategy+Comparison+Chart\" alt=\"Comparison Chart\" width=\"80%\"\u003e\n\u003c/div\u003e\n\nBased on comprehensive metrics, BetterRAG will recommend the most effective chunking approach for your specific documents and queries.\n\n## ⚙️ Configuration Options\n\nBetterRAG uses a single YAML configuration file for all settings:\n\n```yaml\n# Chunking strategies to evaluate\nchunking:\n  fixed_size:\n    enabled: true\n    chunk_size: 500\n    chunk_overlap: 50\n  \n  recursive:\n    enabled: true\n    chunk_size: 1000\n    separators: [\"\\n\\n\", \"\\n\", \" \", \"\"]\n  \n  semantic:\n    enabled: true\n    model: \"all-MiniLM-L6-v2\"\n\n# API credentials (or use environment variables)\napi:\n  azure_openai:\n    api_key: ${AZURE_OPENAI_API_KEY}\n    endpoint: ${AZURE_OPENAI_ENDPOINT}\n```\n\nSee [config_setup.md](config_setup.md) for detailed configuration instructions.\n\n## 🔧 Advanced Usage\n\n```bash\n# Run dashboard only (using previously processed data)\npython -m app.main --dashboard-only\n\n# Reset database before processing\npython -m app.main --reset-db\n\n# Use custom config file\npython -m app.main --config my_custom_config.yaml\n```\n\n## 🛠️ Extending BetterRAG\n\n### Adding a New Chunking Strategy\n\n1. Create a new chunker implementation in `app/chunkers/`\n2. Register it in `app/chunkers/__init__.py`\n3. Add configuration parameters in `config.yaml`\n\n### Custom Metrics\n\nExtend the `ChunkingEvaluator` class in `app/evaluation/metrics.py` to add new metrics.\n\n## 🤝 Contributing\n\nContributions are welcome! Feel free to:\n\n- Report bugs and issues\n- Suggest new features or enhancements\n- Add support for additional LLM providers\n- Implement new chunking strategies\n\n## 📜 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n---\n\n\u003cdiv align=\"center\"\u003e\n  \u003cp\u003eBuilt with ❤️ for the RAG community\u003c/p\u003e\n  \u003cp\u003e\n    \u003ca href=\"https://github.com/Kaos599/betterrag/issues\"\u003eReport Bug\u003c/a\u003e\n    ·\n    \u003ca href=\"https://github.com/Kaos599/betterrag/issues\"\u003eRequest Feature\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaos599%2Fbetterrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkaos599%2Fbetterrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaos599%2Fbetterrag/lists"}