{"id":29744214,"url":"https://github.com/sosanzma/rag-techniques-handbook","last_synced_at":"2026-05-20T07:31:11.629Z","repository":{"id":305652255,"uuid":"890327545","full_name":"sosanzma/rag-techniques-handbook","owner":"sosanzma","description":"A comprehensive guide to advanced RAG techniques including reranking, deep memory, and vector store optimization. Includes practical implementations and best practices using LlamaIndex, Langchain, etc..","archived":false,"fork":false,"pushed_at":"2025-07-21T08:57:30.000Z","size":75,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-21T10:31:50.950Z","etag":null,"topics":["deep-memory","llama-index","machine-learning","optimization","rag","reranking","vector-store"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sosanzma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-11-18T11:37:07.000Z","updated_at":"2025-07-21T08:57:35.000Z","dependencies_parsed_at":"2025-07-21T10:31:54.278Z","dependency_job_id":"0505bb03-11e5-4a46-b7e6-718ad2ed80c1","html_url":"https://github.com/sosanzma/rag-techniques-handbook","commit_stats":null,"previous_names":["sosanzma/rag-techniques-handbook"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/sosanzma/rag-techniques-handbook","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sosanzma%2Frag-techniques-handbook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sosanzma%2Frag-techniques-handbook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sosanzma%2Frag-techniques-handbook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sosanzma%2Frag-techniques-handbook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sosanzma","download_url":"https://codeload.github.com/sosanzma/rag-techniques-handbook/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sosanzma%2Frag-techniques-handbook/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267118314,"owners_count":24038804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-memory","llama-index","machine-learning","optimization","rag","reranking","vector-store"],"created_at":"2025-07-26T04:39:24.123Z","updated_at":"2026-05-20T07:31:11.621Z","avatar_url":"https://github.com/sosanzma.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAG Techniques Handbook\n\n\u003e **🎯 Status**: Production-Ready | **📚 Complete Techniques**: 6 | **🔗 Integrated Docs**: 100%\n\n## Overview\n\nThis repository provides a comprehensive collection of **advanced RAG (Retrieval Augmented Generation) techniques** with production-ready implementations and seamlessly integrated documentation. Each technique includes both theoretical explanations and working Python code that you can immediately use in your projects.\n\n**Key Features:**\n- ✅ **Production-Ready Code**: All implementations are tested and optimized\n- 📖 **Integrated Documentation**: Theory and code seamlessly connected\n- 🚀 **Quick Start Guides**: Get running in 3 minutes with any technique\n- 🔧 **Configurable**: Extensive configuration options for customization\n- 📊 **Evaluation Framework**: Built-in metrics and performance assessment\n- 🤖 **Multiple LLM Support**: OpenAI, Cohere, and more\n\n## Currently Implemented Techniques\n\n### ✅ RAG Metrics \u0026 Evaluation\n**Complete evaluation framework with LlamaIndex and RAGAS integration**\n- **Features**: Retrieval metrics (MRR, Hit Rate), Generation quality (Faithfulness, Relevancy), RAGAS comprehensive metrics\n- **Classes**: `RAGEvaluationFramework`, `EvaluationConfig`\n- **Documentation**: [RAG Metrics \u0026 Evaluation Guide](docs/rag_metrics_evaluation_guide.md)\n- **Implementation**: [rag_metrics_evaluation_module.py](src/rag_metrics_evaluation_module.py)\n\n### ✅ Vector Store Index Implementation\n**Optimized vector store operations with DeepLake integration**\n- **Features**: Deep Memory optimization, Performance evaluation, Cohere reranking, Async operations\n- **Classes**: `VectorStoreManager`, `VectorStoreConfig`\n- **Documentation**: [Vector Store Implementation Guide](docs/vector_store_index_implementation_guide.md)\n- **Implementation**: [vector_store_index_implementation.py](src/vector_store_index_implementation.py)\n\n### ✅ Advanced Reranking Systems\n**Multiple reranking strategies with ensemble capabilities**\n- **Features**: Cohere API, SentenceTransformer, LLM-based, Ensemble reranking, Performance caching\n- **Classes**: `RerankerManager`, `RerankerConfig`, `EnsembleReranker`\n- **Documentation**: [Reranking Systems Guide](docs/reranking_rag_systems.md)\n- **Implementation**: [reranking_rag_systems.py](src/reranking_rag_systems.py)\n\n### ✅ Advanced RAG Techniques\n**LlamaIndex-based advanced query processing and retrieval**\n- **Features**: Sub-question decomposition, Query transformation, Hierarchical retrieval, Streaming support\n- **Classes**: `AdvancedRAGEngine`\n- **Documentation**: [Advanced RAG Techniques Guide](docs/Advanced_RAg_techniques_LLamaIndex.md)\n- **Implementation**: [Advanced_RAG_tenchniques_LLamaIndex.py](src/Advanced_RAG_tenchniques_LLamaIndex.py)\n\n### ✅ RAG Agent System\n**Multi-source agent-based RAG with tool integration**\n- **Features**: Multi-document querying, Tool integration, OpenAI agents, DeepLake support\n- **Classes**: `RAGAgent`\n- **Documentation**: [RAG Agent Guide](docs/LlamaIndex_rag_agent.md)\n- **Implementation**: [LlamaIndex_rag_agent.py](src/LlamaIndex_rag_agent.py)\n\n### ✅ LangSmith Integration\n**Complete monitoring and evaluation with LangSmith**\n- **Features**: Tracing, Prompt management, Evaluation pipelines, Chain monitoring\n- **Classes**: `LangSmithManager`\n- **Documentation**: [LangSmith Integration Guide](docs/Langsmith.md)\n- **Implementation**: [Langsmith.py](src/Langsmith.py)\n\n## Quick Start\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/sosanzma/rag-techniques-handbook.git\ncd rag-techniques-handbook\n\n# Install dependencies\npip install llama-index==0.9.14.post3 deeplake==3.8.12 openai==1.3.8 cohere==4.37 ragas==0.0.22 pandas\n\n# Set up environment variables\nexport OPENAI_API_KEY=\"your_openai_key\"\nexport ACTIVELOOP_TOKEN=\"your_deeplake_token\"\nexport COHERE_API_KEY=\"your_cohere_key\"\n```\n\n### 3-Minute Example: Complete RAG Evaluation\n\n```python\nfrom src.rag_metrics_evaluation_module import RAGEvaluationFramework, EvaluationConfig\nimport asyncio\n\n# Configure and run comprehensive evaluation\nconfig = EvaluationConfig(llm_model=\"gpt-3.5-turbo\", enable_ragas=True)\nevaluator = RAGEvaluationFramework(config)\n\n# Evaluate your documents\nresults = asyncio.run(\n    evaluator.comprehensive_evaluation(documents_path=\"your/docs/path\")\n)\n\nprint(f\"📊 Retrieval MRR: {results['retrieval_metrics']['mrr']:.3f}\")\nprint(f\"✅ Faithfulness: {results['generation_metrics']['faithfulness']:.3f}\")\nprint(f\"🎯 Relevancy: {results['generation_metrics']['relevancy']:.3f}\")\n```\n\n## Project Structure\n\n```\nrag-techniques-handbook/\n├── 📁 docs/                           # Comprehensive guides (theory + implementation)\n│   ├── 📋 rag_metrics_evaluation_guide.md\n│   ├── 📋 vector_store_index_implementation_guide.md\n│   ├── 📋 reranking_rag_systems.md\n│   ├── 📋 Advanced_RAg_techniques_LLamaIndex.md\n│   ├── 📋 LlamaIndex_rag_agent.md\n│   └── 📋 Langsmith.md\n├── 🐍 src/                            # Production-ready implementations\n│   ├── 🔬 rag_metrics_evaluation_module.py\n│   ├── 🗄️ vector_store_index_implementation.py\n│   ├── 🔄 reranking_rag_systems.py\n│   ├── 🚀 Advanced_RAG_tenchniques_LLamaIndex.py\n│   ├── 🤖 LlamaIndex_rag_agent.py\n│   └── 📈 Langsmith.py\n├── 📓 src/Module_04_RAG_Metrics\u0026Evaluation.ipynb  # Jupyter notebook example\n├── 📖 CLAUDE.md                       # Development documentation\n└── 🗺️ README.md                       # This file\n```\n\n\n\n## Prerequisites\n\n- **Python**: 3.8+\n- **API Keys**: OpenAI (required), Cohere (optional), ActiveLoop (for DeepLake)\n- **Memory**: 8GB+ RAM recommended for large document processing\n- **Storage**: Vector databases may require significant disk space\n\n## Contributing\n\nWe welcome contributions! The repository is designed for easy extension:\n\n### Current Areas for Contribution\n- Additional evaluation metrics\n- New vector store integrations\n- Performance optimizations\n- Documentation improvements\n- Example notebooks\n\n### How to Contribute\n1. Fork the repository\n2. Create a feature branch\n3. Follow the existing patterns (see any `src/` file for reference)\n4. Update both implementation and documentation\n5. Submit a pull request\n\n## Resources\n\n### Documentation\n- **[CLAUDE.md](CLAUDE.md)**: Complete technical documentation and development guide\n- **[Planned Features](PLANNED_FEATURES.md)**: Upcoming techniques and improvements\n\n### External Resources\n- [LlamaIndex Documentation](https://docs.llamaindex.ai/)\n- [RAGAS Documentation](https://docs.ragas.io/)\n- [DeepLake Documentation](https://docs.deeplake.ai/)\n- [OpenAI API Documentation](https://platform.openai.com/docs)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n---\n\n\u003e **🚀 Ready to get started?** Pick any technique from the list above and follow its Quick Start guide. Each implementation is production-ready and includes comprehensive documentation to get you running in minutes!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsosanzma%2Frag-techniques-handbook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsosanzma%2Frag-techniques-handbook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsosanzma%2Frag-techniques-handbook/lists"}