{"id":30437327,"url":"https://github.com/kenosis01/tinyrag","last_synced_at":"2025-08-28T08:04:30.369Z","repository":{"id":311027561,"uuid":"1042202662","full_name":"Kenosis01/TinyRag","owner":"Kenosis01","description":"TinyRag is a minimal Python library for retrieval-augmented generation. It offers easy document ingestion, automatic text extraction, embedding generation, and retrieval with vector stores. Designed for quick setup and flexible provider configuration, TinyRag enables fast, contextual responses from language models.","archived":false,"fork":false,"pushed_at":"2025-08-23T16:53:14.000Z","size":163,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-24T11:29:32.388Z","etag":null,"topics":["aichatbot","chatbot","chatgpt","llm","localllm","python","rag","rag-chatbot"],"latest_commit_sha":null,"homepage":"https://tinyrag.netlify.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Kenosis01.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-21T16:34:58.000Z","updated_at":"2025-08-23T16:53:18.000Z","dependencies_parsed_at":"2025-08-21T19:07:26.184Z","dependency_job_id":"f76897b4-cd5c-4319-bacd-e8211f6f686f","html_url":"https://github.com/Kenosis01/TinyRag","commit_stats":null,"previous_names":["kenosis01/tinyrag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Kenosis01/TinyRag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kenosis01%2FTinyRag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kenosis01%2FTinyRag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kenosis01%2FTinyRag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kenosis01%2FTinyRag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Kenosis01","download_url":"https://codeload.github.com/Kenosis01/TinyRag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kenosis01%2FTinyRag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272301786,"owners_count":24910060,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-27T02:00:09.397Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aichatbot","chatbot","chatgpt","llm","localllm","python","rag","rag-chatbot"],"created_at":"2025-08-23T03:18:11.946Z","updated_at":"2025-08-28T08:04:30.362Z","avatar_url":"https://github.com/Kenosis01.png","language":"Python","funding_links":["https://buymeacoffee.com/kenosis"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"logo.jpg\" alt=\"Tinyrag Logo\" width=\"200\"/\u003e\n\u003c/p\u003e\n\n\n# TinyRag 🚀\n\n[![PyPI version](https://badge.fury.io/py/tinyrag.svg)](https://badge.fury.io/py/tinyrag)\n[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Documentation](https://img.shields.io/badge/docs-available-brightgreen.svg)](https://tinyrag-docs.netlify.app/docs)\n[![PyPI Downloads](https://static.pepy.tech/badge/tinyrag)](https://pepy.tech/projects/tinyrag)\n\n\n\nA **lightweight, powerful Python library** for **Retrieval-Augmented Generation (RAG)** that works locally without API keys. Features advanced codebase indexing, multiple document formats, and flexible vector storage backends.\n\n\u003e **🎯 Perfect for developers who need RAG capabilities without complexity or mandatory cloud dependencies.**\n\n## 🌟 Key Features\n\n### 🚀 **Works Locally - No API Keys Required**\n- **🧠 Local Embeddings**: Uses all-MiniLM-L6-v2 by default\n- **🔍 Direct Search**: Query documents without LLM costs\n- **⚡ Zero Setup**: Works immediately after installation\n\n### 📚 **Advanced Document Processing** \n- **📄 Multi-Format**: PDF, DOCX, CSV, TXT, and raw text\n- **💻 Code Intelligence**: Function-level indexing for 7+ programming languages\n- **🧵 Multithreading**: Parallel processing for faster indexing\n- **📊 Chunking Strategies**: Smart text segmentation\n\n### 🗄️ **Flexible Storage Options**\n- **🔌 Multiple Backends**: Memory, Pickle, Faiss, ChromaDB\n- **💾 Persistence**: Automatic or manual data saving\n- **⚡ Performance**: Choose speed vs. memory trade-offs\n- **🔧 Configuration**: Customizable for any use case\n\n### 💬 **Optional AI Integration**\n- **🤖 Custom System Prompts**: Tailor AI behavior for your domain\n- **🔗 Provider Support**: OpenAI, Azure, Anthropic, local models\n- **💰 Cost Control**: Use only when needed\n- **🎯 RAG-Powered Chat**: Contextual AI responses\n\n## 🚀 Quick Start\n\n\u003e **💡 New to TinyRag?** Check out our comprehensive [📖 Documentation](https://tinyrag-docs.netlify.app/docs) with step-by-step guides!\n\n### Installation\n\n```bash\n# Basic installation\npip install tinyrag\n\n# With all optional dependencies\npip install tinyrag[all]\n\n# Specific vector stores\npip install tinyrag[faiss]    # High performance\npip install tinyrag[chroma]   # Persistent storage\npip install tinyrag[docs]     # Document processing\n```\n\n### Usage Examples\n\n### 🏃‍♂️ 30-Second Example (No API Key Required)\n\n```python\nfrom tinyrag import TinyRag\n\n# 1. Create TinyRag instance\nrag = TinyRag()\n\n# 2. Add your content  \nrag.add_documents([\n    \"TinyRag makes RAG simple and powerful.\",\n    \"docs/user_guide.pdf\",\n    \"research_papers/\"\n])\n\n# 3. Search your content\nresults = rag.query(\"How does TinyRag work?\", k=3)\nfor text, score in results:\n    print(f\"Score: {score:.2f} - {text[:100]}...\")\n```\n\n**Output:**\n```\nScore: 0.89 - TinyRag makes RAG simple and powerful.\nScore: 0.76 - TinyRag is a lightweight Python library for...\nScore: 0.72 - The system processes documents using semantic...\n```\n\n### 🤖 AI-Powered Chat (Optional)\n\n```python\nfrom tinyrag import Provider, TinyRag\n\n# Set up AI provider\nprovider = Provider(\n    api_key=\"sk-your-openai-key\",\n    model=\"gpt-4\"\n)\n\n# Create smart assistant\nrag = TinyRag(\n    provider=provider,\n    system_prompt=\"You are a helpful technical assistant.\"\n)\n\n# Add knowledge base\nrag.add_documents([\"technical_docs/\", \"api_guides/\"])\nrag.add_codebase(\"src/\")  # Index your codebase\n\n# Get intelligent answers\nresponse = rag.chat(\"How do I implement user authentication?\")\nprint(response)\n# AI response based on your specific docs and code!\n```\n\n## 📖 Complete Documentation\n\n**📚 [Full Documentation](docs/README.md)** - Comprehensive guides from beginner to expert\n\n### 🚀 **Getting Started**\n- [**Quick Start**](docs/01-quick-start.md) - 5-minute introduction\n- [**Installation**](docs/02-installation.md) - Complete setup guide  \n- [**Basic Usage**](docs/03-basic-usage.md) - Core features without AI\n\n### 🔧 **Core Features**\n- [**Document Processing**](docs/04-document-processing.md) - PDF, DOCX, CSV, TXT\n- [**Codebase Indexing**](docs/05-codebase-indexing.md) - Function-level code search\n- [**Vector Stores**](docs/06-vector-stores.md) - Choose the right storage\n- [**Search \u0026 Query**](docs/07-search-query.md) - Similarity search techniques\n\n### 🤖 **AI Integration**\n- [**System Prompts**](docs/08-system-prompts.md) - Customize AI behavior\n- [**Chat Functionality**](docs/09-chat-functionality.md) - Build conversations\n- [**Provider Configuration**](docs/10-provider-config.md) - AI model setup\n\n---\n\n## 🔧 Core API Reference\n\n### Provider Class\n\n```python\nfrom tinyrag import Provider\n\n# 🆓 No API key needed - works locally\nprovider = Provider(embedding_model=\"default\")\n\n# 🤖 With AI capabilities\nprovider = Provider(\n    api_key=\"sk-your-key\",\n    model=\"gpt-4\",                           # GPT-4, GPT-3.5, local models\n    embedding_model=\"text-embedding-ada-002\", # or \"default\" for local\n    base_url=\"https://api.openai.com/v1\"     # OpenAI, Azure, custom\n)\n```\n\n### TinyRag Class\n\n```python\nfrom tinyrag import TinyRag\n\n# 🎛️ Choose your vector store\nrag = TinyRag(\n    provider=provider,               # Optional: for AI chat\n    vector_store=\"faiss\",           # memory, pickle, faiss, chromadb\n    chunk_size=500,                 # Text chunk size\n    max_workers=4,                  # Parallel processing\n    system_prompt=\"Custom prompt\"   # AI behavior\n)\n```\n\n### 🗄️ Vector Store Comparison\n\n| Store | Performance | Persistence | Memory | Dependencies | Best For |\n|-------|-------------|-------------|---------|--------------|----------|\n| **Memory** | ⚡ Fast | ❌ None | 📈 High | ✅ None | Development, testing |\n| **Pickle** | 🐌 Fair | 💾 Manual | 📊 Medium | ✅ Minimal | Simple projects |\n| **Faiss** | 🚀 Excellent | 💾 Manual | 📉 Low | 📦 faiss-cpu | Large datasets, speed |\n| **ChromaDB** | ⚡ Good | 🔄 Auto | 📊 Medium | 📦 chromadb | Production, features |\n\n\u003e **💡 Recommendation:** Start with `memory` for development, use `faiss` for production performance.\n\n## 🔧 Essential Methods\n\n```python\n# 📄 Document Management\nrag.add_documents([\"file.pdf\", \"text\"])   # Add any documents\nrag.add_codebase(\"src/\")                   # Index code functions\nrag.clear_documents()                      # Reset everything\n\n# 🔍 Search \u0026 Query (No AI needed)\nresults = rag.query(\"search term\", k=5)   # Find similar content\ncode = rag.query(\"auth function\")          # Search code too\n\n# 🤖 AI Chat (Optional)\nresponse = rag.chat(\"Explain this code\")   # Get AI answers\nrag.set_system_prompt(\"Be helpful\")        # Customize AI\n\n# 💾 Persistence\nrag.save_vector_store(\"my_data.pkl\")       # Save your work\nrag.load_vector_store(\"my_data.pkl\")       # Load it back\n```\n\n\u003e **📖 [Complete API Reference](docs/18-api-reference.md)** - Full method documentation\n\n## 💻 Code Intelligence\n\nTinyRag indexes your codebase at the **function level** for intelligent code search:\n\n### 🌐 Supported Languages\n\n| Language | Extensions | Detection |\n|----------|------------|----------|\n| **Python** | `.py` | `def function_name` |\n| **JavaScript** | `.js`, `.ts` | `function name()`, `const name =` |\n| **Java** | `.java` | `public/private type name()` |\n| **C/C++** | `.c`, `.cpp`, `.h` | `return_type function_name()` |\n| **Go** | `.go` | `func functionName()` |\n| **Rust** | `.rs` | `fn function_name()` |\n| **PHP** | `.php` | `function functionName()` |\n\n### 🔍 Code Search Examples\n\n```python\n# Index your entire project\nrag.add_codebase(\"my_app/\")\n\n# Find authentication code\nauth_code = rag.query(\"user authentication login\")\n\n# Database functions\ndb_code = rag.query(\"database query SELECT\")\n\n# API endpoints\napi_code = rag.query(\"REST API endpoint\")\n\n# Get AI explanations (with API key)\nresponse = rag.chat(\"How does user authentication work?\")\n# AI analyzes your actual code and explains it!\n```\n\n\u003e **💡 [Learn More](docs/05-codebase-indexing.md)** - Advanced code search techniques\n\n\n## ⚙️ Configuration Examples\n\n### 🚀 Performance Optimized\n```python\n# Large datasets, maximum speed\nrag = TinyRag(\n    vector_store=\"faiss\",\n    chunk_size=800,\n    max_workers=8  # Parallel processing\n)\n```\n\n### 💾 Production Setup\n```python\n# Persistent, multi-user ready\nrag = TinyRag(\n    provider=provider,\n    vector_store=\"chromadb\",\n    vector_store_config={\n        \"collection_name\": \"company_docs\",\n        \"persist_directory\": \"/data/vectors/\"\n    }\n)\n```\n\n### 🤖 Custom AI Assistant\n```python\n# Domain-specific AI behavior\nrag = TinyRag(\n    provider=provider,\n    system_prompt=\"\"\"You are a senior software engineer.\n    Provide detailed technical explanations with code examples.\"\"\"\n)\n```\n\n\u003e **🔧 [Full Configuration Guide](docs/12-configuration.md)** - All options explained\n\n## 📦 Installation\n\n### 🎯 Choose Your Setup\n\n```bash\n# 🚀 Quick start (works immediately)\npip install tinyrag\n\n# ⚡ High performance (recommended)\npip install tinyrag[faiss]\n\n# 📄 Document processing (PDF, DOCX)\npip install tinyrag[docs]\n\n# 🗄️ Production database\npip install tinyrag[chroma]\n\n# 🎁 Everything included\npip install tinyrag[all]\n```\n\n### 🔧 What Each Option Includes\n\n| Option | Includes | Use Case |\n|--------|----------|----------|\n| **Base** | Memory store, local embeddings | Development, testing |\n| **[faiss]** | + High-performance search | Large datasets |\n| **[docs]** | + PDF/DOCX processing | Document analysis |\n| **[chroma]** | + Persistent database | Production apps |\n| **[all]** | + Everything | Full features |\n\n\u003e **💡 [Installation Guide](docs/02-installation.md)** - Detailed setup instructions\n\n## 🎯 Real-World Use Cases\n\n### 🏢 **Business Applications**\n- **📋 Customer Support**: Query company docs and policies\n- **📚 Knowledge Management**: Searchable internal documentation\n- **🔍 Research Tools**: Semantic search through research papers\n- **📊 Report Analysis**: Find insights across business reports\n\n### 👨‍💻 **Developer Tools**\n- **🔧 Code Documentation**: Auto-generate code explanations\n- **🔍 Legacy Code Explorer**: Understand large codebases\n- **📖 API Assistant**: Query technical documentation\n- **🧪 Testing Helper**: Find relevant test patterns\n\n### 🎓 **Educational \u0026 Research**\n- **📚 Study Assistant**: Query textbooks and notes\n- **📝 Writing Helper**: Research paper analysis\n- **🧠 Learning Companion**: Personalized explanations\n- **📊 Data Analysis**: Explore datasets semantically\n\n\u003e **💡 [See Complete Examples](docs/15-examples.md)** - Production-ready applications\n\n---\n\n## 🛠️ Contributing\n\nWe welcome contributions! Here's how to get started:\n\n```bash\n# 1. Fork and clone\ngit clone https://github.com/Kenosis01/TinyRag.git\ncd TinyRag\n\n# 2. Install development dependencies  \npip install -e \".[all,dev]\"\n\n# 3. Run tests\npython -m pytest\n\n# 4. Make your changes and submit a PR!\n```\n\n### 📋 **Development Setup**\n- **Python 3.7+** required\n- **Core dependencies**: sentence-transformers, requests, numpy\n- **Optional**: faiss-cpu, chromadb, PyPDF2, python-docx\n\n\u003e **🔧 [Development Guide](CONTRIBUTING.md)** - Detailed contributor guidelines\n\n## 🤝 Community \u0026 Support\n\n### 📞 **Get Help**\n- **📖 [Complete Documentation](docs/README.md)** - Comprehensive guides\n- **🐛 [GitHub Issues](https://github.com/Kenosis01/TinyRag/issues)** - Bug reports \u0026 feature requests\n- **💬 [Discussions](https://github.com/Kenosis01/TinyRag/discussions)** - Community Q\u0026A\n- **📋 [FAQ](docs/19-faq.md)** - Common questions answered\n\n### 🎉 **Show Your Support**\n- ⭐ **Star this repo** if TinyRag helps you!\n- 🐦 **Share on Twitter** - spread the word\n- ☕ **[Buy me a coffee](https://buymeacoffee.com/kenosis)** - support development\n- 🤝 **Contribute** - help make TinyRag better\n\n---\n\n## 📄 License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**🚀 TinyRag - Making RAG Simple, Powerful, and Accessible! 🚀**\n\n*Build intelligent search and Q\u0026A systems in minutes, not hours*\n\n[![GitHub stars](https://img.shields.io/github/stars/Kenosis01/TinyRag?style=social)](https://github.com/Kenosis01/TinyRag)\n[![PyPI downloads](https://img.shields.io/pypi/dm/tinyrag)](https://pypi.org/project/tinyrag/)\n[![GitHub last commit](https://img.shields.io/github/last-commit/Kenosis01/TinyRag)](https://github.com/Kenosis01/TinyRag)\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenosis01%2Ftinyrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkenosis01%2Ftinyrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenosis01%2Ftinyrag/lists"}