{"id":50500842,"url":"https://github.com/jonathanfavorite/ragamuffin","last_synced_at":"2026-06-02T11:03:48.449Z","repository":{"id":301781854,"uuid":"1009341892","full_name":"jonathanfavorite/RAGamuffin","owner":"jonathanfavorite","description":"A lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage. Perfect for developers who need privacy-focused, offline-capable document search and AI-powered question answering without external API dependencies.","archived":false,"fork":false,"pushed_at":"2025-07-08T01:25:12.000Z","size":7079,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-08T03:06:55.418Z","etag":null,"topics":["ai","chunking","document-processing","dotnet","embedding-models","fluent-api","local-ai","metadata","ml","nlp","offline-ai","onnx","pdf-processing","privacy-focused","rag","retrieval-augmented-generation","semantic-search","sqlite","vector-database","vector-search"],"latest_commit_sha":null,"homepage":"https://www.nuget.org/packages/RAGamuffin","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jonathanfavorite.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-27T01:22:35.000Z","updated_at":"2025-07-08T01:25:16.000Z","dependencies_parsed_at":"2025-06-28T20:32:44.780Z","dependency_job_id":null,"html_url":"https://github.com/jonathanfavorite/RAGamuffin","commit_stats":null,"previous_names":["jonathanfavorite/ragamuffin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jonathanfavorite/RAGamuffin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanfavorite%2FRAGamuffin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanfavorite%2FRAGamuffin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanfavorite%2FRAGamuffin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanfavorite%2FRAGamuffin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jonathanfavorite","download_url":"https://codeload.github.com/jonathanfavorite/RAGamuffin/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanfavorite%2FRAGamuffin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33818584,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chunking","document-processing","dotnet","embedding-models","fluent-api","local-ai","metadata","ml","nlp","offline-ai","onnx","pdf-processing","privacy-focused","rag","retrieval-augmented-generation","semantic-search","sqlite","vector-database","vector-search"],"created_at":"2026-06-02T11:03:47.313Z","updated_at":"2026-06-02T11:03:48.441Z","avatar_url":"https://github.com/jonathanfavorite.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"![RAGamuffin Banner](https://raw.githubusercontent.com/jonathanfavorite/RAGamuffin/master/assets/banner.jpg)\n\n[![NuGet Version](https://img.shields.io/nuget/v/RAGamuffin?style=for-the-badge\u0026color=brightgreen)](https://www.nuget.org/packages/RAGamuffin)  [![Build Status](https://img.shields.io/github/actions/workflow/status/jonathanfavorite/RAGamuffin/build.yml?style=for-the-badge)](https://github.com/jonathanfavorite/RAGamuffin/actions)  [![MIT License](https://img.shields.io/github/license/jonathanfavorite/RAGamuffin?style=for-the-badge\u0026color=blue)](LICENSE)\n\n\nA lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage.\n\n## 🚀 Features\n\n- **Local Embedding Models**: Use ONNX models for offline, privacy-focused embeddings\n- **SQLite Vector Storage**: Lightweight, file-based vector database with no external dependencies\n- **Multi-Format Support**: Process PDFs and text files with intelligent chunking\n- **Flexible Training Strategies**: Retrain from scratch, incremental updates, or add-only modes\n- **Real-time Ingestion**: Stream text content directly into your vector store\n- **Metadata Preservation**: Maintain document context and metadata throughout the pipeline\n- **Cross-Platform**: Works on Windows, macOS, and Linux with .NET 8.0+\n\n## 🎯 Quick Start\n\n### Installation\n\n```bash\ndotnet add package RAGamuffin\n```\n\n### Basic Usage\n\n```csharp\nusing RAGamuffin.Builders;\nusing RAGamuffin.Core;\nusing RAGamuffin.Embedding;\nusing RAGamuffin.Enums;\n\n// 1. Set up your embedding model (download from HuggingFace)\nvar embedder = new OnnxEmbedder(\"path/to/model.onnx\", \"path/to/tokenizer.json\");\n\n// 2. Configure your vector database\nvar vectorDb = new SqliteDatabaseModel(\"documents.db\", \"my_collection\");\n\n// 3. Build and train your pipeline\nvar pipeline = new IngestionTrainingBuilder()\n    .WithEmbeddingModel(embedder)\n    .WithVectorDatabase(vectorDb)\n    .WithTrainingStrategy(TrainingStrategy.RetrainFromScratch)\n    .WithTrainingFiles(new[] { \"document.pdf\" })\n    .Build();\n\nvar ingestedItems = await pipeline.Train();\n\n// 4. Search your documents\nstring[] results = await pipeline.SearchAndReturnTexts(\"What is the company policy?\", 5);\n```\n\n### Real-time Text Ingestion\n\n```csharp\n// Stream text content directly into your vector store\nvar textItems = new[]\n{\n    new TextItem(\"Meeting notes from Q1\", \"Q1 was successful with 15% growth...\"),\n    new TextItem(\"Product roadmap\", \"Next quarter we'll launch feature X...\")\n};\n\nvar (ingestedItems, model) = await pipeline.TrainWithText(textItems);\n```\n\n### Search Existing Vector Store\n\n```csharp\n// Search without retraining\nvar vectorStore = new SqliteVectorStoreProvider(\"documents.db\", \"my_collection\");\nvar searchResults = await vectorStore.SearchAsync(\"your query\", embedder, 5);\n\n// Get metadata\nvar metadata = await vectorStore.GetAllDocumentsMetadataAsync();\n```\n\n## 📚 Examples\n\nCheck out the comprehensive examples in the `Examples/` directory:\n\n- **[TrainAndSearch](Examples/RAGamuffin.Examples.TrainAndSearch/)**: Complete RAG pipeline with training and search\n- **[SearchExistingVectorStore](Examples/RAGamuffin.Examples.SearchExistingVectorStore/)**: Query existing vector stores with metadata\n- **[IncrementalTraining](Examples/RAGamuffin.Examples.IncrementalTraining/)**: Add new documents to existing collections\n- **[RealTimeIngestion](Examples/RAGamuffin.Examples.RealTimeIngestion/)**: Stream text content in real-time\n- **[MetadataRetrieval](Examples/RAGamuffin.Examples.MetadataRetrieval/)**: Work with document metadata and statistics\n\n## 🔧 Configuration\n\n### Embedding Models\n\nRAGamuffin supports ONNX models for cross-platform compatibility. Recommended starter model:\n\n- **Model**: `all-mpnet-base-v2` from HuggingFace\n- **Download**: [Model](https://huggingface.co/sentence-transformers/all-mpnet-base-v2/blob/main/onnx/model.onnx) | [Tokenizer](https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/tokenizer.json)\n\n### Training Strategies\n\n- **RetrainFromScratch**: Drop all existing data and retrain\n- **IncrementalAdd**: Add new documents (skip if exists)\n- **IncrementalUpdate**: Add new documents and update existing ones\n- **ProcessOnly**: Only process documents, no vector operations\n\n### Chunking Options\n\n```csharp\n// PDF processing options\n.WithPdfOptions(new PdfHybridParagraphIngestionOptions\n{\n    MinSize = 0,        // Minimum chunk size\n    MaxSize = 800,      // Maximum chunk size\n    Overlap = 400,      // Overlap between chunks\n    UseMetadata = true  // Include document metadata\n})\n\n// Text processing options\n.WithTextOptions(new TextHybridParagraphIngestionOptions\n{\n    MinSize = 500,      // Minimum chunk size\n    MaxSize = 800,      // Maximum chunk size\n    Overlap = 400,      // Overlap between chunks\n    UseMetadata = true  // Include document metadata\n})\n```\n\n## 🏗️ Architecture\n\nRAGamuffin is built with a modular architecture:\n\n- **Abstractions**: Clean interfaces for embedding, ingestion, and vector storage\n- **Core**: Main pipeline logic and data models\n- **Embedding**: ONNX-based embedding providers\n- **Ingestion**: PDF and text processing engines\n- **VectorStores**: SQLite vector database implementation\n- **Builders**: Fluent API for pipeline configuration\n\n## 🤝 Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.\n\n## 🔗 Related Projects\n\n- **[InstructSharp](https://github.com/jonathanfavorite/InstructSharp)**: LLM client library for .NET\n- **[PdfPig](https://github.com/UglyToad/PdfPig)**: PDF processing library\n- **[Microsoft.ML.OnnxRuntime](https://github.com/microsoft/onnxruntime)**: ONNX model inference\n\n---\n\n**RAGamuffin** - Making RAG pipelines simple and accessible for .NET developers.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathanfavorite%2Fragamuffin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonathanfavorite%2Fragamuffin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathanfavorite%2Fragamuffin/lists"}