{"id":36454695,"url":"https://github.com/modfin/ragnar","last_synced_at":"2026-01-11T23:01:50.839Z","repository":{"id":316437738,"uuid":"1063365859","full_name":"modfin/ragnar","owner":"modfin","description":"Ragnar helps handling file conversion, chunking, and embedding. Can be used as source for a RAG system.","archived":false,"fork":false,"pushed_at":"2025-11-20T02:40:19.000Z","size":201,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-11-27T10:27:12.491Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modfin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-24T14:23:23.000Z","updated_at":"2025-11-11T13:00:23.000Z","dependencies_parsed_at":"2025-10-24T15:21:40.458Z","dependency_job_id":null,"html_url":"https://github.com/modfin/ragnar","commit_stats":null,"previous_names":["modfin/ragnar"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/modfin/ragnar","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modfin%2Fragnar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modfin%2Fragnar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modfin%2Fragnar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modfin%2Fragnar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modfin","download_url":"https://codeload.github.com/modfin/ragnar/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modfin%2Fragnar/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28326166,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-11T22:11:01.104Z","status":"ssl_error","status_checked_at":"2026-01-11T22:10:58.990Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-11T23:01:50.767Z","updated_at":"2026-01-11T23:01:50.821Z","avatar_url":"https://github.com/modfin.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ragnar\n*README generated by AI*\n\n\nA comprehensive **Retrieval-Augmented Generation (RAG)** service built in Go that enables intelligent document storage, processing, and vector search capabilities. Ragnar provides a complete solution for ingesting various document formats, converting them to markdown, chunking content, generating embeddings, and performing semantic search.\n\n## 🚀 Features\n\n- **Multi-format Document Support**: Supports PDF, DOCX, ODT, HTML, JSON, plain text, and markdown files\n- **Intelligent Document Processing**: Automatically converts documents to markdown using Pandoc and pdftotext\n- **Document Chunking**: Smart text chunking with configurable strategies\n- **Vector Embeddings**: Generate embeddings using various AI models via Bellman AI platform\n- **Semantic Search**: Perform vector-based similarity search across document chunks\n- **Access Control**: Token-based authentication with fine-grained permissions\n- **Storage Flexibility**: Uses MinIO-compatible object storage for scalable file storage\n- **Processing Pipeline**: Asynchronous document processing with status tracking\n- **RESTful API**: Complete REST API with OpenAPI/Swagger documentation\n\n## 🏗️ Architecture\n\nRagnar consists of several key components:\n\n- **Web API**: RESTful HTTP server with comprehensive endpoints\n- **Document Parser**: Converts various file formats to markdown\n- **Chunker**: Splits documents into manageable chunks for processing\n- **AI Integration**: Embedding generation via Bellman AI platform\n- **Storage Layer**: MinIO/S3-compatible object storage\n- **Database**: PostgreSQL for metadata and chunk storage\n- **Processing Pipeline**: Background job processing for document workflows\n\n## 🛠️ Installation \u0026 Setup\n\n### Using Docker Compose (Recommended for Development)\n\n1. **Clone the repository**:\n```bash\ngit clone https://github.com/modfin/ragnar.git\ncd ragnar\n```\n\n2. **Start all services**:\n```bash\ndocker-compose up -d\n```\n\nThis will start:\n- Ragnar API server on port `7100`\n- PostgreSQL database on port `6789`\n- MinIO object storage on port `9000` (console on `9191`)\n\n3. **Access the API**:\n- API: http://localhost:7100\n- MinIO Console: http://localhost:9191 (admin/admin)\n\n## 📚 Go Client Library\n\nRagnar provides a comprehensive Go client library for easy integration.\n\n### Installation\n\n```bash\ngo get github.com/modfin/ragnar\n```\n\n### Basic Usage\n\n```go\npackage main\n\nimport (\n    \"context\"\n    \"fmt\"\n    \"strings\"\n\n    \"github.com/modfin/ragnar\"\n)\n\nfunc main() {\n    // Initialize the client\n    client := ragnar.NewClient(ragnar.ClientConfig{\n        BaseURL:   \"http://localhost:7100\",\n        AccessKey: \"your-access-key\",\n    })\n\n    ctx := context.Background()\n\n    // Create a new tub (document collection)\n    tub, err := client.CreateTub(ctx, ragnar.Tub{\n        TubName: \"my-documents\",\n    })\n    if err != nil {\n        panic(err)\n    }\n    fmt.Printf(\"Created tub: %s\\n\", tub.TubId)\n}\n```\n\n## 🔧 Client API Examples\n\n### 1. Managing Tubs (Document Collections)\n\n```go\n// List all available tubs\ntubs, err := client.GetTubs(ctx)\nif err != nil {\n    log.Fatal(err)\n}\n\n// Get specific tub\ntub, err := client.GetTub(ctx, \"my-documents\")\nif err != nil {\n    log.Fatal(err)\n}\n\n// Update tub with required headers\nupdatedTub := tub.WithRequiredDocumentHeaders(\"project-id\", \"department\")\nresult, err := client.UpdateTub(ctx, updatedTub)\nif err != nil {\n    log.Fatal(err)\n}\n\n// Delete tub\ndeletedTub, err := client.DeleteTub(ctx, \"my-documents\")\nif err != nil {\n    log.Fatal(err)\n}\n```\n\n### 2. Document Operations\n\n#### Upload a Simple Document\n\n```go\n// Upload a text document\ncontent := strings.NewReader(\"This is my document content\")\nheaders := map[string]string{\n    \"Content-Type\":      \"text/plain\",\n    \"x-ragnar-project-id\": \"proj-123\", // Custom header\n}\n\ndoc, err := client.CreateTubDocument(ctx, \"my-documents\", content, headers)\nif err != nil {\n    log.Fatal(err)\n}\nfmt.Printf(\"Document uploaded: %s\\n\", doc.DocumentId)\n```\n\n#### Upload with Custom Markdown and Chunks\n\n```go\n// Upload with pre-processed markdown and chunks\nfileContent := strings.NewReader(\"\u003ch1\u003eTitle\u003c/h1\u003e\u003cp\u003eContent\u003c/p\u003e\")\nmarkdownContent := strings.NewReader(\"# Title\\n\\nContent\")\nchunks := []ragnar.Chunk{\n    {ChunkId: 0, Content: \"Title\"},\n    {ChunkId: 1, Content: \"Content\"},\n}\n\ndoc, err := client.CreateTubDocumentWithOptionals(\n    ctx, \"my-documents\", fileContent, markdownContent, chunks, headers)\nif err != nil {\n    log.Fatal(err)\n}\n```\n\n#### Download Documents\n\n```go\n// Download original document\nreader, err := client.DownloadTubDocument(ctx, \"my-documents\", doc.DocumentId)\nif err != nil {\n    log.Fatal(err)\n}\ndefer reader.Close()\n\n// Download converted markdown\nmdReader, err := client.DownloadTubDocumentMarkdown(ctx, \"my-documents\", doc.DocumentId)\nif err != nil {\n    log.Fatal(err)\n}\ndefer mdReader.Close()\n```\n\n#### Update Documents\n\n```go\n// Update existing document\nnewContent := strings.NewReader(\"Updated content\")\nupdatedHeaders := map[string]string{\n    \"Content-Type\": \"text/plain\",\n    \"x-ragnar-filename\": \"updated.txt\",\n}\n\nupdatedDoc, err := client.UpdateTubDocument(ctx, \"my-documents\",\n    doc.DocumentId, newContent, updatedHeaders)\nif err != nil {\n    log.Fatal(err)\n}\n```\n\n### 3. Document Processing Status\n\n```go\n// Check processing status\nstatus, err := client.GetTubDocumentStatus(ctx, \"my-documents\", doc.DocumentId)\nif err != nil {\n    log.Fatal(err)\n}\nfmt.Printf(\"Status: %s\\n\", status.Status) // \"pending\", \"processing\", \"completed\", \"failed\"\n\n// Wait for completion (example helper function)\nfunc waitForCompletion(client ragnar.Client, tub, docId string) error {\n    for {\n        status, err := client.GetTubDocumentStatus(ctx, tub, docId)\n        if err != nil {\n            return err\n        }\n\n        switch status.Status {\n        case \"completed\":\n            return nil\n        case \"failed\":\n            return fmt.Errorf(\"document processing failed\")\n        default:\n            time.Sleep(5 * time.Second)\n        }\n    }\n}\n```\n\n### 4. Working with Document Chunks\n\n```go\n// Get all chunks from a document\nchunks, err := client.GetTubDocumentChunks(ctx, \"my-documents\", doc.DocumentId, 50, 0)\nif err != nil {\n    log.Fatal(err)\n}\n\nfor i, chunk := range chunks {\n    fmt.Printf(\"Chunk %d: %s\\n\", i, chunk.Content[:100]) // First 100 chars\n}\n\n// Get specific chunk by index\nchunk, err := client.GetTubDocumentChunk(ctx, \"my-documents\", doc.DocumentId, 0)\nif err != nil {\n    log.Fatal(err)\n}\nfmt.Printf(\"First chunk: %s\\n\", chunk.Content)\n```\n\n### 5. Vector Search\n\n```go\n// Perform semantic search across document chunks\nquery := \"What is the project timeline?\"\nsearchResults, err := client.SearchTubDocumentChunks(\n    ctx, \"my-documents\", query, nil, 10, 0)\nif err != nil {\n    log.Fatal(err)\n}\n\nfmt.Printf(\"Found %d matching chunks:\\n\", len(searchResults))\nfor i, chunk := range searchResults {\n    fmt.Printf(\"%d. %s (Doc: %s)\\n\", i+1, chunk.Content, chunk.DocumentId)\n}\n\n// Search with document filtering\nfilter := map[string]any{\n    \"project-id\": []string{\"proj-123\", \"proj-456\"},\n    \"department\": \"engineering\",\n}\n\nfilteredResults, err := client.SearchTubDocumentChunks(\n    ctx, \"my-documents\", query, filter, 5, 0)\nif err != nil {\n    log.Fatal(err)\n}\n```\n\n### 6. Listing, Filtering, and Sorting Documents\n\n```go\n// List all documents in a tub\ndocs, err := client.GetTubDocuments(ctx, \"my-documents\", nil, nil, 20, 0)\nif err != nil {\n    log.Fatal(err)\n}\n\n// Filter documents by headers\nfilter := ragnar.NewDocumentFilter().\n    WithEqual(\"project-id\", \"proj-123\").\n    WithIn(\"content-type\", []string{\"application/pdf\", \"text/plain\"})\n\nfilteredDocs, err := client.GetTubDocuments(ctx, \"my-documents\", filter, nil, 10, 0)\nif err != nil {\n    log.Fatal(err)\n}\n\n// Sort documents by created_at (descending) and then by a header field\nsort := ragnar.NewDocumentSort().\n    WithCreatedAt(ragnar.SortDesc).\n    WithFieldAsc(\"priority\", ragnar.ValueTypeInteger)\n\nsortedDocs, err := client.GetTubDocuments(ctx, \"my-documents\", nil, sort, 10, 0)\nif err != nil {\n    log.Fatal(err)\n}\n\n// Combine filtering and sorting\nfilteredAndSorted, err := client.GetTubDocuments(ctx, \"my-documents\", filter, sort, 10, 0)\nif err != nil {\n    log.Fatal(err)\n}\n\n// Get specific document metadata\ndoc, err := client.GetTubDocument(ctx, \"my-documents\", documentId)\nif err != nil {\n    log.Fatal(err)\n}\nfmt.Printf(\"Document: %s, Created: %v\\n\", doc.DocumentId, doc.CreatedAt)\n```\n\n### 7. Error Handling\n\n```go\n// The client returns descriptive HTTP errors\ndoc, err := client.GetTubDocument(ctx, \"nonexistent\", \"doc-id\")\nif err != nil {\n    if strings.Contains(err.Error(), \"HTTP 404\") {\n        fmt.Println(\"Document not found\")\n    } else if strings.Contains(err.Error(), \"HTTP 401\") {\n        fmt.Println(\"Unauthorized - check your access key\")\n    } else {\n        fmt.Printf(\"Other error: %v\\n\", err)\n    }\n}\n```\n\n## 🔑 Authentication\n\nRagnar uses Bearer token authentication. Access tokens control permissions:\n\n```go\n// Access tokens have specific permissions\ntype AccessToken struct {\n    AccessKeyId     string\n    TokenName       string\n    AllowCreateTubs bool  // Can create new tubs\n    AllowReadTubs   bool  // Can list tubs\n    CreatedAt       time.Time\n    UpdatedAt       time.Time\n}\n```\n\n## 🗂️ Document Headers\n\nDocuments can include custom headers for metadata and filtering:\n\n```go\nheaders := map[string]string{\n    \"Content-Type\":         \"application/pdf\",\n    \"x-ragnar-filename\":    \"report.pdf\",\n    \"x-ragnar-project-id\":  \"proj-123\",\n    \"x-ragnar-department\":  \"research\",\n    \"x-ragnar-author\":      \"john.doe\",\n    \"x-ragnar-version\":     \"1.0\",\n}\n```\n\nHeaders prefixed with `x-ragnar-` are automatically stored and can be used for filtering.\n\n## 🔧 Configuration\n\n### Environment Variables\n\n```bash\n# Database\nRAGNAR_DB_URI=\"postgres://user:pass@localhost/ragnar?sslmode=disable\"\n\n# Storage (MinIO/S3)\nRAGNAR_S3_ENDPOINT=\"localhost:9000\"\nRAGNAR_S3_BUCKET=\"ragnar-documents\"\nRAGNAR_S3_ACCESS_KEY=\"access-key\"\nRAGNAR_S3_SECRET_KEY=\"secret-key\"\n\n# AI/Bellman Integration\nRAGNAR_BELLMAN_URI=\"https://bellman.example.com\"\nRAGNAR_BELLMAN_NAME=\"ragnar-instance\"\nRAGNAR_BELLMAN_KEY=\"bellman-api-key\"\n\n# Server\nRAGNAR_HTTP_PORT=8080\nRAGNAR_PRODUCTION=false\n```\n\n## 📊 API Endpoints\n\nThe service provides a complete REST API:\n\n- `GET /tubs` - List tubs\n- `POST /tubs` - Create tub\n- `GET /tubs/{tub}` - Get tub info\n- `PUT /tubs/{tub}` - Update tub\n- `DELETE /tubs/{tub}` - Delete tub\n- `GET /tubs/{tub}/documents` - List documents\n- `POST /tubs/{tub}/documents` - Upload document\n- `GET /tubs/{tub}/documents/{id}` - Get document\n- `PUT /tubs/{tub}/documents/{id}` - Update document\n- `DELETE /tubs/{tub}/documents/{id}` - Delete document\n- `GET /tubs/{tub}/documents/{id}/download` - Download original\n- `GET /tubs/{tub}/documents/{id}/download/markdown` - Download markdown\n- `GET /tubs/{tub}/documents/{id}/status` - Processing status\n- `GET /tubs/{tub}/documents/{id}/chunks` - Get chunks\n- `GET /search/xnn/{tub}` - Vector search\n\nOpenAPI documentation available at `/.well-known/openapi.json`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodfin%2Fragnar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodfin%2Fragnar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodfin%2Fragnar/lists"}