{"id":26856484,"url":"https://github.com/dontizi/rlama","last_synced_at":"2026-01-14T20:39:06.612Z","repository":{"id":281273775,"uuid":"943027712","full_name":"DonTizi/rlama","owner":"DonTizi","description":"A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems for all your document needs.","archived":false,"fork":false,"pushed_at":"2025-03-23T04:50:47.000Z","size":50388,"stargazers_count":867,"open_issues_count":16,"forks_count":56,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-03-23T05:24:47.572Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://rlama.dev/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DonTizi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-05T04:03:37.000Z","updated_at":"2025-03-23T03:11:25.000Z","dependencies_parsed_at":"2025-03-08T01:23:08.895Z","dependency_job_id":"ec2f8dff-3b92-4aad-8a74-a2a0489a7f28","html_url":"https://github.com/DonTizi/rlama","commit_stats":null,"previous_names":["dontizi/rlama"],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonTizi%2Frlama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonTizi%2Frlama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonTizi%2Frlama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonTizi%2Frlama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DonTizi","download_url":"https://codeload.github.com/DonTizi/rlama/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246395595,"owners_count":20770243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-31T00:02:24.057Z","updated_at":"2026-01-14T20:39:06.600Z","avatar_url":"https://github.com/DonTizi.png","language":"Go","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"\u003c!-- Social Links Navigation Bar --\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://x.com/LeDonTizi\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge\u0026logo=twitter\u0026logoColor=white\" alt=\"Twitter\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://discord.gg/tP5JB9DR\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Discord-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white\" alt=\"Discord\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.youtube.com/@Dontizi\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/YouTube-FF0000?style=for-the-badge\u0026logo=youtube\u0026logoColor=white\" alt=\"YouTube\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n# RLAMA - User Guide\n\n\u003e **⚠️ Project Temporarily Paused**  \n\u003e This project is currently on pause due to my work and university commitments that take up a lot of my time. I am not able to actively maintain this project at the moment. Development will resume when my situation allows it.\n\nRLAMA is a powerful AI-driven question-answering tool for your documents, seamlessly integrating with your local Ollama models. It enables you to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to your documentation needs.\n\n\n[![RLAMA Demonstration](https://img.youtube.com/vi/EIsQnBqeQxQ/0.jpg)](https://www.youtube.com/watch?v=EIsQnBqeQxQ)\n\n## Table of Contents\n- [Vision \u0026 Roadmap](#vision--roadmap)\n- [Installation](#installation)\n- [Available Commands](#available-commands)\n  - [rag - Create a RAG system](#rag---create-a-rag-system)\n  - [crawl-rag - Create a RAG system from a website](#crawl-rag---create-a-rag-system-from-a-website)\n  - [wizard - Create a RAG system with interactive setup](#wizard---create-a-rag-system-with-interactive-setup)\n  - [watch - Set up directory watching for a RAG system](#watch---set-up-directory-watching-for-a-rag-system)\n  - [watch-off - Disable directory watching for a RAG system](#watch-off---disable-directory-watching-for-a-rag-system)\n  - [check-watched - Check a RAG's watched directory for new files](#check-watched---check-a-rags-watched-directory-for-new-files)\n  - [web-watch - Set up website monitoring for a RAG system](#web-watch---set-up-website-monitoring-for-a-rag-system)\n  - [web-watch-off - Disable website monitoring for a RAG system](#web-watch-off---disable-website-monitoring-for-a-rag-system)\n  - [check-web-watched - Check a RAG's monitored website for updates](#check-web-watched---check-a-rags-monitored-website-for-updates)\n  - [run - Use a RAG system](#run---use-a-rag-system)\n  - [api - Start API server](#api---start-api-server)\n  - [list - List RAG systems](#list---list-rag-systems)\n  - [delete - Delete a RAG system](#delete---delete-a-rag-system)\n  - [list-docs - List documents in a RAG](#list-docs---list-documents-in-a-rag)\n  - [list-chunks - Inspect document chunks](#list-chunks---inspect-document-chunks)\n  - [view-chunk - View chunk details](#view-chunk---view-chunk-details)\n  - [add-docs - Add documents to RAG](#add-docs---add-documents-to-rag)\n  - [crawl-add-docs - Add website content to RAG](#crawl-add-docs---add-website-content-to-rag)\n  - [update-model - Change LLM model](#update-model---change-llm-model)\n  - [update - Update RLAMA](#update---update-rlama)\n  - [version - Display version](#version---display-version)\n  - [hf-browse - Browse GGUF models on Hugging Face](#hf-browse---browse-gguf-models-on-hugging-face)\n  - [run-hf - Run a Hugging Face GGUF model](#run-hf---run-a-hugging-face-gguf-model)\n- [Uninstallation](#uninstallation)\n- [Supported Document Formats](#supported-document-formats)\n- [Troubleshooting](#troubleshooting)\n- [Using OpenAI Models](#using-openai-models)\n\n## Vision \u0026 Roadmap\nRLAMA aims to become the definitive tool for creating local RAG systems that work seamlessly for everyone—from individual developers to large enterprises. Here's our strategic roadmap:\n\n### Completed Features ✅\n- ✅ **Basic RAG System Creation**: CLI tool for creating and managing RAG systems\n- ✅ **Document Processing**: Support for multiple document formats (.txt, .md, .pdf, etc.)\n- ✅ **Document Chunking**: Advanced semantic chunking with multiple strategies (fixed, semantic, hierarchical, hybrid)\n- ✅ **Vector Storage**: Local storage of document embeddings\n- ✅ **Context Retrieval**: Basic semantic search with configurable context size\n- ✅ **Ollama Integration**: Seamless connection to Ollama models\n- ✅ **Cross-Platform Support**: Works on Linux, macOS, and Windows\n- ✅ **Easy Installation**: One-line installation script\n- ✅ **API Server**: HTTP endpoints for integrating RAG capabilities in other applications\n- ✅ **Web Crawling**: Create RAGs directly from websites\n- ✅ **Guided RAG Setup Wizard**: Interactive interface for easy RAG creation\n- ✅ **Hugging Face Integration**: Access to 45,000+ GGUF models from Hugging Face Hub\n\n### Small LLM Optimization (Q2 2025)\n- [ ] **Prompt Compression**: Smart context summarization for limited context windows\n- ✅ **Adaptive Chunking**: Dynamic content segmentation based on semantic boundaries and document structure\n- ✅ **Minimal Context Retrieval**: Intelligent filtering to eliminate redundant content\n- [ ] **Parameter Optimization**: Fine-tuned settings for different model sizes\n\n### Advanced Embedding Pipeline (Q2-Q3 2025)\n- [ ] **Multi-Model Embedding Support**: Integration with various embedding models\n- [ ] **Hybrid Retrieval Techniques**: Combining sparse and dense retrievers for better accuracy\n- [ ] **Embedding Evaluation Tools**: Built-in metrics to measure retrieval quality\n- [ ] **Automated Embedding Cache**: Smart caching to reduce computation for similar queries\n\n### User Experience Enhancements (Q3 2025)\n- [ ] **Lightweight Web Interface**: Simple browser-based UI for the existing CLI backend\n- [ ] **Knowledge Graph Visualization**: Interactive exploration of document connections\n- [ ] **Domain-Specific Templates**: Pre-configured settings for different domains\n\n### Enterprise Features (Q4 2025)\n- [ ] **Multi-User Access Control**: Role-based permissions for team environments\n- [ ] **Integration with Enterprise Systems**: Connectors for SharePoint, Confluence, Google Workspace\n- [ ] **Knowledge Quality Monitoring**: Detection of outdated or contradictory information\n- [ ] **System Integration API**: Webhooks and APIs for embedding RLAMA in existing workflows\n- [ ] **AI Agent Creation Framework**: Simplified system for building custom AI agents with RAG capabilities\n\n### Next-Gen Retrieval Innovations (Q1 2026)\n- [ ] **Multi-Step Retrieval**: Using the LLM to refine search queries for complex questions\n- [ ] **Cross-Modal Retrieval**: Support for image content understanding and retrieval\n- [ ] **Feedback-Based Optimization**: Learning from user interactions to improve retrieval\n- [ ] **Knowledge Graphs \u0026 Symbolic Reasoning**: Combining vector search with structured knowledge\n\nRLAMA's core philosophy remains unchanged: to provide a simple, powerful, local RAG solution that respects privacy, minimizes resource requirements, and works seamlessly across platforms.\n\n## Installation\n\n### Prerequisites\n- [Ollama](https://ollama.ai/) installed and running\n\n### Installation from terminal\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh\n```\n\n## Tech Stack\n\nRLAMA is built with:\n\n- **Core Language**: Go (chosen for performance, cross-platform compatibility, and single binary distribution)\n- **CLI Framework**: Cobra (for command-line interface structure)\n- **LLM Integration**: Ollama API (for embeddings and completions)\n- **Storage**: Local filesystem-based storage (JSON files for simplicity and portability)\n- **Vector Search**: Custom implementation of cosine similarity for embedding retrieval\n\n## Architecture\n\nRLAMA follows a clean architecture pattern with clear separation of concerns:\n\n```\nrlama/\n├── cmd/                  # CLI commands (using Cobra)\n│   ├── root.go           # Base command\n│   ├── rag.go            # Create RAG systems\n│   ├── run.go            # Query RAG systems\n│   └── ...\n├── internal/\n│   ├── client/           # External API clients\n│   │   └── ollama_client.go # Ollama API integration\n│   ├── domain/           # Core domain models\n│   │   ├── rag.go        # RAG system entity\n│   │   └── document.go   # Document entity\n│   ├── repository/       # Data persistence\n│   │   └── rag_repository.go # Handles saving/loading RAGs\n│   └── service/          # Business logic\n│       ├── rag_service.go      # RAG operations\n│       ├── document_loader.go  # Document processing\n│       └── embedding_service.go # Vector embeddings\n└── pkg/                  # Shared utilities\n    └── vector/           # Vector operations\n```\n\n## Data Flow\n\n1. **Document Processing**: Documents are loaded from the file system, parsed based on their type, and converted to plain text.\n2. **Embedding Generation**: Document text is sent to Ollama to generate vector embeddings.\n3. **Storage**: The RAG system (documents + embeddings) is stored in the user's home directory (~/.rlama).\n4. **Query Process**: When a user asks a question, it's converted to an embedding, compared against stored document embeddings, and relevant content is retrieved.\n5. **Response Generation**: Retrieved content and the question are sent to Ollama to generate a contextually-informed response.\n\n## Visual Representation\n\n```\n┌─────────────┐     ┌─────────────┐     ┌─────────────┐\n│  Documents  │────\u003e│  Document   │────\u003e│  Embedding  │\n│  (Input)    │     │  Processing │     │  Generation │\n└─────────────┘     └─────────────┘     └─────────────┘\n                                              │\n                                              ▼\n┌─────────────┐     ┌─────────────┐     ┌─────────────┐\n│   Query     │────\u003e│  Vector     │\u003c────│ Vector Store│\n│  Response   │     │  Search     │     │ (RAG System)│\n└─────────────┘     └─────────────┘     └─────────────┘\n       ▲                   │\n       │                   ▼\n┌─────────────┐     ┌─────────────┐\n│   Ollama    │\u003c────│   Context   │\n│    LLM      │     │  Building   │\n└─────────────┘     └─────────────┘\n```\n\nRLAMA is designed to be lightweight and portable, focusing on providing RAG capabilities with minimal dependencies. The entire system runs locally, with the only external dependency being Ollama for LLM capabilities.\n\n## Available Commands\n\nYou can get help on all commands by using:\n\n```bash\nrlama --help\n```\n\n### Global Flags\n\nThese flags can be used with any command:\n\n```bash\n--host string       Ollama host (default: localhost)\n--port string       Ollama port (default: 11434)\n--num-thread int    Number of threads for Ollama to use (default: 0, use Ollama default)\n```\n\n**Performance Optimization:**\n- Use `--num-thread 16` (or your CPU core count) to potentially improve processing speed\n- Ollama often uses half the available cores by default\n- Setting this to your full core count can significantly speed up text generation and embeddings\n\n**Usage Examples:**\n```bash\n# Use 16 threads for better performance\nrlama --num-thread 16 run my-docs\n\n# Create a RAG with optimized thread usage\nrlama --num-thread 16 rag llama3 documentation ./docs\n\n# Run with custom host and thread settings\nrlama --host 192.168.1.100 --port 11434 --num-thread 16 run my-rag\n```\n\n### Custom Data Directory\n\nRLAMA stores data in `~/.rlama` by default. To use a different location:\n\n1. **Command-line flag** (highest priority):\n   ```bash\n   # Use with any command\n   rlama --data-dir /path/to/custom/directory run my-rag\n   ```\n\n2. **Environment variable**:\n   ```bash\n   # Set the environment variable\n   export RLAMA_DATA_DIR=/path/to/custom/directory\n   rlama run my-rag\n   ```\n\nThe precedence order is: command-line flag \u003e environment variable \u003e default location.\n\n### rag - Create a RAG system\n\nCreates a new RAG system by indexing all documents in the specified folder.\n\n```bash\nrlama rag [model] [rag-name] [folder-path]\n```\n\n**Parameters:**\n- `model`: Name of the Ollama model to use (e.g., llama3, mistral, gemma) or a Hugging Face model using the format `hf.co/username/repository[:quantization]`.\n- `rag-name`: Unique name to identify your RAG system.\n- `folder-path`: Path to the folder containing your documents.\n\n**Example:**\n\n```bash\n# Using a standard Ollama model\nrlama rag llama3 documentation ./docs\n\n# Using a Hugging Face model\nrlama rag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF my-rag ./docs\n\n# Using a Hugging Face model with specific quantization\nrlama rag hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q5_K_M my-rag ./docs\n```\n\n### crawl-rag - Create a RAG system from a website\n\nCreates a new RAG system by crawling a website and indexing its content.\n\n```bash\nrlama crawl-rag [model] [rag-name] [website-url]\n```\n\n**Parameters:**\n- `model`: Name of the Ollama model to use (e.g., llama3, mistral, gemma).\n- `rag-name`: Unique name to identify your RAG system.\n- `website-url`: URL of the website to crawl and index.\n\n**Options:**\n- `--max-depth`: Maximum crawl depth (default: 2)\n- `--concurrency`: Number of concurrent crawlers (default: 5)\n- `--exclude-path`: Paths to exclude from crawling (comma-separated)\n- `--chunk-size`: Character count per chunk (default: 1000)\n- `--chunk-overlap`: Overlap between chunks in characters (default: 200)\n- `--chunking-strategy`: Chunking strategy to use (options: \"fixed\", \"semantic\", \"hybrid\", \"hierarchical\", default: \"hybrid\")\n\n#### Chunking Strategies\n\nRLAMA offers multiple advanced chunking strategies to optimize document retrieval:\n\n- **Fixed**: Traditional chunking with fixed size and overlap, respecting sentence boundaries when possible.\n- **Semantic**: Intelligently splits documents based on semantic boundaries like headings, paragraphs, and natural topic shifts.\n- **Hybrid**: Automatically selects the best strategy based on document type and content (markdown, HTML, code, or plain text).\n- **Hierarchical**: For very long documents, creates a two-level chunking structure with major sections and sub-chunks.\n\nThe system automatically adapts to different document types:\n- Markdown documents: Split by headers and sections\n- HTML documents: Split by semantic HTML elements\n- Code documents: Split by functions, classes, and logical blocks\n- Plain text: Split by paragraphs with contextual overlap\n\n**Example:**\n\n```bash\n# Create a new RAG from a documentation website\nrlama crawl-rag llama3 docs-rag https://docs.example.com\n\n# Customize crawling behavior\nrlama crawl-rag llama3 blog-rag https://blog.example.com --max-depth=3 --exclude-path=/archive,/tags\n\n# Create a RAG with semantic chunking\nrlama rag llama3 documentation ./docs --chunking-strategy=semantic\n\n# Use hierarchical chunking for large documents\nrlama rag llama3 book-rag ./books --chunking-strategy=hierarchical\n```\n\n### wizard - Create a RAG system with interactive setup\n\nProvides an interactive step-by-step wizard for creating a new RAG system.\n\n```bash\nrlama wizard\n```\n\nThe wizard guides you through:\n- Naming your RAG\n- Choosing an Ollama model\n- Selecting document sources (local folder or website)\n- Configuring chunking parameters\n- Setting up file filtering\n\n**Example:**\n\n```bash\nrlama wizard\n# Follow the prompts to create your customized RAG\n```\n\n### watch - Set up directory watching for a RAG system\n\nConfigure a RAG system to automatically watch a directory for new files and add them to the RAG.\n\n```bash\nrlama watch [rag-name] [directory-path] [interval]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to watch.\n- `directory-path`: Path to the directory to watch for new files.\n- `interval`: Time in minutes to check for new files (use 0 to check only when the RAG is used).\n\n**Example:**\n\n```bash\n# Set up directory watching to check every 60 minutes\nrlama watch my-docs ./watched-folder 60\n\n# Set up directory watching to only check when the RAG is used\nrlama watch my-docs ./watched-folder 0\n\n# Customize what files to watch\nrlama watch my-docs ./watched-folder 30 --exclude-dir=node_modules,tmp --process-ext=.md,.txt\n```\n\n### watch-off - Disable directory watching for a RAG system\n\nDisable automatic directory watching for a RAG system.\n\n```bash\nrlama watch-off [rag-name]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to disable watching.\n\n**Example:**\n\n```bash\nrlama watch-off my-docs\n```\n\n### check-watched - Check a RAG's watched directory for new files\n\nManually check a RAG's watched directory for new files and add them to the RAG.\n\n```bash\nrlama check-watched [rag-name]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to check.\n\n**Example:**\n\n```bash\nrlama check-watched my-docs\n```\n\n### web-watch - Set up website monitoring for a RAG system\n\nConfigure a RAG system to automatically monitor a website for updates and add new content to the RAG.\n\n```bash\nrlama web-watch [rag-name] [website-url] [interval]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to monitor.\n- `website-url`: URL of the website to monitor.\n- `interval`: Time in minutes between checks (use 0 to check only when the RAG is used).\n\n**Example:**\n\n```bash\n# Set up website monitoring to check every 60 minutes\nrlama web-watch my-docs https://example.com 60\n\n# Set up website monitoring to only check when the RAG is used\nrlama web-watch my-docs https://example.com 0\n\n# Customize what content to monitor\nrlama web-watch my-docs https://example.com 30 --exclude-path=/archive,/tags\n```\n\n### web-watch-off - Disable website monitoring for a RAG system\n\nDisable automatic website monitoring for a RAG system.\n\n```bash\nrlama web-watch-off [rag-name]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to disable monitoring.\n\n**Example:**\n\n```bash\nrlama web-watch-off my-docs\n```\n\n### check-web-watched - Check a RAG's monitored website for updates\n\nManually check a RAG's monitored website for new updates and add them to the RAG.\n\n```bash\nrlama check-web-watched [rag-name]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to check.\n\n**Example:**\n\n```bash\nrlama check-web-watched my-docs\n```\n\n### run - Use a RAG system\n\nStarts an interactive session to interact with an existing RAG system.\n\n```bash\nrlama run [rag-name]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to use.\n- `--context-size`: (Optional) Number of context chunks to retrieve (default: 20)\n\n**Example:**\n\n```bash\nrlama run documentation\n\u003e How do I install the project?\n\u003e What are the main features?\n\u003e exit\n```\n\n**Context Size Tips:**\n- Smaller values (5-15) for faster responses with key information\n- Medium values (20-40) for balanced performance\n- Larger values (50+) for complex questions needing broad context\n- Consider your model's context window limits\n\n```bash\nrlama run documentation --context-size=50  # Use 50 context chunks\n```\n\n### api - Start API server\n\nStarts an HTTP API server that exposes RLAMA's functionality through RESTful endpoints.\n\n```bash\nrlama api [--port PORT]\n```\n\n**Parameters:**\n- `--port`: (Optional) Port number to run the API server on (default: 11249)\n\n**Example:**\n\n```bash\nrlama api --port 8080\n```\n\n**Available Endpoints:**\n\n1. **Query a RAG system** - `POST /rag`\n   ```bash\n   curl -X POST http://localhost:11249/rag \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\n       \"rag_name\": \"documentation\",\n       \"prompt\": \"How do I install the project?\",\n       \"context_size\": 20\n     }'\n   ```\n\n   Request fields:\n   - `rag_name` (required): Name of the RAG system to query\n   - `prompt` (required): Question or prompt to send to the RAG\n   - `context_size` (optional): Number of chunks to include in context\n   - `model` (optional): Override the model used by the RAG\n\n2. **Check server health** - `GET /health`\n   ```bash\n   curl http://localhost:11249/health\n   ```\n\n**Integration Example:**\n```javascript\n// Node.js example\nconst response = await fetch('http://localhost:11249/rag', {\n  method: 'POST',\n  headers: { 'Content-Type': 'application/json' },\n  body: JSON.stringify({\n    rag_name: 'my-docs',\n    prompt: 'Summarize the key features'\n  })\n});\nconst data = await response.json();\nconsole.log(data.response);\n```\n\n### list - List RAG systems\n\nDisplays a list of all available RAG systems.\n\n```bash\nrlama list\n```\n\n### delete - Delete a RAG system\n\nPermanently deletes a RAG system and all its indexed documents.\n\n```bash\nrlama delete [rag-name] [--force/-f]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system to delete.\n- `--force` or `-f`: (Optional) Delete without asking for confirmation.\n\n**Example:**\n\n```bash\nrlama delete old-project\n```\n\nOr to delete without confirmation:\n\n```bash\nrlama delete old-project --force\n```\n\n### list-docs - List documents in a RAG\n\nDisplays all documents in a RAG system with metadata.\n\n```bash\nrlama list-docs [rag-name]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system\n\n**Example:**\n\n```bash\nrlama list-docs documentation\n```\n\n### list-chunks - Inspect document chunks\n\nList and filter document chunks in a RAG system with various options:\n\n```bash\n# Basic chunk listing\nrlama list-chunks [rag-name]\n\n# With content preview (shows first 100 characters)\nrlama list-chunks [rag-name] --show-content\n\n# Filter by document name/ID substring\nrlama list-chunks [rag-name] --document=readme\n\n# Combine options\nrlama list-chunks [rag-name] --document=api --show-content\n```\n\n**Options:**\n- `--show-content`: Display chunk content preview\n- `--document`: Filter by document name/ID substring\n\n**Output columns:**\n- Chunk ID (use with view-chunk command)\n- Document Source\n- Chunk Position (e.g., \"2/5\" for second of five chunks)\n- Content Preview (if enabled)\n- Created Date\n\n### view-chunk - View chunk details\n\nDisplay detailed information about a specific chunk.\n\n```bash\nrlama view-chunk [rag-name] [chunk-id]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system\n- `chunk-id`: Chunk identifier from list-chunks\n\n**Example:**\n\n```bash\nrlama view-chunk documentation doc123_chunk_0\n```\n\n### add-docs - Add documents to RAG\n\nAdd new documents to an existing RAG system.\n\n```bash\nrlama add-docs [rag-name] [folder-path] [flags]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system\n- `folder-path`: Path to documents folder\n\n**Example:**\n\n```bash\nrlama add-docs documentation ./new-docs --exclude-ext=.tmp\n```\n\n### crawl-add-docs - Add website content to RAG\n\nAdd content from a website to an existing RAG system.\n\n```bash\nrlama crawl-add-docs [rag-name] [website-url]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system\n- `website-url`: URL of the website to crawl and add to the RAG\n\n**Options:**\n- `--max-depth`: Maximum crawl depth (default: 2)\n- `--concurrency`: Number of concurrent crawlers (default: 5)\n- `--exclude-path`: Paths to exclude from crawling (comma-separated)\n- `--chunk-size`: Character count per chunk (default: 1000)\n- `--chunk-overlap`: Overlap between chunks in characters (default: 200)\n\n**Example:**\n\n```bash\n# Add blog content to an existing RAG\nrlama crawl-add-docs my-docs https://blog.example.com\n\n# Customize crawling behavior\nrlama crawl-add-docs knowledge-base https://docs.example.com --max-depth=1 --exclude-path=/api\n```\n\n### update-model - Change LLM model\n\nUpdate the LLM model used by a RAG system.\n\n```bash\nrlama update-model [rag-name] [new-model]\n```\n\n**Parameters:**\n- `rag-name`: Name of the RAG system\n- `new-model`: New Ollama model name\n\n**Example:**\n\n```bash\nrlama update-model documentation deepseek-r1:7b-instruct\n```\n\n### update - Update RLAMA\n\nChecks if a new version of RLAMA is available and installs it.\n\n```bash\nrlama update [--force/-f]\n```\n\n**Options:**\n- `--force` or `-f`: (Optional) Update without asking for confirmation.\n\n### version - Display version\n\nDisplays the current version of RLAMA.\n\n```bash\nrlama --version\n```\n\nor\n\n```bash\nrlama -v\n```\n\n### hf-browse - Browse GGUF models on Hugging Face\n\nSearch and browse GGUF models available on Hugging Face.\n\n```bash\nrlama hf-browse [search-term] [flags]\n```\n\n**Parameters:**\n- `search-term`: (Optional) Term to search for (e.g., \"llama3\", \"mistral\")\n\n**Flags:**\n- `--open`: Open the search results in your default web browser\n- `--quant`: Specify quantization type to suggest (e.g., Q4_K_M, Q5_K_M)\n- `--limit`: Limit number of results (default: 10)\n\n**Examples:**\n\n```bash\n# Search for GGUF models and show command-line help\nrlama hf-browse \"llama 3\"\n\n# Open browser with search results\nrlama hf-browse mistral --open\n\n# Search with specific quantization suggestion\nrlama hf-browse phi --quant Q4_K_M\n```\n\n### run-hf - Run a Hugging Face GGUF model\n\nRun a Hugging Face GGUF model directly using Ollama. This is useful for testing models before creating a RAG system with them.\n\n```bash\nrlama run-hf [huggingface-model] [flags]\n```\n\n**Parameters:**\n- `huggingface-model`: Hugging Face model path in the format `username/repository`\n\n**Flags:**\n- `--quant`: Quantization to use (e.g., Q4_K_M, Q5_K_M)\n\n**Examples:**\n\n```bash\n# Try a model in chat mode\nrlama run-hf bartowski/Llama-3.2-1B-Instruct-GGUF\n\n# Specify quantization\nrlama run-hf mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF --quant Q5_K_M\n```\n\n## Uninstallation\n\nTo uninstall RLAMA:\n\n### Removing the binary\n\nIf you installed via `go install`:\n\n```bash\nrlama uninstall\n```\n\n### Removing data\n\nRLAMA stores its data in `~/.rlama`. To remove it:\n\n```bash\nrm -rf ~/.rlama\n```\n\n## Supported Document Formats\n\nRLAMA supports many file formats:\n\n- **Text**: `.txt`, `.md`, `.html`, `.json`, `.csv`, `.yaml`, `.yml`, `.xml`, `.org`\n- **Code**: `.go`, `.py`, `.js`, `.java`, `.c`, `.cpp`, `.cxx`, `.h`, `.rb`, `.php`, `.rs`, `.swift`, `.kt`, `.ts`, `.tsx`, `.f`, `.F`, `.F90`, `.el`, `.svelte`\n- **Documents**: `.pdf`, `.docx`, `.doc`, `.rtf`, `.odt`, `.pptx`, `.ppt`, `.xlsx`, `.xls`, `.epub`\n\nInstalling dependencies via `install_deps.sh` is recommended to improve support for certain formats.\n\n## Troubleshooting\n\n### Ollama is not accessible\n\nIf you encounter connection errors to Ollama:\n1. Check that Ollama is running.\n2. By default, Ollama must be accessible at `http://localhost:11434` or the host and port specified by the OLLAMA_HOST environment variable.\n3. If your Ollama instance is running on a different host or port, use the `--host` and `--port` flags:\n   ```bash\n   rlama --host 192.168.1.100 --port 8000 list\n   rlama --host my-ollama-server --port 11434 run my-rag\n   ```\n4. Check Ollama logs for potential errors.\n\n### Text extraction issues\n\nIf you encounter problems with certain formats:\n1. Install dependencies via `./scripts/install_deps.sh`.\n2. Verify that your system has the required tools (`pdftotext`, `tesseract`, etc.).\n\n### The RAG doesn't find relevant information\n\nIf the answers are not relevant:\n1. Check that the documents are properly indexed with `rlama list`.\n2. Make sure the content of the documents is properly extracted.\n3. Try rephrasing your question more precisely.\n4. Consider adjusting chunking parameters during RAG creation\n\n### Other issues\n\nFor any other issues, please open an issue on the [GitHub repository](https://github.com/dontizi/rlama/issues) providing:\n1. The exact command used.\n2. The complete output of the command.\n3. Your operating system and architecture.\n4. The RLAMA version (`rlama --version`).\n\n### Configuring Ollama Connection\n\nRLAMA provides multiple ways to connect to your Ollama instance:\n\n1. **Command-line flags** (highest priority):\n   ```bash\n   rlama --host 192.168.1.100 --port 8080 run my-rag\n   ```\n\n2. **Environment variable**:\n   ```bash\n   # Format: \"host:port\" or just \"host\"\n   export OLLAMA_HOST=remote-server:8080\n   rlama run my-rag\n   ```\n\n3. **Default values** (used if no other method is specified):\n   - Host: `localhost`\n   - Port: `11434`\n\nThe precedence order is: command-line flags \u003e environment variable \u003e default values.\n\n## Advanced Usage\n\n### Context Size Management\n\n```bash\n# Quick answers with minimal context\nrlama run my-docs --context-size=10\n\n# Deep analysis with maximum context\nrlama run my-docs --context-size=50\n\n# Balance between speed and depth\nrlama run my-docs --context-size=30\n```\n\n### RAG Creation with Filtering\n```bash\nrlama rag llama3 my-project ./code \\\n  --exclude-dir=node_modules,dist \\\n  --process-ext=.go,.ts \\\n  --exclude-ext=.spec.ts\n```\n\n### Chunk Inspection\n```bash\n# List chunks with content preview\nrlama list-chunks my-project --show-content\n\n# Filter chunks from specific document\nrlama list-chunks my-project --document=architecture\n```\n\n## Help System\n\nGet full command help:\n```bash\nrlama --help\n```\n\nCommand-specific help:\n```bash\nrlama rag --help\nrlama list-chunks --help\nrlama update-model --help\n```\n\nAll commands support the global `--host` and `--port` flags for custom Ollama connections.\n\nThe precedence order is: command-line flags \u003e environment variable \u003e default values.\n\n## Hugging Face Integration\n\nRLAMA now supports using GGUF models directly from Hugging Face through Ollama's native integration:\n\n### Browsing Hugging Face Models\n\n```bash\n# Search for GGUF models on Hugging Face\nrlama hf-browse \"llama 3\"\n\n# Open browser with search results\nrlama hf-browse mistral --open\n```\n\n### Testing a Model\n\nBefore creating a RAG, you can test a Hugging Face model directly:\n\n```bash\n# Try a model in chat mode\nrlama run-hf bartowski/Llama-3.2-1B-Instruct-GGUF\n\n# Specify quantization\nrlama run-hf mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF --quant Q5_K_M\n```\n\n### Creating a RAG with Hugging Face Models\n\nUse Hugging Face models when creating RAG systems:\n\n```bash\n# Create a RAG with a Hugging Face model\nrlama rag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF my-rag ./docs\n\n# Use specific quantization\nrlama rag hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q5_K_M my-rag ./docs\n```\n\n## Using OpenAI Models\n\nRLAMA supports using OpenAI models with two approaches:\n\n### Option 1: Default API Keys (Automatic Usage)\n\nSet your default OpenAI API key in the web interface or via environment variable. This key will be automatically used for all RLAMA commands without needing to specify a profile.\n\n**Via Web Interface:**\n1. Navigate to **Settings → Default API Keys**\n2. Enter your OpenAI API key (starts with `sk-`)\n3. Click **Save Default API Keys**\n\n**Via Environment Variable:**\n```bash\nexport OPENAI_API_KEY=\"your-api-key\"\n```\n\n**Usage with default keys:**\n```bash\n# These commands will automatically use your default OpenAI API key\nrlama rag o3-mini my-rag ./documents\nrlama rag gpt-4o another-rag ./docs\nrlama update-model my-rag gpt-4o\nrlama run my-rag\n```\n\n### Option 2: Named Profiles (Specific Usage)\n\nCreate named profiles for different OpenAI accounts or organizations. Use these when you need to switch between different API keys.\n\n**Create profiles:**\n```bash\n# Create profiles for different accounts\nrlama profile add work-account openai \"sk-work-api-key\"\nrlama profile add personal-account openai \"sk-personal-api-key\"\n```\n\n**Usage with named profiles:**\n```bash\n# Specify profile with --profile flag\nrlama rag o3-mini work-rag ./documents --profile work-account\nrlama rag gpt-4o personal-rag ./docs --profile personal-account\nrlama update-model my-rag gpt-4o --profile work-account\n```\n\n### Available OpenAI Models (Updated January 2025)\n\n#### Reasoning Models (o-series)\n| Model | Input Price | Output Price | Context | Description |\n|-------|------------|-------------|---------|-------------|\n| **o3-mini** ⭐ | $1.10/1M | $4.40/1M | 200K | Latest reasoning model, 93% cheaper than o1 |\n| o1-pro | $150.00/1M | $600.00/1M | 200K | Most powerful reasoning model (Enterprise) |\n| o1 | $15.00/1M | $60.00/1M | 200K | Advanced reasoning model |\n\n#### GPT-4 Series  \n| Model | Input Price | Output Price | Context | Description |\n|-------|------------|-------------|---------|-------------|\n| **GPT-4.5** 🆕 | $75.00/1M | $150.00/1M | 128K | Natural conversation, emotional intelligence |\n| **GPT-4.1** 🆕 | $30.00/1M | $60.00/1M | 1M | Latest GPT-4 with 1M context window |\n| **GPT-4.1-nano** 🆕 | $5.00/1M | $15.00/1M | 128K | Lightweight version of GPT-4.1 |\n| **GPT-4o** 🔥 | $5.00/1M | $15.00/1M | 128K | Multimodal with images and audio support |\n| **GPT-4o mini** 💰 | $0.15/1M | $0.60/1M | 128K | Efficient version of GPT-4o |\n\n#### GPT-3.5 Series\n| Model | Input Price | Output Price | Context | Description |\n|-------|------------|-------------|---------|-------------|\n| GPT-3.5 Turbo | $0.50/1M | $1.50/1M | 16K | Fast and economical model |\n\n**Legend:** ⭐ = Recommended, 🆕 = New (2025), 🔥 = Popular, 💰 = Budget-friendly\n\n**Cost Optimization Tips:**\n- Use context caching for 50% reduction on repeated content\n- Choose appropriate context window sizes\n- Test multiple models for your specific use case\n- Consider o3-mini for reasoning tasks at reduced cost\n\nNote: Only inference uses OpenAI API. Document embeddings still use Ollama for processing.\n\n## Managing API Profiles\n\n### Using Default Keys (Recommended for Most Users)\n\nFor most users, setting up default API keys is the simplest approach:\n\n**Via Web Interface:**\n1. Open RLAMA web interface\n2. Go to **Settings → Default API Keys** \n3. Enter your OpenAI API key\n4. Save the configuration\n\n**Commands will automatically use your default key:**\n```bash\n# No --profile needed - uses default key automatically\nrlama rag o3-mini my-rag ./documents\nrlama update-model my-rag gpt-4o\nrlama run my-rag\n```\n\n### Using Named Profiles (Advanced Users)\n\nFor users managing multiple OpenAI accounts or organizations:\n\n#### Creating Named Profiles\n\n**Via CLI:**\n```bash\n# Create profiles for different environments\nrlama profile add work-openai openai \"sk-work-key...\"\nrlama profile add personal-openai openai \"sk-personal-key...\"\n```\n\n**Via Web Interface:**\n1. Navigate to **Settings → Named Profiles**\n2. Click **\"New Profile\"**\n3. Fill in the profile details:\n   - **Name**: Unique identifier (e.g., `work-account`, `personal-account`)\n   - **Provider**: OpenAI (automatically selected)\n   - **API Key**: Your OpenAI API key (starts with `sk-`)\n   - **Description**: Optional description for the profile\n\n#### Managing Profiles\n\n```bash\n# List all profiles\nrlama profile list\n\n# Delete a profile\nrlama profile delete old-profile\n```\n\n#### Using Named Profiles\n\n```bash\n# Specify profile with --profile flag\nrlama rag gpt-4o work-rag ./documents --profile work-openai\nrlama rag o3-mini personal-rag ./documents --profile personal-openai\n\n# Update models with specific profiles\nrlama update-model work-rag gpt-4o --profile work-openai\nrlama update-model personal-rag o3-mini --profile personal-openai\n```\n\n### Web Interface Features\n\nThe RLAMA web interface provides:\n- **Real-time validation** of API key format\n- **Secure storage** with masked key display\n- **Integration examples** showing exact CLI commands\n- **Model pricing table** with latest 2025 rates\n- **Usage guidance** for both default keys and named profiles\n\n### Benefits of Each Approach\n\n**Default API Keys:**\n- ✅ Simple setup - configure once, use everywhere\n- ✅ No need to remember profile names\n- ✅ Automatic usage in all commands\n- ✅ Perfect for single OpenAI account users\n\n**Named Profiles:**\n- ✅ Multiple API keys management\n- ✅ Project-specific configurations\n- ✅ Environment separation (dev/staging/prod)\n- ✅ Organization account switching\n- ✅ Audit trail with usage tracking\n\n### Example Workflows\n\n#### Simple Workflow (Default Keys)\n```bash\n# 1. Set default API key in web interface (one-time setup)\n# 2. Use RLAMA commands directly - no profiles needed\nrlama rag o3-mini my-docs ./docs\nrlama run my-docs  # Uses default key automatically\n```\n\n#### Advanced Workflow (Named Profiles)\n```bash\n# 1. Create profiles for different environments\nrlama profile add dev-openai openai \"sk-dev-key...\"\nrlama profile add prod-openai openai \"sk-prod-key...\"\n\n# 2. Create RAGs with specific profiles\nrlama rag o3-mini dev-docs ./dev-docs --profile dev-openai\nrlama rag gpt-4o prod-docs ./prod-docs --profile prod-openai\n\n# 3. Use RAGs with their associated profiles\nrlama run dev-docs   # Must specify profile or use default\nrlama run prod-docs  # Profile is remembered per RAG\n```\n\nThis dual approach ensures RLAMA works seamlessly for both simple single-account usage and complex multi-account enterprise scenarios.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdontizi%2Frlama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdontizi%2Frlama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdontizi%2Frlama/lists"}