{"id":31794426,"url":"https://github.com/kanutocd/prescient","last_synced_at":"2026-04-19T07:33:39.071Z","repository":{"id":308291554,"uuid":"1032326846","full_name":"kanutocd/prescient","owner":"kanutocd","description":"Unified interface for AI providers including Ollama (local), Anthropic Claude, OpenAI GPT, and HuggingFace models","archived":false,"fork":false,"pushed_at":"2025-08-05T18:45:37.000Z","size":449,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-10T19:45:45.046Z","etag":null,"topics":["ai","anthropic-api","huggingface","llm","ollama","openai-api","retrieval-augmented-generation","rubygem"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kanutocd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-05T06:36:21.000Z","updated_at":"2025-08-11T04:43:18.000Z","dependencies_parsed_at":"2025-08-05T07:18:39.710Z","dependency_job_id":null,"html_url":"https://github.com/kanutocd/prescient","commit_stats":null,"previous_names":["kanutocd/prescient"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kanutocd/prescient","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kanutocd%2Fprescient","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kanutocd%2Fprescient/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kanutocd%2Fprescient/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kanutocd%2Fprescient/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kanutocd","download_url":"https://codeload.github.com/kanutocd/prescient/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kanutocd%2Fprescient/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31998936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"online","status_checked_at":"2026-04-19T02:00:07.110Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","anthropic-api","huggingface","llm","ollama","openai-api","retrieval-augmented-generation","rubygem"],"created_at":"2025-10-10T19:45:35.861Z","updated_at":"2026-04-19T07:33:39.063Z","avatar_url":"https://github.com/kanutocd.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Prescient\n\nPrescient provides a unified interface for AI providers including Ollama (local), Anthropic Claude, OpenAI GPT, and HuggingFace models. Built for prescient applications that need AI predictions with provider switching, error handling, and fallback mechanisms.\n\n## Features\n\n- **Unified Interface**: Single API for multiple AI providers\n- **Local and Cloud Support**: Ollama for local/private deployments, cloud APIs for scale\n- **Embedding Generation**: Vector embeddings for semantic search and AI applications\n- **Text Completion**: Chat completions with context support\n- **Error Handling**: Robust error handling with automatic retries\n- **Health Monitoring**: Built-in health checks for all providers\n- **Flexible Configuration**: Environment variable and programmatic configuration\n\n## Supported Providers\n\n### Ollama (Local)\n\n- **Models**: Any Ollama-compatible model (llama3.1, nomic-embed-text, etc.)\n- **Capabilities**: Embeddings, Text Generation, Model Management\n- **Use Case**: Privacy-focused, local deployments\n\n### Anthropic Claude\n\n- **Models**: Claude 3 (Haiku, Sonnet, Opus)\n- **Capabilities**: Text Generation only (no embeddings)\n- **Use Case**: High-quality conversational AI\n\n### OpenAI\n\n- **Models**: GPT-3.5, GPT-4, text-embedding-3-small/large\n- **Capabilities**: Embeddings, Text Generation\n- **Use Case**: Proven performance, wide model selection\n\n### HuggingFace\n\n- **Models**: sentence-transformers, open-source chat models\n- **Capabilities**: Embeddings, Text Generation\n- **Use Case**: Open-source models, research\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'prescient'\n```\n\nAnd then execute:\n\n```bash\nbundle install\n```\n\nOr install it yourself as:\n\n```bash\ngem install prescient\n```\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# Ollama (Local)\nOLLAMA_URL=http://localhost:11434\nOLLAMA_EMBEDDING_MODEL=nomic-embed-text\nOLLAMA_CHAT_MODEL=llama3.1:8b\n\n# Anthropic\nANTHROPIC_API_KEY=your_api_key\nANTHROPIC_MODEL=claude-3-haiku-20240307\n\n# OpenAI\nOPENAI_API_KEY=your_api_key\nOPENAI_EMBEDDING_MODEL=text-embedding-3-small\nOPENAI_CHAT_MODEL=gpt-3.5-turbo\n\n# HuggingFace\nHUGGINGFACE_API_KEY=your_api_key\nHUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2\nHUGGINGFACE_CHAT_MODEL=microsoft/DialoGPT-medium\n```\n\n### Programmatic Configuration\n\n```ruby\nrequire 'prescient'\n\n# Configure providers\nPrescient.configure do |config|\n  config.default_provider = :ollama\n  config.timeout = 60\n  config.retry_attempts = 3\n  config.retry_delay = 1.0\n\n  # Add custom Ollama configuration\n  config.add_provider(:ollama, Prescient::Ollama::Provider,\n    url: 'http://localhost:11434',\n    embedding_model: 'nomic-embed-text',\n    chat_model: 'llama3.1:8b'\n  )\n\n  # Add Anthropic\n  config.add_provider(:anthropic, Prescient::Anthropic::Provider,\n    api_key: ENV['ANTHROPIC_API_KEY'],\n    model: 'claude-3-haiku-20240307'\n  )\n\n  # Add OpenAI\n  config.add_provider(:openai, Prescient::OpenAI::Provider,\n    api_key: ENV['OPENAI_API_KEY'],\n    embedding_model: 'text-embedding-3-small',\n    chat_model: 'gpt-3.5-turbo'\n  )\nend\n```\n\n### Provider Fallback Configuration\n\nPrescient supports automatic fallback to backup providers when the primary provider fails. This ensures high availability for your AI applications.\n\n```ruby\nPrescient.configure do |config|\n  # Configure primary provider\n  config.add_provider(:primary, Prescient::Provider::OpenAI,\n    api_key: ENV['OPENAI_API_KEY'],\n    embedding_model: 'text-embedding-3-small',\n    chat_model: 'gpt-3.5-turbo'\n  )\n  \n  # Configure backup providers\n  config.add_provider(:backup1, Prescient::Provider::Anthropic,\n    api_key: ENV['ANTHROPIC_API_KEY'],\n    model: 'claude-3-haiku-20240307'\n  )\n  \n  config.add_provider(:backup2, Prescient::Provider::Ollama,\n    url: 'http://localhost:11434',\n    embedding_model: 'nomic-embed-text',\n    chat_model: 'llama3.1:8b'\n  )\n  \n  # Configure fallback order\n  config.fallback_providers = [:backup1, :backup2]\nend\n\n# Client with fallback enabled (default)\nclient = Prescient::Client.new(:primary, enable_fallback: true)\n\n# Client without fallback\nclient_no_fallback = Prescient::Client.new(:primary, enable_fallback: false)\n\n# Convenience methods also support fallback\nresponse = Prescient.generate_response(\"Hello\", provider: :primary, enable_fallback: true)\n```\n\n**Fallback Behavior:**\n- When a provider fails with a persistent error, Prescient automatically tries the next available provider\n- Only available (healthy) providers are tried during fallback\n- If no fallback providers are configured, all available providers are tried as fallbacks\n- Transient errors (rate limits, timeouts) still use retry logic before fallback\n- The fallback process preserves all method arguments and options\n\n## Usage\n\n### Quick Start\n\n```ruby\nrequire 'prescient'\n\n# Use default provider (Ollama)\nclient = Prescient.client\n\n# Generate embeddings\nembedding = client.generate_embedding(\"Your text here\")\n# =\u003e [0.1, 0.2, 0.3, ...] (768-dimensional vector)\n\n# Generate text responses\nresponse = client.generate_response(\"What is Ruby?\")\nputs response[:response]\n# =\u003e \"Ruby is a dynamic, open-source programming language...\"\n\n# Health check\nhealth = client.health_check\nputs health[:status] # =\u003e \"healthy\"\n```\n\n### Provider-Specific Usage\n\n```ruby\n# Use specific provider\nopenai_client = Prescient.client(:openai)\nanthropic_client = Prescient.client(:anthropic)\n\n# Direct method calls\nembedding = Prescient.generate_embedding(\"text\", provider: :openai)\nresponse = Prescient.generate_response(\"prompt\", provider: :anthropic)\n```\n\n### Context-Aware Generation\n\n```ruby\n# Generate embeddings for document chunks\ndocuments = [\"Document 1 content\", \"Document 2 content\"]\nembeddings = documents.map { |doc| Prescient.generate_embedding(doc) }\n\n# Later, find relevant context and generate response\nquery = \"What is mentioned about Ruby?\"\ncontext_items = find_relevant_documents(query, embeddings) # Your similarity search\n\nresponse = Prescient.generate_response(query, context_items,\n  max_tokens: 1000,\n  temperature: 0.7\n)\n\nputs response[:response]\nputs \"Model: \" + response[:model]\nputs \"Provider: \" + response[:provider]\n```\n\n### Error Handling\n\n```ruby\nbegin\n  response = client.generate_response(\"Your prompt\")\nrescue Prescient::ConnectionError =\u003e e\n  puts \"Connection failed: #{e.message}\"\nrescue Prescient::RateLimitError =\u003e e\n  puts \"Rate limited: #{e.message}\"\nrescue Prescient::AuthenticationError =\u003e e\n  puts \"Auth failed: #{e.message}\"\nrescue Prescient::Error =\u003e e\n  puts \"General error: #{e.message}\"\nend\n```\n\n### Health Monitoring\n\n```ruby\n# Check all providers\n[:ollama, :anthropic, :openai, :huggingface].each do |provider|\n  health = Prescient.health_check(provider: provider)\n  puts \"#{provider}: #{health[:status]}\"\n  puts \"Ready: #{health[:ready]}\" if health[:ready]\nend\n```\n\n## Custom Prompt Templates\n\nPrescient allows you to customize the AI assistant's behavior through configurable prompt templates:\n\n```ruby\nPrescient.configure do |config|\n  config.add_provider(:customer_service, Prescient::Provider::OpenAI,\n    api_key: ENV['OPENAI_API_KEY'],\n    embedding_model: 'text-embedding-3-small',\n    chat_model: 'gpt-3.5-turbo',\n    prompt_templates: {\n      system_prompt: 'You are a friendly customer service representative.',\n      no_context_template: \u003c\u003c~TEMPLATE.strip,\n        %{ system_prompt }\n\n        Customer Question: %{query}\n\n        Please provide a helpful response.\n      TEMPLATE\n      with_context_template: \u003c\u003c~TEMPLATE.strip\n        %{ system_prompt } Use the company info below to help answer.\n\n        Company Information:\n        %{context}\n\n        Customer Question: %{query}\n\n        Respond based on our company policies above.\n      TEMPLATE\n    }\n  )\nend\n\nclient = Prescient.client(:customer_service)\nresponse = client.generate_response(\"What's your return policy?\")\n```\n\n### Template Placeholders\n\n- `%{system_prompt}` - The system/role instruction\n- `%{query}` - The user's question\n- `%{context}` - Formatted context items (when provided)\n\n### Template Types\n\n- **system_prompt** - Defines the AI's role and behavior\n- **no_context_template** - Used when no context items provided\n- **with_context_template** - Used when context items are provided\n\n### Examples by Use Case\n\n#### Technical Documentation\n\n```ruby\nprompt_templates: {\n  system_prompt: 'You are a technical documentation assistant. Provide detailed explanations with code examples.',\n  # ... templates\n}\n\n```\n\n#### Creative Writing\n\n```ruby\nprompt_templates: {\n  system_prompt: 'You are a creative writing assistant. Be imaginative and inspiring.',\n  # ... templates\n}\n```\n\nSee `examples/custom_prompts.rb` for complete examples.\n\n## Custom Context Configurations\n\nDefine how different data types should be formatted and which fields to use for embeddings:\n\n```ruby\nPrescient.configure do |config|\n  config.add_provider(:ecommerce, Prescient::Provider::OpenAI,\n    api_key: ENV['OPENAI_API_KEY'],\n    context_configs: {\n      'product' =\u003e {\n        fields: %w[name description price category brand],\n        format: '%{ name } by %{ brand }: %{ description } - $%{ price } (%{ category })',\n        embedding_fields: %w[name description category brand]\n      },\n      'review' =\u003e {\n        fields: %w[product_name rating review_text reviewer_name],\n        format: '%{ product_name } - %{ rating }/5 stars: \"%{ review_text }\"',\n        embedding_fields: %w[product_name review_text]\n      }\n    }\n  )\nend\n\n# Context items with explicit type\nproducts = [\n  {\n    'type' =\u003e 'product',\n    'name' =\u003e 'UltraBook Pro',\n    'description' =\u003e 'High-performance laptop',\n    'price' =\u003e '1299.99',\n    'category' =\u003e 'Laptops',\n    'brand' =\u003e 'TechCorp'\n  }\n]\n\nclient = Prescient.client(:ecommerce)\nresponse = client.generate_response(\"I need a laptop for work\", products)\n```\n\n### Context Configuration Options\n\n- **fields** - Array of field names available for this context type\n- **format** - Template string for displaying context items\n- **embedding_fields** - Specific fields to use when generating embeddings\n\n### Automatic Context Detection\n\nThe system automatically detects context types based on YOUR configured field patterns:\n\n1. **Explicit Type Fields**: Uses `type`, `context_type`, or `model_type` field values\n2. **Field Matching**: Matches items to configured contexts based on field overlap (≥50% match required)\n3. **Default Fallback**: Uses generic formatting when no context configuration matches\n\nThe system has NO hardcoded context types - it's entirely driven by your configuration!\n\n### Without Context Configuration\n\nThe system works perfectly without any context configuration - it will:\n\n- Use intelligent fallback formatting for any hash structure\n- Extract text fields for embeddings while excluding common metadata (id, timestamps, etc.)\n- Provide consistent behavior across different data types\n\n```ruby\n# No context_configs needed - works with any data!\nclient = Prescient.client(:default)\nresponse = client.generate_response(\"Analyze this\", [\n  { 'title' =\u003e 'Issue', 'content' =\u003e 'Server down', 'created_at' =\u003e '2024-01-01' },\n  { 'name' =\u003e 'Alert', 'message' =\u003e 'High CPU usage', 'timestamp' =\u003e 1234567 }\n])\n```\n\nSee `examples/custom_contexts.rb` for complete examples.\n\n## Vector Database Integration (pgvector)\n\nPrescient integrates seamlessly with PostgreSQL's pgvector extension for storing and searching embeddings:\n\n### Setup with Docker\n\nThe included `docker-compose.yml` provides a complete setup with PostgreSQL + pgvector:\n\n```bash\n# Start PostgreSQL with pgvector\ndocker-compose up -d postgres\n\n# The database will automatically:\n# - Install pgvector extension\n# - Create tables for documents and embeddings\n# - Set up optimized vector indexes\n# - Insert sample data for testing\n```\n\n### Database Schema\n\nThe setup creates these key tables:\n\n- **`documents`** - Store original content and metadata\n- **`document_embeddings`** - Store vector embeddings for documents\n- **`document_chunks`** - Break large documents into searchable chunks\n- **`chunk_embeddings`** - Store embeddings for document chunks\n- **`search_queries`** - Track search queries and performance\n- **`query_results`** - Store search results for analysis\n\n### Vector Search Example\n\n```ruby\nrequire 'prescient'\nrequire 'pg'\n\n# Connect to database\ndb = PG.connect(\n  host: 'localhost',\n  port: 5432,\n  dbname: 'prescient_development',\n  user: 'prescient',\n  password: 'prescient_password'\n)\n\n# Generate embedding for a document\nclient = Prescient.client(:ollama)\ntext = \"Ruby is a dynamic programming language\"\nembedding = client.generate_embedding(text)\n\n# Store embedding in database\nvector_str = \"[#{embedding.join(',')}]\"\ndb.exec_params(\n  \"INSERT INTO document_embeddings (document_id, embedding_provider, embedding_model, embedding_dimensions, embedding, embedding_text) VALUES ($1, $2, $3, $4, $5, $6)\",\n  [doc_id, 'ollama', 'nomic-embed-text', 768, vector_str, text]\n)\n\n# Perform similarity search\nquery_text = \"What is Ruby programming?\"\nquery_embedding = client.generate_embedding(query_text)\nquery_vector = \"[#{query_embedding.join(',')}]\"\n\nresults = db.exec_params(\n  \"SELECT d.title, d.content, de.embedding \u003c=\u003e $1::vector AS distance\n   FROM documents d\n   JOIN document_embeddings de ON d.id = de.document_id\n   ORDER BY de.embedding \u003c=\u003e $1::vector\n   LIMIT 5\",\n  [query_vector]\n)\n```\n\n### Distance Functions\n\npgvector supports three distance functions:\n\n- **Cosine Distance** (`\u003c=\u003e`): Best for normalized embeddings\n- **L2 Distance** (`\u003c-\u003e`): Euclidean distance, good general purpose\n- **Inner Product** (`\u003c#\u003e`): Dot product, useful for specific cases\n\n```sql\n-- Cosine similarity (most common)\nORDER BY embedding \u003c=\u003e query_vector\n\n-- L2 distance\nORDER BY embedding \u003c-\u003e query_vector\n\n-- Inner product\nORDER BY embedding \u003c#\u003e query_vector\n```\n\n### Vector Indexes\n\nThe setup automatically creates HNSW indexes for fast similarity search:\n\n```sql\n-- Example index for cosine distance\nCREATE INDEX idx_embeddings_cosine\nON document_embeddings\nUSING hnsw (embedding vector_cosine_ops)\nWITH (m = 16, ef_construction = 64);\n```\n\n### Advanced Search with Filters\n\nCombine vector similarity with metadata filtering:\n\n```ruby\n# Search with tag filtering\nresults = db.exec_params(\n  \"SELECT d.title, de.embedding \u003c=\u003e $1::vector as distance\n   FROM documents d\n   JOIN document_embeddings de ON d.id = de.document_id\n   WHERE d.metadata-\u003e'tags' ? 'programming'\n   ORDER BY de.embedding \u003c=\u003e $1::vector\n   LIMIT 5\",\n  [query_vector]\n)\n\n# Search with difficulty and tag filters\nresults = db.exec_params(\n  \"SELECT d.title, de.embedding \u003c=\u003e $1::vector as distance\n   FROM documents d\n   JOIN document_embeddings de ON d.id = de.document_id\n   WHERE d.metadata-\u003e\u003e'difficulty' = 'beginner'\n     AND d.metadata-\u003e'tags' ?| $2::text[]\n   ORDER BY de.embedding \u003c=\u003e $1::vector\n   LIMIT 5\",\n  [query_vector, ['ruby', 'programming']]\n)\n```\n\n### Performance Optimization\n\n#### Index Configuration\n\nFor large datasets, tune HNSW parameters:\n\n```sql\n-- High accuracy (slower build, more memory)\nWITH (m = 32, ef_construction = 128)\n\n-- Fast build (lower accuracy, less memory)\nWITH (m = 8, ef_construction = 32)\n\n-- Balanced (recommended default)\nWITH (m = 16, ef_construction = 64)\n```\n\n#### Query Performance\n\n```sql\n-- Set ef_search for query-time accuracy/speed tradeoff\nSET hnsw.ef_search = 100;  -- Higher = more accurate, slower\n\n-- Use EXPLAIN ANALYZE to optimize queries\nEXPLAIN ANALYZE\nSELECT * FROM document_embeddings\nORDER BY embedding \u003c=\u003e '[0.1,0.2,...]'::vector\nLIMIT 10;\n```\n\n#### Chunking Strategy\n\nFor large documents, use chunking for better search granularity:\n\n```ruby\ndef chunk_document(text, chunk_size: 500, overlap: 50)\n  chunks = []\n  start = 0\n\n  while start \u003c text.length\n    end_pos = [start + chunk_size, text.length].min\n    chunk = text[start...end_pos]\n    chunks \u003c\u003c chunk\n    start += chunk_size - overlap\n  end\n\n  chunks\nend\n\n# Generate embeddings for each chunk\nchunks = chunk_document(document.content)\nchunks.each_with_index do |chunk, index|\n  embedding = client.generate_embedding(chunk)\n  # Store chunk and embedding...\nend\n```\n\n### Example Usage\n\nRun the complete vector search example:\n\n```bash\n# Start services\ndocker-compose up -d postgres ollama\n\n# Run example\nDB_HOST=localhost ruby examples/vector_search.rb\n```\n\nThe example demonstrates:\n\n- Document embedding generation and storage\n- Similarity search with different distance functions\n- Metadata filtering and advanced queries\n- Performance comparison between approaches\n\n## Advanced Usage\n\n### Custom Provider Implementation\n\n```ruby\nclass MyCustomProvider \u003c Prescient::BaseProvider\n  def generate_embedding(text, **options)\n    # Your implementation\n  end\n\n  def generate_response(prompt, context_items = [], **options)\n    # Your implementation\n  end\n\n  def health_check\n    # Your implementation\n  end\n\n  protected\n\n  def validate_configuration!\n    # Validate required options\n  end\nend\n\n# Register your provider\nPrescient.configure do |config|\n  config.add_provider(:mycustom, MyCustomProvider,\n    api_key: 'your_key',\n    model: 'your_model'\n  )\nend\n```\n\n### Provider Information\n\n```ruby\nclient = Prescient.client(:ollama)\ninfo = client.provider_info\n\nputs info[:name]      # =\u003e :ollama\nputs info[:class]     # =\u003e \"Prescient::Ollama::Provider\"\nputs info[:available] # =\u003e true\nputs info[:options]   # =\u003e { ... } (excluding sensitive data)\n```\n\n## Provider-Specific Features\n\n### Ollama\n\n- Model management: `pull_model`, `list_models`\n- Local deployment support\n- No API costs\n\n### Anthropic\n\n- High-quality responses\n- No embedding support (use with OpenAI/HuggingFace for embeddings)\n\n### OpenAI\n\n- Multiple embedding model sizes\n- Latest GPT models\n- Reliable performance\n\n### HuggingFace\n\n- Open-source models\n- Research-friendly\n- Free tier available\n\n## Docker Setup (Recommended for Ollama)\n\nThe easiest way to get started with Prescient and Ollama is using Docker Compose:\n\n### Hardware Requirements\n\nBefore starting, ensure your system meets the minimum requirements for running Ollama:\n\n#### **Minimum Requirements:**\n\n- **CPU**: 4+ cores (x86_64 or ARM64)\n- **RAM**: 8GB+ (16GB recommended)\n- **Storage**: 10GB+ free space for models\n- **OS**: Linux, macOS, or Windows with Docker\n\n#### **Model-Specific Requirements:**\n\n| Model              | RAM Required | Storage | Notes                             |\n| ------------------ | ------------ | ------- | --------------------------------- |\n| `nomic-embed-text` | 1GB          | 274MB   | Embedding model                   |\n| `llama3.1:8b`      | 8GB          | 4.7GB   | Chat model (8B parameters)        |\n| `llama3.1:70b`     | 64GB+        | 40GB    | Large chat model (70B parameters) |\n| `codellama:7b`     | 8GB          | 3.8GB   | Code generation model             |\n\n#### **Performance Recommendations:**\n\n- **SSD Storage**: Significantly faster model loading\n- **GPU (Optional)**: NVIDIA GPU with 8GB+ VRAM for acceleration\n- **Network**: Stable internet for initial model downloads\n- **Docker**: 4GB+ memory limit configured\n\n#### **GPU Acceleration (Optional):**\n\n- **NVIDIA GPU**: RTX 3060+ with 8GB+ VRAM recommended\n- **CUDA**: Version 11.8+ required\n- **Docker**: NVIDIA Container Toolkit installed\n- **Performance**: 3-10x faster inference with compatible models\n\n\u003e **💡 Tip**: Start with smaller models like `llama3.1:8b` and upgrade based on your hardware capabilities and performance needs.\n\n### Quick Start with Docker\n\n1. **Start Ollama service:**\n\n   ```bash\n   docker-compose up -d ollama\n   ```\n\n2. **Pull required models:**\n\n   ```bash\n   # Automatic setup\n   docker-compose up ollama-init\n\n   # Or manual setup\n   ./scripts/setup-ollama-models.sh\n   ```\n\n3. **Run examples:**\n\n   ```bash\n   # Set environment variable\n   export OLLAMA_URL=http://localhost:11434\n\n   # Run examples\n   ruby examples/custom_contexts.rb\n   ```\n\n### Docker Compose Services\n\nThe included `docker-compose.yml` provides:\n\n- **ollama**: Ollama AI service with persistent model storage\n- **ollama-init**: Automatically pulls required models on startup\n- **redis**: Optional caching layer for embeddings\n- **prescient-app**: Example Ruby application container\n\n### Configuration Options\n\n```yaml\n# docker-compose.yml environment variables\nservices:\n  ollama:\n    ports:\n      - \"11434:11434\" # Ollama API port\n    volumes:\n      - ollama_data:/root/.ollama # Persist models\n    environment:\n      - OLLAMA_HOST=0.0.0.0\n      - OLLAMA_ORIGINS=*\n```\n\n### GPU Support (Optional)\n\nFor GPU acceleration, uncomment the GPU configuration in `docker-compose.yml`:\n\n```yaml\nservices:\n  ollama:\n    deploy:\n      resources:\n        reservations:\n          devices:\n            - driver: nvidia\n              count: 1\n              capabilities: [gpu]\n```\n\n### Environment Variables\n\n```bash\n# Ollama Configuration\nOLLAMA_URL=http://localhost:11434\nOLLAMA_EMBEDDING_MODEL=nomic-embed-text\nOLLAMA_CHAT_MODEL=llama3.1:8b\n\n# Optional: Other AI providers\nOPENAI_API_KEY=your_key_here\nANTHROPIC_API_KEY=your_key_here\nHUGGINGFACE_API_KEY=your_key_here\n```\n\n### Model Management\n\n```bash\n# Check available models\ncurl http://localhost:11434/api/tags\n\n# Pull a specific model\ncurl -X POST http://localhost:11434/api/pull \\\n  -H \"Content-Type: application/json\" \\\n  -d '{ \"name\": \"llama3.1:8b\"}'\n\n# Health check\ncurl http://localhost:11434/api/version\n```\n\n### Production Deployment\n\nFor production use:\n\n1. Use specific image tags instead of `latest`\n2. Configure proper resource limits\n3. Set up monitoring and logging\n4. Use secrets management for API keys\n5. Configure backups for model data\n\n### Troubleshooting\n\n#### **Common Issues:**\n\n**Out of Memory Errors:**\n\n```bash\n# Check available memory\nfree -h\n\n# Increase Docker memory limit (Docker Desktop)\n# Settings \u003e Resources \u003e Memory: 8GB+\n\n# Use smaller models if hardware limited\nOLLAMA_CHAT_MODEL=llama3.1:7b ruby examples/custom_contexts.rb\n```\n\n**Slow Model Loading:**\n\n```bash\n# Check disk I/O\niostat -x 1\n\n# Move Docker data to SSD if on HDD\n# Docker Desktop: Settings \u003e Resources \u003e Disk image location\n```\n\n**Model Download Failures:**\n\n```bash\n# Check disk space\ndf -h\n\n# Manually pull models with retry\ndocker exec prescient-ollama ollama pull llama3.1:8b\n```\n\n**GPU Not Detected:**\n\n```bash\n# Check NVIDIA Docker runtime\ndocker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi\n\n# Install NVIDIA Container Toolkit if missing\n# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html\n```\n\n#### **Performance Monitoring:**\n\n```bash\n# Monitor resource usage\ndocker stats prescient-ollama\n\n# Check Ollama logs\ndocker logs prescient-ollama\n\n# Test API response time\ntime curl -X POST http://localhost:11434/api/generate \\\n  -H \"Content-Type: application/json\" \\\n  -d '{ \"model\": \"llama3.1:8b\", \"prompt\": \"Hello\", \"stream\": false}'\n```\n\n## Testing\n\nThe gem includes comprehensive test coverage:\n\n```bash\nbundle exec rspec\n```\n\n## Development\n\nAfter checking out the repo, run:\n\n```bash\nbundle install\n```\n\nTo install this gem onto your local machine:\n\n```bash\nbundle exec rake install\n```\n\n## Contributing\n\n1. Fork it\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create a new Pull Request\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n\n## Roadmap\n\n### Version 0.2.0 (Planned)\n\n- **MariaDB Vector Support**: Integration with MariaDB using external vector databases\n- **Hybrid Database Architecture**: Support for MariaDB + Milvus/Qdrant combinations\n- **Vector Database Adapters**: Pluggable adapters for different vector storage backends\n- **Enhanced Chunking Strategies**: Smart document splitting with multiple algorithms\n- **Search Result Ranking**: Advanced scoring and re-ranking capabilities\n\n### Version 0.3.0 (Future)\n\n- **Streaming Responses**: Real-time response streaming for chat applications\n- **Multi-Model Ensembles**: Combine responses from multiple AI providers\n- **Advanced Analytics**: Search performance insights and usage analytics\n- **Cloud Provider Integration**: Direct support for Pinecone, Weaviate, etc.\n\n## Changelog\n\n### Version 0.1.0\n\n- Initial release\n- Support for Ollama, Anthropic, OpenAI, and HuggingFace\n- Unified interface for embeddings and text generation\n- Comprehensive error handling and retry logic\n- Health monitoring capabilities\n- PostgreSQL pgvector integration with complete Docker setup\n- Vector similarity search with multiple distance functions\n- Document chunking and metadata filtering\n- Performance optimization guides and troubleshooting\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkanutocd%2Fprescient","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkanutocd%2Fprescient","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkanutocd%2Fprescient/lists"}