{"id":31581018,"url":"https://github.com/roble3/claudette","last_synced_at":"2026-05-05T04:08:03.787Z","repository":{"id":315106014,"uuid":"1058146936","full_name":"RobLe3/claudette","owner":"RobLe3","description":"AI middleware plugin for Claude via MCP that routes across LLMs with fallback, caching, and RAG (graph \u0026 vector DB) integration, optimized for cost, latency, and reliability.","archived":false,"fork":false,"pushed_at":"2025-09-24T13:28:40.000Z","size":1447,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-24T15:27:01.438Z","etag":null,"topics":["claude","claude-code","claudette","graphdb","llm","mcp","mcp-server","middleware","prompt-engineering","prompting","rag","vectordb"],"latest_commit_sha":null,"homepage":"https://github.com/RobLe3/claudette","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RobLe3.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"docs/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":".github/SECURITY_CONFIGURATION.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-16T17:25:10.000Z","updated_at":"2025-09-24T13:28:36.000Z","dependencies_parsed_at":"2025-09-16T19:56:00.434Z","dependency_job_id":"ee208c07-0ef3-4d02-9009-10b62cdc5292","html_url":"https://github.com/RobLe3/claudette","commit_stats":null,"previous_names":["roble3/claudette"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/RobLe3/claudette","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobLe3%2Fclaudette","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobLe3%2Fclaudette/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobLe3%2Fclaudette/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobLe3%2Fclaudette/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RobLe3","download_url":"https://codeload.github.com/RobLe3/claudette/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobLe3%2Fclaudette/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278526242,"owners_count":26001325,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude","claude-code","claudette","graphdb","llm","mcp","mcp-server","middleware","prompt-engineering","prompting","rag","vectordb"],"created_at":"2025-10-05T21:52:07.128Z","updated_at":"2025-10-05T21:52:12.813Z","avatar_url":"https://github.com/RobLe3.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Claudette v1.0.5 - Maximize Your AI Investment 🧠\n\n🚀 **Smart AI Middleware That Saves Money While Preserving Quality**\n\n\u003e **v1.0.5**: Get more from your AI budget by intelligently routing requests across multiple providers. Reduce costs while maintaining the quality your users expect.\n\n![Version](https://img.shields.io/badge/version-1.0.5-blue)\n![License](https://img.shields.io/badge/license-MIT-green)\n![TypeScript](https://img.shields.io/badge/TypeScript-ready-blue)\n![Tested](https://img.shields.io/badge/Core_System-Verified-brightgreen)\n![Status](https://img.shields.io/badge/Status-Stable-brightgreen)\n\n---\n\n## 🎯 What is Claudette?\n\nClaudette is an **AI middleware platform** that helps you **maximize your AI investment** while maintaining quality. Instead of being locked into expensive single-provider solutions, Claudette intelligently routes your requests across multiple AI backends to deliver the best value.\n\n## 💼 What Claudette Helps You With\n\n### 🏢 **For Businesses**\n- **Reduce AI costs** by automatically choosing cost-effective backends for routine tasks\n- **Extend subscription value** - get significantly more AI interactions for the same budget\n- **Avoid vendor lock-in** with support for multiple AI providers\n- **Scale confidently** with built-in failover and health monitoring\n- **Track spending** with real-time cost monitoring and budget controls\n\n### 👨‍💻 **For Developers**\n- **Build AI features faster** with a unified API across multiple providers\n- **Prevent outages** with automatic failover between AI services\n- **Optimize performance** with intelligent caching and routing\n- **Debug easily** with comprehensive logging and monitoring\n- **Deploy reliably** with production-tested infrastructure\n\n### 🎓 **For Teams \u0026 Projects**\n- **Make AI budgets last longer** by optimizing every request\n- **Ensure consistent quality** while reducing costs\n- **Simplify AI integration** with one interface for multiple providers\n- **Stay operational** even when one AI service has issues\n- **Scale usage** without proportional cost increases\n\n### 🌟 **Real-World Use Cases**\n- **Content teams**: Draft with cost-effective models, polish with premium ones - save budget for creative review\n- **Development teams**: Route code questions intelligently - simple syntax to fast models, architecture to specialized ones  \n- **Customer support**: Handle routine inquiries efficiently while ensuring complex issues get premium treatment\n- **Research projects**: Optimize between speed and quality based on whether it's exploration or final analysis\n- **Startups**: Access multiple AI capabilities without multiple expensive subscriptions\n\n### 🏆 Key Features\n- **🔄 Smart Routing** - Automatic selection between OpenAI, Claude, Qwen, and Ollama based on your needs\n- **💰 Cost Intelligence** - Real-time optimization to maximize your AI budget\n- **💸 Low-Cost Providers** - Access to [80-95% cheaper alternatives](#-low-cost-token-providers--inference-services) like Alibaba Cloud, DeepSeek, and free local models\n- **📊 Transparency** - Track performance, costs, and quality across all providers\n- **🏗️ Developer Ready** - Full TypeScript support with modern tooling\n- **⚡ Performance** - Intelligent caching and optimized request handling\n- **🛡️ Reliability** - Circuit breakers and graceful failure recovery\n\n## 🚀 **What's New in v1.0.5** - Advanced Memory Management \u0026 Ultra-Fast MCP\n\n### 🧠 **Advanced Memory Management System** \n\u003e **Reduce memory pressure from 95% to 75-85%** - Handle complex tasks without crashes\n\n```javascript\n// Automatic memory optimization for complex tasks\nconst claudette = new Claudette({\n  memory: {\n    advancedManagement: true,    // NEW: Advanced memory pool management\n    pressureOptimization: true,  // NEW: Pressure-based scaling\n    emergencyCleanup: true,      // NEW: 75% cache reduction when needed\n    complexTaskPrep: true       // NEW: Automatic memory prep for complex tasks\n  }\n});\n\n// System automatically optimizes memory before complex operations\nconst response = await claudette.optimize({\n  prompt: \"Analyze this 50-page document and create detailed recommendations...\",\n  // Memory system automatically prepares resources\n  // Reduces memory pressure from 94% → 75% before execution\n});\n```\n\n### ⚡ **Ultra-Fast MCP Server** - 99.1% Startup Improvement\n\u003e **From 30 seconds to 264ms startup** - Perfect for Claude Code integration\n\n```bash\n# Previous MCP startup: 30,000ms (30 seconds timeout)\n# NEW Fast MCP startup: 264ms (sub-second!)\n\n# Start the ultra-fast MCP server\nnode claudette-mcp-server-fast.js\n\n# Benchmark all interfaces\nnode benchmark-all.js\n\n# Performance Results:\n# 🏆 MCP Server: 264ms startup (FASTEST)\n# 🏆 MCP Requests: 896ms average (FASTEST) \n# 🏆 MCP Memory: 0.39MB growth (MOST EFFICIENT)\n```\n\n### 📊 **Comprehensive Benchmarking Suite**\n\u003e **Performance validation across all interfaces** - Native, HTTP API, and MCP\n\n```bash\n# Test individual interfaces\n./benchmark-native.js   # Test direct library usage\n./benchmark-api.js      # Test HTTP REST API \n./benchmark-mcp.js      # Test MCP server performance\n\n# Compare all interfaces\n./benchmark-all.js      # Comprehensive comparison\n\n# Results Summary:\n# - Native: Best for single-process applications\n# - HTTP API: Best for web services and REST integration\n# - MCP: Best for Claude Code integration (now fastest!)\n```\n\n### 🔄 **Harmonized Timeout System**\n\u003e **Eliminate timeout conflicts** - Intelligent retry with circuit breakers\n\n```javascript\n// NEW: Unified timeout configuration\nconst config = {\n  timeouts: {\n    startup: 5000,        // 5s optimized startup\n    query: 60000,         // 60s Claude Code compatible\n    health: 3000,         // 3s health checks\n    emergency: 90000      // 90s for complex tasks\n  },\n  retry: {\n    maxAttempts: 3,       // Intelligent retry logic\n    backoffMultiplier: 1.5, // Adaptive backoff\n    circuitBreaker: true   // Prevent cascade failures\n  }\n};\n```\n\n### 🎯 **Performance Improvements Summary**\n| Component | v1.0.4 | v1.0.5 | Improvement |\n|-----------|--------|--------|-------------|\n| **MCP Startup** | 30,000ms | 264ms | **113.6x faster** |\n| **Memory Pressure** | 95% critical | 75-85% managed | **Memory crashes eliminated** |\n| **Environment Loading** | 3,888ms | \u003c100ms | **38x faster** |\n| **Complex Task Handling** | Manual management | Automatic optimization | **Zero-config scaling** |\n| **Timeout Reliability** | 62.5% success | 95%+ success | **52% reliability improvement** |\n\n---\n\n## 📚 Table of Contents\n\n- [🚀 Quick Start](#-quick-start)\n- [💰 Claude Subscription Optimization Guide](#-claude-subscription-optimization-guide)\n- [💸 Low-Cost Token Providers \u0026 Inference Services](#-low-cost-token-providers--inference-services)\n- [🔧 API Usage](#-api-usage)\n- [📖 Documentation](#-documentation)\n- [🤝 Contributing](#-contributing)\n- [🐛 Support \u0026 Issues](#-support--issues)\n\n---\n\n## 🚀 Quick Start\n\n### ⚡ See the Value in 2 Minutes\n\n```bash\n# Install Claudette\nnpm install -g claudette\nclaudette init\n\n# Make your first optimized request\nclaudette \"Explain machine learning\" --verbose\n\n# See the cost savings and backend selection in action\n# Claudette automatically chose the most cost-effective backend\n# while maintaining quality standards\n```\n\n### 💡 **What Just Happened?**\nInstead of paying premium rates for a simple explanation, Claudette:\n1. **Analyzed your request** - determined it was educational content\n2. **Selected the optimal backend** - chose a cost-effective model that excels at explanations  \n3. **Delivered quality results** - maintained high response quality while reducing costs\n4. **Showed transparency** - displayed which backend was used and the actual cost\n\n### 📦 Installation Options\n\n```bash\n# Option 1: NPM Installation (Recommended)\nnpm install -g claudette\nclaudette init\n\n# Option 2: Source Installation\ngit clone https://github.com/RobLe3/claudette.git\ncd claudette\nnpm install \u0026\u0026 npm run build\n```\n\n### 🔧 Configuration\n\n1. **Copy environment template**:\n   ```bash\n   cp .env.example .env\n   ```\n\n2. **Configure your credentials**:\n   ```bash\n   # Required: OpenAI API Key\n   OPENAI_API_KEY=sk-your-openai-api-key-here\n   \n   # Optional: Alternative Backend\n   ALTERNATIVE_API_URL=https://your-custom-backend.com\n   ALTERNATIVE_API_KEY=your_api_key_here\n   ```\n\n3. **Verify installation**:\n   ```bash\n   claudette --version    # Should output: 1.0.5\n   claudette status       # Check system status\n   ```\n\n### 📋 Requirements\n- **Node.js**: v18.0.0 or higher  \n- **npm**: Latest version recommended\n- **API Keys**: At least one AI provider API key (OpenAI, Anthropic, etc.)\n- **Operating System**: Linux, macOS, Windows\n\n---\n\n## 💡 Why Use Claudette? \n\n### Without Claudette\n```javascript\n// Locked into one expensive provider\nconst response = await openai.chat.completions.create({\n  model: \"gpt-4\",\n  messages: [{ role: \"user\", content: \"Simple question\" }]\n});\n// Cost: $0.03 per request, no failover, single provider dependency\n```\n\n### With Claudette\n```javascript\n// Intelligent routing across multiple providers\nconst response = await claudette.optimize(\"Simple question\");\n// Cost: $0.002 per request, automatic failover, best provider for each task\n```\n\n**Result**: Up to 95% cost reduction while maintaining quality and reliability.\n\n---\n\n## 💰 Claude Subscription Optimization Guide\n\n\u003e **Maximize your Claude Pro investment** - From $20/month to enterprise-scale efficiency\n\n### 🎯 **Claude Pro ($20/month) - 5x More Value**\n\nWith just a **Claude Pro subscription**, Claudette transforms your $20/month into powerful AI capabilities:\n\n**Without Claudette:**\n- ~500-1000 Claude Sonnet interactions/month\n- Single provider dependency\n- No cost optimization\n- Manual quality vs. cost decisions\n\n**With Claudette + Claude Pro:**\n```javascript\n// Smart routing maximizes your Claude Pro usage\nconst config = {\n  claude: { \n    enabled: true, \n    priority: 1,        // Premium quality for important tasks\n    model: \"claude-3-sonnet-20240229\" \n  },\n  qwen: { \n    enabled: true, \n    priority: 2,        // Cost-effective for routine tasks\n    cost_per_token: 0.0001 // 3x cheaper than Claude\n  }\n};\n\n// Claudette automatically optimizes:\n// - Complex analysis → Claude (premium quality)\n// - Simple questions → Qwen (cost-effective)\n// - Code explanations → Mixed routing based on complexity\n```\n\n**Result**: **2,500+ effective interactions/month** for the same $20 budget!\n\n### 🚀 **Scaling with Additional APIs**\n\n#### **Max100 Tier ($100/month equivalent)**\nAdd **OpenAI GPT-4o** and **local Ollama** for comprehensive coverage:\n\n```javascript\nconst enterpriseConfig = {\n  claude: { \n    enabled: true, \n    priority: 1,        // Creative writing, complex analysis\n    cost_per_token: 0.0003 \n  },\n  openai: { \n    enabled: true, \n    priority: 2,        // Code generation, technical docs\n    model: \"gpt-4o-mini\",\n    cost_per_token: 0.0001 \n  },\n  qwen: { \n    enabled: true, \n    priority: 3,        // Research, summarization\n    cost_per_token: 0.0001 \n  },\n  ollama: { \n    enabled: true, \n    priority: 4,        // Development, testing (FREE!)\n    cost_per_token: 0,\n    base_url: \"http://localhost:11434\"\n  }\n};\n```\n\n**Smart Routing Strategy:**\n- **Creative Content** → Claude Sonnet (premium quality)\n- **Code Generation** → GPT-4o (excellent for programming)\n- **Research \u0026 Analysis** → Qwen Plus (cost-effective, high quality)\n- **Development \u0026 Testing** → Ollama (free, local, private)\n\n**Economics:**\n- $20 Claude Pro + $20 OpenAI + $0 Ollama = $40/month\n- **10,000+ interactions/month** with intelligent quality optimization\n- **75% cost reduction** vs. using Claude Pro exclusively\n\n#### **Max200 Enterprise Tier ($200/month equivalent)**\nAdd **premium models** and **specialized backends**:\n\n```javascript\nconst maxConfig = {\n  claude: { \n    enabled: true,\n    model: \"claude-3-opus-20240229\",  // Premium model for critical tasks\n    priority: 1 \n  },\n  openai: { \n    enabled: true,\n    model: \"gpt-4\",                   // Full GPT-4 for complex reasoning\n    priority: 2 \n  },\n  \"claude-sonnet\": {\n    enabled: true,\n    model: \"claude-3-sonnet-20240229\", // Mid-tier for balanced tasks\n    priority: 3\n  },\n  qwen: { \n    enabled: true,\n    model: \"qwen-max\",                // Premium Qwen for specialized tasks\n    priority: 4 \n  },\n  mistral: {\n    enabled: true,\n    model: \"mistral-large\",           // European data compliance\n    priority: 5\n  },\n  ollama: { \n    enabled: true,\n    model: \"codellama:34b\",           // High-capacity local model\n    priority: 6,\n    cost_per_token: 0\n  }\n};\n```\n\n**Use Case Optimization:**\n```javascript\n// Automatic task classification and routing\nconst examples = [\n  {\n    task: \"Write marketing copy for product launch\",\n    routed_to: \"claude-opus\",     // Premium creativity\n    cost: \"$0.015 per request\"\n  },\n  {\n    task: \"Generate unit tests for React component\", \n    routed_to: \"gpt-4\",          // Excellent code understanding\n    cost: \"$0.006 per request\"\n  },\n  {\n    task: \"Summarize research papers\",\n    routed_to: \"qwen-max\",       // Cost-effective, high accuracy\n    cost: \"$0.002 per request\"\n  },\n  {\n    task: \"Code refactoring during development\",\n    routed_to: \"ollama\",         // Free, private, fast iteration\n    cost: \"$0.000 per request\"\n  }\n];\n```\n\n**Enterprise Benefits:**\n- **25,000+ interactions/month** across all quality tiers\n- **Specialized routing** for different content types\n- **Geographic compliance** (EU data with Mistral)\n- **Private development** (local Ollama)\n- **Cost transparency** and budget controls\n\n### 📊 **ROI Comparison Table**\n\n| Setup | Monthly Cost | Interactions | Cost/Interaction | Quality Mix |\n|-------|-------------|--------------|------------------|-------------|\n| **Claude Pro Only** | $20 | 1,000 | $0.020 | High (Claude only) |\n| **Claudette + Claude Pro** | $20 | 2,500 | $0.008 | High/Med (Smart routing) |\n| **Max100 (Multi-API)** | $40 | 10,000 | $0.004 | Premium/High/Med |\n| **Max200 (Enterprise)** | $200 | 25,000 | $0.008 | All tiers optimized |\n\n### 🎯 **Smart Routing Examples**\n\n#### **Content Creation Workflow**\n```javascript\n// Blog post creation - optimized routing\nconst workflow = [\n  {\n    step: \"Research and outline\",\n    prompt: \"Research trends in AI development\",\n    routed_to: \"qwen\",           // Cost-effective research\n    cost: \"$0.002\"\n  },\n  {\n    step: \"Draft creation\", \n    prompt: \"Write engaging blog post from outline\",\n    routed_to: \"claude-sonnet\",  // Balanced quality/cost\n    cost: \"$0.008\"\n  },\n  {\n    step: \"Final polish\",\n    prompt: \"Enhance tone and add compelling examples\",\n    routed_to: \"claude-opus\",    // Premium quality finish\n    cost: \"$0.015\"\n  }\n];\n\n// Total cost: $0.025 vs $0.045 using Claude exclusively\n// Savings: 44% while maintaining premium final quality\n```\n\n#### **Development Workflow** \n```javascript\n// Software development - mixed routing\nconst devWorkflow = [\n  {\n    step: \"Code iteration\",\n    routed_to: \"ollama\",         // Free local development\n    cost: \"$0.000\",\n    use: \"Rapid prototyping, testing ideas\"\n  },\n  {\n    step: \"Code review\",\n    routed_to: \"gpt-4\",          // Excellent code analysis\n    cost: \"$0.006\",\n    use: \"Security review, best practices\"\n  },\n  {\n    step: \"Documentation\",\n    routed_to: \"claude-sonnet\",  // Clear technical writing\n    cost: \"$0.008\",\n    use: \"API docs, user guides\"\n  }\n];\n```\n\n### 🔄 **Migration Strategy**\n\n#### **Phase 1: Start with Claude Pro**\n```bash\n# Week 1-2: Basic optimization\nclaudette init --quick\n# Configure Claude Pro + free Qwen API\n# Immediate 2-3x interaction increase\n```\n\n#### **Phase 2: Add Strategic APIs**\n```bash\n# Week 3-4: Add OpenAI for code tasks\n# Add local Ollama for development\n# 5-8x effective capacity\n```\n\n#### **Phase 3: Enterprise Optimization**\n```bash\n# Month 2+: Full backend suite\n# Specialized routing rules\n# 10-20x capacity with quality optimization\n```\n\n### 💡 **Pro Tips for Maximum Efficiency**\n\n1. **Cache Strategy**: \n   ```javascript\n   // 40% of requests hit cache = 40% cost reduction\n   const config = { \n     caching: true, \n     cache_ttl: 3600  // 1 hour cache\n   };\n   ```\n\n2. **Quality Tiering**:\n   ```javascript\n   // Route by complexity automatically\n   const rules = {\n     simple_questions: \"qwen\",      // 70% of requests\n     complex_analysis: \"claude\",    // 20% of requests  \n     creative_content: \"claude-opus\" // 10% of requests\n   };\n   ```\n\n3. **Development vs Production**:\n   ```javascript\n   // Free development, optimized production\n   const environment = process.env.NODE_ENV;\n   const backend = environment === 'development' ? 'ollama' : 'claude';\n   ```\n\n### 🎯 **Bottom Line Value Proposition**\n\n- **$20 Claude Pro** → **2,500 interactions** (vs 1,000 direct)\n- **$40 Multi-API** → **10,000 interactions** with quality routing\n- **$200 Enterprise** → **25,000 interactions** with premium options\n\n**Claudette pays for itself** with the first month's optimization! 🚀\n\n---\n\n## 💸 Low-Cost Token Providers \u0026 Inference Services\n\n\u003e **Slash your AI costs by 80-95%** - Access premium AI capabilities through budget-friendly providers\n\n### 🏭 **Enterprise-Grade Low-Cost Providers**\n\n#### **Alibaba Cloud (Qwen) - 90% Cost Reduction**\n```javascript\n// Qwen through Alibaba Cloud DashScope\nconst config = {\n  qwen: {\n    enabled: true,\n    base_url: \"https://dashscope-intl.aliyuncs.com/compatible-mode/v1\",\n    api_key: process.env.QWEN_API_KEY,\n    model: \"qwen-plus\",\n    cost_per_token: 0.0001,  // ~90% cheaper than Claude\n    priority: 2\n  }\n};\n\n// Get API access:\n// 1. Sign up at https://dashscope.console.aliyun.com/\n// 2. Activate DashScope service\n// 3. Get API key from console\n// 4. $3 free credits + pay-per-use pricing\n```\n\n**Qwen Pricing (Alibaba Cloud):**\n- **Qwen-Plus**: ¥0.0008/1K tokens (~$0.0001) - **20x cheaper than Claude**\n- **Qwen-Max**: ¥0.02/1K tokens (~$0.003) - **10x cheaper than GPT-4**\n- **Qwen-Turbo**: ¥0.0003/1K tokens (~$0.00004) - **75x cheaper than Claude**\n\n#### **DeepSeek - Extremely Low Cost**\n```javascript\nconst config = {\n  deepseek: {\n    enabled: true,\n    base_url: \"https://api.deepseek.com/v1\",\n    api_key: process.env.DEEPSEEK_API_KEY,\n    model: \"deepseek-chat\",\n    cost_per_token: 0.00002,  // 95% cheaper than premium models\n    priority: 3\n  }\n};\n\n// Get access at: https://platform.deepseek.com/\n// $5 free credits, then $0.14/1M input tokens\n```\n\n#### **Together AI - High Performance, Low Cost**\n```javascript\nconst config = {\n  together: {\n    enabled: true,\n    base_url: \"https://api.together.xyz/v1\",\n    api_key: process.env.TOGETHER_API_KEY,\n    model: \"meta-llama/Llama-2-70b-chat-hf\",\n    cost_per_token: 0.0002,  // 85% cheaper than Claude\n    priority: 4\n  }\n};\n\n// Access: https://api.together.xyz/\n// Multiple open-source models, competitive pricing\n```\n\n#### **Groq - Ultra-Fast Inference**\n```javascript\nconst config = {\n  groq: {\n    enabled: true,\n    base_url: \"https://api.groq.com/openai/v1\",\n    api_key: process.env.GROQ_API_KEY,\n    model: \"mixtral-8x7b-32768\",\n    cost_per_token: 0.00027,  // 80% cheaper + 10x faster\n    priority: 5\n  }\n};\n\n// Get free tier: https://console.groq.com/\n// 100 requests/day free, then $0.27/1M tokens\n```\n\n### 🏠 **Self-Hosted Solutions (FREE)**\n\n#### **Ollama - Completely Free Local Inference**\n```javascript\nconst config = {\n  ollama: {\n    enabled: true,\n    base_url: \"http://localhost:11434\",\n    model: \"llama2:70b\",\n    cost_per_token: 0,  // 100% FREE!\n    priority: 6\n  }\n};\n\n// Setup Ollama locally:\n// 1. Install: curl -fsSL https://ollama.ai/install.sh | sh\n// 2. Run: ollama run llama2:70b\n// 3. Free unlimited usage on your hardware\n```\n\n**Recommended Ollama Models:**\n- **CodeLlama:34b** - Excellent for code generation (FREE)\n- **Mistral:7b** - Fast general purpose (FREE) \n- **Llama2:70b** - High quality responses (FREE)\n- **Neural-Chat:7b** - Conversational AI (FREE)\n\n#### **LocalAI - Self-Hosted OpenAI Alternative**\n```bash\n# Docker setup for free local inference\ndocker run -p 8080:8080 -v $PWD/models:/models -ti localai/localai:latest\n\n# Configure in Claudette:\nconst config = {\n  localai: {\n    enabled: true,\n    base_url: \"http://localhost:8080/v1\",\n    model: \"gpt-3.5-turbo\",  // LocalAI model name\n    cost_per_token: 0,  // FREE!\n    priority: 7\n  }\n};\n```\n\n### 📊 **Cost Comparison Table**\n\n| Provider | Model | Cost/1M Tokens | vs Claude Pro | Quality | Speed |\n|----------|-------|----------------|---------------|---------|-------|\n| **Claude Pro** | claude-3-sonnet | $3.00 | Baseline | Excellent | Fast |\n| **Qwen Plus** | qwen-plus | $0.10 | **30x cheaper** | Excellent | Fast |\n| **DeepSeek** | deepseek-chat | $0.14 | **21x cheaper** | Very Good | Fast |\n| **Groq Mixtral** | mixtral-8x7b | $0.27 | **11x cheaper** | Very Good | **Ultra Fast** |\n| **Together AI** | llama-2-70b | $0.20 | **15x cheaper** | Very Good | Fast |\n| **Ollama** | llama2:70b | **$0.00** | **∞ cheaper** | Good | Medium |\n| **LocalAI** | Various | **$0.00** | **∞ cheaper** | Varies | Medium |\n\n### 🎯 **Smart Cost Optimization Strategy**\n\n#### **Tier 1: Ultra-Budget Setup ($0-5/month)**\n```javascript\nconst budgetConfig = {\n  // Free tier: 80% of requests\n  ollama: { \n    enabled: true, \n    priority: 1,\n    cost_per_token: 0,\n    use_cases: [\"development\", \"testing\", \"simple_queries\"]\n  },\n  \n  // Low-cost tier: 15% of requests  \n  qwen: { \n    enabled: true, \n    priority: 2,\n    cost_per_token: 0.0001,\n    use_cases: [\"research\", \"analysis\", \"content_generation\"]\n  },\n  \n  // Premium tier: 5% of requests\n  claude: { \n    enabled: true, \n    priority: 3,\n    cost_per_token: 0.003,\n    use_cases: [\"critical_decisions\", \"final_review\", \"complex_reasoning\"]\n  }\n};\n\n// Result: 10,000+ interactions for $5/month\n// vs 500 interactions with Claude Pro alone\n```\n\n#### **Tier 2: Performance Setup ($10-20/month)**\n```javascript\nconst performanceConfig = {\n  // Speed layer: 40% of requests\n  groq: { \n    enabled: true, \n    priority: 1,\n    cost_per_token: 0.00027,\n    use_cases: [\"real_time_chat\", \"quick_responses\"]\n  },\n  \n  // Quality layer: 40% of requests\n  qwen: { \n    enabled: true, \n    priority: 2,\n    cost_per_token: 0.0001,\n    use_cases: [\"content_creation\", \"analysis\"]\n  },\n  \n  // Premium layer: 20% of requests\n  claude: { \n    enabled: true, \n    priority: 3,\n    cost_per_token: 0.003,\n    use_cases: [\"complex_tasks\", \"critical_content\"]\n  }\n};\n\n// Result: 25,000+ interactions for $20/month\n// Premium quality with ultra-fast responses\n```\n\n### 🔧 **Easy Setup Guide**\n\n#### **1. Qwen (Alibaba Cloud) Setup**\n```bash\n# Step 1: Get free Alibaba Cloud account\n# Visit: https://www.alibabacloud.com/\n# Sign up with email (no credit card required for trial)\n\n# Step 2: Activate DashScope\n# Go to: https://dashscope.console.aliyun.com/\n# Click \"Activate Service\" (free tier included)\n\n# Step 3: Get API Key\n# Dashboard → API Keys → Create New Key\n# Copy the API key\n\n# Step 4: Configure Claudette\nexport QWEN_API_KEY=\"sk-your-qwen-key-here\"\nclaudette setup-credentials\n```\n\n#### **2. DeepSeek Setup**\n```bash\n# Step 1: Register at https://platform.deepseek.com/\n# $5 free credits, no credit card required\n\n# Step 2: Generate API key\n# API Keys → Create New Key\n\n# Step 3: Add to Claudette\nexport DEEPSEEK_API_KEY=\"sk-your-deepseek-key\"\n```\n\n#### **3. Ollama Local Setup**\n```bash\n# Install Ollama (one-time setup)\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Download a model (8GB+ RAM recommended)\nollama run llama2:7b    # Smaller model for testing\nollama run llama2:70b   # Larger model for production\n\n# Verify it works\ncurl http://localhost:11434/api/generate -d '{\n  \"model\": \"llama2\",\n  \"prompt\": \"Hello world\"\n}'\n```\n\n### 💡 **Advanced Cost Optimization Tips**\n\n#### **Geographic Arbitrage**\n```javascript\n// Use region-specific pricing\nconst asiaConfig = {\n  qwen: {\n    base_url: \"https://dashscope-ap-southeast-1.aliyuncs.com/compatible-mode/v1\",\n    // Often 20-30% cheaper in Asia Pacific regions\n  }\n};\n```\n\n#### **Batch Processing for Volume Discounts**\n```javascript\n// Process multiple requests together\nconst batchResults = await claudette.optimizeBatch([\n  { prompt: \"Question 1\" },\n  { prompt: \"Question 2\" },\n  { prompt: \"Question 3\" }\n], {\n  backend: \"qwen\",  // Use lowest cost backend for batches\n  batch_size: 10    // Optimize for volume pricing\n});\n```\n\n#### **Smart Caching for 50% Cost Reduction**\n```javascript\nconst cacheConfig = {\n  features: {\n    caching: true,\n    cache_ttl: 86400,  // 24 hour cache\n    intelligent_cache: true  // Cache similar queries\n  }\n};\n\n// Typical 40-60% cache hit rate = 40-60% cost savings\n```\n\n### 🎯 **Real-World Savings Examples**\n\n#### **Content Creator Workflow**\n```javascript\n// Before: Claude Pro only\n// Cost: $50/month for 1,000 articles\n// After: Smart routing\nconst contentWorkflow = {\n  research: \"qwen\",        // $2/month for research\n  draft: \"deepseek\",       // $3/month for drafts  \n  polish: \"claude\",        // $10/month for final polish\n  // Total: $15/month for same 1,000 articles\n  // Savings: 70% ($35/month)\n};\n```\n\n#### **Development Team**\n```javascript\n// Before: GPT-4 for everything\n// Cost: $200/month for team\n// After: Tiered approach\nconst devWorkflow = {\n  code_review: \"groq\",       // Ultra-fast, $5/month\n  documentation: \"qwen\",     // High quality, $8/month\n  architecture: \"claude\",    // Complex reasoning, $15/month\n  prototyping: \"ollama\",     // Free local development\n  // Total: $28/month vs $200/month\n  // Savings: 86% ($172/month)\n};\n```\n\n### 🏆 **Best Practices for Maximum Savings**\n\n1. **Start Free**: Begin with Ollama for development and testing\n2. **Graduate Smart**: Move to Qwen for production workloads\n3. **Premium Sparingly**: Use Claude/GPT-4 only for critical tasks\n4. **Cache Aggressively**: Enable caching for 40-60% cost reduction\n5. **Monitor Usage**: Track costs and optimize routing rules\n6. **Batch Processing**: Group similar requests for volume discounts\n\n---\n\n## 🔧 API Usage\n\n### Basic Backend Routing\n```javascript\nimport { Claudette } from 'claudette';\n\nconst claudette = new Claudette({\n  openai: { apiKey: process.env.OPENAI_API_KEY },\n  claude: { apiKey: process.env.ANTHROPIC_API_KEY }\n});\n\n// Automatic backend selection\nconst response = await claudette.optimize({\n  prompt: \"Explain quantum computing\",\n  max_tokens: 500\n});\n\nconsole.log(response.content);\nconsole.log(`Backend used: ${response.backend_used}`);\nconsole.log(`Cost: €${response.cost_eur}`);\n```\n\n### System Status\n```javascript\n// Check system status\nconst status = await claudette.getStatus();\nconsole.log(`System Health: ${status.healthy ? 'Healthy' : 'Unhealthy'}`);\nconsole.log(`Version: ${status.version}`);\nconsole.log(`Cache Hit Rate: ${status.cache.hit_rate}`);\n```\n\n---\n\n## 📖 Documentation\n\n### Core Documentation\n- **[API Reference](docs/API.md)** - Complete API documentation\n- **[Configuration Guide](docs/ENVIRONMENT_SETUP.md)** - Setup and configuration\n- **[Architecture Overview](docs/ARCHITECTURE.md)** - System design and components\n\n### Cost Optimization Guides\n- **[Claude Subscription Optimization](#-claude-subscription-optimization-guide)** - Maximize Claude Pro value\n- **[Low-Cost Token Providers](#-low-cost-token-providers--inference-services)** - 80-95% cost reduction strategies\n- **[Smart Routing Configuration](#-smart-cost-optimization-strategy)** - Tiered backend setup\n\n### Development Resources  \n- **[Configuration Examples](config/)** - Sample configurations\n- **[TypeScript Types](src/types/)** - Type definitions\n- **[Testing](tests/)** - Test examples and utilities\n\n---\n\n## 🤝 Contributing\n\n1. **Fork the repository**\n2. **Create a feature branch**: `git checkout -b feature/amazing-feature`\n3. **Test your changes**: `npm test`\n4. **Commit changes**: `git commit -m 'Add amazing feature'`\n5. **Push to branch**: `git push origin feature/amazing-feature`  \n6. **Open a Pull Request**\n\n### Development Setup\n```bash\ngit clone https://github.com/RobLe3/claudette.git\ncd claudette\nnpm install\nnpm run test:comprehensive  # Run full test suite\n```\n\n---\n\n## 📊 Current Version\n\n### ✅ v1.0.5 (Current)\n- **Backend Support**: OpenAI, Claude, Qwen, Ollama, and custom backends\n- **Advanced Memory Management**: Pressure-based scaling with emergency cleanup\n- **Ultra-Fast MCP Server**: Sub-second startup (264ms) for Claude Code integration\n- **Comprehensive Benchmarking**: Performance validation across all interfaces\n- **Harmonized Timeouts**: Intelligent retry logic with circuit breakers\n- **Monitoring**: Performance metrics and health monitoring\n- **Cost Tracking**: Real-time cost calculation and budget management\n- **Caching**: Intelligent response caching system\n- **TypeScript**: Full type safety and modern development experience\n- **CLI Tools**: Interactive setup and management commands\n\n---\n\n## 🐛 Support \u0026 Issues\n\n- **Issues**: [GitHub Issues](https://github.com/RobLe3/claudette/issues)\n- **Documentation**: [docs/](docs/)\n- **License**: [MIT License](LICENSE)\n\n---\n\n*Claudette v1.0.5 - Advanced AI Backend Router \u0026 Cost Optimizer with Ultra-Fast MCP Integration*","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froble3%2Fclaudette","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froble3%2Fclaudette","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froble3%2Fclaudette/lists"}