{"id":50400039,"url":"https://github.com/vcal-project/ai-firewall","last_synced_at":"2026-06-13T01:01:22.480Z","repository":{"id":344572785,"uuid":"1180194161","full_name":"vcal-project/ai-firewall","owner":"vcal-project","description":"OpenAI-compatible LLM gateway that reduces API costs using Redis exact cache and Qdrant semantic cache.","archived":false,"fork":false,"pushed_at":"2026-06-10T10:17:29.000Z","size":1242,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T11:20:50.237Z","etag":null,"topics":["ai-cost-optimization","ai-gateway","ai-infrastructure","llm","openai","qdrant","redis","rust","semantic-cache","vector-search"],"latest_commit_sha":null,"homepage":"https://vcal-project.com","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vcal-project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-12T19:51:50.000Z","updated_at":"2026-06-10T10:17:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"8f621537-dc0f-4483-b1e4-afb1d243b9c6","html_url":"https://github.com/vcal-project/ai-firewall","commit_stats":null,"previous_names":["vcal-project/ai-firewall"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/vcal-project/ai-firewall","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vcal-project%2Fai-firewall","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vcal-project%2Fai-firewall/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vcal-project%2Fai-firewall/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vcal-project%2Fai-firewall/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vcal-project","download_url":"https://codeload.github.com/vcal-project/ai-firewall/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vcal-project%2Fai-firewall/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34268189,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-cost-optimization","ai-gateway","ai-infrastructure","llm","openai","qdrant","redis","rust","semantic-cache","vector-search"],"created_at":"2026-05-30T23:00:28.750Z","updated_at":"2026-06-13T01:01:22.472Z","avatar_url":"https://github.com/vcal-project.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# AI Cost Firewall\n\n![Rust](https://img.shields.io/badge/Rust-stable-orange)\n![License](https://img.shields.io/github/license/vcal-project/ai-firewall)\n![Docker](https://img.shields.io/badge/docker-ready-blue)\n![Status](https://img.shields.io/badge/status-pilot--ready-blue)\n\n## Pilot-ready OpenAI-compatible gateway for LLM caching, cost control, and observability\n\nAI Cost Firewall is a lightweight OpenAI-compatible API gateway that reduces LLM API cost and latency through two cache layers:\n\n* exact cache using Redis\n* semantic cache using Qdrant\n\nOnly cache misses are forwarded to the upstream LLM endpoint.\n\nv0.2.0 is the first pilot-ready milestone of AI Cost Firewall. It consolidates the v0.1.x work into a stable OpenAI-compatible gateway model for caching, cost visibility, and operational diagnostics.\n\nAI Cost Firewall is developed and maintained by VCAL Labs, Inc.\n\n---\n\n# Why AI Cost Firewall?\n\nLLM applications frequently generate repeated or semantically similar prompts.\n\nWithout caching, every request results in:\n\n- repeated upstream API calls\n- additional token usage\n- higher cost\n- avoidable latency\n\nAI Cost Firewall introduces a two-layer cache:\n\n1. Exact cache (Redis)\n2. Semantic cache (Qdrant)\n\nThe firewall behaves similarly to “nginx for LLM APIs”:\n\n- applications call AI Cost Firewall\n- the firewall evaluates exact and semantic cache reuse\n- only cache misses reach the upstream provider\n\nSupported OpenAI-compatible providers include:\n\n- OpenAI\n- Ollama\n- LM Studio\n- vLLM\n- LiteLLM\n- OpenRouter\n\n---\n\n# v0.2.1 Release Focus\n\nAI Cost Firewall v0.2.1 builds on the v0.2.0 pilot-ready baseline with additional gateway controls, clearer fail-open behavior, and improved deployment diagnostics.\n\nThis release focuses on:\n\n* configurable exact cache enable/disable behavior\n* explicit Redis/exact-cache fail-open behavior\n* separate upstream and embedding timeout controls\n* request body and prompt-size protection\n* independent exact and semantic cache store controls\n* per-request cache bypass using `X-AIF-Cache-Bypass`\n* metrics endpoint access-control configuration\n* configurable readiness dependency behavior for Redis, Qdrant, and upstream providers\n* improved Grafana Overview and Diagnostics dashboards\n* cache-bypass visibility in Prometheus and Grafana\n* cleaner Docker runtime image for release testing\n* continued support for OpenAI-compatible chat and embedding APIs\n\nv0.2.1 is an operational hardening release. It keeps the v0.2.0 architecture stable while making the gateway easier to test, debug, and deploy in pilot and production-like environments.\n\n---\n\n# Included Dashboards\n\nAI Cost Firewall v0.2.1 includes Grafana dashboards for cost visibility, cache effectiveness, and operational diagnostics.\n\nThe dashboards are included in the Docker deployment files and are automatically provisioned by Grafana when using the provided Docker Compose setup.\n\n## Cost Savings Overview\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"assets/grafana/ai-firewall-overview-021.png\"\u003e\n    \u003cimg src=\"assets/grafana/ai-firewall-overview-021.png\" alt=\"AI Cost Firewall Grafana Dashboard\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cem\u003e30-minute cold-cache demo run with local simulated OpenAI-compatible upstream.\u003c/em\u003e\n\u003c/p\u003e\n\nThe Overview dashboard shows the high-level cost and cache impact of AI Cost Firewall.\n\nIt demonstrates:\n\n- total request volume\n- estimated chat-completion cost\n- gross savings from cache reuse\n- embedding overhead\n- net savings after embedding cost\n- net savings percentage\n- cache hit rate\n- exact and semantic cache activity\n- cache bypass request rate\n- per-model spend and savings\n- savings by cache type\n\nThis dashboard is intended for quick validation, demos, and cost-savings reviews.\n\n---\n\n## Semantic Diagnostics\n\n[![AI Cost Firewall Grafana Dashboard](assets/grafana/ai-firewall-diagnostics-021.png)](assets/grafana/ai-firewall-diagnostics-021.png)\n\u003cp align=\"center\"\u003e\n  \u003cem\u003eSemantic diagnostics from the same cold-cache demo run, including readiness, threshold behavior, lookup latency, and cache activity.\u003c/em\u003e\n\u003c/p\u003e\n\nThe Diagnostics dashboard provides a deeper operational view of semantic-cache behavior and runtime health.\n\nIt demonstrates:\n\n- readiness state\n- semantic lookup volume\n- semantic threshold pass/fail behavior\n- semantic candidate evaluation\n- expired semantic entries skipped during lookup\n- semantic lookup latency\n- upstream and embedding latency\n- embedding overhead by operation\n- gross vs net semantic savings\n- exact vs semantic savings\n- semantic cache misses vs threshold passes\n- semantic store health\n- runtime and provider pressure signals\n- provider error classes\n\nThis dashboard is intended for troubleshooting, tuning semantic similarity thresholds, validating fail-open behavior, and understanding runtime cache behavior during pilots.\n\n---\n\n# Deployment Patterns\n\nAI Cost Firewall includes ready-to-run deployment examples under:\n\n```text\ndeploy/examples/\n```\n\nAvailable patterns:\n\n| Pattern | Description |\n|---|---|\n| `openai-cloud/` | Fastest cloud evaluation path |\n| `local-ollama/` | Fully local OpenAI-compatible deployment |\n| `hybrid-openai-local-embeddings/` | OpenAI chat + local embeddings |\n| `openrouter/` | OpenRouter upstream with OpenAI embeddings |\n| `local-full-stack/` | Fully local stack with dashboards |\n\nEach example includes:\n\n- `docker-compose.yml`\n- minimal configuration\n- example requests\n- expected behavior\n- expected metrics\n- optional observability overlays\n\n---\n\n# Architecture Overview\n\n[![AI Cost Firewall Architecture Diagram](assets/architecture/ai-cost-firewall-diagram.png)](assets/architecture/ai-cost-firewall-diagram.png)\n\nClient applications send requests to AI Cost Firewall instead of directly to the LLM provider.\n\nThe firewall:\n\n1. validates requests\n2. checks exact cache\n3. checks semantic cache\n4. forwards only cache misses upstream\n5. exposes metrics and operational diagnostics\n\nFull architecture documentation:\n\n```text\ndocs/architecture.md\n```\n\n---\n\n# Quick Start (Docker)\n\n## Prerequisites\n\nInstall:\n\n- Docker\n- Docker Compose\n\nVerify installation:\n\n```bash\ndocker --version\ndocker compose version\n```\n\n---\n\n## Clone the repository\n\n```bash\ngit clone https://github.com/vcal-project/ai-firewall.git\ncd ai-firewall\n```\n\nCopy the example configuration:\n\n```bash\ncp configs/ai-firewall.conf.example configs/ai-firewall.conf\n```\n\nEdit the configuration and add your API key:\n\n```bash\nnano configs/ai-firewall.conf\n```\n\n---\n\n## Start the stack\n\nThe default deployment starts:\n\n- AI Cost Firewall\n- Redis\n- Qdrant\n- Prometheus\n- Grafana\n\n```bash\ndocker compose pull\ndocker compose up -d\n```\n\n---\n\n## Validate the deployment\n\n```bash\ncurl http://localhost:8080/healthz\ncurl http://localhost:8080/readyz\ncurl http://localhost:8080/version\n```\n\nExpected:\n\n```text\nOK\nready\n```\n\nThe `/version` endpoint returns release metadata, including the AI Cost Firewall version, release title, and OpenAI-compatible compatibility model.\n\n---\n\n## Example Request\n\n```bash\ncurl http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"gpt-4o-mini-2024-07-18\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Explain Redis briefly.\"}\n    ]\n  }'\n```\n\nRun the same request twice.\n\n- The first request should go upstream.\n- The second request should be served from cache.\n\n---\n\n# Operational Features\n\nAI Cost Firewall includes operational safeguards and observability features designed for real deployments.\n\n## Runtime Features\n\n- readiness and liveness endpoints\n- graceful shutdown with request draining\n- startup dependency validation\n- nginx-style configuration reload (SIGHUP)\n- structured Prometheus metrics\n- semantic cache lifecycle control\n- upstream timeout tracking\n- request size protection\n- runtime diagnostics\n- configurable semantic cache fail-open behavior\n\n---\n\n## Health Endpoints\n\n| Endpoint | Purpose |\n|---|---|\n| `/healthz` | Process liveness |\n| `/readyz` | Ready to serve traffic |\n\n---\n\n## Configuration Validation\n\nValidate configuration statically before startup:\n\n```bash\ndocker compose run --rm firewall \\\n  --config /configs/ai-firewall.conf \\\n  --test-config\n```\n\nExpected output:\n\n```text\nconfiguration OK\n```\n\n---\n\n## Semantic Cache Fail-Open Behavior\n\nWhen `semantic_cache_fail_open` is enabled, runtime semantic cache lookup or embedding failures skip semantic cache and continue to the upstream LLM endpoint.\n\nThis setting applies to runtime semantic cache behavior. It does not bypass startup dependency validation when semantic cache is enabled. If semantic cache is enabled, Qdrant must be reachable during startup and the configured vector size must match the collection.\n\n---\n\n## Print Loaded Configuration\n\n```bash\ndocker compose run --rm firewall \\\n  --config /configs/ai-firewall.conf \\\n  --print-config\n```\n\nSecrets are automatically masked.\n\n---\n\n# OpenAI-Compatible Providers\n\nAI Cost Firewall supports practical OpenAI-compatible deployments while keeping a simple flat configuration model.\n\nThe current model is:\n\n```text\nupstream_provider openai_compatible;\nembedding_provider openai_compatible;\n```\n\nThis means AI Cost Firewall expects OpenAI-style chat and embedding APIs. It does not yet provide provider-specific configuration blocks or native provider-specific request transformations.\n\nCommon OpenAI-compatible deployment patterns include:\n\n| Runtime or Gateway | Usage Pattern                               |\n| ------------------ | ------------------------------------------- |\n| OpenAI             | Cloud OpenAI-compatible chat and embeddings |\n| Ollama             | Local OpenAI-compatible model endpoint      |\n| LM Studio          | Local OpenAI-compatible model endpoint      |\n| vLLM               | Self-hosted OpenAI-compatible serving       |\n| LiteLLM            | Gateway in front of multiple providers      |\n| OpenRouter         | OpenAI-compatible hosted gateway            |\n\nExample configuration:\n\n```text\nupstream_provider openai_compatible;\nupstream_base_url https://api.openai.com;\nupstream_api_key sk-your-key;\n\nembedding_provider openai_compatible;\nembedding_base_url https://api.openai.com;\nembedding_api_key sk-your-key;\n```\n\nThe upstream provider and embedding provider may use different OpenAI-compatible base URLs.\n\nImportant limitations:\n\n* AI Cost Firewall does not claim universal compatibility with every OpenAI-like API.\n* Native Anthropic, Gemini, Mistral, and Cohere APIs are not directly supported in v0.2.0.\n* Mistral, Anthropic, Gemini, or other providers may be used only when exposed through an OpenAI-compatible layer such as LiteLLM, OpenRouter, or another compatible gateway.\n* Provider-specific config blocks, fallback chains, native provider transformations, and provider-specific pricing catalogs are intentionally postponed until after v0.2.0.\n\nSee:\n\n```text\nconfigs/examples/\ndeploy/examples/\ndocs/provider-compatibility.md\n```\n\n---\n\n# Metrics Overview\n\nMetrics are exposed at:\n\n```text\nhttp://localhost:8080/metrics\n```\n\nExample metrics:\n\n```text\naif_requests_total\naif_cache_exact_hits\naif_cache_semantic_hits\naif_model_cost_micro_usd_total\naif_gross_saved_micro_usd_total\naif_net_saved_micro_usd_total\naif_embedding_overhead_micro_usd_total\n```\n\nAI Cost Firewall reports:\n\n- gross chat-completion savings\n- embedding overhead\n- net savings after embedding cost\n- cache hit ratios\n- semantic cache diagnostics\n- per-model traffic and cost metrics\n\n---\n\n# Configuration\n\nAI Cost Firewall uses a simple nginx-style configuration format.\n\nMinimal example:\n\n```text\nlisten_addr 0.0.0.0:8080;\n\nredis_url redis://redis:6379;\n\nupstream_provider openai_compatible;\nupstream_base_url https://api.openai.com;\nupstream_api_key sk-your-key;\n\nsemantic_cache_enabled true;\n```\n\nFull documentation:\n\n- `docs/config-reference.md`\n- `docs/provider-compatibility.md`\n- `docs/quickstart.md`\n\n---\n\n## Benchmarks\n\nAI Cost Firewall v0.2.0 has been benchmarked with a local simulated OpenAI-compatible upstream provider to isolate gateway behavior, Redis/Qdrant integration, cache effectiveness, and Prometheus metrics without external API cost or provider rate-limit noise.\n\nIn a 30-minute cache-effectiveness benchmark, AI Cost Firewall sustained 30 RPS with 0% request failures, p95 latency of 9.03 ms, and a 98.86% aggregate cache-hit rate.\n\nIn a single-VM high-load benchmark, AI Cost Firewall sustained approximately 500 RPS for 5 minutes with 0% HTTP failures. Higher RPS values caused instability in the single-VM test environment, so this should be treated as a local benchmark observation, not a universal capacity limit.\n\nSee [BENCHMARKS.md](BENCHMARKS.md) for benchmark methodology, environment, limitations, and detailed results.\n\n---\n\n# Troubleshooting\n\nSee:\n\n- `docs/troubleshooting.md`\n- `docs/provider-compatibility.md`\n- `docs/operation.md`\n\nCommon issues include:\n\n- incorrect upstream base URLs\n- provider TLS/certificate failures\n- embedding dimension mismatches\n- Qdrant vector-size mismatch\n- unsupported provider behavior\n- semantic threshold tuning\n\n---\n\n# Documentation\n\n| Document | Description |\n|---|---|\n| `docs/architecture.md` | System architecture |\n| `docs/config-reference.md` | Configuration directives |\n| `docs/faq.md` | Frequently asked questions |\n| `docs/how-it-works.md` | Request flow and cache logic |\n| `docs/metrics-and-costs.md` | Cost and savings accounting |\n| `docs/operation.md` | Runtime behavior |\n| `docs/provider-compatibility.md` | OpenAI-compatible providers |\n| `docs/quickstart.md` | Extended setup guide |\n| `docs/troubleshooting.md` | Troubleshooting guide |\n\nFull documentation:\n\nhttps://ai-firewall.docs.vcal-project.com/\n\n---\n\n# Build from Source\n\n```bash\ngit clone https://github.com/vcal-project/ai-firewall.git\ncd ai-firewall\n\ncargo build --release\ncargo run --release\n```\n\n---\n\n# Testing\n\nRun tests:\n\n```bash\ncargo test\n```\n\nAI Cost Firewall includes tests for:\n\n- configuration validation\n- request validation\n- semantic cache requirements\n- semantic cache fail-open behavior\n- environment variable parsing\n- request size parsing\n- cost accounting logic\n\n---\n\n# Contributing\n\nContributions are welcome.\n\nAreas where contributions are especially valuable:\n\n- documentation\n- performance\n- observability\n- provider compatibility\n- deployment examples\n- testing\n\nSee:\n\n```text\nCONTRIBUTING.md\n```\n\n---\n\n# Integration with VCAL Semantic Cache\n\nAI Cost Firewall can optionally integrate with VCAL Semantic Cache for advanced semantic caching and distributed vector storage.\n\nhttps://vcal-project.com/vcal-server\n\n---\n\n# License\n\nApache License 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvcal-project%2Fai-firewall","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvcal-project%2Fai-firewall","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvcal-project%2Fai-firewall/lists"}