{"id":31078156,"url":"https://github.com/drcbeatz/eks-chat","last_synced_at":"2026-04-13T04:04:04.319Z","repository":{"id":313848300,"uuid":"1052287554","full_name":"DrCBeatz/eks-chat","owner":"DrCBeatz","description":"Demo Medical Policy LLM RAG chatbot deployed on AWS EKS using Claude 3 Haiku (Python/FastAPI/LangChain/Docker/AWS Bedrock/Kubernetes/Helm)","archived":false,"fork":false,"pushed_at":"2025-09-09T04:48:46.000Z","size":55,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-16T08:19:21.269Z","etag":null,"topics":["aws","bedrock","claude-haiku","css","docker","eks","faiss","fastapi","helm","html","javascript","kubernetes","langchain","makefile","python","rag-chatbot","s3","sse","titan-embeddings"],"latest_commit_sha":null,"homepage":"http://eks-chat-web-637423503463-us-east-1.s3-website-us-east-1.amazonaws.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DrCBeatz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-07T19:31:05.000Z","updated_at":"2025-09-09T04:50:58.000Z","dependencies_parsed_at":"2025-09-09T06:29:42.009Z","dependency_job_id":"cafa1364-28f2-4468-82e0-0b0828e7b96f","html_url":"https://github.com/DrCBeatz/eks-chat","commit_stats":null,"previous_names":["drcbeatz/eks-chat"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DrCBeatz/eks-chat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrCBeatz%2Feks-chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrCBeatz%2Feks-chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrCBeatz%2Feks-chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrCBeatz%2Feks-chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DrCBeatz","download_url":"https://codeload.github.com/DrCBeatz/eks-chat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrCBeatz%2Feks-chat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31739050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T03:27:07.512Z","status":"ssl_error","status_checked_at":"2026-04-13T03:26:53.610Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","bedrock","claude-haiku","css","docker","eks","faiss","fastapi","helm","html","javascript","kubernetes","langchain","makefile","python","rag-chatbot","s3","sse","titan-embeddings"],"created_at":"2025-09-16T08:05:26.737Z","updated_at":"2026-04-13T04:04:04.305Z","avatar_url":"https://github.com/DrCBeatz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EKS Medical Policy Chat (Bedrock) — Streaming + RAG Demo\n\nA tiny streaming chat app that runs on **AWS EKS**, serves a static **web UI on S3**, calls **Amazon Bedrock** (Claude 3 Haiku), and optionally performs **RAG** using **Titan Embeddings v2** + **FAISS**. It’s designed to be easy to demo and cheap to run.\n\n\u003e **Demo intent only.** Not medical advice. Do not enter PHI.\n\n---\n\n## 👩‍⚕️ 2‑Minute Tour (for non‑technical reviewers)\n\n1. **Open** the web app (S3 website URL).\n2. In the header, check the chips:\n   - **API** — current backend host (ALB). Click **“set API”** to change.\n   - **KB** — knowledge‑base status (files \u0026 chunks from `/rag/status`).\n   - **RAG** — appears when RAG is ON.\n   - **STRICT** — appears when Strict mode is ON.\n3. In the toolbar:\n   - **Temperature** — lower = more deterministic; higher = more creative.\n   - **RAG** — when ON, answers can cite the internal **policy knowledge base**.\n   - **Strict** — when ON, answers must come **only** from the policy context; otherwise the bot replies **“Not in policy.”**\n4. (Optional) Add a **System prompt** to change tone/role.\n5. Try a few questions (use the sample buttons under the composer):\n   - **In‑policy**: “List covered CGM codes.” → grounded answer + **Sources**.\n   - **Out‑of‑policy**: “What is the capital of France?”  \n     - **RAG ON + Strict ON** → **Not in policy.**  \n     - **RAG ON + Strict OFF** → model may answer from general knowledge (and **won’t** show sources for this query).\n\n\u003e The **Sources** panel appears when the backend has determined the answer was supported by the policy context (always shown in Strict mode, otherwise shown only if supported).\n\n---\n\n## What’s new (since 0.5)\n\n- 🧱 **Strict Mode Guardrail** — JSON “support judge” route gates answers to policy only; strict path short‑circuits to **“Not in policy.”**\n- 🧾 **Citations logic** — in **non‑strict**, sources are shown **only when** the retrieved context supported the answer; in **strict**, sources are always shown.\n- 🧭 **UI chips/badges** — **RAG**, **STRICT**, **API host**, and **KB status** chips.\n- 🎯 **Sample prompt buttons** — one‑click examples for reviewers.\n- ⚠️ **Safety banner** — “Demo only — not medical advice; do not enter PHI.”\n\n---\n\n## Features\n\n- ⚡ **Streaming responses (SSE)** — tokens arrive as the model generates them.\n- 🧠 **RAG** — Titan Embeddings v2 + FAISS; index can be rebuilt from S3.\n- 🧾 **Sources / Evidence** — the UI shows which chunks supported the answer.\n- 🧱 **Strict Mode** — when enabled, answers must be grounded in retrieved policy context; otherwise the bot replies **“Not in policy.”**\n- 🧩 **System prompt \u0026 Temperature** — tweak behavior and creativity (saved in localStorage).\n- ☁️ **One‑command deploy** — Makefile + Helm for EKS, ALB, ECR, and S3 website.\n- 🔐 **Admin endpoint** — `/rag/reindex` protected by a shared token (`RAG_TOKEN`).\n\n---\n\n## Architecture\n\nThe diagram below is also included as `architecture.svg` in the repo.\n\n![Architecture](./architecture.svg)\n\n**High‑level:**\n\n- **Frontend**: S3 static website (`web/`) calling the API (`/chat`, `/chat/stream`, `/rag/status`).\n- **Backend**: FastAPI container on EKS (SSE streaming) with a **Support Judge** micro‑prompt to enforce strict grounding.\n- **Models**: Bedrock (Claude 3 Haiku) for chat + judge; Titan Embeddings v2 for RAG.\n- **RAG**: FAISS index persisted under `/app/store` (PVC), refreshed from S3 via `/rag/reindex`.\n- **Auth**: `/rag/reindex` requires `X‑RAG‑Token` (shared secret).\n\n---\n\n## Prerequisites\n\n- AWS account + CLI configured\n- Tools: `docker`, `kubectl`, `helm`, `eksctl`, `jq`, `make`\n- **Bedrock access** in your region (us‑east‑1) for:\n  - `anthropic.claude-3-haiku-20240307-v1:0`\n  - `amazon.titan-embed-text-v2:0`\n\n---\n\n## Quick Start (Local)\n\n1. **Install deps**\n   ```bash\n   python -m venv .venv \u0026\u0026 source .venv/bin/activate\n   pip install -r requirements.txt\n   ```\n\n2. **Run API**\n   ```bash\n   export AWS_REGION=us-east-1\n   export MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0\n   export USE_LANGCHAIN=false     # optional; forces raw boto3 path for /chat\n   uvicorn app.main:app --reload --port 8000\n   ```\n\n3. **Open UI**\n   - Create/edit `web/config.js`:\n     ```js\n     window.API_BASE_URL = 'http://localhost:8000';\n     ```\n   - Open `web/index.html` in your browser (or use `?api=http://localhost:8000`).\n\n---\n\n## Deploy to EKS\n\n**Create cluster + bootstrap IAM (one‑time):**\n```bash\nmake cluster-up           # EKS cluster + OIDC + bedrock invoke policy + SA\n```\n\n**Build \u0026 push** the image to ECR:\n```bash\nTAG=$(git rev-parse --short HEAD) make build-push\n```\n\n**Create RAG bucket** and seed demo content:\n```bash\nmake rag-bucket           # creates s3://\u003ccluster\u003e-rag-\u003cacct\u003e-\u003cregion\u003e/docs/demo.md\nmake rag-iam              # attaches S3 read policy to the workload role\n```\n\n**Deploy the app** (ALB Service + envs):\n```bash\nRAG_TOKEN='your-strong-token' make deploy TAG=$(git rev-parse --short HEAD)\nmake url   # prints API base URL (ALB hostname)\n```\n\n---\n\n## Frontend Website (S3)\n\nCreate a public S3 static website (demo‑only approach):\n```bash\nmake frontend-up\n# Point the web UI at your API (ALB hostname)\necho \"window.API_BASE_URL = 'http://\u003cALB_HOSTNAME\u003e'\" \u003e web/config.js\nmake frontend-deploy\nmake frontend-url   # prints the website URL\n```\n\u003e **Production**: prefer CloudFront + OAI/WAF rather than a public S3 website.\n\n---\n\n## RAG Admin \u0026 Data\n\n1. **Upload / update** docs:\n   ```bash\n   aws s3 sync data/ s3://$RAG_BUCKET/docs/\n   ```\n2. **Rebuild** the FAISS index in the pod:\n   ```bash\n   curl -sS -X POST \"$API_URL/rag/reindex\" -H \"X-RAG-Token: $RAG_TOKEN\" | jq .\n   ```\n3. **Verify**:\n   ```bash\n   curl -sS \"$API_URL/rag/status\" | jq .\n   curl -sS \"$API_URL/rag/search?q=CGM%20codes\u0026k=3\" | jq .\n   ```\n4. **Manual tests**:\n   - **RAG ON + Strict ON**  \n     - “List covered CGM codes” → policy‑grounded answer + **Sources**\n     - “What is the capital of France?” → **Not in policy.**\n   - **RAG ON + Strict OFF** — model can use general knowledge; sources only appear when the policy context supported the answer.\n   - **RAG OFF** — chat behaves like a general LLM (no Sources).\n\n---\n\n## API Examples\n\n### Non‑streaming `/chat`\n```bash\ncurl -sS -X POST \"$API_URL/chat\" \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n        \"messages\": [{\"role\":\"user\",\"content\":\"List covered CGM codes\"}],\n        \"rag\": true,\n        \"strict\": true,\n        \"temperature\": 0.2\n      }' | jq\n```\n**Response** (truncated):\n```json\n{\n  \"answer\": \"According to the policy context provided...\",\n  \"sources\": [\n    {\"source\":\"s3://.../docs/demo.md\",\"section\":\"Covered Codes (examples)\",\"preview\":\"- A4238 ...\"}\n  ]\n}\n```\n\n### Streaming `/chat/stream` (SSE)\n```bash\ncurl -N -X POST \"$API_URL/chat/stream\" \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n        \"messages\":[{\"role\":\"user\",\"content\":\"What is the capital of France?\"}],\n        \"rag\": true,\n        \"strict\": true,\n        \"temperature\": 0.2\n      }'\n```\n**SSE frames**:\n- Early **`sources`** frame (for the UI)\n- Multiple **`delta`** frames as tokens stream\n- Final **`done`** frame  \nIn strict mode, out‑of‑scope questions short‑circuit with **`delta: \"Not in policy.\"`**.\n\n### RAG Admin\n```bash\n# Reindex from S3\ncurl -sS -X POST \"$API_URL/rag/reindex\" -H \"X-RAG-Token: $RAG_TOKEN\" | jq .\n\n# Status\ncurl -sS \"$API_URL/rag/status\" | jq .\n\n# Debug search\ncurl -sS \"$API_URL/rag/search?q=glp-1%20renewal\u0026k=3\" | jq .\n```\n\n---\n\n## UI Reference\n\n- **RAG** — toggles use of the knowledge base (S3‑backed policy index).\n- **Strict** — answers only if supported by retrieved policy context; else **“Not in policy.”**\n- **Temperature** — lower (e.g., 0.2) = more deterministic; higher = more creative.\n- **System prompt** — optional; sets tone/role (saved locally).\n- **Sources** — shows which chunk/section supported the answer (always in strict; conditional in non‑strict).\n- **Header chips** — **API**, **KB**, **RAG**, **STRICT** show live state.\n\n---\n\n## Configuration (env)\n\n| Var             | Default                                           | Notes                                                     |\n|-----------------|---------------------------------------------------|-----------------------------------------------------------|\n| `AWS_REGION`    | `us-east-1`                                       | Must match Bedrock and S3 region                          |\n| `MODEL_ID`      | `anthropic.claude-3-haiku-20240307-v1:0`          | Claude 3 Haiku via Bedrock                                |\n| `USE_LANGCHAIN` | `true`                                            | `false` to force raw boto3 path for `/chat`               |\n| `RAG_S3_BUCKET` | *(none)*                                          | e.g. `eks-chat-rag-\u003cacct\u003e-us-east-1`                      |\n| `RAG_S3_PREFIX` | `docs/`                                           | Folder under bucket                                       |\n| `RAG_TOKEN`     | *(empty)*                                         | If set, required for `/rag/reindex`                       |\n\nHelm chart sets these with `--set env.*=...`.\n\n---\n\n## Costs \u0026 Cleanup\n\nApprox hourly costs (us‑east‑1, estimates):\n- EKS control plane: ~**$0.10/h**\n- ALB: ~**$0.02+/h**\n- 1× t4g.small worker: a few **¢/h**\n- S3 website: fractions of a cent for light traffic\n- RAG embeddings: **Titan v2** for tiny docs rounds to **$0.00**\n\n**Stop LB but keep cluster**:\n```bash\nhelm uninstall bedrock-chat\n```\n\n**Delete everything**:\n```bash\nmake down\n```\n\n---\n\n## Troubleshooting\n\n- **`AccessDenied: s3:ListBucket`** — Attach RAG S3 read policy to the workload role:\n  ```bash\n  make rag-iam\n  kubectl rollout restart deploy/bedrock-chat\n  ```\n- **`/rag/search` NameError** — ensure the import \u0026 name match in `main.py`:\n  ```python\n  from app.rag import search as rag_search_impl  # (fix typo)\n  ```\n- **Streaming differs from `/chat`** — `/chat` can use LangChain while `/chat/stream` uses boto3. To unify behavior:\n  ```bash\n  --set env.USE_LANGCHAIN=\"false\"\n  ```\n\n---\n\n## License\n\nMIT. Demo content only.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrcbeatz%2Feks-chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrcbeatz%2Feks-chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrcbeatz%2Feks-chat/lists"}