{"id":50883937,"url":"https://github.com/headless-start/cs2-rag-assistant","last_synced_at":"2026-06-15T15:01:47.350Z","repository":{"id":361906209,"uuid":"1256038604","full_name":"headless-start/cs2-rag-assistant","owner":"headless-start","description":"This repository contains a Retrieval-Augmented Generation assistant for Counter-Strike 2 — hybrid retrieval, cross-encoder reranking, grounded answers with inline citations, and a RAGAS evaluation.","archived":false,"fork":false,"pushed_at":"2026-06-01T18:12:42.000Z","size":290,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T20:13:31.417Z","etag":null,"topics":["bm25","counter-strike-2","fastapi","hybrid-search","llm","nlp","python","qdrant","rag","ragas","reranking","retrieval-augmented-generation","semantic-search","sentence-transformers","streamlit","vector-search"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/headless-start.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-01T12:03:16.000Z","updated_at":"2026-06-01T18:55:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/headless-start/cs2-rag-assistant","commit_stats":null,"previous_names":["headless-start/cs2-rag-assistant"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/headless-start/cs2-rag-assistant","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fcs2-rag-assistant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fcs2-rag-assistant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fcs2-rag-assistant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fcs2-rag-assistant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/headless-start","download_url":"https://codeload.github.com/headless-start/cs2-rag-assistant/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fcs2-rag-assistant/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34367696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bm25","counter-strike-2","fastapi","hybrid-search","llm","nlp","python","qdrant","rag","ragas","reranking","retrieval-augmented-generation","semantic-search","sentence-transformers","streamlit","vector-search"],"created_at":"2026-06-15T15:01:45.586Z","updated_at":"2026-06-15T15:01:47.332Z","avatar_url":"https://github.com/headless-start.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CS2 RAG Assistant\n\n## 📌 Project Overview\nThis project is a Retrieval-Augmented Generation (RAG) assistant for\n**Counter-Strike 2**. It answers questions about weapons and the economy, map\ncallouts, utility and CS2's smoke mechanics, round rules, and strategy — and\nevery answer is **grounded in a curated knowledge base with the source passages\ncited inline**, so you can see exactly what each claim is based on and when the\nassistant doesn't know.\n\nIt is the standalone successor to an earlier TF-IDF / intent-classifier CS2\nchatbot: the same domain, rebuilt as a real retrieval stack with hybrid search,\na cross-encoder reranker, grounded generation, and an offline evaluation.\n\n**Domain**: Counter-Strike 2 — weapons, economy, maps, utility, rules, strategy.  \n**Stack**: `bge-m3` embeddings, Qdrant, BM25, `bge-reranker-v2-m3`,\n`Qwen2.5-3B-Instruct`, FastAPI, Streamlit, RAGAS.  \n**Goal**: Accurate, grounded, cited answers from a small curated corpus — and\nback it with real evaluation numbers.\n\n![The chat UI: a grounded answer with inline citations and the source passages, each with its hybrid-retrieval scores](docs/demo.png)\n\n---\n\n## 🚀 Key Features\n1. **Hybrid Retrieval (not naive top-k)**:\n   - A dense search with **bge-m3** embeddings in **Qdrant** and a lexical\n     **BM25** search run in parallel.\n   - Their rankings are merged with **Reciprocal Rank Fusion**, so a query is\n     matched on both meaning and on exact terms like weapon names and callouts.\n2. **Cross-Encoder Reranking**:\n   - The fused candidates are re-scored by a **bge-reranker-v2-m3** cross-encoder\n     that reads the query and passage together, which is far more precise than\n     the first-stage scores.\n3. **Grounded Generation with Citations**:\n   - The top passages are handed to an instruct model that answers **only** from\n     them and cites each claim inline as `[n]`.\n   - When the passages don't cover the question, it says so instead of guessing.\n4. **Offline Evaluation**:\n   - A hand-written question set is scored with **RAGAS** — faithfulness, answer\n     relevancy, context precision and recall — fully offline with a local judge.\n5. **Runs Locally or in Docker**:\n   - Local embeddings and a local LLM by default, with no paid API.\n   - `docker compose up` brings up Qdrant + the API + the Streamlit UI.\n\n---\n\n## 🔍 Findings\nEvaluated on a 38-question set with [RAGAS](https://docs.ragas.io), scored\nentirely offline with a local judge (`Qwen2.5-3B-Instruct`, 4-bit):\n\n- **Context Precision**: **0.93** — retrieval surfaces the right passages.\n- **Context Recall**: **0.82** — coverage across the full 38 questions.\n- **Answer Relevancy**: **0.87** — answers stay on the question (35/38 scored).\n- **Faithfulness**: **0.75** — answers stick to the retrieved context (19/38 scored).\n\n![RAGAS scores](results/ragas_scores.png)\n\nContext recall and answer relevancy parse cleanly across the whole set. The\nstricter faithfulness and context-precision metrics are averaged over the\nquestions the small local judge could score cleanly; pointing a larger or hosted\njudge at the same cached generations (`python -m eval.run_eval --reuse`) closes\nthat gap. Full write-up in [eval/results.md](eval/results.md) — every number is\nreproducible from a real run, none are hand-set.\n\n---\n\n## 🧩 How It Works\n1. **Ingest** — the markdown corpus is split section-aware into ~200–300 token\n   chunks that carry their source file and heading, so answers can cite them.\n2. **Index** — chunks are embedded with bge-m3 into Qdrant, and a BM25 index is\n   built alongside for lexical matching.\n3. **Retrieve** — dense top-k and BM25 top-k are fused with Reciprocal Rank\n   Fusion, then reranked by the cross-encoder down to the top few passages.\n4. **Answer** — the passages are numbered and given to the LLM, which answers\n   from them with `[n]` citations, or declines when the context is thin.\n\nA real exchange through the API:\n\n```text\nQ: What is the loss bonus ladder and how much is the bomb plant bonus?\n\nA: The loss bonus ladder goes up to $3400, starting from $1400 and increasing\n   by $500 after each consecutive loss. [2] Planting the bomb gives the team an\n   $800 bonus even on a round it loses. [1]\n\n   [1] economy-rewards.md › The plant bonus   (rerank 0.98)\n   [2] economy-rewards.md › overview          (rerank 0.79)\n```\n\nOut-of-scope questions are declined rather than answered from outside the\ncorpus (e.g. \"Who is the best NBA player of all time?\" → \"I don't have enough\ninformation to answer that.\"). The Streamlit UI shows the answer with an\nexpandable **Sources** panel that marks which passages were cited and lists each\none's dense, BM25 and reranker scores.\n\n---\n\n## ⚙️ How to Run\n### Docker\n```bash\ndocker compose up --build\n```\nStarts Qdrant, builds the index from the corpus, serves the API on `:8000` and\nthe Streamlit chat on `:8501`. Models download on first run into a cached\nvolume; open http://localhost:8501.\n\nThe default image runs on CPU. To run the API on a GPU (CUDA torch + 4-bit\ngeneration, needs the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)):\n```bash\ndocker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build\n```\n\n### Local\n```bash\npython -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -r requirements.txt\n\npython -m scripts.build_index          # chunk, embed, build BM25 + Qdrant\nuvicorn app.api:app --port 8000        # API\nstreamlit run app/ui.py                # UI (in a second shell)\n```\nA CUDA GPU is used automatically if present (set `LLM_4BIT=1` to load the\ngenerator in 4-bit on an 8 GB card); everything also runs on CPU, just slower.\n\nSwapping in a hosted model is one change: set `LLM_PROVIDER=openai`,\n`OPENAI_BASE_URL`, `OPENAI_API_KEY` and `GEN_MODEL`.\n\n---\n\n## 🛠 System Requirements\n- Python 3.11+\n- Key libraries: `sentence-transformers`, `transformers`, `qdrant-client`,\n  `rank-bm25`, `fastapi`, `streamlit`, `ragas` (full list in `requirements.txt`)\n- Optional: Docker and Docker Compose; an NVIDIA GPU for faster local inference\n\n---\n\n## 📄 License\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file\nfor details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheadless-start%2Fcs2-rag-assistant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheadless-start%2Fcs2-rag-assistant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheadless-start%2Fcs2-rag-assistant/lists"}