{"id":50323053,"url":"https://github.com/olesyastorchakprojects/rag_engineering_playground","last_synced_at":"2026-05-29T04:01:48.983Z","repository":{"id":350939676,"uuid":"1207441161","full_name":"olesyastorchakprojects/rag_engineering_playground","owner":"olesyastorchakprojects","description":"Specification-first RAG engineering playground for evaluation, observability, and comparative experiments","archived":false,"fork":false,"pushed_at":"2026-04-12T20:59:55.000Z","size":17377,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-12T22:27:44.163Z","etag":null,"topics":["eval","observability","python","rag","rust","specification-first"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/olesyastorchakprojects.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-11T00:20:34.000Z","updated_at":"2026-04-12T20:59:58.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/olesyastorchakprojects/rag_engineering_playground","commit_stats":null,"previous_names":["olesyastorchakprojects/rag_engineering_playground"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/olesyastorchakprojects/rag_engineering_playground","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olesyastorchakprojects%2Frag_engineering_playground","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olesyastorchakprojects%2Frag_engineering_playground/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olesyastorchakprojects%2Frag_engineering_playground/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olesyastorchakprojects%2Frag_engineering_playground/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/olesyastorchakprojects","download_url":"https://codeload.github.com/olesyastorchakprojects/rag_engineering_playground/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olesyastorchakprojects%2Frag_engineering_playground/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33635961,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["eval","observability","python","rag","rust","specification-first"],"created_at":"2026-05-29T04:01:48.829Z","updated_at":"2026-05-29T04:01:48.969Z","avatar_url":"https://github.com/olesyastorchakprojects.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAG Engineering Playground\n\nA specification-first RAG engineering playground focused on controlled experimentation, evaluation, and observability.\n\nThis repository is designed for building, inspecting, and comparing retrieval pipelines end to end. It emphasizes request-level evidence capture, offline evaluation, comparative reporting, and observability-driven diagnosis.\n\nThe project focuses on making RAG systems easier to inspect, compare, and improve.\n\n---\n\n## Why this project exists\n\nIn RAG systems answer quality depends on multiple upstream decisions: document preparation, chunking, retrieval, reranking, and generation.\n\nThis project exists to make those layers easier to study as a system.\n\nIt is built around four engineering goals:\n\n- **Inspectability** — each request can be examined across retrieval, reranking, generation, and evaluation.\n- **Comparability** — different pipeline variants can be evaluated on the same request set.\n- **Reproducibility** — evaluation runs operate on persisted request evidence and produce stable artifacts.\n- **Specification-first design** — interfaces, schemas, and boundaries are treated as core engineering assets.\n\n---\n\n## Project focus\n\nThis repository is focused on:\n\n- controlled RAG experiments,\n- request-level evidence capture,\n- offline evaluation workflows,\n- comparative analysis of pipeline variants,\n- observability for diagnosis and iteration.\n\nIt is intended as an engineering playground for understanding system behavior, not only for producing answers.\n\n---\n\n## What is implemented\n\n### Data preparation\n- Structural chunking\n- Fixed-size chunking\n- Dense ingest\n- Hybrid ingest (bag-of-words and bm25)\n\n### Retrieval and ranking\n- Dense retrieval\n- Hybrid retrieval (bag-of-words and bm25)\n- Heuristic reranker\n- Cross-encoder reranker\n\n### Evaluation\n- Request capture for later offline evaluation\n- Persisted evaluation runs\n- Comparative reports across pipeline variants\n- Retrieval and answer-level metrics\n\n### Observability\n- Traces for request-level inspection\n- Metrics and dashboards for aggregate behavior\n\n\n---\n\n## What this project demonstrates\n\nThis repository is designed to support engineering questions such as:\n\n- How does chunking strategy affect retrieval and downstream generation?\n- Do retrieval gains translate into answer-quality gains?\n- How much does reranking change final system behavior?\n- Which metrics expose useful differences between pipeline variants?\n- How can request-level evidence be preserved for later comparison and analysis?\n\nThe purpose of the project is to make those questions easier to answer with artifacts, runs, and system evidence.\n\n### Current findings\n\nBased on the current experiment reports, several patterns already stand out:\n\n- chunking strategy changes retrieval behavior in meaningful ways;\n- retrieval gains do not always propagate to answer-level gains;\n- reranking effects are often easier to observe at ranking level than at final-answer level;\n- comparative evaluation is necessary because intuition alone is not a reliable guide.\n\nThese are working findings rather than final claims, but they already make the project useful as an engineering learning and diagnosis environment.\n\n---\n\n## Core architectural idea\n\nA central design choice in this repository is treating **request capture** as a first-class architectural boundary.\n\nInstead of relying only on live pipeline replay, the system preserves request-level evidence that can later be reused for offline evaluation and comparison. This makes experiment runs easier to inspect, compare, and reason about over time.\n\nThat boundary helps separate:\n\n- online execution,\n- persisted evidence,\n- offline evaluation,\n- and aggregate reporting.\n\n---\n\n## Architecture at a glance\n\n![alt text](Documentation/img/Architecture.svg)\n### Language split\n\n- **Python** is used for parsing, chunking, and ingest workflows.\n- **Rust** is used for runtime structure, orchestration, and stronger system boundaries.\n\nThis split reflects the different engineering needs of document-processing workflows and runtime pipeline components.\n\n---\n\n## Engineering principles\n\nThe repository is organized around a small set of principles:\n\n- **Explicit boundaries** over implicit coupling\n- **Evidence preservation** over one-off inspection\n- **Comparative evaluation** over isolated results\n- **Observability as a system layer**\n- **Specifications and schemas** as tools for keeping behavior explicit\n\n---\n\n## Current scope\n\nThis project is currently optimized for:\n\n- local experimentation,\n- pipeline diagnosis,\n- retrieval and answer-quality comparison,\n- evaluation workflow design,\n- observability-driven analysis.\n\nIt is not currently positioned as a production-ready multi-tenant RAG platform.\n\n---\n\n## Why this repository may be useful\n\nThis repository may be useful if you are interested in:\n\n- RAG system design beyond the happy path,\n- evaluation-first AI engineering,\n- observability-first pipeline iteration,\n- comparing chunking, retrieval, and reranking strategies,\n- building systems that are easier to diagnose and reason about.\n\n---\n\n## Repository reading path\n\nIf you are reviewing this repository, start here:\n\n1. **[Architecture Overview](Documentation/ARCHITECTURE_OVERVIEW.md)**\n   Pipeline shape, subsystem boundaries, and main architectural ideas.\n\n2. **[Evaluation Story](Documentation/EVALUATION_STORY.md)**\n   How request capture, evaluation runs, judge stages, and reports fit together.\n\n3. **[Observability Story](Documentation/OBSERVABILITY_STORY.md)**\n   How traces, metrics, dashboards, and local infrastructure support diagnosis.\n\n4. **[Specification-First Approach](Documentation/SPECIFICATION_FIRST_APPROACH.md)**\n   Why specs, schemas, and explicit contracts are central in this project.\n\n5. **[Documentation README](Documentation/README.md)**\n   Full documentation map and recommended reading order.\n\n---\n\n## Evidence surfaces\n\nThis repository includes several places where the system’s behavior and experiment outcomes can be inspected directly:\n\n- **[Comparative experiment report](Documentation/EXPERIMENTS_20Q_COMPARATIVE_REPORT.md)** — side-by-side findings across evaluated pipeline variants\n- **[Evaluation flow](Documentation/EVALUATION_STORY.md)** — request capture, run boundaries, judge stages, and reporting flow\n- **[Observability documentation](Documentation/OBSERVABILITY_STORY.md)** — traces, dashboards, metrics, and diagnostic surfaces\n- **[Architecture overview](Documentation/ARCHITECTURE_OVERVIEW.md)** — end-to-end system shape and subsystem boundaries\n- **[Key technical decisions](Documentation/KEY_TECHNICAL_DECISIONS.md)** — the main engineering choices and their rationale\n\nThe repository is structured to support inspection, comparison, and iteration through evidence.\n\n---\n\n## Tech stack\n\n- **Rust** — runtime and orchestration components\n- **Python** — parsing, chunking, ingest, and experiment-support workflows\n- **Qdrant** — vector search\n- **Postgres** — request capture store\n- **Grafana / Tempo / Phoenix / OTEL-based tooling** — observability and analysis\n\n---\n\n## Repository structure\n\nThe repository is organized around six top-level areas:\n\n- `Execution/`: runnable code, configs, tests, launcher entrypoints, and local stack definitions\n- `Specification/`: contracts, schemas, architecture docs, and codegen-oriented source of truth\n- `Measurement/`: dashboards, observability assets, and evaluation measurement surfaces\n- `Evidence/`: datasets, run artifacts, manifests, reports, and produced outputs\n- `Documentation/`: human-oriented narrative docs for onboarding, architecture review, evaluation interpretation, observability, and project presentation\n- `AgentContext/`: agent-operating rules, multi-agent workflow conventions, and repository-specific guidance for agent-driven work\n\nThis split is deliberate.\nThe project treats execution, specification, measurement, evidence, documentation, and agent context as separate engineering concerns.\n\n---\n\n## Running the project\n\nSee **[Run From Zero](Documentation/RUN_FROM_ZERO.md)** for environment setup and local execution.\n\n---\n\n## Future directions\n\nNatural next directions for this repository include:\n\n- stronger answer-level validation metrics,\n- broader experiment matrices,\n- richer comparison dashboards,\n- larger evaluation sets,\n- deeper reranker benchmarking,\n- stronger citation and grounding validation.\n\n---\n\n## Documentation\n\nSee **[Documentation README](Documentation/README.md)** for the current documentation map and recommended reading path.\n\n---\n\n## Summary\n\nThis repository is an engineering playground for treating RAG as a system that can be:\n\n- inspected,\n- compared,\n- evaluated,\n- and improved with evidence.\n\nIts purpose is not only to produce answers, but to make pipeline behavior easier to understand.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Folesyastorchakprojects%2Frag_engineering_playground","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Folesyastorchakprojects%2Frag_engineering_playground","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Folesyastorchakprojects%2Frag_engineering_playground/lists"}