{"id":50840016,"url":"https://github.com/vimalyad/sql_optimizer_environment","last_synced_at":"2026-06-14T06:06:23.937Z","repository":{"id":349527842,"uuid":"1202610466","full_name":"vimalyad/sql_optimizer_environment","owner":"vimalyad","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-06T12:21:24.000Z","size":208,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-06T12:22:07.325Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vimalyad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-06T07:53:15.000Z","updated_at":"2026-04-06T12:21:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vimalyad/sql_optimizer_environment","commit_stats":null,"previous_names":["vimalyad/sql_optimizer_environment"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/vimalyad/sql_optimizer_environment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fsql_optimizer_environment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fsql_optimizer_environment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fsql_optimizer_environment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fsql_optimizer_environment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vimalyad","download_url":"https://codeload.github.com/vimalyad/sql_optimizer_environment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fsql_optimizer_environment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34310810,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-14T06:06:23.133Z","updated_at":"2026-06-14T06:06:23.932Z","avatar_url":"https://github.com/vimalyad.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\r\ntitle: SQL Optimizer Environment\r\nemoji: 🗃️\r\ncolorFrom: blue\r\ncolorTo: green\r\nsdk: docker\r\napp_port: 8000\r\npinned: false\r\n---\r\n\r\n# 🗃️ SQL Optimizer Environment\r\n\r\n\u003e An [OpenEnv](https://github.com/meta-pytorch/openenv) reinforcement-learning environment that teaches agents to **rewrite slow SQL into fast SQL** — verified against a real PostgreSQL database with `EXPLAIN ANALYZE`.\r\n\r\n---\r\n\r\n## ✨ What it does\r\n\r\nYou hand the environment a slow query and a Postgres connection string. The agent then has up to **N rewrite steps** to make it faster. Every rewrite is **executed against a real database**, and the reward is the actual measured speedup — no simulation, no synthetic cost models.\r\n\r\n```\r\nslow query  ──►  agent picks rewrite  ──►  EXPLAIN ANALYZE  ──►  reward = % faster\r\n                       ▲                                              │\r\n                       └──────────── new observation ◄────────────────┘\r\n```\r\n\r\nThe environment auto-discovers schema, indexes, and statistics from `pg_catalog` / `information_schema` — **no upfront schema definition required**. Point it at any Postgres DB and go.\r\n\r\n---\r\n\r\n## 🧠 The 9 actions\r\n\r\nActions split into **structural rewrites** (always available) and **hint-based rewrites** (require the `pg_hint_plan` extension). The environment hides hint actions automatically when the extension isn't installed.\r\n\r\n\u003e ⚠️ **Heads up:** Actions **1, 2, and 3** (`add_index_hint`, `add_join_order_hint`, `add_join_method_hint`) only work if the [`pg_hint_plan`](https://github.com/ossc-db/pg_hint_plan) extension is installed on your Postgres instance. Without it, these actions are stripped from `legal_actions` and only the structural rewrites (4–9) are available to the agent.\r\n\r\n| ID | Action | Requires | What it does |\r\n|----|--------|----------|--------------|\r\n| 1 | `add_index_hint` | `pg_hint_plan` | Force a specific index via `/*+ IndexScan(...) */` |\r\n| 2 | `add_join_order_hint` | `pg_hint_plan` | Force join order via `/*+ Leading(...) */` |\r\n| 3 | `add_join_method_hint` | `pg_hint_plan` | Force HashJoin / NestLoop / MergeJoin |\r\n| 4 | `push_predicate` | — | Move a `WHERE` filter into the `JOIN ON` clause |\r\n| 5 | `replace_subquery_with_join` | — | Rewrite `IN (SELECT ...)` as a `JOIN` |\r\n| 6 | `remove_redundant_join` | — | Drop a `JOIN` whose columns are never referenced |\r\n| 7 | `replace_select_star` | — | Expand `SELECT *` to only the columns needed |\r\n| 8 | `materialize_cte` | — | Add `MATERIALIZED` to a `WITH` clause |\r\n| 9 | `submit` | — | End the episode and return the final query |\r\n\r\nAdding a new action = one entry in `ACTION_REGISTRY` (`sql_optimizer/models.py`). Nothing else changes.\r\n\r\n---\r\n\r\n## 🎯 Observation, Action, State\r\n\r\nStrict typed contracts via Pydantic:\r\n\r\n```python\r\nclass SQLAction:           # what the agent sends\r\n    action_id: int\r\n    params: Dict[str, Any]\r\n\r\nclass SQLObservation:      # what the agent gets back\r\n    current_query: str\r\n    observation_vector: List[float]   # featurized plan stats\r\n    legal_actions: List[Dict]         # filtered by available extensions\r\n    explain_plan: Dict\r\n    done: bool\r\n    reward: float\r\n\r\nclass SQLState:            # full episode metadata (env.state())\r\n    original_query: str\r\n    current_query: str\r\n    baseline_time_ms: float\r\n    current_time_ms: float\r\n    rewrites_applied: List[str]\r\n    step_count: int\r\n    total_reward: float\r\n    improvement_pct: float\r\n```\r\n\r\n---\r\n\r\n## 🏗️ Architecture\r\n\r\n```\r\n┌──────────────────────────────────────────────────────┐\r\n│  Agent (your RL loop)                                │\r\n│   └── SQLOptimizerEnv  ◄── client.py (typed wrapper) │\r\n└─────────────────────┬────────────────────────────────┘\r\n                      │ HTTP (FastAPI / OpenEnv core)\r\n┌─────────────────────▼────────────────────────────────┐\r\n│  Env Server  (sql_optimizer/server/app.py)           │\r\n│   • parses \u0026 validates actions                       │\r\n│   • applies rewrite to current query                 │\r\n│   • runs EXPLAIN ANALYZE on Postgres                 │\r\n│   • computes reward + builds next observation        │\r\n└─────────────────────┬────────────────────────────────┘\r\n                      │ psycopg2\r\n┌─────────────────────▼────────────────────────────────┐\r\n│  PostgreSQL ≥ 13  (+ pg_hint_plan, optional)         │\r\n│   schema discovered live from pg_catalog             │\r\n└──────────────────────────────────────────────────────┘\r\n```\r\n\r\n---\r\n\r\n## 🛠️ Tech stack\r\n\r\n- **Python 3.13** · `pyproject.toml` + `uv` for dep mgmt\r\n- **FastAPI** + **uvicorn** — HTTP env server\r\n- **OpenEnv core** — `EnvClient` / `Action` / `Observation` / `State` base classes\r\n- **Pydantic v2** — typed contracts, automatic action validation\r\n- **psycopg2** — Postgres driver\r\n- **PostgreSQL 13+** with optional **pg_hint_plan** extension\r\n- **Docker / docker-compose** — one-command local stack (env + DB)\r\n- **Hugging Face Spaces** — cloud deployment via `openenv push`\r\n\r\n---\r\n\r\n## 🚀 Quickstart\r\n\r\n### Run locally with Docker\r\n\r\n```bash\r\ngit clone https://github.com/AJ5831A/sql_optimizer_environment\r\ncd sql_optimizer_environment\r\ndocker-compose up -d --build\r\n```\r\n\r\nThis brings up:\r\n- `sql_optimizer_db` — Postgres with `pg_hint_plan` and a sample schema preloaded\r\n- `sql_optimizer_env` — the OpenEnv server on `http://localhost:8000`\r\n\r\n### Use it from Python\r\n\r\n```python\r\nfrom sql_optimizer.client import SQLOptimizerEnv\r\nfrom sql_optimizer.models import SQLAction\r\n\r\nenv = SQLOptimizerEnv(base_url=\"http://localhost:8000\")\r\n\r\nobs = env.reset(query=\"SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE region='EU')\")\r\n\r\n# Agent picks an action from obs.legal_actions\r\nresult = env.step(SQLAction(action_id=5, params={}))   # replace_subquery_with_join\r\nprint(result.reward, result.observation.current_query)\r\n\r\nenv.step(SQLAction(action_id=9, params={}))            # submit\r\nprint(env.state().improvement_pct, \"% faster\")\r\n```\r\n\r\n---\r\n\r\n## ⚙️ Configuration\r\n\r\nAll knobs are environment variables (see `openenv.yaml`):\r\n\r\n| Var | Default | Purpose |\r\n|-----|---------|---------|\r\n| `WORKERS` | `4` | uvicorn worker processes |\r\n| `MAX_CONCURRENT_ENVS` | `100` | concurrent sessions per worker |\r\n| `QUERY_TIMEOUT_MS` | `30000` | per-query execution cap |\r\n| `MAX_STEPS` | `10` | max rewrites per episode |\r\n| `DATABASE_URL` | — | Postgres connection string |\r\n\r\n---\r\n\r\n## 📦 Project layout\r\n\r\n```\r\nsql_optimizer_environment/\r\n├── openenv.yaml              # env spec (actions, runtime, hardware)\r\n├── Dockerfile                # HF Spaces / openenv build\r\n├── docker-compose.yml        # local dev stack (env + db)\r\n├── db.Dockerfile             # Postgres + pg_hint_plan + sample schema\r\n├── client.py / models.py     # root re-exports for openenv push\r\n├── sql_optimizer/\r\n│   ├── client.py             # SQLOptimizerEnv (typed client)\r\n│   ├── models.py             # ACTION_REGISTRY + dataclasses\r\n│   ├── db.py                 # schema discovery, EXPLAIN ANALYZE runner\r\n│   └── server/\r\n│       ├── app.py            # FastAPI env server\r\n│       └── Dockerfile\r\n└── pyproject.toml\r\n```\r\n\r\n---\r\n\r\n## ☁️ Deploy to Hugging Face Spaces\r\n\r\n```bash\r\nhf auth login\r\nopenenv push --repo-id \u003cyour-username\u003e/sql-optimizer-environment\r\n```\r\n\r\nLive demo: **[huggingface.co/spaces/ILoveTemples/sql-optimizer-environment](https://huggingface.co/spaces/ILoveTemples/sql-optimizer-environment)**\r\n\r\n---\r\n\r\n## 📜 License\r\n\r\nBSD-style — see `LICENSE`.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvimalyad%2Fsql_optimizer_environment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvimalyad%2Fsql_optimizer_environment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvimalyad%2Fsql_optimizer_environment/lists"}