{"id":33760376,"url":"https://github.com/nadeem4/nl2sql","last_synced_at":"2026-06-01T03:31:59.893Z","repository":{"id":326623796,"uuid":"1104365288","full_name":"nadeem4/nl2sql","owner":"nadeem4","description":"NL2SQL is an enterprise-grade, multi-agent NL→SQL system that delivers accurate, safe, and deterministic SQL with schema retrieval, validation, and full observability.","archived":false,"fork":false,"pushed_at":"2026-02-03T03:23:02.000Z","size":2174,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-03T17:19:36.351Z","etag":null,"topics":["chromadb","embeddings","langchain","langgraph","llm","multi-agent-systems","nl2sql","openai","rag","semantic-search","sqlalchemy","sqlglot","vector-database","vector-search"],"latest_commit_sha":null,"homepage":"https://nadeem4.github.io/nl2sql/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nadeem4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":"audit/remediation_plan.md","citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-26T05:42:18.000Z","updated_at":"2026-02-03T03:22:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nadeem4/nl2sql","commit_stats":null,"previous_names":["nadeem4/nl2sql"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nadeem4/nl2sql","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nadeem4%2Fnl2sql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nadeem4%2Fnl2sql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nadeem4%2Fnl2sql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nadeem4%2Fnl2sql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nadeem4","download_url":"https://codeload.github.com/nadeem4/nl2sql/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nadeem4%2Fnl2sql/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33759178,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromadb","embeddings","langchain","langgraph","llm","multi-agent-systems","nl2sql","openai","rag","semantic-search","sqlalchemy","sqlglot","vector-database","vector-search"],"created_at":"2025-12-05T10:01:49.458Z","updated_at":"2026-06-01T03:31:59.883Z","avatar_url":"https://github.com/nadeem4.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NL2SQL Engine\n\n\u003e **Production-grade Natural Language → SQL runtime with deterministic orchestration.**\n\nNL2SQL treats text-to-SQL as a **distributed systems** problem. The engine compiles a user query into a validated plan, executes via adapters, and aggregates results through a graph-based pipeline.\n\n---\n\n## 🧭 What you get\n\n- Graph-based orchestration (`LangGraph`) with explicit state (`GraphState`)\n- Deterministic planning and validation before SQL generation\n- Adapter-based execution with sandbox isolation\n- Observability hooks (metrics, logs, audit events)\n\n## 🏗️ System Topology\n\nThe runtime is organized around a LangGraph orchestration pipeline and supporting registries. It is designed for fault isolation and deterministic execution.\n\n```mermaid\nflowchart TD\n    User[User Query] --\u003e Resolver[DatasourceResolverNode]\n    Resolver --\u003e Decomposer[DecomposerNode]\n    Decomposer --\u003e Planner[GlobalPlannerNode]\n    Planner --\u003e Router[Layer Router]\n\n    subgraph SQLAgent[\"SQL Agent Subgraph\"]\n        Schema[SchemaRetrieverNode] --\u003e AST[ASTPlannerNode]\n        AST --\u003e|ok| Logical[LogicalValidatorNode]\n        AST --\u003e|retry| Retry[retry_node]\n        Logical --\u003e|ok| Generator[GeneratorNode]\n        Logical --\u003e|retry| Retry\n        Generator --\u003e Executor[ExecutorNode]\n        Retry --\u003e Refiner[RefinerNode]\n        Refiner --\u003e AST\n    end\n\n    Router --\u003e Schema\n    Executor --\u003e Router\n    Router --\u003e Aggregator[EngineAggregatorNode]\n    Aggregator --\u003e Synthesizer[AnswerSynthesizerNode]\n```\n\n### 1. The Control Plane (The Graph)\n\n**Responsibility**: Reasoning, Planning, and Orchestration.\n\n* **Agentic Graph**: Implemented as a Directed Cyclic Graph (LangGraph) to enable refinement loops. If a plan fails validation, the system self-corrects.\n* **State Management**: Shared `GraphState` ensures auditability and reproducibility of every decision.\n\n### 2. The Security Plane (The Firewall)\n\n**Responsibility**: Invariants Enforcement.\n\n* **Valid-by-Construction**: The LLM generates an **Abstract Syntax Tree (AST)** rather than executing SQL.\n* **Static Analysis**: The [Logical Validator](docs/agents/nodes.md) enforces RBAC and schema constraints before SQL generation.\n\n### 3. The Data Plane (The Sandbox)\n\n**Responsibility**: Semantic Search and Execution.\n\n* **Blast Radius Isolation**: SQL drivers run in a dedicated **[Sandboxed Process Pool](docs/adr/adr-001-sandboxed-execution.md)**. A segfault in a driver kills a disposable worker, not the Agent.\n* **Partitioned Retrieval**: The [Schema Store + Retrieval](docs/schema/store.md) flow injects relevant schema context, preventing context window overflow.\n\n### 4. The Reliability Plane (The Guard)\n\n**Responsibility**: Fault Tolerance and Stability.\n\n* **Layered Defense**: A combination of **[Circuit Breakers](docs/observability/error-handling.md)** and **[Sandboxing](docs/execution/sandbox.md)** keeps the system stable during outages.\n* **Fail-Fast**: We stop processing immediately if a dependency is unresponsive, preserving resources.\n\n### 5. The Observability Plane (The Watchtower)\n\n**Responsibility**: Visibility, Forensics, and Compliance.\n\n* **Full-Stack Telemetry**: Native [OpenTelemetry](docs/observability/stack.md) integration provides distributed tracing (Jaeger) and metrics (Prometheus) for every node execution.\n* **Forensic Audit Logs**: A persistent [Audit Log](docs/observability/stack.md) records AI decisions for compliance and debugging.\n\n---\n\n## 📐 Architectural Invariants\n\n| Invariant | Rationale | Mechanism |\n| :--- | :--- | :--- |\n| **No Unvalidated SQL** | Prevent hallucinations \u0026 data leaks | All plans pass through `LogicalValidator` (AST). `PhysicalValidator` exists but is not wired into the default SQL subgraph. |\n| **Zero Shared State** | Crash Safety | Execution happens in isolated processes; no shared memory with the Control Plane. |\n| **Fail-Fast** | Reliability | Circuit Breakers and Strict Timeouts prevent cascading failures (Retry Storms). |\n| **Determinism** | Debuggability | Temperature-0 generation + Strict Typing (Pydantic) for all LLM outputs. |\n\n---\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n* Python 3.10+\n* A configured datasource (`configs/datasources.yaml`)\n* A configured LLM (`configs/llm.yaml`)\n\n### 1. Installation\n\n```bash\n# Install core only\npip install nl2sql-core\n\n# Install core with selected adapters\npip install nl2sql-core[mysql,mssql]\n\n# Install core with all adapters\npip install nl2sql-core[all]\n```\n\nFor local development:\n\n```bash\ngit clone https://github.com/nadeem4/nl2sql.git\ncd nl2sql\n\n# Set up environment\npython -m venv venv\nsource venv/bin/activate\n\n# Install core engine and adapter SDK\npip install -e packages/core\npip install -e packages/adapter-sdk\n```\n\n### 2. Run a query (Python API)\n\n```python\nfrom nl2sql.context import NL2SQLContext\nfrom nl2sql.pipeline.runtime import run_with_graph\n\nctx = NL2SQLContext()\nresult = run_with_graph(ctx, \"Top 5 customers by revenue last quarter?\")\n\nprint(result.get(\"final_answer\"))\n```\n\n## 🧪 Demo data (CLI-only)\n\nUse the CLI to generate deterministic demo data and configs, then point the API at the generated files.\n\n1. Generate demo data + configs:\n\n```bash\nnl2sql setup --demo --lite\n```\n\n2. Start the API with demo settings:\n\n```bash\n# Option A: load .env.demo via ENV\nENV=demo uvicorn nl2sql_api.main:app\n\n# Option B: load a specific env file\nENV_FILE_PATH=.env.demo uvicorn nl2sql_api.main:app\n```\n\nThe demo datasource file uses relative paths (e.g. `data/demo_lite/*.db`), so start the API from the repo root.\n\n## 🔖 Versioning Policy\n\nNL2SQL uses unified versioning across the monorepo. Core, adapters, API, and CLI\nshare the same version number and are released together. Internal dependencies\nare pinned to the same version to avoid mismatches.\n\n## 📚 Documentation\n\n- **[System Architecture](docs/architecture/high-level.md)**: runtime topology and core flows\n- **[Agent Nodes](docs/agents/nodes.md)**: node-by-node specs and responsibilities\n- **[Schema Store + Retrieval](docs/schema/store.md)**: schema snapshots and vector retrieval\n- **[Execution Sandbox](docs/execution/sandbox.md)**: process isolation and failures\n- **[Observability](docs/observability/stack.md)**: metrics, logging, audit events\n  \n\n---\n\n## 📦 Repository Structure\n\n```text\npackages/\n├── core/               # The Engine (Graph, State, Logic)\n├── adapter-sdk/        # Interface Contract for new Databases\n└── adapters/           # Official Dialects (Postgres, MSSQL, MySQL)\nconfigs/                # Runtime Configuration (Policies, Prompts)\ndocs/                   # Architecture \u0026 Operations Manual\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnadeem4%2Fnl2sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnadeem4%2Fnl2sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnadeem4%2Fnl2sql/lists"}