{"id":47686295,"url":"https://github.com/databricks-solutions/databricks-genie-workbench","last_synced_at":"2026-04-08T07:01:33.518Z","repository":{"id":347718316,"uuid":"1171016490","full_name":"databricks-solutions/databricks-genie-workbench","owner":"databricks-solutions","description":"Genie Workbench is a unified developer tool for creating, scoring, and optimizing Databricks Genie Spaces.","archived":false,"fork":false,"pushed_at":"2026-04-03T00:34:35.000Z","size":7569,"stargazers_count":4,"open_issues_count":20,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-03T02:23:37.191Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks-solutions.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS.txt","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE.md","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-02T19:30:46.000Z","updated_at":"2026-04-02T18:34:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/databricks-solutions/databricks-genie-workbench","commit_stats":null,"previous_names":["databricks-solutions/databricks-genie-workbench"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/databricks-solutions/databricks-genie-workbench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fdatabricks-genie-workbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fdatabricks-genie-workbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fdatabricks-genie-workbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fdatabricks-genie-workbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks-solutions","download_url":"https://codeload.github.com/databricks-solutions/databricks-genie-workbench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fdatabricks-genie-workbench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31544087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"online","status_checked_at":"2026-04-08T02:00:06.127Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-02T14:51:33.051Z","updated_at":"2026-04-08T07:01:33.512Z","avatar_url":"https://github.com/databricks-solutions.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Genie Workbench\n\nGenie Workbench is a unified developer tool for creating, scoring, and optimizing Databricks Genie Spaces. The tool helps Genie developers by:\n\n* Creating Genie spaces from scratch using an agent that gathers business logic, profiles data sources, and generates the initial configuration\n* Scoring space quality on a 0-100 rubric across categorized best-practice dimensions with a four-stage maturity model\n* Optimizing configurations through a benchmark-driven loop that compares Genie's generated SQL against expected answers and automatically recommends improvements\n* Tracking history of every configuration change and score over time, stored in Lakebase\n* Versioning and rollback of Genie space configurations, which Genie does not natively support\n* Managing multiple spaces across projects and stakeholders from a single dashboard\n* Providing scientific proof of lift via MLflow experiment tracking on every benchmark run\n\n## Architecture\n\nThe app is a FastAPI backend serving a React/Vite frontend, deployed as a [Databricks App](https://docs.databricks.com/aws/en/dev-tools/databricks-apps/). User identity flows via OBO (On-Behalf-Of) auth so each user operates under their own Databricks permissions. Score history and session state are persisted in Lakebase (PostgreSQL).\n\n## Prerequisites\n\n* [Databricks CLI](https://docs.databricks.com/dev-tools/cli/install.html) (v0.239.0+ required)\n* [uv](https://docs.astral.sh/uv/) — Python package manager (used for dependency management and hash-verified installs)\n* Node.js (18+ recommended) and npm\n* Python 3.11+\n* A Databricks workspace with:\n  * Apps enabled\n  * A SQL Warehouse (Serverless recommended)\n  * A Unity Catalog with CREATE SCHEMA permission\n  * MLflow Prompt Registry enabled (required for Auto-Optimize judge prompt traceability)\n\n## Quick Start\n\n### 1. Clone the repo\n\n```bash\ngit clone \u003crepo-url\u003e\ncd databricks-genie-workbench\n```\n\n### 2. Authenticate with Databricks CLI\n\n```bash\ndatabricks auth login --profile \u003cworkspace-profile\u003e\n```\n\n\u003e **Do NOT run `databricks bundle init`** — it overwrites the project configuration. The deploy scripts handle everything.\n\n### 3. Run the guided installer\n\n```bash\n./scripts/install.sh\n```\n\nThe installer will:\n1. Check prerequisites (CLI, Node, Python, npm, uv)\n2. Ask for your Databricks CLI profile\n3. Ask for catalog (auto-discovered from your workspace)\n4. Ask for SQL warehouse (auto-discovered from your workspace)\n5. Ask for LLM model endpoint\n6. Optionally configure MLflow tracing (creates or links an experiment)\n7. Ask for Lakebase instance name\n8. Ask for app name\n9. Write `.env.deploy` with your configuration\n10. Run `scripts/deploy.sh` to build and deploy the app\n11. Resolve the app's service principal\n12. Optionally grant the SP access to your existing Genie Spaces\n\n### 4. Attach Lakebase (optional but recommended)\n\nWithout Lakebase, scan results and starred spaces are lost on app restart.\n\n\u003e **Note:** If you used `install.sh`, it already collected your Lakebase instance name (stored as `GENIE_LAKEBASE_INSTANCE` in `.env.deploy`). You still need to attach the resource manually in the Apps UI as described below.\n\n**Create a Lakebase instance** (if you don't have one):\n1. In the workspace UI, go to **Catalog → Lakebase** (or **SQL → Lakebase**)\n2. Click **Create** → name it (e.g. `genie-workbench`), capacity **CU_1**\n\n**Grant the app's SP access:**\n1. Go to **Databases → your instance → Roles**\n2. Find the app's SP (e.g. `app-xxxx genie-workbench-v0`) and grant **CREATEDB** attribute\n3. Go to **Databases → your instance → Permissions** and grant the SP **Can manage**\n\n**Attach to your app:**\n1. Open **Databricks Apps UI** → your app → **Resources**\n2. Click **+ Add resource** → **PostgreSQL (Lakebase)** → select your instance\n3. Set resource key to `postgres` with **CAN_CONNECT_AND_CREATE** permission\n4. Save and **redeploy** — the app creates a `genie` schema and all tables automatically\n\nThe app stores data in the `genie` schema within the `databricks_postgres` database. Tables: `scan_results`, `starred_spaces`, `seen_spaces`, `optimization_runs`, `agent_sessions`.\n\n## Manual Setup (without installer)\n\nIf you prefer non-interactive setup:\n\n### 1. Create `.env.deploy` in the project root\n\n```bash\ncat \u003e .env.deploy \u003c\u003c'EOF'\nGENIE_WAREHOUSE_ID=\u003cyour-sql-warehouse-id\u003e\nGENIE_CATALOG=\u003cyour-catalog-name\u003e\nGENIE_APP_NAME=genie-workbench\nGENIE_DEPLOY_PROFILE=genie-workbench\nGENIE_LLM_MODEL=databricks-claude-sonnet-4-6\nGENIE_LAKEBASE_INSTANCE=genie-workbench\nEOF\n```\n\n### 2. Deploy\n\n```bash\n./scripts/deploy.sh\n```\n\n### Configuration Reference\n\nSet these in `.env.deploy` or as environment variables:\n\n| Variable | Required | Default | Description |\n|---|---|---|---|\n| `GENIE_WAREHOUSE_ID` | Yes | — | SQL Warehouse ID (hex string from warehouse URL or detail page) |\n| `GENIE_CATALOG` | Yes | — | Unity Catalog name (you need CREATE SCHEMA permission) |\n| `GENIE_APP_NAME` | No | `genie-workbench` | Databricks App name (must be unique in your workspace) |\n| `GENIE_DEPLOY_PROFILE` | No | `DEFAULT` | Databricks CLI profile name |\n| `GENIE_LLM_MODEL` | No | `databricks-claude-sonnet-4-6` | LLM serving endpoint for analysis |\n| `GENIE_LAKEBASE_INSTANCE` | No | `\u003capp-name\u003e` | Lakebase instance name (patched into `app.yaml` at deploy) |\n\n## Deploy Commands\n\n```bash\n./scripts/deploy.sh                           # Full deploy: create app, sync code, configure, deploy\n./scripts/deploy.sh --update                  # Code-only update: sync + redeploy (faster)\n./scripts/deploy.sh --destroy                 # Tear down app and clean up jobs\n./scripts/deploy.sh --destroy --auto-approve  # Tear down without confirmation prompt\n```\n\n### What `--destroy` cleans up (and what it doesn't)\n\n`--destroy` deletes the Databricks App, runtime-created jobs, and the bundle-managed optimization job. It does **not** remove:\n- Lakebase data (the `genie` schema in `databricks_postgres`)\n- Unity Catalog schema/tables (`\u003ccatalog\u003e.genie_space_optimizer` and its 8 tables)\n- Genie Space SP permissions granted during install\n- MLflow experiments created during install\n- Synced tables (if manually created)\n\nClean these up manually if you want a full teardown.\n\n### What `deploy.sh` does\n\n**Full deploy (8 steps):**\n\n1. **Pre-flight checks** — validates tools, CLI profile, warehouse, catalog, app state\n2. **Build frontend** — `npm ci` + `npm run build`\n3. **Create app** — `databricks apps create` (skipped if app already exists)\n4. **Sync files** — `databricks sync --full` + explicit `frontend/dist/` upload\n5. **Grant UC permissions** — resolves app SP, creates GSO schema/tables, grants SP access, enables CDF\n6. **Set up optimization job** — builds GSO wheel, uploads notebooks, creates/finds the Databricks job, grants SP CAN_MANAGE\n7. **Redeploy app** — patches `app.yaml` with config values, configures scopes, deploys\n8. **Verify** — checks critical files, waits for deployment to succeed\n\n**Code update** (`--update`) skips step 3 (app creation) — use for iterating on code changes.\n\n### Typical workflow\n\n```bash\n# First time\n./scripts/deploy.sh\n\n# After code changes\n./scripts/deploy.sh --update\n\n# Tear down\n./scripts/deploy.sh --destroy\n```\n\n## Auto-Optimize (GSO Package)\n\nThe Auto-Optimize optimization job is created automatically during deploy. The deploy script builds the GSO wheel, uploads job notebooks, and creates the Databricks job — no separate deployment step needed.\n\nIf the job already exists (from a previous deploy), it is reused. To force recreation, delete the job in the Databricks UI and re-run `./scripts/deploy.sh --update`.\n\n## Post-Deploy: Genie Space Access\n\nThe app uses On-Behalf-Of (OBO) auth — users see only Genie Spaces they have permission to manage. The app's service principal also needs access for fallback operations:\n\n1. The installer grants SP access to your existing Genie Spaces\n2. For spaces created after install, share them with the app's service principal (CAN_MANAGE)\n3. The SP needs SELECT on schemas referenced by your Genie Spaces:\n   ```sql\n   GRANT SELECT ON SCHEMA \u003ccatalog\u003e.\u003cschema\u003e TO `\u003cservice-principal-name\u003e`\n   ```\n\n## Troubleshooting\n\n| Symptom | Cause | Fix |\n|---|---|---|\n| App shows blank page | `frontend/dist/` missing (gitignored) | Re-run `./scripts/deploy.sh --update` |\n| `Could not import module \"backend.main\"` | Source files missing on workspace | Re-run `./scripts/deploy.sh --update` (full-sync uploads everything) |\n| `No dependencies file found` | `requirements.txt` not on workspace | Same — `./scripts/deploy.sh --update` |\n| \"Failed to list spaces\" | Lakebase not attached | Attach a `postgres` resource in Apps UI (see step 4 above) |\n| `Catalog 'X' is not accessible` | Wrong catalog or missing permissions | `databricks catalogs list --profile \u003cprofile\u003e` |\n| `Invalid SQL warehouse resource` | Warehouse doesn't exist or no CAN_USE | `databricks warehouses list --profile \u003cprofile\u003e` |\n| `Maximum number of apps` | Workspace hit the 300-app limit | Delete unused apps |\n| Auto-Optimize fails at \"Baseline Evaluation\" with `FEATURE_DISABLED` | Prompt Registry not enabled on workspace | Contact workspace admin to enable MLflow Prompt Registry |\n| Unresolved `__GSO_*__` placeholders | deploy.sh couldn't patch `app.yaml` | Ensure `GENIE_CATALOG` is set; check deploy output for warnings |\n| GSO job creation fails during deploy | Bundle deploy failed (CLI version, auth, or build issue) | Check `databricks bundle deploy -t app` output; ensure CLI \u003e= 0.239.0 and `pip install build` |\n| Notebook upload fails (`RESOURCE_DOES_NOT_EXIST`) | `/Workspace/Shared/` not writable by deployer | Check workspace-level permissions on the upload path |\n\n\u003e **Note on MLflow tracing:** The `MLFLOW_EXPERIMENT_ID` in `app.yaml` is workspace-specific. The app validates it at startup and silently disables tracing if the experiment doesn't exist in your workspace. To enable tracing, create an MLflow experiment and update the value in `app.yaml` before deploying.\n\n**Debug commands:**\n\n```bash\n# View app logs\ndatabricks apps logs \u003capp-name\u003e --profile \u003cprofile\u003e\n\n# Check app status\ndatabricks apps get \u003capp-name\u003e --profile \u003cprofile\u003e\n\n# List workspace files to verify sync\ndatabricks workspace list /Workspace/Users/\u003cemail\u003e/\u003capp-name\u003e/backend --profile \u003cprofile\u003e\n```\n\n## Dependency Security\n\nAll dependencies are pinned to exact versions to guard against supply chain attacks\n(e.g. [CVE-2026-33634 / TeamPCP](https://www.kaspersky.com/blog/critical-supply-chain-attack-trivy-litellm-checkmarx-teampcp/55510/),\nwhich targeted unpinned PyPI packages and GitHub Action tags).\n\n### Lock files (always commit these)\n\n| File | Covers | Tool |\n|---|---|---|\n| `uv.lock` | All root Python transitive deps with SHA256 hashes | uv |\n| `packages/genie-space-optimizer/uv.lock` | GSO Python deps with SHA256 hashes | uv |\n| `frontend/package-lock.json` | All frontend npm deps with SHA-512 integrity hashes | npm |\n| `packages/genie-space-optimizer/bun.lock` | GSO UI deps | bun |\n\n### Updating Python dependencies\n\n```bash\n# Upgrade one package (resolves latest compatible, updates uv.lock with new hashes)\nuv lock --upgrade-package \u003cpackage-name\u003e\n\n# Regenerate requirements.txt from the updated lock file\nuv export --frozen --no-dev --no-hashes --format requirements-txt \\\n  | grep -v \"^-e \" \u003e requirements.txt\necho \"-e ./packages/genie-space-optimizer\" \u003e\u003e requirements.txt\n\n# Commit both\ngit add uv.lock requirements.txt\n```\n\n\u003e **Do not edit `requirements.txt` manually.** It is generated from `uv.lock` and\n\u003e includes all transitive dependencies pinned to exact `==` versions. The generation\n\u003e command is documented at the top of the file.\n\n### Updating npm dependencies\n\n```bash\ncd frontend\nnpm install \u003cpackage\u003e@\u003cnew-version\u003e   # resolves and updates package-lock.json\n# Then update package.json to exact version (remove the ^ prefix)\ngit add package.json package-lock.json  # always commit both together\n```\n\n### Why `npm ci` instead of `npm install` in deploys\n\n`scripts/deploy.sh` uses `npm ci` for the frontend build step. Unlike `npm install`,\n`npm ci`:\n- Reads `package-lock.json` as the single source of truth (never updates it)\n- Verifies SHA-512 integrity hashes for every installed package\n- Fails loudly if `package.json` and `package-lock.json` are out of sync\n\nIf you update `frontend/package.json`, always run `npm install` locally to regenerate\n`package-lock.json`, then commit both files.\n\n## How to Get Help\n\nDatabricks support doesn't cover this content. For questions or bugs, please open a GitHub issue and the team will help on a best effort basis.\n\n## License\n\n\u0026copy; 2025 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.\n\n| library | description | license | source |\n|---|---|---|---|\n| asyncpg | Fast PostgreSQL client for asyncio | Apache-2.0 | https://pypi.org/project/asyncpg/ |\n| class-variance-authority | CSS class name composition utility | Apache-2.0 | https://github.com/joe-bell/cva |\n| clsx | Utility for constructing className strings | MIT | https://github.com/lukeed/clsx |\n| databricks-sdk | Databricks SDK for Python | Apache-2.0 | https://pypi.org/project/databricks-sdk/ |\n| fastapi | Modern async web framework for APIs | MIT | https://pypi.org/project/fastapi/ |\n| httpx | Async/sync HTTP client | BSD-3-Clause | https://pypi.org/project/httpx/ |\n| lucide-react | Icon library for React | ISC | https://github.com/lucide-icons/lucide |\n| mlflow | ML experiment tracking and model registry | Apache-2.0 | https://pypi.org/project/mlflow/ |\n| pandas | Data manipulation and analysis | BSD-3-Clause | https://pypi.org/project/pandas/ |\n| prism-react-renderer | Syntax highlighting with Prism for React | MIT | https://github.com/FormidableLabs/prism-react-renderer |\n| psycopg | PostgreSQL database adapter (v3) | LGPL-3.0 | https://pypi.org/project/psycopg/ |\n| pydantic | Data validation using Python type hints | MIT | https://pypi.org/project/pydantic/ |\n| pydantic-settings | Settings management with Pydantic | MIT | https://pypi.org/project/pydantic-settings/ |\n| python-dotenv | Load environment variables from .env files | BSD-3-Clause | https://pypi.org/project/python-dotenv/ |\n| pyyaml | YAML parser and emitter | MIT | https://pypi.org/project/PyYAML/ |\n| react | Library for building user interfaces | MIT | https://github.com/facebook/react |\n| react-diff-viewer-continued | Text diff viewer component for React | MIT | https://github.com/aeolun/react-diff-viewer-continued |\n| react-dom | React DOM rendering | MIT | https://github.com/facebook/react |\n| react-markdown | Render Markdown as React components | MIT | https://github.com/remarkjs/react-markdown |\n| recharts | Charting library for React | MIT | https://github.com/recharts/recharts |\n| remark-gfm | GitHub Flavored Markdown support for remark | MIT | https://github.com/remarkjs/remark-gfm |\n| requests | HTTP library for Python | Apache-2.0 | https://pypi.org/project/requests/ |\n| sql-formatter | SQL query formatter | MIT | https://github.com/sql-formatter-org/sql-formatter |\n| sqlglot | SQL parser, transpiler, and optimizer | MIT | https://pypi.org/project/sqlglot/ |\n| sqlmodel | SQL databases with Python and Pydantic | MIT | https://pypi.org/project/sqlmodel/ |\n| tailwind-merge | Merge Tailwind CSS classes without conflicts | MIT | https://github.com/dcastil/tailwind-merge |\n| uvicorn | ASGI web server | BSD-3-Clause | https://pypi.org/project/uvicorn/ |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks-solutions%2Fdatabricks-genie-workbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks-solutions%2Fdatabricks-genie-workbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks-solutions%2Fdatabricks-genie-workbench/lists"}