{"id":39196511,"url":"https://github.com/viragtripathi/crdb-sql-audit","last_synced_at":"2026-01-17T22:49:32.590Z","repository":{"id":293377771,"uuid":"983843686","full_name":"viragtripathi/crdb-sql-audit","owner":"viragtripathi","description":"CLI tool to extract, deduplicate, and analyze SQL logs for CockroachDB compatibility","archived":false,"fork":false,"pushed_at":"2025-05-21T17:25:47.000Z","size":1389,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-07T03:44:02.199Z","etag":null,"topics":["cockroach","cockroach-cloud","cockroach-database","cockroachdb","postgres","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/viragtripathi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-15T02:19:52.000Z","updated_at":"2025-07-28T15:05:56.000Z","dependencies_parsed_at":"2025-05-15T03:49:20.292Z","dependency_job_id":null,"html_url":"https://github.com/viragtripathi/crdb-sql-audit","commit_stats":null,"previous_names":["viragtripathi/crdb-sql-audit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/viragtripathi/crdb-sql-audit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viragtripathi%2Fcrdb-sql-audit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viragtripathi%2Fcrdb-sql-audit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viragtripathi%2Fcrdb-sql-audit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viragtripathi%2Fcrdb-sql-audit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/viragtripathi","download_url":"https://codeload.github.com/viragtripathi/crdb-sql-audit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viragtripathi%2Fcrdb-sql-audit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28521166,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T22:11:28.393Z","status":"ssl_error","status_checked_at":"2026-01-17T22:11:27.841Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cockroach","cockroach-cloud","cockroach-database","cockroachdb","postgres","postgresql"],"created_at":"2026-01-17T22:49:31.815Z","updated_at":"2026-01-17T22:49:32.582Z","avatar_url":"https://github.com/viragtripathi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://img.shields.io/pypi/v/crdb-sql-audit)](https://pypi.org/project/crdb-sql-audit/)\n[![Python version](https://img.shields.io/pypi/pyversions/crdb-sql-audit)](https://pypi.org/project/crdb-sql-audit/)\n[![License](https://img.shields.io/pypi/l/crdb-sql-audit)](https://pypi.org/project/crdb-sql-audit/)\n[![Build status](https://github.com/viragtripathi/crdb-sql-audit/actions/workflows/python-ci.yml/badge.svg)](https://github.com/viragtripathi/crdb-sql-audit/actions)\n[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?repo=viragtripathi/crdb-sql-audit\u0026machine=standardLinux32gb\u0026devcontainer_path=.devcontainer%2Fdevcontainer.json)\n\n# crdb-sql-audit\n\nA powerful CLI tool to extract, deduplicate, and analyze SQL logs for **CockroachDB compatibility** using a flexible, rule-based engine.\n\n## 🚀 Features\n- Works with **any SQL dialect** (PostgreSQL, MySQL, Oracle, etc.)\n- Extracts SQL and function calls using customizable search terms (e.g. `execute`, `pg_`)\n- Deduplicates repeated SQL statements from logs\n- Analyzes SQL using a **YAML-based rule engine**\n- Supports default compatibility rules (PostgreSQL ➜ CockroachDB)\n- Allows **custom rule sets** via `--rules`\n- Logs analysis output to both terminal and `crdb_sql_audit.log`\n- Automatically detects SQL statement types (e.g. SELECT, DELETE)\n- Friendly CLI with `--help` and `--version`\n- Export full reports in multiple formats:\n  - `.sql`: Deduplicated queries\n  - `.csv`: Raw compatibility issue list\n  - `.md`: Developer-friendly Markdown report\n  - `.html`: Interactive browser report with sorting/filtering\n  - `.png`: Visual bar chart of issues\n\n## 🖼 Sample Output\n\n| Report Type | Preview                                                                                                               |\n|-------------|-----------------------------------------------------------------------------------------------------------------------|\n| HTML        | ![HTML Report Screenshot](https://raw.githubusercontent.com/viragtripathi/crdb-sql-audit/main/docs/sample_report.png) |\n| Chart       | ![Bar Chart](https://raw.githubusercontent.com/viragtripathi/crdb-sql-audit/main/docs/sample_chart.png)               |\n| CSV         | ![CSV Snippet](https://raw.githubusercontent.com/viragtripathi/crdb-sql-audit/main/docs/sample_csv.png)               |\n| SQL         | ![SQL Snippet](https://raw.githubusercontent.com/viragtripathi/crdb-sql-audit/main/docs/sample_sql.png)               |\n| Markdown    | ![Markdown Snippet](https://raw.githubusercontent.com/viragtripathi/crdb-sql-audit/main/docs/sample_md.png)           |\n\n\n## 📦 Installation\n\n### Option A: Quick Install from PyPI\n\n```bash\npip install crdb-sql-audit\n```\n\n### Option B: Local Dev Install\n```bash\ngit clone https://github.com/your-org/crdb-sql-audit.git\ncd crdb-sql-audit\npython -m venv venv\nsource venv/bin/activate\npip install .\n```\n\n### Option C: Build via `pyproject.toml`\n```bash\npython -m build\npip install dist/crdb_sql_audit-0.2.0-py3-none-any.whl\n```\n\n## 🧪 Usage\n\n```bash\ncrdb-sql-audit \\\n  --dir /path/to/logs \\\n  --filters execute,pg_ \\\n  --out output/report\n```\n\nYou can also analyze a single file:\n\n```bash\ncrdb-sql-audit \\\n  --file /path/to/logfile.log \\\n  --filters SELECT,INSERT \\\n  --raw \\\n  --out output/single_file_report\n```\n\n\u003e ⚠️ You must provide either `--dir` or `--file`, but not both.\n\n### 🔧 Additional Options\n\n```bash\n--dir       Directory containing SQL log files (mutually exclusive with --file)\n--file      Single SQL log file (mutually exclusive with --dir)\n--filters     Comma-separated search keywords to extract SQL (default: 'LOG:  execute', 'pg_', 'LOG:  statement:')\n--raw       Treat each matching line as a raw SQL statement (default: False)\n--rules     Path to YAML rules file (optional, default: built-in PostgreSQL rules)\n--out       Output file prefix (default: crdb_audit_output/report)\n--debug     Enable debug-level logging\n--help      Show usage help\n--version   Show current version\n```\n\n### 📘 CLI Help Example\n\n```bash\ncrdb-sql-audit --help\n```\n\n![CLI help screenshot](https://raw.githubusercontent.com/viragtripathi/crdb-sql-audit/main/docs/cli_help.png)\n\n### Custom Rules Example\n\n```bash\ncrdb-sql-audit \\\n  --dir ./logs \\\n  --filters execute,pg_ \\\n  --rules ./rules/mysql_to_crdb.yaml \\\n  --out output/mysql_report\n```\n\n\u003e 💡 This tool supports auditing **any SQL dialect** — just provide a rule set for your source database (e.g., PostgreSQL, MySQL, Oracle).\n\n## 📁 Output\n```\noutput/\n├── report.sql          # Deduplicated SQL\n├── report.csv          # Compatibility issues\n├── report.md           # Markdown summary\n├── report.html         # Interactive dashboard\n├── report_chart.png    # Visual chart of issues\n├── crdb_sql_audit.log  # Full run log\n```\n\n## 🧹 Preparing Your Log Files\n\nTo analyze SQL logs effectively, we recommend the following preprocessing steps:\n\n### 1. Extract SQL-related Lines\n```bash\ngrep \"execute\" app.log \u003e sql_only.log\n# or to include pg_ built-in function usage:\ngrep -E \"execute|pg_\" app.log \u003e sql_only.log\n```\n\n### 2. Split Into Manageable Chunks (Optional but Recommended)\n```bash\nsplit -b 50M sql_only.log chunks/sql_chunk_\n```\n\n### 3. Run the Audit\n```bash\ncrdb-sql-audit --dir chunks --filters execute,pg_ --out output/report\n```\n\n### 🗜 Supported Log Formats\n\nThis tool automatically supports reading:\n\n* ✅ Regular `.log` or `.txt` files\n* ✅ Compressed files: `.gz`, `.xz`\n* ✅ Folders with mixed log formats\n\nYou can pass these directly using `--file` or `--dir`:\n\n```bash\ncrdb-sql-audit --file logs/app.log.gz --out output/report_from_gz\n```\n\n### 🧪 Raw Mode vs. Filtered Mode\n\nThis tool supports two modes of SQL log analysis:\n\n| Mode                  | Behavior                                                                |\n|-----------------------|-------------------------------------------------------------------------|\n| `--filters` (default) | Filters log lines using keywords like `LOG:  execute`, `pg_`, etc.      |\n| `--raw`               | Analyzes every line as a potential SQL statement — no filtering applied |\n\n\u003e ✅ Use `--raw` if you want the most complete coverage, especially for mixed-format or unknown logs.\n\u003e ⚠️ Warning: large logs + `--raw` + `--debug` may generate gigabytes of audit output.\n\n## 📚 Rule Engine Format\n\nRules are written in YAML and matched against each SQL line. Example:\n\u003e 💡 This is also the default rule if you don't provide `--rules` param.\n\n```yaml\n# postgres_to_crdb.yaml — Comprehensive CRDB Compatibility Rules based on https://www.cockroachlabs.com/docs/v25.2/sql-feature-support\n\n- id: malformed_dml_statements\n  match: '^(SELECT|INSERT|UPDATE|DELETE FROM)\\s*$'\n  message: \"Possibly malformed or incomplete SQL statement\"\n  level: warning\n  tags: [syntax]\n\n- id: special_char_in_identifier\n  match: '\"[^\\\"]*#\\w*\"'\n  message: \"Table name contains unsupported special character (#)\"\n  level: error\n  tags: [table, identifier]\n\n- id: pg_builtins\n  match: '^.*\\bpg_\\w+\\s*\\(.*$'\n  message: \"PostgreSQL pg_* function not supported in CockroachDB\"\n  level: error\n  tags: [function]\n\n- id: with_cte\n  match: '^\\s*WITH\\s+'\n  message: \"CTE (WITH clause) detected\"\n  level: warning\n  tags: [cte, syntax]\n\n- id: upsert_syntax\n  match: '^\\s*UPSERT\\s+'\n  message: \"UPSERT syntax (CockroachDB supports but should be reviewed)\"\n  level: info\n  tags: [upsert, insert]\n\n- id: json_ops\n  match: '-\u003e|-\u003e\u003e|::json[b]?'  # Look for JSON navigation or cast\n  message: \"JSON/JSONB usage detected\"\n  level: info\n  tags: [json]\n\n- id: row_values\n  match: '\\(.*\\).*IN\\s*\\('  # e.g., WHERE (a, b) IN ((1, 2))\n  message: \"ROW VALUES in IN clause\"\n  level: warning\n  tags: [rowvalues, comparison]\n\n- id: window_function\n  match: '\\bOVER\\s*\\('\n  message: \"Window function usage (e.g., RANK, ROW_NUMBER)\"\n  level: info\n  tags: [window, analytics]\n\n- id: set_ops\n  match: '\\s+(UNION|INTERSECT|EXCEPT)\\s+'\n  message: \"Set operation (UNION, INTERSECT, EXCEPT)\"\n  level: info\n  tags: [setops]\n\n- id: case_expr\n  match: '\\bCASE\\b.*\\bWHEN\\b.*\\bTHEN\\b'\n  message: \"CASE expression detected\"\n  level: info\n  tags: [case, conditional]\n\n- id: time_interval\n  match: 'INTERVAL\\s+[''\\\"]'\n  message: \"TIME INTERVAL expression\"\n  level: info\n  tags: [interval, time]\n\n- id: group_by_rollup\n  match: 'GROUP BY ROLLUP\\('\n  message: \"ROLLUP clause used\"\n  level: warning\n  tags: [aggregation, rollup]\n\n- id: filter_clause\n  match: 'FILTER\\s*\\(\\s*WHERE'\n  message: \"FILTER clause used in aggregation\"\n  level: warning\n  tags: [aggregation, filter]\n```\n\n\u003e 📦 Multiple rule sets can be created to target different SQL dialects (e.g., `postgres_to_crdb.yaml`, `mysql_to_crdb.yaml`, etc.)\n\n## 🧪 Validate Your Regex Rules\n\n### 🔍 Online (Recommended)\nUse [regex101.com](https://regex101.com/?flavor=python) to test your patterns:\n- Set the **flavor to Python**\n- Paste your rule into the regex field\n- Paste a sample SQL line into the test area\n\n### 🐍 In Python\nYou can also test your rules directly:\n```python\nimport re\npattern = re.compile(r'^.*\\bpg_\\w+\\s*\\(.*$', re.IGNORECASE)\nsql = \"SELECT pg_backend_pid()\"\nprint(bool(pattern.search(sql)))  # ✅ True\n```\n\n### 🛠 Validate with Shell\nYou can use basic Unix commands to check for patterns like pg_ functions directly in your log chunks:\n\n\n| Task                               | Command                                                                   |\n|------------------------------------|---------------------------------------------------------------------------|\n| Total matches across chunks        | `grep -oE '\\bpg_[a-zA-Z0-9_]+\\(' chunks/* \\| wc -l`                       |\n| Unique function names              | `grep -oE '\\bpg_[a-zA-Z0-9_]+\\(' chunks/* \\| sort \\| uniq`                |\n| Count occurrences of each function | `grep -oE '\\bpg_[a-zA-Z0-9_]+\\(' chunks/* \\| sort \\| uniq -c \\| sort -nr` |\n| Full SQL lines containing pg\\_\\*   | `grep -E '\\bpg_[a-zA-Z0-9_]+\\(' chunks/*`                                 |\n\nAlso, before or after running `crdb-sql-audit`, you can inspect your logs to see how often common filters appear.\n\nFor example, to count usage of PostgreSQL built-ins and log patterns:\n\n```bash\n{\n  echo \"🔍 pg_* function usage:\"\n  grep -oE '\\bpg_[a-zA-Z0-9_]+\\(' chunks/* | sort | uniq -c | sort -nr\n  echo \"\"\n  echo \"🔍 PostgreSQL LOG prefixes:\"\n  grep -oE 'LOG:  execute|LOG:  statement:|LOG:  duration:' chunks/* | sort | uniq -c | sort -nr\n}\n````\n\nThis will show counts of:\n\n* Each `pg_` function used (e.g. `pg_backend_pid(`)\n* Number of log lines using `LOG:  execute`, `LOG:  statement:`, and `LOG:  duration:`\n\n\u003e ✅ Useful for checking whether your filters (`--filters`) are likely to match anything in the input.\n\n---\n\n## 🧪 Running Tests\n\nThis project includes a test suite using sample logs and rules to validate behavior.\n\n### 🔧 To run locally:\n\n```bash\npython tests/test_runner.py\n```\n\n### 🧪 What it does:\n\n* Runs `crdb-sql-audit` on a small sample of PostgreSQL-style logs\n* Uses `tests/rules/test_rules.yaml`\n* Verifies that a CSV report is created with expected issues\n\n✅ This runs automatically in GitHub Actions on every commit to `main`.\n\n---\n\n📓 [Try it in a Jupyter notebook](notebooks/demo_crdb_sql_audit.ipynb)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviragtripathi%2Fcrdb-sql-audit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviragtripathi%2Fcrdb-sql-audit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviragtripathi%2Fcrdb-sql-audit/lists"}