{"id":17308313,"url":"https://github.com/alexott/pysigma-backend-databricks","last_synced_at":"2026-01-04T15:15:48.204Z","repository":{"id":66695417,"uuid":"537889102","full_name":"alexott/pySigma-backend-databricks","owner":"alexott","description":"pySigma Databricks backend ","archived":false,"fork":false,"pushed_at":"2025-08-03T09:45:29.000Z","size":174,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-03T11:23:27.788Z","etag":null,"topics":["cybersecurity","databricks","sigma","spark"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alexott.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-09-17T17:56:10.000Z","updated_at":"2025-08-03T09:45:32.000Z","dependencies_parsed_at":"2024-10-27T10:27:33.381Z","dependency_job_id":"65a6fadd-9248-455f-a2d2-2f36ff9325e5","html_url":"https://github.com/alexott/pySigma-backend-databricks","commit_stats":{"total_commits":35,"total_committers":3,"mean_commits":"11.666666666666666","dds":"0.11428571428571432","last_synced_commit":"23c4be59c64c9f693f52a540a75f9eeb17133359"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/alexott/pySigma-backend-databricks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexott%2FpySigma-backend-databricks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexott%2FpySigma-backend-databricks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexott%2FpySigma-backend-databricks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexott%2FpySigma-backend-databricks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alexott","download_url":"https://codeload.github.com/alexott/pySigma-backend-databricks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexott%2FpySigma-backend-databricks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271430735,"owners_count":24758361,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cybersecurity","databricks","sigma","spark"],"created_at":"2024-10-15T12:04:27.755Z","updated_at":"2026-01-04T15:15:48.199Z","avatar_url":"https://github.com/alexott.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Tests](https://github.com/alexott/databricks-sigma-backend/actions/workflows/test.yml/badge.svg)\n![Coverage Badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/alexott/GitHub Gist identifier containing coverage badge JSON expected by shields.io./raw/alexott-databricks-sigma-backend.json)\n![Status](https://img.shields.io/badge/Status-pre--release-orange)\n\nStatus: **experimental**, work in progress:\n\n* Although `cidrmatch` is generated, you still need to provide corresponding function as UDF (I'll add example later)\n* Requires more testing\n\n# pySigma Databricks Backend\n\nThis is the Databricks backend for pySigma. It provides the package `sigma.backends.databricks` with the `DatabricksBackend` class.\nFurther, it contains the following processing pipelines in `sigma.pipelines.databricks`:\n\n* `snake_case`: convert column names into snake case format\n\nIt supports the following output formats:\n\n* default: plain Databricks/Apache Spark SQL queries\n* dbsql: Databricks SQL queries with rules metadata (title, status) embedded as comment\n* detection_yaml: Yaml markup for my own detection framework\n\n## Unbound Keyword Search\n\nThe backend supports Sigma rules with unbound keywords (values without field names). These keywords search the raw log line.\n\n### Configuration\n\nBy default, the backend looks for keywords in a field named `raw`. You can customize this:\n\n**Command Line:**\n```bash\nsigma convert -t databricks -O raw_log_field=message rule.yml\n```\n\n**Programmatic:**\n```python\nfrom sigma.backends.databricks import DatabricksBackend\n\nbackend = DatabricksBackend(raw_log_field=\"event_data\")\n```\n\n### Examples\n\n**Simple Keywords (OR logic):**\n```yaml\ndetection:\n    keywords:\n        - 'EVILSERVICE'\n        - 'svchost.exe -n evil'\n    condition: keywords\n```\nGenerates: `contains(lower(raw), lower('EVILSERVICE')) OR contains(lower(raw), lower('svchost.exe -n evil'))`\n\n**Keywords with |all (AND logic):**\n```yaml\ndetection:\n    keywords:\n        '|all':\n            - 'Remove-MailboxExportRequest'\n            - ' -Identity '\n    condition: keywords\n```\nGenerates: `contains(lower(raw), lower('Remove-MailboxExportRequest')) AND contains(lower(raw), lower(' -Identity '))`\n\n**Mixed with Field Conditions:**\n```yaml\ndetection:\n    selection:\n        EventID: 4688\n    keywords:\n        - 'mimikatz'\n    condition: selection and keywords\n```\nGenerates: `EventID = 4688 AND contains(lower(raw), lower('mimikatz'))`\n\n**Wildcards in Keywords:**\n```yaml\ndetection:\n    keywords:\n        - '*malware*'      # uses contains()\n        - 'cmd.exe*'       # uses startswith()\n        - '*.dll'          # uses endswith()\n    condition: keywords\n```\n\n**Regex Patterns:**\n```yaml\ndetection:\n    keywords:\n        - '|re': '.*evil(cmd|powershell).*'\n    condition: keywords\n```\nGenerates: `raw rlike '.*evil(cmd|powershell).*'`\n\n## Maintainer\n\nThis backend is currently maintained by:\n\n* [Alex Ott](https://github.com/alexott/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexott%2Fpysigma-backend-databricks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexott%2Fpysigma-backend-databricks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexott%2Fpysigma-backend-databricks/lists"}