https://github.com/alexott/pysigma-backend-databricks
pySigma Databricks backend
https://github.com/alexott/pysigma-backend-databricks
cybersecurity databricks sigma spark
Last synced: 6 months ago
JSON representation
pySigma Databricks backend
- Host: GitHub
- URL: https://github.com/alexott/pysigma-backend-databricks
- Owner: alexott
- License: mit
- Created: 2022-09-17T17:56:10.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2025-08-03T09:45:29.000Z (11 months ago)
- Last Synced: 2025-08-03T11:23:27.788Z (11 months ago)
- Topics: cybersecurity, databricks, sigma, spark
- Language: Python
- Homepage:
- Size: 170 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README



Status: **experimental**, work in progress:
* Although `cidrmatch` is generated, you still need to provide corresponding function as UDF (I'll add example later)
* Requires more testing
# pySigma Databricks Backend
This is the Databricks backend for pySigma. It provides the package `sigma.backends.databricks` with the `DatabricksBackend` class.
Further, it contains the following processing pipelines in `sigma.pipelines.databricks`:
* `snake_case`: convert column names into snake case format
It supports the following output formats:
* default: plain Databricks/Apache Spark SQL queries
* dbsql: Databricks SQL queries with rules metadata (title, status) embedded as comment
* detection_yaml: Yaml markup for my own detection framework
## Unbound Keyword Search
The backend supports Sigma rules with unbound keywords (values without field names). These keywords search the raw log line.
### Configuration
By default, the backend looks for keywords in a field named `raw`. You can customize this:
**Command Line:**
```bash
sigma convert -t databricks -O raw_log_field=message rule.yml
```
**Programmatic:**
```python
from sigma.backends.databricks import DatabricksBackend
backend = DatabricksBackend(raw_log_field="event_data")
```
### Examples
**Simple Keywords (OR logic):**
```yaml
detection:
keywords:
- 'EVILSERVICE'
- 'svchost.exe -n evil'
condition: keywords
```
Generates: `contains(lower(raw), lower('EVILSERVICE')) OR contains(lower(raw), lower('svchost.exe -n evil'))`
**Keywords with |all (AND logic):**
```yaml
detection:
keywords:
'|all':
- 'Remove-MailboxExportRequest'
- ' -Identity '
condition: keywords
```
Generates: `contains(lower(raw), lower('Remove-MailboxExportRequest')) AND contains(lower(raw), lower(' -Identity '))`
**Mixed with Field Conditions:**
```yaml
detection:
selection:
EventID: 4688
keywords:
- 'mimikatz'
condition: selection and keywords
```
Generates: `EventID = 4688 AND contains(lower(raw), lower('mimikatz'))`
**Wildcards in Keywords:**
```yaml
detection:
keywords:
- '*malware*' # uses contains()
- 'cmd.exe*' # uses startswith()
- '*.dll' # uses endswith()
condition: keywords
```
**Regex Patterns:**
```yaml
detection:
keywords:
- '|re': '.*evil(cmd|powershell).*'
condition: keywords
```
Generates: `raw rlike '.*evil(cmd|powershell).*'`
## Maintainer
This backend is currently maintained by:
* [Alex Ott](https://github.com/alexott/)