Databricks TruffleHog Scan Detected

Status: Experimental
Severity: medium
Log types: Databricks.Audit
Tags: Databricks, Collection, Credential Access
Reference: https://github.com/databricks-solutions/cybersec-workspace-detection-app/blob/main/base/detections/event-based/trufflehog_scan_detected.py
Source: github.com/panther-labs/panther-analysis

Detects TruffleHog secret scanning activity in Databricks. TruffleHog is a tool used to scan repositories and systems for exposed credentials and secrets. While it can be used legitimately for security audits, unauthorized scanning may indicate credential harvesting attempts. External IP sources are elevated to HIGH severity.

MITRE ATT&CK coverage

Tactic	Techniques
Credential Access	`T1552` Unsecured Credentials
Collection	`T1213` Data from Information Repositories

Rule body yaml

AnalysisType: rule
Filename: databricks_trufflehog_scan_detected.py
RuleID: "Databricks.Audit.TrufflehogScanDetected"
DisplayName: "Databricks TruffleHog Scan Detected"
Enabled: true
Status: Experimental
LogTypes:
  - Databricks.Audit
Tags:
  - Databricks
  - Collection
  - Credential Access
Reports:
  MITRE ATT&CK:
    - TA0006:T1552 # Unsecured Credentials
    - TA0009:T1213 # Data from Information Repositories
Severity: Medium
Description: >
  Detects TruffleHog secret scanning activity in Databricks. TruffleHog is a tool used to scan
  repositories and systems for exposed credentials and secrets. While it can be used legitimately
  for security audits, unauthorized scanning may indicate credential harvesting attempts. External
  IP sources are elevated to HIGH severity.
Runbook: |
  1. Query audit logs for all secret access attempts (getSecret action) by this user in the 24 hours before and after the TruffleHog scan
  2. Check if the source IP (sourceIPAddress) matches known security scanning tools or is from an unexpected geographic location
  3. Find all other unusual secret access patterns from this IP or user in the past 7 days
Reference: https://github.com/databricks-solutions/cybersec-workspace-detection-app/blob/main/base/detections/event-based/trufflehog_scan_detected.py
Tests:
  - Name: TruffleHog Scan from External IP
    ExpectedResult: true
    Log:
      timestamp: 1234567890000
      serviceName: "accounts"
      actionName: "login"
      sourceIPAddress: "203.0.113.50"
      userAgent: "TruffleHog/3.0"
      userIdentity:
        email: "scanner@external.com"
      requestParams:
        tokenId: "token-123"
  - Name: TruffleHog Scan from Internal IP
    ExpectedResult: true
    Log:
      timestamp: 1234567890000
      serviceName: "workspace"
      actionName: "listSecrets"
      sourceIPAddress: "10.0.1.100"
      userAgent: "TruffleHog/3.0 (Security Audit)"
      userIdentity:
        email: "security@example.com"
  - Name: Normal User Activity
    ExpectedResult: false
    Log:
      timestamp: 1234567890000
      serviceName: "accounts"
      actionName: "login"
      sourceIPAddress: "198.51.100.1"
      userAgent: "Mozilla/5.0"
      userIdentity:
        email: "user@example.com"
  - Name: Service Agent Activity
    ExpectedResult: false
    Log:
      timestamp: 1234567890000
      serviceName: "workspace"
      actionName: "executeCommand"
      sourceIPAddress: "10.0.1.50"
      userAgent: "Databricks-Runtime/12.0"
      userIdentity:
        email: "system@databricks.com"

Detection logic

Condition

not (userAgent contains "Databricks-Service/driver" or userAgent contains "Databricks-Runtime" or userAgent contains "Delta-Sharing-SparkStructuredStreaming" or userAgent contains "RawDBHttpClient" or userAgent contains "mlflow-python" or userAgent contains "obsSDK-scala" or userAgent contains "wsfs" or userAgent contains "feature-store" or requestParams.path contains "/telemetry" or requestParams.path contains "/delta-commit" or requestParams.path contains "/health" or requestParams.path contains "/metrics" or requestParams.path contains "/status" or userIdentity.email regex_match "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}")
userAgent contains "TruffleHog"

Exclusions

Top-level NOT(...) conjuncts: predicates this rule actively suppresses.

Field	Kind	Excluded values
`requestParams.path`	contains	`/delta-commit`
`requestParams.path`	contains	`/health`
`requestParams.path`	contains	`/metrics`
`requestParams.path`	contains	`/status`
`requestParams.path`	contains	`/telemetry`
`userAgent`	contains	`Databricks-Runtime`
`userAgent`	contains	`Databricks-Service/driver`
`userAgent`	contains	`Delta-Sharing-SparkStructuredStreaming`
`userAgent`	contains	`RawDBHttpClient`
`userAgent`	contains	`feature-store`
`userAgent`	contains	`mlflow-python`
`userAgent`	contains	`obsSDK-scala`
`userAgent`	contains	`wsfs`
`userIdentity.email`	regex_match	`[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}`

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

Field	Kind	Values
`userAgent`	contains	`TruffleHog`

Output fields

Fields the rule emits when it matches. Chronicle authors list these in the outcome block; they appear on the detection and $risk_score drives alerting. Sentinel / Defender XDR rules build them up through project / summarize / extend stages. Sentinel maps these into alert fields via entityMappings and customDetails; Defender XDR custom detections surface them as alert fields directly.

Field	Source
`sourceIPAddress`
`email`	`userIdentity.email`

`j` / `k`	Scroll down / up
`d` / `u`	Half-page down / up
`gg` / `G`	Top / bottom
`h` / `l`	History back / forward
`f`	Follow link (`Shift` = new tab)
`/`	Focus search
`?`	Toggle this help
`↑` / `↓`	Navigate search results
`Enter`	Open highlighted result
`Esc`	Close results / dialog

`type:`	`events` / `rules` / `providers`
`vendor:`	`sigma` / `elastic` / `splunk` / `kusto` / `chronicle` (vendor name alone also works: `sigma:`, `kql:`, `secops:`…)
`tactic:`	TA-id, slug, or name: `credential_access`, `TA0006`
`technique:`	technique or sub-technique ID: `T1003`, `T1003.001` (alias `tech:`)
`severity:`	`critical` / `high` / `medium` / `low` / `informational` (alias `sev:`)
`risk_score`	Numeric comparison on the Elastic risk score (0 to 100): `risk_score>50`, `risk_score<=20`, `risk_score=99` (alias `risk`; Elastic rules only)
`stages:`	Rules with exactly N pipeline stages
`correlation:`	`single_event` / `sequence` / `alternatives` / `alternatives_cross_log` / `all_required` / `correlated`
`with:`	Co-occurrence event-id; stacks (`with:4624 with:4769`) to require all, while a comma list in one occurrence (`with:4624,4769`) is an either-or group. Implies multi-event
`like:`	Structural neighbors of a rule slug (equivalents + subsumption stricter / broader): `like:comsvcs_lsass_memory_dump-splunk-sysmon`
`groupby:`	Entity-grouping substring match against `group_by_keys`: `groupby:user`, `groupby:host`
`uses:`	Rules whose predicate tree touches the field (any kind, any value): `uses:CommandLine`
`excludes:`	Rules with top-level `not()` clauses on the field (FP whitelists): `excludes:ParentImage`
`field:` / `value:`	Predicate search; narrows rule cards to those with a matching leaf and drives the indicator tier. Unquoted = substring, wildcards allowed (`value:mimikatz`)
`indicator:`	Shorthand for `field:F value:V`: `indicator:Image=*\powershell.exe`
`kind:`	Filter by predicate kind. Narrows rule cards to those carrying a matching predicate leaf (`vendor:elastic kind:cidr_match`) and drives the indicator tier: `contains` / `starts_with` / `ends_with` / `regex` / `cidr` / `eq` / `in` … (operator aliases `op:`/`match:`)
`has:` / `no:`	`sample`, `field`, `notes`, `refs`, `trace`, `thirdparty`, `rule`, `pattern`, `timewindow`, `threshold`, `newterms`, `sigma`/`elastic`/`splunk`/`kusto`/`chronicle`
`-op:val`	Exclude matches; works on most operators but not `type:`/`like:`/`has:`/`no:` (use `no:<flag>` to exclude a rule flag): `tactic:execution -vendor:splunk`. Standalone `-kind:`/`-field:`/`-value:` drop every rule carrying a matching predicate leaf (`type:rules -kind:is_null`)
`field:"…"` / `value:"…"`	Quoted value = anchored exact match (also allows spaces): `value:"net user"`
`a,b`	Comma = OR inside one operator (`vendor:sigma,elastic`, `severity:high,critical`); repeating a facet merges the same way. `field:`/`value:` never split (literal commas)
`vendors:` / `stage:`	Singular and plural spellings fold to the canonical operator and value: `tactics:` = `tactic:`, `type:event` = `type:events`, `correlation:sequences` = `correlation:sequence`, `has:thresholds` = `has:threshold`
`"quoted phrase"`	Exact-match a multi-word phrase (free text)