Unusual High Confidence Content Filter Blocks Detected

Status: production
Severity: medium
Time window: 1h
Group by: gen_ai.compliance.violation_code, user.id
Author: Elastic
Source: github.com/elastic/detection-rules

Detects repeated high-confidence 'BLOCKED' actions coupled with specific 'Content Filter' policy violation having codes such as 'MISCONDUCT', 'HATE', 'SEXUAL', INSULTS', 'PROMPT_ATTACK', 'VIOLENCE' indicating persistent misuse or attempts to probe the model's ethical boundaries.

MITRE ATLAS coverage

Adversarial-ML threat framework (not MITRE ATT&CK).

Tactic	Techniques
Execution	`AML.T0051` LLM Prompt Injection
Privilege Escalation	`AML.T0054` LLM Jailbreak
Defense Evasion	`AML.T0054` LLM Jailbreak

Rule body elastic

[metadata]
creation_date = "2024/05/05"
integration = ["aws_bedrock"]
maturity = "production"
updated_date = "2025/11/10"

[rule]
author = ["Elastic"]
description = """
Detects repeated high-confidence 'BLOCKED' actions coupled with specific 'Content Filter' policy violation having codes
such as 'MISCONDUCT', 'HATE', 'SEXUAL', INSULTS', 'PROMPT_ATTACK', 'VIOLENCE' indicating persistent misuse or attempts
to probe the model's ethical boundaries.
"""
false_positives = ["New model deployments.", "Testing updates to compliance policies."]
from = "now-60m"
interval = "10m"
language = "esql"
license = "Elastic License v2"
name = "Unusual High Confidence Content Filter Blocks Detected"
note = """## Triage and analysis

### Investigating Unusual High Confidence Content Filter Blocks Detected

Amazon Bedrock Guardrail is a set of features within Amazon Bedrock designed to help businesses apply robust safety and privacy controls to their generative AI applications.

It enables users to set guidelines and filters that manage content quality, relevancy, and adherence to responsible AI practices.

Through Guardrail, organizations can enable Content filter for Hate, Insults, Sexual Violence and Misconduct along with Prompt Attack filters prompts
to prevent the model from generating content on specific, undesired subjects, and they can establish thresholds for harmful content categories.

#### Possible investigation steps

- Identify the user account whose prompts caused high confidence content filter blocks and whether it should perform this kind of action.
- Investigate other alerts associated with the user account during the past 48 hours.
- Consider the time of day. If the user is a human (not a program or script), did the activity take place during a normal time of day?
- Examine the account's prompts and responses in the last 24 hours.
- If you suspect the account has been compromised, scope potentially compromised assets by tracking Amazon Bedrock model access, prompts generated, and responses to the prompts by the account in the last 24 hours.

### False positive analysis

- Verify the user account that queried denied topics, is not testing any new model deployments or updated compliance policies in Amazon Bedrock guardrails.

### Response and remediation

- Initiate the incident response process based on the outcome of the triage.
- Disable or limit the account during the investigation and response.
- Identify the possible impact of the incident and prioritize accordingly; the following actions can help you gain context:
    - Identify the account role in the cloud environment.
    - Identify if the attacker is moving laterally and compromising other Amazon Bedrock Services.
    - Identify any regulatory or legal ramifications related to this activity.
- Review the permissions assigned to the implicated user group or role behind these requests to ensure they are authorized and expected to access bedrock and ensure that the least privilege principle is being followed.
- Determine the initial vector abused by the attacker and take action to prevent reinfection via the same vector.
- Using the incident response data, update logging and audit policies to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).
"""
references = [
    "https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html",
    "https://atlas.mitre.org/techniques/AML.T0051",
    "https://atlas.mitre.org/techniques/AML.T0054",
    "https://www.elastic.co/security-labs/elastic-advances-llm-security",
]
risk_score = 47
rule_id = "4f855297-c8e0-4097-9d97-d653f7e471c4"
setup = """## Setup

This rule requires that guardrails are configured in AWS Bedrock. For more information, see the AWS Bedrock documentation:

https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-create.html
"""
severity = "medium"
tags = [
    "Domain: LLM",
    "Data Source: AWS Bedrock",
    "Data Source: AWS S3",
    "Use Case: Policy Violation",
    "Mitre Atlas: T0051",
    "Mitre Atlas: T0054",
    "Resources: Investigation Guide",
]
timestamp_override = "event.ingested"
type = "esql"

query = '''
from logs-aws_bedrock.invocation-*

// Expand multi-value fields
| mv_expand gen_ai.compliance.violation_code
| mv_expand gen_ai.policy.confidence
| mv_expand gen_ai.policy.name
| mv_expand gen_ai.policy.action

// Filter for high-confidence content policy blocks with targeted violations
| where
  gen_ai.policy.action == "BLOCKED"
  and gen_ai.policy.name == "content_policy"
  and gen_ai.policy.confidence like "HIGH"
  and gen_ai.compliance.violation_code in ("HATE", "MISCONDUCT", "SEXUAL", "INSULTS", "PROMPT_ATTACK", "VIOLENCE")

// keep ECS + compliance fields
| keep
  user.id,
  gen_ai.compliance.violation_code

// count blocked violations per user per violation type
| stats
    Esql.ml_policy_blocked_violation_count = count()
  by
    user.id,
    gen_ai.compliance.violation_code

// Aggregate all violation types per user
| stats
    Esql.ml_policy_blocked_violation_total_count = sum(Esql.ml_policy_blocked_violation_count)
  by
    user.id

// Filter for users with more than 5 total violations
| where Esql.ml_policy_blocked_violation_total_count > 5

// sort by violation volume
| sort Esql.ml_policy_blocked_violation_total_count desc
'''

Stages and Predicates

Stage 1: `from`

from logs-aws_bedrock.invocation-*

Stage 2: `mv_expand`

| mv_expand gen_ai.compliance.violation_code

Stage 3: `mv_expand`

| mv_expand gen_ai.policy.confidence

Stage 4: `mv_expand`

| mv_expand gen_ai.policy.name

Stage 5: `mv_expand`

| mv_expand gen_ai.policy.action

Stage 6: `where`

| where
  gen_ai.policy.action == "BLOCKED"
  and gen_ai.policy.name == "content_policy"
  and gen_ai.policy.confidence like "HIGH"
  and gen_ai.compliance.violation_code in ("HATE", "MISCONDUCT", "SEXUAL", "INSULTS", "PROMPT_ATTACK", "VIOLENCE")

Stage 7: `keep`

| keep
  user.id,
  gen_ai.compliance.violation_code

Stage 8: `stats`

| stats
    Esql.ml_policy_blocked_violation_count = count()
  by
    user.id,
    gen_ai.compliance.violation_code

Stage 9: `stats`

| stats
    Esql.ml_policy_blocked_violation_total_count = sum(Esql.ml_policy_blocked_violation_count)
  by
    user.id

Stage 10: `where`

| where Esql.ml_policy_blocked_violation_total_count > 5

Stage 11: `sort`

| sort Esql.ml_policy_blocked_violation_total_count desc

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

Field	Kind	Values
`Esql.ml_policy_blocked_violation_total_count`	gt	`5`
`gen_ai.compliance.violation_code`	in	`HATE` `INSULTS` `MISCONDUCT` `PROMPT_ATTACK` `SEXUAL` `VIOLENCE`
`gen_ai.policy.action`	eq	`BLOCKED`
`gen_ai.policy.confidence`	wildcard	`HIGH`
`gen_ai.policy.name`	eq	`content_policy`

Output fields

Fields the rule emits when it matches. Chronicle authors list these in the outcome block; they appear on the detection and $risk_score drives alerting. Sentinel / Defender XDR rules build them up through project / summarize / extend stages. Sentinel maps these into alert fields via entityMappings and customDetails; Defender XDR custom detections surface them as alert fields directly.

Field	Source
`Esql.ml_policy_blocked_violation_total_count`	`STATS Esql.ml_policy_blocked_violation_total_count = sum(Esql.ml_policy_blocked_violation_count)`
`user.id`	`STATS BY`

`j` / `k`	Scroll down / up
`d` / `u`	Half-page down / up
`gg` / `G`	Top / bottom
`h` / `l`	History back / forward
`f`	Follow link (`Shift` = new tab)
`/`	Focus search
`?`	Toggle this help
`↑` / `↓`	Navigate search results
`Enter`	Open highlighted result
`Esc`	Close results / dialog

`type:`	`events` / `rules` / `providers`
`vendor:`	`sigma` / `elastic` / `splunk` / `kusto` / `chronicle` (vendor name alone also works: `sigma:`, `kql:`, `secops:`…)
`tactic:`	TA-id, slug, or name: `credential_access`, `TA0006`
`technique:`	technique or sub-technique ID: `T1003`, `T1003.001` (alias `tech:`)
`severity:`	`critical` / `high` / `medium` / `low` / `informational` (alias `sev:`)
`risk_score`	Numeric comparison on the Elastic risk score (0 to 100): `risk_score>50`, `risk_score<=20`, `risk_score=99` (alias `risk`; Elastic rules only)
`stages:`	Rules with exactly N pipeline stages
`correlation:`	`single_event` / `sequence` / `alternatives` / `alternatives_cross_log` / `all_required` / `correlated`
`with:`	Co-occurrence event-id; stacks (`with:4624 with:4769`) to require all, while a comma list in one occurrence (`with:4624,4769`) is an either-or group. Implies multi-event
`like:`	Structural neighbors of a rule slug (equivalents + subsumption stricter / broader): `like:comsvcs_lsass_memory_dump-splunk-sysmon`
`groupby:`	Entity-grouping substring match against `group_by_keys`: `groupby:user`, `groupby:host`
`uses:`	Rules whose predicate tree touches the field (any kind, any value): `uses:CommandLine`
`excludes:`	Rules with top-level `not()` clauses on the field (FP whitelists): `excludes:ParentImage`
`field:` / `value:`	Predicate search; narrows rule cards to those with a matching leaf and drives the indicator tier. Unquoted = substring, wildcards allowed (`value:mimikatz`)
`indicator:`	Shorthand for `field:F value:V`: `indicator:Image=*\powershell.exe`
`kind:`	Filter by predicate kind. Narrows rule cards to those carrying a matching predicate leaf (`vendor:elastic kind:cidr_match`) and drives the indicator tier: `contains` / `starts_with` / `ends_with` / `regex` / `cidr` / `eq` / `in` … (operator aliases `op:`/`match:`)
`has:` / `no:`	`sample`, `field`, `notes`, `refs`, `trace`, `thirdparty`, `rule`, `pattern`, `timewindow`, `threshold`, `newterms`, `sigma`/`elastic`/`splunk`/`kusto`/`chronicle`
`-op:val`	Exclude matches; works on most operators but not `type:`/`like:`/`has:`/`no:` (use `no:<flag>` to exclude a rule flag): `tactic:execution -vendor:splunk`. Standalone `-kind:`/`-field:`/`-value:` drop every rule carrying a matching predicate leaf (`type:rules -kind:is_null`)
`field:"…"` / `value:"…"`	Quoted value = anchored exact match (also allows spaces): `value:"net user"`
`a,b`	Comma = OR inside one operator (`vendor:sigma,elastic`, `severity:high,critical`); repeating a facet merges the same way. `field:`/`value:` never split (literal commas)
`vendors:` / `stage:`	Singular and plural spellings fold to the canonical operator and value: `tactics:` = `tactic:`, `type:event` = `type:events`, `correlation:sequences` = `correlation:sequence`, `has:thresholds` = `has:threshold`
`"quoted phrase"`	Exact-match a multi-word phrase (free text)

Unusual High Confidence Content Filter Blocks Detected

MITRE ATLAS coverage

Rule body elastic

Stages and Predicates

Stage 1: `from`

Stage 2: `mv_expand`

Stage 3: `mv_expand`

Stage 4: `mv_expand`

Stage 5: `mv_expand`

Stage 6: `where`

Stage 7: `keep`

Stage 8: `stats`

Stage 9: `stats`

Stage 10: `where`

Stage 11: `sort`

Indicators

Output fields

Keyboard shortcuts

Search operators

Unusual High Confidence Content Filter Blocks Detected

MITRE ATLAS coverage

Rule body elastic

Stages and Predicates

Stage 1: from

Stage 2: mv_expand

Stage 3: mv_expand

Stage 4: mv_expand

Stage 5: mv_expand

Stage 6: where

Stage 7: keep

Stage 8: stats

Stage 9: stats

Stage 10: where

Stage 11: sort

Indicators

Output fields

Stage 1: `from`

Stage 2: `mv_expand`

Stage 3: `mv_expand`

Stage 4: `mv_expand`

Stage 5: `mv_expand`

Stage 6: `where`

Stage 7: `keep`

Stage 8: `stats`

Stage 9: `stats`

Stage 10: `where`

Stage 11: `sort`