M365 Copilot Jailbreak Attempts

Status: experimental
Severity: low
Author: Rod Soto
Source: github.com/splunk/security_content

Detects M365 Copilot jailbreak attempts through prompt injection techniques including rule manipulation, system bypass commands, and AI impersonation requests that attempt to circumvent built-in safety controls. The detection searches exported eDiscovery prompt logs for jailbreak keywords like "pretend you are," "act as," "rules=," "ignore," "bypass," and "override" in the Subject_Title field, assigning severity scores based on the manipulation type (score of 4 for amoral impersonation or explicit rule injection, score of 3 for entity roleplay or bypass commands). Prompts with a jailbreak score of 2 or higher are flagged, prioritizing the most severe attempts to override AI safety mechanisms through direct instruction injection or unauthorized persona adoption.

MITRE ATT&CK coverage

Tactic	Techniques
Defense Impairment	`T1685` Disable or Modify Tools

Rule body splunk

name: M365 Copilot Jailbreak Attempts
id: b05a4f25-e07d-436f-ab03-f954afa922c0
version: 6
creation_date: '2025-10-13'
modification_date: '2026-05-13'
author: Rod Soto
status: experimental
type: Anomaly
description: Detects M365 Copilot jailbreak attempts through prompt injection techniques including rule manipulation, system bypass commands, and AI impersonation requests that attempt to circumvent built-in safety controls. The detection searches exported eDiscovery prompt logs for jailbreak keywords like "pretend you are," "act as," "rules=," "ignore," "bypass," and "override" in the Subject_Title field, assigning severity scores based on the manipulation type (score of 4 for amoral impersonation or explicit rule injection, score of 3 for entity roleplay or bypass commands). Prompts with a jailbreak score of 2 or higher are flagged, prioritizing the most severe attempts to override AI safety mechanisms through direct instruction injection or unauthorized persona adoption.
data_source:
    - M365 Exported eDiscovery Prompts
search: |
    `m365_exported_ediscovery_prompt_logs`
    | search Subject_Title IN (
                                "*act as*",
                                "*bypass*",
                                "*ignore*",
                                "*override*",
                                "*pretend you are*",
                                "*rules=*"
                              )
    | eval user = Sender
    | eval jailbreak_score=case(
                                match(Subject_Title, "(?i)pretend you are.*amoral"), 4,
                                match(Subject_Title, "(?i)act as.*entities"), 3,
                                match(Subject_Title, "(?i)(ignore|bypass|override)"), 3,
                                match(Subject_Title, "(?i)rules\s*="), 4, 1=1, 1
    )
    | where jailbreak_score >= 2
    | table _time, user, Subject_Title, jailbreak_score, Workload, Size
    | sort -jailbreak_score, -_time
    | `m365_copilot_jailbreak_attempts_filter`
how_to_implement: To export M365 Copilot prompt logs, navigate to the Microsoft Purview compliance portal (compliance.microsoft.com) and access eDiscovery. Create a new eDiscovery case, add target user accounts or date ranges as data sources, then create a search query targeting M365 Copilot interactions across relevant workloads. Once the search completes, export the results to generate a package containing prompt logs with fields like Subject_Title (prompt text), Sender, timestamps, and workload metadata. Download the exported files using the eDiscovery Export Tool and ingest them into Splunk for security analysis and detection of jailbreak attempts, data exfiltration requests, and policy violations.
known_false_positives: Legitimate users discussing AI ethics research, security professionals testing system robustness, developers creating training materials for AI safety, or academic discussions about AI limitations and behavioral constraints may trigger false positives.
references:
    - https://www.splunk.com/en_us/blog/artificial-intelligence/m365-copilot-log-analysis-splunk.html
drilldown_searches:
    - name: View the detection results for - "$user$"
      search: '%original_detection_search% | search  "$Suser = "$user$"'
      earliest_offset: $info_min_time$
      latest_offset: $info_max_time$
    - name: View risk events for the last 7 days for "$user$"
      search: '| from datamodel Risk.All_Risk | search normalized_risk_object IN ("$user$", | stats count min(_time) as firstTime max(_time) as lastTime values(search_name) as "Search Name" values(risk_message) as "Risk Message" values(analyticstories) as "Analytic Stories" values(annotations._all) as "Annotations" values(annotations.mitre_attack.mitre_tactic) as "ATT&CK Tactics" by normalized_risk_object | `security_content_ctime(firstTime)` | `security_content_ctime(lastTime)`'
      earliest_offset: 7d
      latest_offset: "0"
intermediate_findings:
    entities:
        - field: user
          type: user
          score: 20
          message: User $user$ attempted M365 Copilot Jailbreak with score $jailbreak_score$ using prompt injection techniques to bypass AI safety controls and manipulate system behavior, potentially violating acceptable use policies.
analytic_story:
    - Suspicious Microsoft 365 Copilot Activities
asset_type: Web Application
mitre_attack_id:
    - T1685
product:
    - Splunk Enterprise
    - Splunk Enterprise Security
    - Splunk Cloud
category: application
security_domain: endpoint
tests:
    - name: True Positive Test
      attack_data:
        - data: https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/m365_copilot/copilot_prompt_logs.csv
          sourcetype: csv
          source: csv
      test_type: experimental
      description: This test is a legacy experimental test and may not be accurate.

Stages and Predicates

Stage 1: `search`

`m365_exported_ediscovery_prompt_logs`

Stage 2: `search`

| search Subject_Title IN (
                            "*act as*",
                            "*bypass*",
                            "*ignore*",
                            "*override*",
                            "*pretend you are*",
                            "*rules=*"
                          )

Stage 3: `eval`

| eval user = Sender

Stage 4: `eval`

| eval jailbreak_score=case(
                            match(Subject_Title, "(?i)pretend you are.*amoral"), 4,
                            match(Subject_Title, "(?i)act as.*entities"), 3,
                            match(Subject_Title, "(?i)(ignore|bypass|override)"), 3,
                            match(Subject_Title, "(?i)rules\s*="), 4, 1=1, 1
)

jailbreak_score =

ifmatch(Subject_Title, "(?i)pretend you are.*amoral")4

elifmatch(Subject_Title, "(?i)act as.*entities")3

elifmatch(Subject_Title, "(?i)(ignore|bypass|override)")3

elifmatch(Subject_Title, "(?i)rules\s*=")4

else1

Stage 5: `where`

| where jailbreak_score >= 2

Stage 6: `table`

| table _time, user, Subject_Title, jailbreak_score, Workload, Size

Stage 7: `sort`

| sort -jailbreak_score, -_time

Stage 8: `search`

| `m365_copilot_jailbreak_attempts_filter`

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

Field	Kind	Values
`Subject_Title`	in	`"act as"` `"bypass"` `"ignore"` `"override"` `"pretend you are"` `"rules="`
`jailbreak_score`	ge	`2`
`sourcetype`	eq	`csv`

`j` / `k`	Scroll down / up
`d` / `u`	Half-page down / up
`gg` / `G`	Top / bottom
`h` / `l`	History back / forward
`f`	Follow link (`Shift` = new tab)
`/`	Focus search
`?`	Toggle this help
`↑` / `↓`	Navigate search results
`Enter`	Open highlighted result
`Esc`	Close results / dialog

`type:`	`events` / `rules` / `providers`
`vendor:`	`sigma` / `elastic` / `splunk` / `kusto` / `chronicle` (vendor name alone also works: `sigma:`, `kql:`, `secops:`…)
`tactic:`	TA-id, slug, or name: `credential_access`, `TA0006`
`technique:`	technique or sub-technique ID: `T1003`, `T1003.001` (alias `tech:`)
`severity:`	`critical` / `high` / `medium` / `low` / `informational` (alias `sev:`)
`risk_score`	Numeric comparison on the Elastic risk score (0 to 100): `risk_score>50`, `risk_score<=20`, `risk_score=99` (alias `risk`; Elastic rules only)
`stages:`	Rules with exactly N pipeline stages
`correlation:`	`single_event` / `sequence` / `alternatives` / `alternatives_cross_log` / `all_required` / `correlated`
`with:`	Co-occurrence event-id; stacks (`with:4624 with:4769`) to require all, while a comma list in one occurrence (`with:4624,4769`) is an either-or group. Implies multi-event
`like:`	Structural neighbors of a rule slug (equivalents + subsumption stricter / broader): `like:comsvcs_lsass_memory_dump-splunk-sysmon`
`groupby:`	Entity-grouping substring match against `group_by_keys`: `groupby:user`, `groupby:host`
`uses:`	Rules whose predicate tree touches the field (any kind, any value): `uses:CommandLine`
`excludes:`	Rules with top-level `not()` clauses on the field (FP whitelists): `excludes:ParentImage`
`field:` / `value:`	Predicate search; narrows rule cards to those with a matching leaf and drives the indicator tier. Unquoted = substring, wildcards allowed (`value:mimikatz`)
`indicator:`	Shorthand for `field:F value:V`: `indicator:Image=*\powershell.exe`
`kind:`	Filter by predicate kind. Narrows rule cards to those carrying a matching predicate leaf (`vendor:elastic kind:cidr_match`) and drives the indicator tier: `contains` / `starts_with` / `ends_with` / `regex` / `cidr` / `eq` / `in` … (operator aliases `op:`/`match:`)
`has:` / `no:`	`sample`, `field`, `notes`, `refs`, `trace`, `thirdparty`, `rule`, `pattern`, `timewindow`, `threshold`, `newterms`, `sigma`/`elastic`/`splunk`/`kusto`/`chronicle`
`-op:val`	Exclude matches; works on most operators but not `type:`/`like:`/`has:`/`no:` (use `no:<flag>` to exclude a rule flag): `tactic:execution -vendor:splunk`. Standalone `-kind:`/`-field:`/`-value:` drop every rule carrying a matching predicate leaf (`type:rules -kind:is_null`)
`field:"…"` / `value:"…"`	Quoted value = anchored exact match (also allows spaces): `value:"net user"`
`a,b`	Comma = OR inside one operator (`vendor:sigma,elastic`, `severity:high,critical`); repeating a facet merges the same way. `field:`/`value:` never split (literal commas)
`vendors:` / `stage:`	Singular and plural spellings fold to the canonical operator and value: `tactics:` = `tactic:`, `type:event` = `type:events`, `correlation:sequences` = `correlation:sequence`, `has:thresholds` = `has:threshold`
`"quoted phrase"`	Exact-match a multi-word phrase (free text)

M365 Copilot Jailbreak Attempts

MITRE ATT&CK coverage

Rule body splunk

Stages and Predicates

Stage 1: `search`

Stage 2: `search`

Stage 3: `eval`

Stage 4: `eval`

Stage 5: `where`

Stage 6: `table`

Stage 7: `sort`

Stage 8: `search`

Indicators

Keyboard shortcuts

Search operators

M365 Copilot Jailbreak Attempts

MITRE ATT&CK coverage

Rule body splunk

Stages and Predicates

Stage 1: search

Stage 2: search

Stage 3: eval

Stage 4: eval

Stage 5: where

Stage 6: table

Stage 7: sort

Stage 8: search

Indicators

Stage 1: `search`

Stage 2: `search`

Stage 3: `eval`

Stage 4: `eval`

Stage 5: `where`

Stage 6: `table`

Stage 7: `sort`

Stage 8: `search`