MacOS Data Chunking

Status: production
Severity: low
Group by: CurrentDirectory, command_line, computer_name, original_file_name, parent_process_id, process_guid, process_hash, process_id, process_name, user, user_id, vendor_product
Author: Raven Tait, Splunk
Source: github.com/splunk/security_content

The following analytic detects suspicious data chunking activities that involve the use of split or dd, potentially indicating an attempt to evade detection by breaking large files into smaller parts. Attackers may use this technique to bypass size-based security controls, facilitating the covert exfiltration of sensitive data. By monitoring for unusual or unauthorized use of these commands, this analytic helps identify potential data exfiltration attempts, allowing security teams to intervene and prevent the unauthorized transfer of critical information from the network.

MITRE ATT&CK coverage

Tactic	Techniques
Exfiltration	`T1030` Data Transfer Size Limits

Rule body splunk

name: MacOS Data Chunking
id: 7f1c8bed-9bd4-40b0-a1df-c262cbade0fc
version: 3
creation_date: '2026-04-14'
modification_date: '2026-05-13'
author: Raven Tait, Splunk
status: production
type: Anomaly
description: |-
    The following analytic detects suspicious data chunking activities that involve the use of split or dd, potentially indicating an attempt to evade detection by breaking large files into smaller parts.
    Attackers may use this technique to bypass size-based security controls, facilitating the covert exfiltration of sensitive data.
    By monitoring for unusual or unauthorized use of these commands, this analytic helps identify potential data exfiltration attempts, allowing security teams to intervene and prevent the unauthorized transfer of critical information from the network.
data_source:
    - Osquery Results
search: |-
    | tstats `security_content_summariesonly`
      count min(_time) as firstTime
            max(_time) as lastTime
    
    from datamodel=Endpoint.Processes where
    
    (
        Processes.process = "dd *"
         Processes.process = "* if=*"
    )
    OR
    (
        Processes.process = "*split *"
        Processes.process="* -b *"
    )
    
    by Processes.dest Processes.original_file_name Processes.parent_process_id
       Processes.process Processes.process_exec Processes.process_guid
       Processes.process_hash Processes.process_id
       Processes.process_current_directory Processes.process_name
       Processes.process_path Processes.user
       Processes.user_id Processes.vendor_product
    
    | `drop_dm_object_name(Processes)`
    | `security_content_ctime(firstTime)`
    | `security_content_ctime(lastTime)`
    | `macos_data_chunking_filter`
how_to_implement: |-
    This detection uses osquery and endpoint security on MacOS. Follow the link in references, which describes how to setup process auditing in MacOS with endpoint security and osquery.
    Also the [TA-OSquery](https://splunkbase.splunk.com/app/8574) must be deployed across your indexers and universal forwarders in order to have the osquery data populate the data models.
known_false_positives: |-
    Administrator or network operator can use this application for automation purposes. Please update the filter macros to remove false positives.
references:
    - https://osquery.readthedocs.io/en/stable/deployment/process-auditing/
    - https://ss64.com/mac/dd.html
    - https://ss64.com/mac/split.html
drilldown_searches:
    - name: View the detection results for - "$user$" and "$dest$"
      search: '%original_detection_search% | search  user = "$user$" dest = "$dest$"'
      earliest_offset: $info_min_time$
      latest_offset: $info_max_time$
    - name: View risk events for the last 7 days for - "$user$" and "$dest$"
      search: '| from datamodel Risk.All_Risk | search normalized_risk_object IN ("$user$", "$dest$") | stats count min(_time) as firstTime max(_time) as lastTime values(search_name) as "Search Name" values(risk_message) as "Risk Message" values(analyticstories) as "Analytic Stories" values(annotations._all) as "Annotations" values(annotations.mitre_attack.mitre_tactic) as "ATT&CK Tactics" by normalized_risk_object | `security_content_ctime(firstTime)` | `security_content_ctime(lastTime)`'
      earliest_offset: 7d
      latest_offset: "0"
intermediate_findings:
    entities:
        - field: user
          type: user
          score: 20
          message: A file was split on $dest$ by $user$ via $process$
        - field: dest
          type: system
          score: 20
          message: A file was split on $dest$ by $user$ via $process$
threat_objects:
    - field: process
      type: process
analytic_story:
    - MacOS Post-Exploitation
asset_type: Endpoint
mitre_attack_id:
    - T1030
product:
    - Splunk Enterprise
    - Splunk Enterprise Security
    - Splunk Cloud
category: endpoint
security_domain: endpoint
tests:
    - name: True Positive Test
      attack_data:
        - data: https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1030/osquery_data_chunking/osquery.log
          source: osquery
          sourcetype: osquery:results
      test_type: unit

Stages and Predicates

Stage 1: `tstats`

| tstats `security_content_summariesonly`
  count min(_time) as firstTime
        max(_time) as lastTime

from datamodel=Endpoint.Processes where

(
    Processes.process = "dd *"
     Processes.process = "* if=*"
)
OR
(
    Processes.process = "*split *"
    Processes.process="* -b *"
)

by Processes.dest Processes.original_file_name Processes.parent_process_id
   Processes.process Processes.process_exec Processes.process_guid
   Processes.process_hash Processes.process_id
   Processes.process_current_directory Processes.process_name
   Processes.process_path Processes.user
   Processes.user_id Processes.vendor_product

Stage 2: `search`

| `drop_dm_object_name(Processes)`

Stage 3: `search`

| `security_content_ctime(firstTime)`

Stage 4: `search`

| `security_content_ctime(lastTime)`

Stage 5: `search`

| `macos_data_chunking_filter`

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

Field	Kind	Values
`Processes.process`	eq	`"* -b "` `" if="` `"split "` `"dd "`

`j` / `k`	Scroll down / up
`d` / `u`	Half-page down / up
`gg` / `G`	Top / bottom
`h` / `l`	History back / forward
`f`	Follow link (`Shift` = new tab)
`/`	Focus search
`?`	Toggle this help
`↑` / `↓`	Navigate search results
`Enter`	Open highlighted result
`Esc`	Close results / dialog

`type:`	`events` / `rules` / `providers`
`vendor:`	`sigma` / `elastic` / `splunk` / `kusto` / `chronicle` (vendor name alone also works: `sigma:`, `kql:`, `secops:`…)
`tactic:`	TA-id, slug, or name: `credential_access`, `TA0006`
`technique:`	technique or sub-technique ID: `T1003`, `T1003.001` (alias `tech:`)
`severity:`	`critical` / `high` / `medium` / `low` / `informational` (alias `sev:`)
`risk_score`	Numeric comparison on the Elastic risk score (0 to 100): `risk_score>50`, `risk_score<=20`, `risk_score=99` (alias `risk`; Elastic rules only)
`stages:`	Rules with exactly N pipeline stages
`correlation:`	`single_event` / `sequence` / `alternatives` / `alternatives_cross_log` / `all_required` / `correlated`
`with:`	Co-occurrence event-id; stacks (`with:4624 with:4769`) to require all, while a comma list in one occurrence (`with:4624,4769`) is an either-or group. Implies multi-event
`like:`	Structural neighbors of a rule slug (equivalents + subsumption stricter / broader): `like:comsvcs_lsass_memory_dump-splunk-sysmon`
`groupby:`	Entity-grouping substring match against `group_by_keys`: `groupby:user`, `groupby:host`
`uses:`	Rules whose predicate tree touches the field (any kind, any value): `uses:CommandLine`
`excludes:`	Rules with top-level `not()` clauses on the field (FP whitelists): `excludes:ParentImage`
`field:` / `value:`	Predicate search; narrows rule cards to those with a matching leaf and drives the indicator tier. Unquoted = substring, wildcards allowed (`value:mimikatz`)
`indicator:`	Shorthand for `field:F value:V`: `indicator:Image=*\powershell.exe`
`kind:`	Filter by predicate kind. Narrows rule cards to those carrying a matching predicate leaf (`vendor:elastic kind:cidr_match`) and drives the indicator tier: `contains` / `starts_with` / `ends_with` / `regex` / `cidr` / `eq` / `in` … (operator aliases `op:`/`match:`)
`has:` / `no:`	`sample`, `field`, `notes`, `refs`, `trace`, `thirdparty`, `rule`, `pattern`, `timewindow`, `threshold`, `newterms`, `sigma`/`elastic`/`splunk`/`kusto`/`chronicle`
`-op:val`	Exclude matches; works on most operators but not `type:`/`like:`/`has:`/`no:` (use `no:<flag>` to exclude a rule flag): `tactic:execution -vendor:splunk`. Standalone `-kind:`/`-field:`/`-value:` drop every rule carrying a matching predicate leaf (`type:rules -kind:is_null`)
`field:"…"` / `value:"…"`	Quoted value = anchored exact match (also allows spaces): `value:"net user"`
`a,b`	Comma = OR inside one operator (`vendor:sigma,elastic`, `severity:high,critical`); repeating a facet merges the same way. `field:`/`value:` never split (literal commas)
`vendors:` / `stage:`	Singular and plural spellings fold to the canonical operator and value: `tactics:` = `tactic:`, `type:event` = `type:events`, `correlation:sequences` = `correlation:sequence`, `has:thresholds` = `has:threshold`
`"quoted phrase"`	Exact-match a multi-word phrase (free text)

MacOS Data Chunking

MITRE ATT&CK coverage

Rule body splunk

Stages and Predicates

Stage 1: `tstats`

Stage 2: `search`

Stage 3: `search`

Stage 4: `search`

Stage 5: `search`

Indicators

Keyboard shortcuts

Search operators

MacOS Data Chunking

MITRE ATT&CK coverage

Rule body splunk

Stages and Predicates

Stage 1: tstats

Stage 2: search

Stage 3: search

Stage 4: search

Stage 5: search

Indicators

Stage 1: `tstats`

Stage 2: `search`

Stage 3: `search`

Stage 4: `search`

Stage 5: `search`