Detection rules › Splunk

Kubernetes Process with Resource Ratio Anomalies

Status
experimental
Severity
low
Group by
"host.name", "k8s.cluster.name", "k8s.node.name", "process.executable.name"
Author
Matthew Moore, Splunk
Source
github.com/splunk/security_content

The following analytic detects anomalous changes in resource utilization ratios for processes running on a Kubernetes node. It leverages process metrics collected via an OTEL collector and hostmetrics receiver, analyzed through Splunk Observability Cloud. The detection uses a lookup table containing average and standard deviation values for various resource ratios (e.g., CPU:memory, CPU:disk operations). Significant deviations from these baselines may indicate compromised processes, malicious activity, or misconfigurations. If confirmed malicious, this could signify a security breach, allowing attackers to manipulate workloads, potentially leading to data exfiltration or service disruption.

MITRE ATT&CK coverage

TacticTechniques
ExecutionT1204 User Execution

Rule body splunk

name: Kubernetes Process with Resource Ratio Anomalies
id: 0d42b295-0f1f-4183-b75e-377975f47c65
version: 9
creation_date: '2024-01-10'
modification_date: '2026-05-13'
author: Matthew Moore, Splunk
status: experimental
type: Anomaly
description: The following analytic detects anomalous changes in resource utilization ratios for processes running on a Kubernetes node. It leverages process metrics collected via an OTEL collector and hostmetrics receiver, analyzed through Splunk Observability Cloud. The detection uses a lookup table containing average and standard deviation values for various resource ratios (e.g., CPU:memory, CPU:disk operations). Significant deviations from these baselines may indicate compromised processes, malicious activity, or misconfigurations. If confirmed malicious, this could signify a security breach, allowing attackers to manipulate workloads, potentially leading to data exfiltration or service disruption.
data_source: []
search: "| mstats avg(process.*) as process.* where `kubernetes_metrics` by host.name k8s.cluster.name k8s.node.name process.executable.name span=10s | eval cpu:mem = 'process.cpu.utilization'/'process.memory.utilization' | eval cpu:disk = 'process.cpu.utilization'/'process.disk.operations' | eval mem:disk = 'process.memory.utilization'/'process.disk.operations' | eval cpu:threads = 'process.cpu.utilization'/'process.threads' | eval disk:threads = 'process.disk.operations'/'process.threads' | eval key = 'k8s.cluster.name' + \":\" + 'host.name' + \":\" + 'process.executable.name' | lookup k8s_process_resource_ratio_baseline key | fillnull | eval anomalies = \"\" | foreach stdev_* [ eval anomalies =if( '<<MATCHSTR>>' > ('avg_<<MATCHSTR>>' + 4 * 'stdev_<<MATCHSTR>>'), anomalies + \"<<MATCHSTR>> ratio higher than average by \" + tostring(round(('<<MATCHSTR>>' - 'avg_<<MATCHSTR>>')/'stdev_<<MATCHSTR>>' ,2)) + \" Standard Deviations. <<MATCHSTR>>=\" + tostring('<<MATCHSTR>>') + \" avg_<<MATCHSTR>>=\" + tostring('avg_<<MATCHSTR>>') + \" 'stdev_<<MATCHSTR>>'=\" + tostring('stdev_<<MATCHSTR>>') + \", \" , anomalies) ] | eval anomalies = replace(anomalies, \",\\s$\", \"\") | where anomalies!=\"\" | stats count values(anomalies) as anomalies by host.name k8s.cluster.name k8s.node.name process.executable.name | where count > 5 | rename host.name as host | `kubernetes_process_with_resource_ratio_anomalies_filter`"
how_to_implement: "To implement this detection, follow these steps:\n* Deploy the OpenTelemetry Collector (OTEL) to your Kubernetes cluster.\n* Enable the hostmetrics/process receiver in the OTEL configuration.\n* Ensure that the process metrics, specifically Process.cpu.utilization and process.memory.utilization, are enabled.\n* Install the Splunk Infrastructure Monitoring (SIM) add-on. (ref: https://splunkbase.splunk.com/app/5247)\n * Configure the SIM add-on with your Observability Cloud Organization ID and Access Token.\n* Set up the SIM modular input to ingest Process Metrics. Name this input \"sim_process_metrics_to_metrics_index\".\n* In the SIM configuration, set the Organization ID to your Observability Cloud Organization ID.\n* Set the Signal Flow Program to the following: data('process.threads').publish(label='A'); data('process.cpu.utilization').publish(label='B'); data('process.cpu.time').publish(label='C'); data('process.disk.io').publish(label='D'); data('process.memory.usage').publish(label='E'); data('process.memory.virtual').publish(label='F'); data('process.memory.utilization').publish(label='G'); data('process.cpu.utilization').publish(label='H'); data('process.disk.operations').publish(label='I'); data('process.handles').publish(label='J'); data('process.threads').publish(label='K')\n* Set the Metric Resolution to 10000.\n * Leave all other settings at their default values.\n* Run the Search Baseline Of Kubernetes Container Network IO Ratio"
known_false_positives: No false positives have been identified at this time.
references:
    - https://github.com/signalfx/splunk-otel-collector-chart
intermediate_findings:
    entities:
        - field: host
          type: system
          score: 20
          message: Kubernetes Process with Resource Ratio Anomalies on host $host$
analytic_story:
    - Abnormal Kubernetes Behavior using Splunk Infrastructure Monitoring
asset_type: Kubernetes
mitre_attack_id:
    - T1204
product:
    - Splunk Enterprise
    - Splunk Enterprise Security
    - Splunk Cloud
category: cloud
security_domain: network
baselines:
    - Baseline Of Kubernetes Process Resource Ratio

Stages and Predicates

Stage 1: search

| mstats avg(process.*) as process.* where `kubernetes_metrics` by host.name k8s.cluster.name k8s.node.name process.executable.name span=10s

Stage 2: eval

| eval cpu:mem = 'process.cpu.utilization'/'process.memory.utilization'

Stage 3: eval

| eval cpu:disk = 'process.cpu.utilization'/'process.disk.operations'

Stage 4: eval

| eval mem:disk = 'process.memory.utilization'/'process.disk.operations'

Stage 5: eval

| eval cpu:threads = 'process.cpu.utilization'/'process.threads'

Stage 6: eval

| eval disk:threads = 'process.disk.operations'/'process.threads'

Stage 7: eval

| eval key = 'k8s.cluster.name' + ":" + 'host.name' + ":" + 'process.executable.name'

Stage 8: lookup

| lookup k8s_process_resource_ratio_baseline key
Lookup table
k8s_process_resource_ratio_baseline
Key field
key

Stage 9: fillnull

| fillnull

Stage 10: eval

| eval anomalies = ""

Stage 11: search

| foreach stdev_* [ eval anomalies =if( '<<MATCHSTR>>' > ('avg_<<MATCHSTR>>' + 4 * 'stdev_<<MATCHSTR>>'), anomalies + "<<MATCHSTR>> ratio higher than average by " + tostring(round(('<<MATCHSTR>>' - 'avg_<<MATCHSTR>>')/'stdev_<<MATCHSTR>>' ,2)) + " Standard Deviations. <<MATCHSTR>>=" + tostring('<<MATCHSTR>>') + " avg_<<MATCHSTR>>=" + tostring('avg_<<MATCHSTR>>') + " 'stdev_<<MATCHSTR>>'=" + tostring('stdev_<<MATCHSTR>>') + ", " , anomalies) ]

Stage 12: eval

| eval anomalies = replace(anomalies, ",\s$", "")

Stage 13: where

| where anomalies!=""

Stage 14: stats

| stats count values(anomalies) as anomalies by host.name k8s.cluster.name k8s.node.name process.executable.name

Stage 15: where

| where count > 5

Stage 16: rename

| rename host.name as host

Stage 17: search

| `kubernetes_process_with_resource_ratio_anomalies_filter`

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

FieldKindValues
anomaliesne
  • ""
countgt
  • 5
spaneq
  • 10s

Search terms

Bare-string tokens in the SPL search body. Splunk matches each token against _raw (the untyped raw event text) anywhere it appears, not against a specific field. These don't surface in the Indicators table because they aren't predicates on a known field.

StageTerm
1mstats
1avg
1process.*
1as
1process.*
1where
1by
1process.executable.name
1host.name
1k8s.cluster.name
1k8s.node.name
11foreach
11stdev_*