Detection rules › Kusto

Time series anomaly for data size transferred to public internet

Severity
medium
Time window
14d
Group by
AnomalyHour, TotalBytesSentinMBperHour, anomalies, baselinebytessentperHour, score
Author
Microsoft Security Research
Source
github.com/Azure/Azure-Sentinel

'Identifies anomalous data transfer to public networks. The query leverages built-in KQL anomaly detection algorithms that detects large deviations from a baseline pattern. A sudden increase in data transferred to unknown public networks is an indication of data exfiltration attempts and should be investigated. The higher the score, the further it is from the baseline value. The output is aggregated to provide summary view of unique source IP to destination IP address and port bytes sent traffic observed in the flagged anomaly hour. The source IP addresses which were sending less than bytessentperhourthreshold have been exluded whose value can be adjusted as needed . You may have to run queries for individual source IP addresses from SourceIPlist to determine if anything looks suspicious'

MITRE ATT&CK coverage

TacticTechniques
ExfiltrationT1030 Data Transfer Size Limits

Rule body kusto

id: f2dd4a3a-ebac-4994-9499-1a859938c947
name: Time series anomaly for data size transferred to public internet
description: |
  'Identifies anomalous data transfer to public networks. The query leverages built-in KQL anomaly detection algorithms that detects large deviations from a baseline pattern.
  A sudden increase in data transferred to unknown public networks is an indication of data exfiltration attempts and should be investigated.
  The higher the score, the further it is from the baseline value.
  The output is aggregated to provide summary view of unique source IP to destination IP address and port bytes sent traffic observed in the flagged anomaly hour.
  The source IP addresses which were sending less than bytessentperhourthreshold have been exluded whose value can be adjusted as needed .
  You may have to run queries for individual source IP addresses from SourceIPlist to determine if anything looks suspicious'
severity: Medium
requiredDataConnectors:
  - connectorId: CiscoASA
    dataTypes:
      - CommonSecurityLog
  - connectorId: PaloAltoNetworks
    dataTypes:
      - CommonSecurityLog
  - connectorId: AzureMonitor(VMInsights)
    dataTypes:
      - VMConnection
queryFrequency: 1d
queryPeriod: 14d
triggerOperator: gt
triggerThreshold: 1
tactics:
  - Exfiltration
relevantTechniques:
  - T1030
tags:
  - DEV-0537
query: |
  let starttime = 14d;
  let endtime = 1d;
  let timeframe = 1h;
  let scorethreshold = 5;
  let bytessentperhourthreshold = 10;
  let TimeSeriesData = (union isfuzzy=true
  (
  VMConnection
  | where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))
  | where isnotempty(DestinationIp) and isnotempty(SourceIp)
  | extend SourceIP = SourceIp, DestinationIP = DestinationIp
  | where ipv4_is_private(DestinationIP) == false
  | extend DeviceVendor = "VMConnection"
  | project TimeGenerated, BytesSent, DeviceVendor
  | make-series TotalBytesSent=sum(BytesSent) on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by DeviceVendor
  ),
  (
  CommonSecurityLog
  | where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))
  | where isnotempty(DestinationIP) and isnotempty(SourceIP)
  | where ipv4_is_private(DestinationIP) == false
  | project TimeGenerated, SentBytes, DeviceVendor
  | make-series TotalBytesSent=sum(SentBytes) on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by DeviceVendor
  )
  );
  //Filter anomolies against TimeSeriesData
  let TimeSeriesAlerts = materialize(TimeSeriesData
  | extend (anomalies, score, baseline) = series_decompose_anomalies(TotalBytesSent, scorethreshold, -1, 'linefit')
  | mv-expand TotalBytesSent to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long)
  | where anomalies > 0 | extend AnomalyHour = TimeGenerated
  | extend TotalBytesSentinMBperHour = round(((TotalBytesSent / 1024)/1024),2), baselinebytessentperHour = round(((baseline / 1024)/1024),2), score = round(score,2)
  | project DeviceVendor, AnomalyHour, TimeGenerated, TotalBytesSentinMBperHour, baselinebytessentperHour, anomalies, score);
  let AnomalyHours = materialize(TimeSeriesAlerts  | where TimeGenerated > ago(2d) | project TimeGenerated);
  //Union of all BaseLogs aggregated per hour
  let BaseLogs = (union isfuzzy=true
  (
  CommonSecurityLog
  | where isnotempty(DestinationIP) and isnotempty(SourceIP)
  | where TimeGenerated > ago(2d)
  | extend DateHour = bin(TimeGenerated, 1h) // create a new column and round to hour
  | where DateHour in ((AnomalyHours)) //filter the dataset to only selected anomaly hours
  | where ipv4_is_private(DestinationIP) == false
  | extend SentBytesinMB = ((SentBytes / 1024)/1024), ReceivedBytesinMB = ((ReceivedBytes / 1024)/1024)
  | summarize HourlyCount = count(), TimeGeneratedMax=arg_max(TimeGenerated, *), DestinationIPList=make_set(DestinationIP, 100), DestinationPortList = make_set(DestinationPort,100), TotalSentBytesinMB = sum(SentBytesinMB), TotalReceivedBytesinMB = sum(ReceivedBytesinMB) by SourceIP, DeviceVendor, TimeGeneratedHour=bin(TimeGenerated,1h)
  | where TotalSentBytesinMB > bytessentperhourthreshold
  | sort by TimeGeneratedHour asc, TotalSentBytesinMB desc
  | extend Rank=row_number(1, prev(TimeGeneratedHour) != TimeGeneratedHour) // Ranking the dataset per Hourly Partition
  | where Rank < 10  // Selecting Top 10 records with Highest BytesSent in each Hour
  | project DeviceVendor, TimeGeneratedHour, TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, Rank
  ),
  (
  VMConnection
  | where isnotempty(DestinationIp) and isnotempty(SourceIp)
  | where TimeGenerated > ago(2d)
  | extend DateHour = bin(TimeGenerated, 1h) // create a new column and round to hour
  | where DateHour in ((AnomalyHours)) //filter the dataset to only selected anomaly hours
  | extend SourceIP = SourceIp, DestinationIP = DestinationIp
  | where ipv4_is_private(DestinationIP) == false | extend DeviceVendor = "VMConnection"
  | extend SentBytesinMB = ((BytesSent / 1024)/1024), ReceivedBytesinMB = ((BytesReceived / 1024)/1024)
  | summarize HourlyCount = count(),TimeGeneratedMax=arg_max(TimeGenerated, *), DestinationIPList=make_set(DestinationIP, 100), DestinationPortList = make_set(DestinationPort, 100), TotalSentBytesinMB = sum(SentBytesinMB),TotalReceivedBytesinMB = sum(ReceivedBytesinMB) by SourceIP, DeviceVendor, TimeGeneratedHour=bin(TimeGenerated,1h)
  | where TotalSentBytesinMB > bytessentperhourthreshold
  | sort by TimeGeneratedHour asc, TotalSentBytesinMB desc
  | extend Rank=row_number(1, prev(TimeGeneratedHour) != TimeGeneratedHour) // Ranking the dataset per Hourly Partition
  | where Rank < 10  // Selecting Top 10 records with Highest BytesSent in each Hour
  | project DeviceVendor, TimeGeneratedHour, TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, Rank
  )
  );
  // Join against base logs to retrive records associated with the hour of anomoly
  TimeSeriesAlerts
  | where TimeGenerated > ago(2d)
  | join (
      BaseLogs | extend AnomalyHour = TimeGeneratedHour
  ) on DeviceVendor, AnomalyHour | sort by score desc
  | project DeviceVendor, AnomalyHour,TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, TotalBytesSentinMBperHour, baselinebytessentperHour, score, anomalies
  | summarize EventCount = count(), StartTimeUtc= min(TimeGeneratedMax), EndTimeUtc= max(TimeGeneratedMax), SourceIPMax= arg_max(SourceIP,*), TotalBytesSentinMB = sum(TotalSentBytesinMB), TotalBytesReceivedinMB = sum(TotalReceivedBytesinMB), SourceIPList = make_set(SourceIP, 100), DestinationIPList = make_set(DestinationIPList, 100) by AnomalyHour,TotalBytesSentinMBperHour, baselinebytessentperHour, score, anomalies
  | project DeviceVendor, AnomalyHour, StartTimeUtc, EndTimeUtc, SourceIPMax, SourceIPList, DestinationIPList, DestinationPortList, TotalBytesSentinMB, TotalBytesReceivedinMB, TotalBytesSentinMBperHour, baselinebytessentperHour, score, anomalies, EventCount
entityMappings:
  - entityType: IP
    fieldMappings:
      - identifier: Address
        columnName: SourceIPMax
version: 1.0.6
kind: Scheduled
metadata:
    source:
        kind: Community
    author:
        name: Microsoft Security Research
    support:
        tier: Community
    categories:
        domains: [ "Security - Threat Protection" ]

Stages and Predicates

Parameters

let starttime = 14d;
let endtime = 1d;
let timeframe = 1h;
let scorethreshold = 5;
let bytessentperhourthreshold = 10;

Let binding: AnomalyHours

let AnomalyHours = materialize(TimeSeriesAlerts  | where TimeGenerated > ago(2d) | project TimeGenerated);

Derived from TimeSeriesAlerts.

Let binding: BaseLogs

let BaseLogs = (union isfuzzy=true
(
CommonSecurityLog
| where isnotempty(DestinationIP) and isnotempty(SourceIP)
| where TimeGenerated > ago(2d)
| extend DateHour = bin(TimeGenerated, 1h)
| where DateHour in ((AnomalyHours))
| where ipv4_is_private(DestinationIP) == false
| extend SentBytesinMB = ((SentBytes / 1024)/1024), ReceivedBytesinMB = ((ReceivedBytes / 1024)/1024)
| summarize HourlyCount = count(), TimeGeneratedMax=arg_max(TimeGenerated, *), DestinationIPList=make_set(DestinationIP, 100), DestinationPortList = make_set(DestinationPort,100), TotalSentBytesinMB = sum(SentBytesinMB), TotalReceivedBytesinMB = sum(ReceivedBytesinMB) by SourceIP, DeviceVendor, TimeGeneratedHour=bin(TimeGenerated,1h)
| where TotalSentBytesinMB > bytessentperhourthreshold
| sort by TimeGeneratedHour asc, TotalSentBytesinMB desc
| extend Rank=row_number(1, prev(TimeGeneratedHour) != TimeGeneratedHour)
| where Rank < 10
| project DeviceVendor, TimeGeneratedHour, TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, Rank
),
(
VMConnection
| where isnotempty(DestinationIp) and isnotempty(SourceIp)
| where TimeGenerated > ago(2d)
| extend DateHour = bin(TimeGenerated, 1h)
| where DateHour in ((AnomalyHours))
| extend SourceIP = SourceIp, DestinationIP = DestinationIp
| where ipv4_is_private(DestinationIP) == false | extend DeviceVendor = "VMConnection"
| extend SentBytesinMB = ((BytesSent / 1024)/1024), ReceivedBytesinMB = ((BytesReceived / 1024)/1024)
| summarize HourlyCount = count(),TimeGeneratedMax=arg_max(TimeGenerated, *), DestinationIPList=make_set(DestinationIP, 100), DestinationPortList = make_set(DestinationPort, 100), TotalSentBytesinMB = sum(SentBytesinMB),TotalReceivedBytesinMB = sum(ReceivedBytesinMB) by SourceIP, DeviceVendor, TimeGeneratedHour=bin(TimeGenerated,1h)
| where TotalSentBytesinMB > bytessentperhourthreshold
| sort by TimeGeneratedHour asc, TotalSentBytesinMB desc
| extend Rank=row_number(1, prev(TimeGeneratedHour) != TimeGeneratedHour)
| where Rank < 10
| project DeviceVendor, TimeGeneratedHour, TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, Rank
)
);

Derived from bytessentperhourthreshold, AnomalyHours.

The stages below define let TimeSeriesAlerts (the rule's main pipeline source).

Stage 1: source

TimeSeriesData

Stage 2: extend

extend anomalies, baseline, score

Stage 3: mv-expand

mv-expand TotalBytesSent

Stage 4: where

where anomalies > 0

Stage 5: extend

extend AnomalyHour

Stage 6: extend

extend TotalBytesSentinMBperHour, baselinebytessentperHour, score

Stage 7: project

project AnomalyHour, DeviceVendor, TimeGenerated, TotalBytesSentinMBperHour, anomalies, baselinebytessentperHour, score

The stages below run on TimeSeriesAlerts (the outer pipeline).

Stage 8: where

where ...

Stage 9: join

join (...)

Stage 10: sort

sort by score

Stage 11: project

project AnomalyHour, DestinationIPList, DestinationPortList, DeviceVendor, SourceIP, TimeGeneratedMax, TotalBytesSentinMBperHour, TotalReceivedBytesinMB, TotalSentBytesinMB, anomalies, baselinebytessentperHour, score

Stage 12: summarize

summarize DestinationIPList, EndTimeUtc, EventCount, SourceIPList, SourceIPMax, StartTimeUtc, TotalBytesReceivedinMB, TotalBytesSentinMB by AnomalyHour, TotalBytesSentinMBperHour, baselinebytessentperHour, score, anomalies

Stage 13: project

project AnomalyHour, DestinationIPList, DestinationPortList, DeviceVendor, EndTimeUtc, EventCount, SourceIPList, SourceIPMax, StartTimeUtc, TotalBytesReceivedinMB, TotalBytesSentinMB, TotalBytesSentinMBperHour, anomalies, baselinebytessentperHour, score

Exclusions

Top-level NOT(...) conjuncts: predicates this rule actively suppresses.

FieldKindExcluded values
DestinationIPcidr_match10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, 127.0.0.0/8

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

FieldKindValues
DestinationIPis_not_null
  • (no value, null check)
DestinationIpis_not_null
  • (no value, null check)
Ranklt
  • 10 transforms: cased
SourceIPis_not_null
  • (no value, null check)
SourceIpis_not_null
  • (no value, null check)
TotalSentBytesinMBgt
  • 10 transforms: cased
anomaliesgt
  • 0 transforms: cased

Output fields

Fields the rule emits when it matches. Chronicle authors list these in the outcome block; they appear on the detection and $risk_score drives alerting. Sentinel / Defender XDR rules build them up through project / summarize / extend stages. Sentinel maps these into alert fields via entityMappings and customDetails; Defender XDR custom detections surface them as alert fields directly.

FieldSource
AnomalyHourproject
DestinationIPListproject
DestinationPortListproject
DeviceVendorproject
EndTimeUtcproject
EventCountproject
SourceIPListproject
SourceIPMaxproject
StartTimeUtcproject
TotalBytesReceivedinMBproject
TotalBytesSentinMBproject
TotalBytesSentinMBperHourproject
anomaliesproject
baselinebytessentperHourproject
scoreproject