AWS S3 Large Download

Severity: informational
Tags: Beta, Data Exfiltration
Source: github.com/panther-labs/panther-analysis

Detects when a user (IAM User, AssumedRole, or FederatedUser) downloads more than the configured threshold of data from S3 buckets within a time window. Configurable thresholds and bucket filtering allow customization for different organizational needs. This may indicate unauthorized data exfiltration or bulk data downloads for analysis.

MITRE ATT&CK coverage

Tactic	Techniques
Exfiltration	`T1537` Transfer Data to Cloud Account

Rule body yaml

AnalysisType: scheduled_rule
Filename: aws_s3_large_download_specific_bucket.py
RuleID: "AWS.S3.LargeDownload"
DisplayName: "AWS S3 Large Download"
Enabled: false
CreateAlert: false
ScheduledQueries:
  - AWS S3 Large Download
Severity: Info
Tags:
  - Beta
  - Data Exfiltration
Reports:
  MITRE ATT&CK:
    - "TA0010:T1537"  # Exfiltration: Transfer Data to Cloud Account
Description: >
  Detects when a user (IAM User, AssumedRole, or FederatedUser) downloads more than the 
  configured threshold of data from S3 buckets within a time window. Configurable thresholds 
  and bucket filtering allow customization for different organizational needs. This may 
  indicate unauthorized data exfiltration or bulk data downloads for analysis.
DedupPeriodMinutes: 60
Runbook: |
  1. **Immediate Actions:**
     - Verify if the user activity is authorized
     - Check if the downloads were for legitimate business purposes
     - Consider temporarily restricting bucket access if suspicious
  
  2. **Investigation Steps:**
     - Review the user's recent authentication events
     - Check for any privilege escalation or credential compromise
     - Examine the downloaded objects to determine sensitivity
     - Review source IP and user agent for signs of automation
     - Check for other suspicious activities by the same user
  
  3. **Containment:**
     - If unauthorized: revoke user credentials immediately
     - Apply temporary S3 bucket policies to restrict access
     - Enable S3 MFA delete and additional monitoring
  
  4. **Recovery:**
     - Document the scope of data accessed
     - Review S3 access logs for complete activity timeline
     - Consider rotating any sensitive data that may have been accessed
  
  5. **Prevention:**
     - Implement S3 access monitoring and alerting
     - Review IAM policies for least privilege
     - Consider S3 Access Points for controlled access
     - Enable GuardDuty for enhanced threat detection

Tests:
  - Name: Large download triggers alert
    ExpectedResult: true
    Log:
      user_arn: "arn:aws:iam::111111111111:user/data-engineer"
      user_name: "data-engineer"
      user_type: "IAMUser"
      bucket_name: "sensitive-data-bucket"
      source_ip: "3.3.3.3"
      user_agent: "aws-cli/2.0.0"
      total_bytes_downloaded: 78643200  # 75MB
      object_count: 75
      first_download_time: "2025-01-15 14:30:00"
      last_download_time: "2025-01-15 14:34:30"
      sample_objects: ["logs/2025/01/15/file1.log", "logs/2025/01/15/file2.log"]
      account_id: "111111111111"

  - Name: Critical threshold triggers high severity
    ExpectedResult: true
    Log:
      user_arn: "arn:aws:iam::111111111111:user/suspicious-user" 
      user_name: "suspicious-user"
      user_type: "IAMUser"
      bucket_name: "sensitive-data-bucket"
      source_ip: "2.2.2.2"
      user_agent: "python-requests/2.28.0"
      total_bytes_downloaded: 1073741824  # 1GB
      object_count: 200
      first_download_time: "2025-01-15 15:00:00"
      last_download_time: "2025-01-15 15:04:59"
      sample_objects: ["data/model1.bin", "data/model2.bin", "logs/training.log"]
      account_id: "111111111111"

  - Name: Medium download triggers medium severity
    ExpectedResult: true
    Log:
      user_arn: "arn:aws:iam::111111111111:user/analyst"
      user_name: "analyst"
      user_type: "IAMUser"
      bucket_name: "sensitive-data-bucket"
      source_ip: "1.1.1.1"
      user_agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
      total_bytes_downloaded: 104857600  # 100MB
      object_count: 40
      first_download_time: "2025-01-15 16:01:00"
      last_download_time: "2025-01-15 16:03:15"
      sample_objects: ["analytics/report1.json", "analytics/report2.json"]
      account_id: "111111111111"

  - Name: AssumedRole large download triggers alert
    ExpectedResult: true
    Log:
      user_arn: "arn:aws:sts::111111111111:assumed-role/sample-role-nervous-banzai/sample-role-frosty-mendeleev"
      user_name: "DataAnalyst"
      user_type: "AssumedRole"
      bucket_name: "analytics-bucket"
      source_ip: "1.2.3.4"
      user_agent: "boto3/1.26.0"
      total_bytes_downloaded: 62914560  # 60MB
      object_count: 30
      first_download_time: "2025-01-15 17:00:00"
      last_download_time: "2025-01-15 17:05:00"
      sample_objects: ["reports/quarterly.csv", "reports/monthly.json"]
      account_id: "111111111111"

  - Name: FederatedUser large download triggers alert
    ExpectedResult: true
    Log:
      user_arn: "arn:aws:sts::123456789012:federated-user/contractor@company.com"
      user_name: "contractor@company.com"
      user_type: "FederatedUser"
      bucket_name: "project-data"
      source_ip: "203.0.113.100"
      user_agent: "aws-sdk-python/1.28.0"
      total_bytes_downloaded: 83886080  # 80MB
      object_count: 150
      first_download_time: "2025-01-15 18:00:00"
      last_download_time: "2025-01-15 18:08:00"
      sample_objects: ["datasets/training.parquet", "datasets/validation.csv"]
      account_id: "123456789012"

Detection logic

Filter

def rule(_) -> bool:
    """Always return True since the query already filtered for violations"""
    return True
def title(event) -> str:
    user_arn = event.get("user_arn", "unknown")
    bucket_name = event.get("bucket_name", "unknown")
    total_mb = round(event.get("total_bytes_downloaded", 0) / (1024 * 1024), 2)
    return f"Large S3 download detected: {user_arn} downloaded {total_mb}MB from {bucket_name}"
def alert_context(event) -> dict:
    total_bytes = event.get("total_bytes_downloaded", 0)
    total_mb = round(total_bytes / (1024 * 1024), 2)
    return {
        "user_arn": event.get("user_arn"),
        "user_name": event.get("user_name"),
        "bucket_name": event.get("bucket_name"),
        "source_ip": event.get("source_ip"),
        "user_agent": event.get("user_agent"),
        "total_bytes_downloaded": total_bytes,
        "total_mb_downloaded": total_mb,
        "object_count": event.get("object_count"),
        "first_download_time": event.get("first_download_time"),
        "last_download_time": event.get("last_download_time"),
        "sample_objects": event.get("sample_objects", [])[:10],
    }

Output fields

Fields the rule emits when it matches. Chronicle authors list these in the outcome block; they appear on the detection and $risk_score drives alerting. Sentinel / Defender XDR rules build them up through project / summarize / extend stages. Sentinel maps these into alert fields via entityMappings and customDetails; Defender XDR custom detections surface them as alert fields directly.

Field
`user_arn`
`user_name`
`bucket_name`
`source_ip`
`user_agent`
`total_bytes_downloaded`
`object_count`
`first_download_time`
`last_download_time`

`j` / `k`	Scroll down / up
`d` / `u`	Half-page down / up
`gg` / `G`	Top / bottom
`h` / `l`	History back / forward
`f`	Follow link (`Shift` = new tab)
`/`	Focus search
`?`	Toggle this help
`↑` / `↓`	Navigate search results
`Enter`	Open highlighted result
`Esc`	Close results / dialog

`type:`	`events` / `rules` / `providers`
`vendor:`	`sigma` / `elastic` / `splunk` / `kusto` / `chronicle` (vendor name alone also works: `sigma:`, `kql:`, `secops:`…)
`tactic:`	TA-id, slug, or name: `credential_access`, `TA0006`
`technique:`	technique or sub-technique ID: `T1003`, `T1003.001` (alias `tech:`)
`severity:`	`critical` / `high` / `medium` / `low` / `informational` (alias `sev:`)
`risk_score`	Numeric comparison on the Elastic risk score (0 to 100): `risk_score>50`, `risk_score<=20`, `risk_score=99` (alias `risk`; Elastic rules only)
`stages:`	Rules with exactly N pipeline stages
`correlation:`	`single_event` / `sequence` / `alternatives` / `alternatives_cross_log` / `all_required` / `correlated`
`with:`	Co-occurrence event-id; stacks (`with:4624 with:4769`) to require all, while a comma list in one occurrence (`with:4624,4769`) is an either-or group. Implies multi-event
`like:`	Structural neighbors of a rule slug (equivalents + subsumption stricter / broader): `like:comsvcs_lsass_memory_dump-splunk-sysmon`
`groupby:`	Entity-grouping substring match against `group_by_keys`: `groupby:user`, `groupby:host`
`uses:`	Rules whose predicate tree touches the field (any kind, any value): `uses:CommandLine`
`excludes:`	Rules with top-level `not()` clauses on the field (FP whitelists): `excludes:ParentImage`
`field:` / `value:`	Predicate search; narrows rule cards to those with a matching leaf and drives the indicator tier. Unquoted = substring, wildcards allowed (`value:mimikatz`)
`indicator:`	Shorthand for `field:F value:V`: `indicator:Image=*\powershell.exe`
`kind:`	Filter by predicate kind. Narrows rule cards to those carrying a matching predicate leaf (`vendor:elastic kind:cidr_match`) and drives the indicator tier: `contains` / `starts_with` / `ends_with` / `regex` / `cidr` / `eq` / `in` … (operator aliases `op:`/`match:`)
`has:` / `no:`	`sample`, `field`, `notes`, `refs`, `trace`, `thirdparty`, `rule`, `pattern`, `timewindow`, `threshold`, `newterms`, `sigma`/`elastic`/`splunk`/`kusto`/`chronicle`
`-op:val`	Exclude matches; works on most operators but not `type:`/`like:`/`has:`/`no:` (use `no:<flag>` to exclude a rule flag): `tactic:execution -vendor:splunk`. Standalone `-kind:`/`-field:`/`-value:` drop every rule carrying a matching predicate leaf (`type:rules -kind:is_null`)
`field:"…"` / `value:"…"`	Quoted value = anchored exact match (also allows spaces): `value:"net user"`
`a,b`	Comma = OR inside one operator (`vendor:sigma,elastic`, `severity:high,critical`); repeating a facet merges the same way. `field:`/`value:` never split (literal commas)
`vendors:` / `stage:`	Singular and plural spellings fold to the canonical operator and value: `tactics:` = `tactic:`, `type:event` = `type:events`, `correlation:sequences` = `correlation:sequence`, `has:thresholds` = `has:threshold`
`"quoted phrase"`	Exact-match a multi-word phrase (free text)