GitHub Exfiltration via High Number of Repository Clones by User

Status: production
Severity: medium
Time window: 9m
Group by: user.name
Author: Elastic
Source: github.com/elastic/detection-rules

Detects a high number of repository cloning actions by a single user within a short time frame. Adversaries may clone multiple repositories to exfiltrate sensitive data.

MITRE ATT&CK coverage

Tactic	Techniques
Collection	`T1213.003` Data from Information Repositories: Code Repositories
Exfiltration	`T1020` Automated Exfiltration, `T1567.001` Exfiltration Over Web Service: Exfiltration to Code Repository

Event coverage

Provider	Event
GitHub-git	git.clone

Rules detecting the same action

Other rules on this platform that filter on the same API call or operation.

Rule body elastic

[metadata]
creation_date = "2025/12/16"
integration = ["github"]
maturity = "production"
updated_date = "2026/04/10"

[rule]
author = ["Elastic"]
description = """
Detects a high number of repository cloning actions by a single user within a short time frame. Adversaries may
clone multiple repositories to exfiltrate sensitive data.
"""
from = "now-9m"
interval = "8m"
language = "esql"
license = "Elastic License v2"
name = "GitHub Exfiltration via High Number of Repository Clones by User"
note = """ ## Triage and analysis

> **Disclaimer**:
> This investigation guide was created using generative AI technology and has been reviewed to improve its accuracy and relevance. While every effort has been made to ensure its quality, we recommend validating the content and adapting it to suit your specific environment and operational needs.

### Investigating GitHub Exfiltration via High Number of Repository Clones by User

This rule flags a single user rapidly cloning dozens of repositories, a strong indicator of bulk source code exfiltration. Mass cloning enables quick siphoning of proprietary code, embedded secrets, and build artifacts across teams before defenses can respond. A typical pattern is a stolen personal access token used in a script to enumerate org repositories and clone them in rapid succession from a CI runner or cloud VM, including private and internal repos, to stage data for off-platform transfer.

### Possible investigation steps

- Validate whether the actor is a known automation or service account with a documented need to mass-clone, and quickly confirm intent with the account owner and affected repo admins.
- Enumerate the cloned repositories and their visibility, deprioritizing activity dominated by public repos while fast-tracking private/internal codebases with sensitive content across orgs.
- Pivot on the token identifier to determine the token owner, scopes, and creation/last-use details, compare to normal usage patterns, and revoke/reset credentials if anomalous.
- Analyze the user agent and agent identifier to attribute the activity to a specific host or CI runner, correlating with pipeline logs and login locations/times for anomalies.
- Correlate with endpoint/network telemetry from the originating host for large outbound transfers, external Git remotes, or bulk archiving indicating off-platform exfiltration following the clones.

### False positive analysis

- A developer rebuilding a workstation or creating an approved local mirror may legitimately clone dozens of repositories in a short window, especially when activity is dominated by public or low-sensitivity repos.
- A shared automation/service account running scheduled builds or org-wide maintenance tasks can trigger fresh clones across many repositories due to pipeline configuration or cache resets, inflating counts without exfiltration intent.

### Response and remediation

- Immediately revoke the GitHub token used for the clones, force sign out, require password reset and 2FA re-verification for the user, and suspend the account if unauthorized.
- Block and quarantine the originating host or CI runner by revoking its runner registration, removing its SSH keys/credentials, and firewalling its IP until imaged.
- On the cloned private/internal repositories, remove the user from teams, rotate or disable deploy keys and GitHub App installations, and enforce SAML SSO.
- Rotate repository and organization secrets present in those repos (Actions secrets, PATs, SSH keys, cloud access keys) and invalidate any secrets found in commit history.
- Recover by restoring only minimal access after owner approval, issuing a new fine-grained PAT with least privilege and expiry, and re-enabling builds while monitoring for further clone bursts.
- Escalate to incident response leadership and Legal if any private or export-controlled repos were cloned or cloning continues post-revocation, and harden by enforcing org-wide SSO, disallowing classic PATs, IP allowlisting for PAT use, enabling secret scanning with push protection, and alerting on burst git clone patterns from runners and unusual user agents.
"""
references = [
    "https://www.wiz.io/blog/shai-hulud-2-0-ongoing-supply-chain-attack",
    "https://trigger.dev/blog/shai-hulud-postmortem",
    "https://posthog.com/blog/nov-24-shai-hulud-attack-post-mortem",
]
risk_score = 47
rule_id = "19f3674c-f4a1-43bb-a89c-e4c6212275e0"
severity = "medium"
tags = [
    "Domain: Cloud",
    "Use Case: Threat Detection",
    "Tactic: Exfiltration",
    "Data Source: Github",
    "Resources: Investigation Guide",
]
timestamp_override = "event.ingested"
type = "esql"
query = '''
from logs-github.audit-* metadata _id, _index, _version
| where
  data_stream.dataset == "github.audit" and event.type == "change" and event.action == "git.clone"
| stats
  Esql.event_count = COUNT(*),
  Esql.github_org_values = values(github.org),
  Esql.github_repo_values = values(github.repo),
  Esql.github_repository_public_values = values(github.repository_public),
  Esql.github_token_id_values = values(github.token_id),
  Esql.github_user_agent_values = values(github.user_agent),
  Esql.user_name_values = values(user.name),
  Esql.agent_id_values = values(agent.id),
  Esql.data_stream_dataset_values = values(data_stream.dataset),
  Esql.data_stream_namespace_values = values(data_stream.namespace)

  by user.name

| keep Esql.*

| where
  Esql.event_count >= 25
'''

[[rule.threat]]
framework = "MITRE ATT&CK"

[[rule.threat.technique]]
id = "T1020"
name = "Automated Exfiltration"
reference = "https://attack.mitre.org/techniques/T1020/"

[[rule.threat.technique]]
id = "T1567"
name = "Exfiltration Over Web Service"
reference = "https://attack.mitre.org/techniques/T1567/"

[[rule.threat.technique.subtechnique]]
id = "T1567.001"
name = "Exfiltration to Code Repository"
reference = "https://attack.mitre.org/techniques/T1567/001/"

[rule.threat.tactic]
id = "TA0010"
name = "Exfiltration"
reference = "https://attack.mitre.org/tactics/TA0010/"

[[rule.threat]]
framework = "MITRE ATT&CK"

[[rule.threat.technique]]
id = "T1213"
name = "Data from Information Repositories"
reference = "https://attack.mitre.org/techniques/T1213/"

[[rule.threat.technique.subtechnique]]
id = "T1213.003"
name = "Code Repositories"
reference = "https://attack.mitre.org/techniques/T1213/003/"

[rule.threat.tactic]
id = "TA0009"
name = "Collection"
reference = "https://attack.mitre.org/tactics/TA0009/"

Stages and Predicates

Stage 1: `from`

from logs-github.audit-* metadata _id, _index, _version

Stage 2: `where`

| where
  data_stream.dataset == "github.audit" and event.type == "change" and event.action == "git.clone"

Stage 3: `stats`

| stats
  Esql.event_count = COUNT(*),
  Esql.github_org_values = values(github.org),
  Esql.github_repo_values = values(github.repo),
  Esql.github_repository_public_values = values(github.repository_public),
  Esql.github_token_id_values = values(github.token_id),
  Esql.github_user_agent_values = values(github.user_agent),
  Esql.user_name_values = values(user.name),
  Esql.agent_id_values = values(agent.id),
  Esql.data_stream_dataset_values = values(data_stream.dataset),
  Esql.data_stream_namespace_values = values(data_stream.namespace)

  by user.name

Stage 4: `keep`

| keep Esql.*

Stage 5: `where`

| where
  Esql.event_count >= 25

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

Field	Kind	Values
`Esql.event_count`	ge	`25`
`data_stream.dataset`	eq	`github.audit`
`event.action`	eq	`git.clone`
`event.type`	eq	`change`

Output fields

Fields the rule emits when it matches. Chronicle authors list these in the outcome block; they appear on the detection and $risk_score drives alerting. Sentinel / Defender XDR rules build them up through project / summarize / extend stages. Sentinel maps these into alert fields via entityMappings and customDetails; Defender XDR custom detections surface them as alert fields directly.

Field	Source
`Esql.*`	`KEEP Esql.*`

`j` / `k`	Scroll down / up
`d` / `u`	Half-page down / up
`gg` / `G`	Top / bottom
`h` / `l`	History back / forward
`f`	Follow link (`Shift` = new tab)
`/`	Focus search
`?`	Toggle this help
`↑` / `↓`	Navigate search results
`Enter`	Open highlighted result
`Esc`	Close results / dialog

`type:`	`events` / `rules` / `providers`
`vendor:`	`sigma` / `elastic` / `splunk` / `kusto` / `chronicle` (vendor name alone also works: `sigma:`, `kql:`, `secops:`…)
`tactic:`	TA-id, slug, or name: `credential_access`, `TA0006`
`technique:`	technique or sub-technique ID: `T1003`, `T1003.001` (alias `tech:`)
`severity:`	`critical` / `high` / `medium` / `low` / `informational` (alias `sev:`)
`risk_score`	Numeric comparison on the Elastic risk score (0 to 100): `risk_score>50`, `risk_score<=20`, `risk_score=99` (alias `risk`; Elastic rules only)
`stages:`	Rules with exactly N pipeline stages
`correlation:`	`single_event` / `sequence` / `alternatives` / `alternatives_cross_log` / `all_required` / `correlated`
`with:`	Co-occurrence event-id; stacks (`with:4624 with:4769`) to require all, while a comma list in one occurrence (`with:4624,4769`) is an either-or group. Implies multi-event
`like:`	Structural neighbors of a rule slug (equivalents + subsumption stricter / broader): `like:comsvcs_lsass_memory_dump-splunk-sysmon`
`groupby:`	Entity-grouping substring match against `group_by_keys`: `groupby:user`, `groupby:host`
`uses:`	Rules whose predicate tree touches the field (any kind, any value): `uses:CommandLine`
`excludes:`	Rules with top-level `not()` clauses on the field (FP whitelists): `excludes:ParentImage`
`field:` / `value:`	Predicate search; narrows rule cards to those with a matching leaf and drives the indicator tier. Unquoted = substring, wildcards allowed (`value:mimikatz`)
`indicator:`	Shorthand for `field:F value:V`: `indicator:Image=*\powershell.exe`
`kind:`	Filter by predicate kind. Narrows rule cards to those carrying a matching predicate leaf (`vendor:elastic kind:cidr_match`) and drives the indicator tier: `contains` / `starts_with` / `ends_with` / `regex` / `cidr` / `eq` / `in` … (operator aliases `op:`/`match:`)
`has:` / `no:`	`sample`, `field`, `notes`, `refs`, `trace`, `thirdparty`, `rule`, `pattern`, `timewindow`, `threshold`, `newterms`, `sigma`/`elastic`/`splunk`/`kusto`/`chronicle`
`-op:val`	Exclude matches; works on most operators but not `type:`/`like:`/`has:`/`no:` (use `no:<flag>` to exclude a rule flag): `tactic:execution -vendor:splunk`. Standalone `-kind:`/`-field:`/`-value:` drop every rule carrying a matching predicate leaf (`type:rules -kind:is_null`)
`field:"…"` / `value:"…"`	Quoted value = anchored exact match (also allows spaces): `value:"net user"`
`a,b`	Comma = OR inside one operator (`vendor:sigma,elastic`, `severity:high,critical`); repeating a facet merges the same way. `field:`/`value:` never split (literal commas)
`vendors:` / `stage:`	Singular and plural spellings fold to the canonical operator and value: `tactics:` = `tactic:`, `type:event` = `type:events`, `correlation:sequences` = `correlation:sequence`, `has:thresholds` = `has:threshold`
`"quoted phrase"`	Exact-match a multi-word phrase (free text)

GitHub Exfiltration via High Number of Repository Clones by User

MITRE ATT&CK coverage

Event coverage

Rules detecting the same action

Rule body elastic

Stages and Predicates

Stage 1: `from`

Stage 2: `where`

Stage 3: `stats`

Stage 4: `keep`

Stage 5: `where`

Indicators

Output fields

Keyboard shortcuts

Search operators

GitHub Exfiltration via High Number of Repository Clones by User

MITRE ATT&CK coverage

Event coverage

Rules detecting the same action

Rule body elastic

Stages and Predicates

Stage 1: from

Stage 2: where

Stage 3: stats

Stage 4: keep

Stage 5: where

Indicators

Output fields

Stage 1: `from`

Stage 2: `where`

Stage 3: `stats`

Stage 4: `keep`

Stage 5: `where`