Detection rules › Elastic

GitHub Exfiltration via High Number of Repository Clones by User

Status
production
Severity
medium
Time window
9m
Group by
user.name
Author
Elastic
Source
github.com/elastic/detection-rules

Detects a high number of repository cloning actions by a single user within a short time frame. Adversaries may clone multiple repositories to exfiltrate sensitive data.

MITRE ATT&CK coverage

Event coverage

ProviderEvent
GitHub-gitgit.clone

Rules detecting the same action

Other rules on this platform that filter on the same API call or operation.

Rule body elastic

[metadata]
creation_date = "2025/12/16"
integration = ["github"]
maturity = "production"
updated_date = "2026/04/10"

[rule]
author = ["Elastic"]
description = """
Detects a high number of repository cloning actions by a single user within a short time frame. Adversaries may
clone multiple repositories to exfiltrate sensitive data.
"""
from = "now-9m"
interval = "8m"
language = "esql"
license = "Elastic License v2"
name = "GitHub Exfiltration via High Number of Repository Clones by User"
note = """ ## Triage and analysis

> **Disclaimer**:
> This investigation guide was created using generative AI technology and has been reviewed to improve its accuracy and relevance. While every effort has been made to ensure its quality, we recommend validating the content and adapting it to suit your specific environment and operational needs.

### Investigating GitHub Exfiltration via High Number of Repository Clones by User

This rule flags a single user rapidly cloning dozens of repositories, a strong indicator of bulk source code exfiltration. Mass cloning enables quick siphoning of proprietary code, embedded secrets, and build artifacts across teams before defenses can respond. A typical pattern is a stolen personal access token used in a script to enumerate org repositories and clone them in rapid succession from a CI runner or cloud VM, including private and internal repos, to stage data for off-platform transfer.

### Possible investigation steps

- Validate whether the actor is a known automation or service account with a documented need to mass-clone, and quickly confirm intent with the account owner and affected repo admins.
- Enumerate the cloned repositories and their visibility, deprioritizing activity dominated by public repos while fast-tracking private/internal codebases with sensitive content across orgs.
- Pivot on the token identifier to determine the token owner, scopes, and creation/last-use details, compare to normal usage patterns, and revoke/reset credentials if anomalous.
- Analyze the user agent and agent identifier to attribute the activity to a specific host or CI runner, correlating with pipeline logs and login locations/times for anomalies.
- Correlate with endpoint/network telemetry from the originating host for large outbound transfers, external Git remotes, or bulk archiving indicating off-platform exfiltration following the clones.

### False positive analysis

- A developer rebuilding a workstation or creating an approved local mirror may legitimately clone dozens of repositories in a short window, especially when activity is dominated by public or low-sensitivity repos.
- A shared automation/service account running scheduled builds or org-wide maintenance tasks can trigger fresh clones across many repositories due to pipeline configuration or cache resets, inflating counts without exfiltration intent.

### Response and remediation

- Immediately revoke the GitHub token used for the clones, force sign out, require password reset and 2FA re-verification for the user, and suspend the account if unauthorized.
- Block and quarantine the originating host or CI runner by revoking its runner registration, removing its SSH keys/credentials, and firewalling its IP until imaged.
- On the cloned private/internal repositories, remove the user from teams, rotate or disable deploy keys and GitHub App installations, and enforce SAML SSO.
- Rotate repository and organization secrets present in those repos (Actions secrets, PATs, SSH keys, cloud access keys) and invalidate any secrets found in commit history.
- Recover by restoring only minimal access after owner approval, issuing a new fine-grained PAT with least privilege and expiry, and re-enabling builds while monitoring for further clone bursts.
- Escalate to incident response leadership and Legal if any private or export-controlled repos were cloned or cloning continues post-revocation, and harden by enforcing org-wide SSO, disallowing classic PATs, IP allowlisting for PAT use, enabling secret scanning with push protection, and alerting on burst git clone patterns from runners and unusual user agents.
"""
references = [
    "https://www.wiz.io/blog/shai-hulud-2-0-ongoing-supply-chain-attack",
    "https://trigger.dev/blog/shai-hulud-postmortem",
    "https://posthog.com/blog/nov-24-shai-hulud-attack-post-mortem",
]
risk_score = 47
rule_id = "19f3674c-f4a1-43bb-a89c-e4c6212275e0"
severity = "medium"
tags = [
    "Domain: Cloud",
    "Use Case: Threat Detection",
    "Tactic: Exfiltration",
    "Data Source: Github",
    "Resources: Investigation Guide",
]
timestamp_override = "event.ingested"
type = "esql"
query = '''
from logs-github.audit-* metadata _id, _index, _version
| where
  data_stream.dataset == "github.audit" and event.type == "change" and event.action == "git.clone"
| stats
  Esql.event_count = COUNT(*),
  Esql.github_org_values = values(github.org),
  Esql.github_repo_values = values(github.repo),
  Esql.github_repository_public_values = values(github.repository_public),
  Esql.github_token_id_values = values(github.token_id),
  Esql.github_user_agent_values = values(github.user_agent),
  Esql.user_name_values = values(user.name),
  Esql.agent_id_values = values(agent.id),
  Esql.data_stream_dataset_values = values(data_stream.dataset),
  Esql.data_stream_namespace_values = values(data_stream.namespace)

  by user.name

| keep Esql.*

| where
  Esql.event_count >= 25
'''

[[rule.threat]]
framework = "MITRE ATT&CK"

[[rule.threat.technique]]
id = "T1020"
name = "Automated Exfiltration"
reference = "https://attack.mitre.org/techniques/T1020/"

[[rule.threat.technique]]
id = "T1567"
name = "Exfiltration Over Web Service"
reference = "https://attack.mitre.org/techniques/T1567/"

[[rule.threat.technique.subtechnique]]
id = "T1567.001"
name = "Exfiltration to Code Repository"
reference = "https://attack.mitre.org/techniques/T1567/001/"

[rule.threat.tactic]
id = "TA0010"
name = "Exfiltration"
reference = "https://attack.mitre.org/tactics/TA0010/"

[[rule.threat]]
framework = "MITRE ATT&CK"

[[rule.threat.technique]]
id = "T1213"
name = "Data from Information Repositories"
reference = "https://attack.mitre.org/techniques/T1213/"

[[rule.threat.technique.subtechnique]]
id = "T1213.003"
name = "Code Repositories"
reference = "https://attack.mitre.org/techniques/T1213/003/"

[rule.threat.tactic]
id = "TA0009"
name = "Collection"
reference = "https://attack.mitre.org/tactics/TA0009/"

Stages and Predicates

Stage 1: from

from logs-github.audit-* metadata _id, _index, _version

Stage 2: where

| where
  data_stream.dataset == "github.audit" and event.type == "change" and event.action == "git.clone"

Stage 3: stats

| stats
  Esql.event_count = COUNT(*),
  Esql.github_org_values = values(github.org),
  Esql.github_repo_values = values(github.repo),
  Esql.github_repository_public_values = values(github.repository_public),
  Esql.github_token_id_values = values(github.token_id),
  Esql.github_user_agent_values = values(github.user_agent),
  Esql.user_name_values = values(user.name),
  Esql.agent_id_values = values(agent.id),
  Esql.data_stream_dataset_values = values(data_stream.dataset),
  Esql.data_stream_namespace_values = values(data_stream.namespace)

  by user.name

Stage 4: keep

| keep Esql.*

Stage 5: where

| where
  Esql.event_count >= 25

Indicators

Each row is a field, operator, and value that the rule matches. The corpus column counts how many other rules in the catalog look for the same combination: high numbers point to widely-used, community-vetted indicators. Blank or 1 shows that the indicator is specific to this rule.

FieldKindValues
Esql.event_countge
  • 25
data_stream.dataseteq
  • github.audit
event.actioneq
  • git.clone
event.typeeq
  • change

Output fields

Fields the rule emits when it matches. Chronicle authors list these in the outcome block; they appear on the detection and $risk_score drives alerting. Sentinel / Defender XDR rules build them up through project / summarize / extend stages. Sentinel maps these into alert fields via entityMappings and customDetails; Defender XDR custom detections surface them as alert fields directly.

FieldSource
Esql.*KEEP Esql.*