Link: Multistage landing - Scribd document

Severity: medium
Type: rule
Source: github.com/sublime-security/sublime-rules

Detects when a Scribd document contains embedded links that are suspicious, particularly those targeting Microsoft services through various evasion techniques. The rule analyzes both the document content and linked destinations for suspicious patterns and redirects.

Threat classification

Sublime's own taxonomy (not MITRE ATT&CK).

Category	Values
Attack types	Credential Phishing
Tactics and techniques	Evasion, Social engineering, Impersonation: Brand, Free file host

Event coverage

Message attribute
body
body.links (collection)
headers.auth_summary
sender.email
type

Rule body MQL

type.inbound
// only one link to Scribd
and length(distinct(filter(body.links,
                           .href_url.domain.root_domain in ("scribd.com")
                           and strings.istarts_with(.href_url.path, "/document")
                    ),
                    .href_url.url
           )
) == 1
and any(body.links,
        .href_url.domain.root_domain == "scribd.com"
        and strings.istarts_with(.href_url.path, "/document")
        and (
          // target the embedded links via XPath
          any(html.xpath(ml.link_analysis(.).final_dom,
                         '//a[@class="ll"]/@href'
              ).nodes,
              strings.parse_url(.raw).domain.tld in $suspicious_tlds
              or strings.parse_url(.raw).domain.domain in $free_subdomain_hosts
              or strings.parse_url(.raw).domain.root_domain in $free_subdomain_hosts
              // observed pattern in credential theft URLs
              or strings.ilike(strings.parse_url(.raw).path,
                               "*o365*",
                               "*office365*",
                               "*microsoft*"
              )
              // observed pattern in credential theft URLs
              or strings.ilike(strings.parse_url(.raw).query_params,
                               "*o365*",
                               "*office365*",
                               "*microsoft*"
              )
              // observed pattern in credential theft URLs
              or any(beta.scan_base64(strings.parse_url(.raw).query_params),
                     strings.ilike(., "*o365*", "*office365*", "*microsoft*")
              )
              or ml.link_analysis(strings.parse_url(.raw), mode="aggressive").credphish.disposition == "phishing"
              or ml.link_analysis(strings.parse_url(.raw), mode="aggressive").credphish.contains_captcha
              or strings.icontains(ml.link_analysis(strings.parse_url(.raw),
                                                    mode="aggressive"
                                   ).final_dom.display_text,
                                   "I'm Human"
              )
              // bails out to a well-known domain, seen in evasion attempts
              or (
                length(ml.link_analysis(strings.parse_url(.raw),
                                        mode="aggressive"
                       ).redirect_history
                ) > 0
                and ml.link_analysis(strings.parse_url(.raw), mode="aggressive").effective_url.domain.root_domain in $tranco_10k
              )
          )
          // credential theft language on the main Scribd page
          or any(ml.nlu_classifier(beta.ocr(ml.link_analysis(.,
                                                             mode="aggressive"
                                            ).screenshot
                                   ).text
                 ).intents,
                 .name == "cred_theft" and .confidence != "low"
          )
        )
)
// negate highly trusted sender domains unless they fail DMARC authentication
and (
  (
    sender.email.domain.root_domain in $high_trust_sender_root_domains
    and not headers.auth_summary.dmarc.pass
  )
  or sender.email.domain.root_domain not in $high_trust_sender_root_domains
)

Detection logic

Scope: inbound message.

inbound message
length(distinct(filter(body.links, .href_url.domain.root_domain in ('scribd.com') and strings.istarts_with(.href_url.path, '/document')), .href_url.url)) is 1
any of body.links where all hold:
- .href_url.domain.root_domain is 'scribd.com'
- .href_url.path starts with '/document'
- any of:
  - any of html.xpath(ml.link_analysis(.).final_dom, '//a[@class="ll"]/@href').nodes where any holds:
    
    strings.parse_url(.raw).domain.tld in $suspicious_tlds
    strings.parse_url(.raw).domain.domain in $free_subdomain_hosts
    strings.parse_url(.raw).domain.root_domain in $free_subdomain_hosts
    strings.parse_url(.raw).path matches any of 3 patterns
    
    *o365*
    *office365*
    *microsoft*
    strings.parse_url(.raw).query_params matches any of 3 patterns
    
    *o365*
    *office365*
    *microsoft*
    any of beta.scan_base64(...) where:
    
    . matches any of 3 patterns
    
    *o365*
    *office365*
    *microsoft*
    ml.link_analysis(strings.parse_url(.raw)).credphish.disposition is 'phishing'
    ml.link_analysis(strings.parse_url(.raw)).credphish.contains_captcha
    ml.link_analysis(strings.parse_url(.raw), mode='aggressive').final_dom.display_text contains "I'm Human"
    all of:
    
    length(ml.link_analysis(strings.parse_url(.raw), mode='aggressive').redirect_history) > 0
    ml.link_analysis(strings.parse_url(.raw)).effective_url.domain.root_domain in $tranco_10k
  - any of ml.nlu_classifier(beta.ocr(ml.link_analysis(., mode='aggressive').screenshot).text).intents where all hold:
    
    .name is 'cred_theft'
    .confidence is not 'low'
any of:
- all of:
  - sender.email.domain.root_domain in $high_trust_sender_root_domains
  - not:
    
    headers.auth_summary.dmarc.pass
- sender.email.domain.root_domain not in $high_trust_sender_root_domains

Inspects: body.links, body.links[].href_url.domain.root_domain, body.links[].href_url.path, headers.auth_summary.dmarc.pass, sender.email.domain.root_domain, type.inbound. Sensors: beta.ocr, beta.scan_base64, html.xpath, ml.link_analysis, ml.nlu_classifier, strings.icontains, strings.ilike, strings.istarts_with, strings.parse_url. Reference lists: $free_subdomain_hosts, $high_trust_sender_root_domains, $suspicious_tlds, $tranco_10k.

Indicators matched (8)

Field	Match	Value
`body.links[].href_url.domain.root_domain`	member	`scribd.com`
`strings.istarts_with`	prefix	`/document`
`body.links[].href_url.domain.root_domain`	equals	`scribd.com`
`strings.ilike`	substring	`o365`
`strings.ilike`	substring	`office365`
`strings.ilike`	substring	`microsoft`
`strings.icontains`	substring	`I'm Human`
`ml.nlu_classifier(beta.ocr(ml.link_analysis(body.links[], mode='aggressive').screenshot).text).intents[].name`	equals	`cred_theft`

`j` / `k`	Scroll down / up
`d` / `u`	Half-page down / up
`gg` / `G`	Top / bottom
`h` / `l`	History back / forward
`f`	Follow link (`Shift` = new tab)
`/`	Focus search
`?`	Toggle this help
`↑` / `↓`	Navigate search results
`Enter`	Open highlighted result
`Esc`	Close results / dialog

`type:`	`events` / `rules` / `providers`
`vendor:`	`sigma` / `elastic` / `splunk` / `kusto` / `chronicle` (vendor name alone also works: `sigma:`, `kql:`, `secops:`…)
`tactic:`	TA-id, slug, or name: `credential_access`, `TA0006`
`technique:`	technique or sub-technique ID: `T1003`, `T1003.001` (alias `tech:`)
`severity:`	`critical` / `high` / `medium` / `low` / `informational` (alias `sev:`)
`risk_score`	Numeric comparison on the Elastic risk score (0 to 100): `risk_score>50`, `risk_score<=20`, `risk_score=99` (alias `risk`; Elastic rules only)
`stages:`	Rules with exactly N pipeline stages
`correlation:`	`single_event` / `sequence` / `alternatives` / `alternatives_cross_log` / `all_required` / `correlated`
`with:`	Co-occurrence event-id; stacks (`with:4624 with:4769`) to require all, while a comma list in one occurrence (`with:4624,4769`) is an either-or group. Implies multi-event
`like:`	Structural neighbors of a rule slug (equivalents + subsumption stricter / broader): `like:comsvcs_lsass_memory_dump-splunk-sysmon`
`groupby:`	Entity-grouping substring match against `group_by_keys`: `groupby:user`, `groupby:host`
`uses:`	Rules whose predicate tree touches the field (any kind, any value): `uses:CommandLine`
`excludes:`	Rules with top-level `not()` clauses on the field (FP whitelists): `excludes:ParentImage`
`field:` / `value:`	Predicate search; narrows rule cards to those with a matching leaf and drives the indicator tier. Unquoted = substring, wildcards allowed (`value:mimikatz`)
`indicator:`	Shorthand for `field:F value:V`: `indicator:Image=*\powershell.exe`
`kind:`	Filter by predicate kind. Narrows rule cards to those carrying a matching predicate leaf (`vendor:elastic kind:cidr_match`) and drives the indicator tier: `contains` / `starts_with` / `ends_with` / `regex` / `cidr` / `eq` / `in` … (operator aliases `op:`/`match:`)
`has:` / `no:`	`sample`, `field`, `notes`, `refs`, `trace`, `thirdparty`, `rule`, `pattern`, `timewindow`, `threshold`, `newterms`, `sigma`/`elastic`/`splunk`/`kusto`/`chronicle`
`-op:val`	Exclude matches; works on most operators but not `type:`/`like:`/`has:`/`no:` (use `no:<flag>` to exclude a rule flag): `tactic:execution -vendor:splunk`. Standalone `-kind:`/`-field:`/`-value:` drop every rule carrying a matching predicate leaf (`type:rules -kind:is_null`)
`field:"…"` / `value:"…"`	Quoted value = anchored exact match (also allows spaces): `value:"net user"`
`a,b`	Comma = OR inside one operator (`vendor:sigma,elastic`, `severity:high,critical`); repeating a facet merges the same way. `field:`/`value:` never split (literal commas)
`vendors:` / `stage:`	Singular and plural spellings fold to the canonical operator and value: `tactics:` = `tactic:`, `type:event` = `type:events`, `correlation:sequences` = `correlation:sequence`, `has:thresholds` = `has:threshold`
`"quoted phrase"`	Exact-match a multi-word phrase (free text)