Embedding Hunt

VerifiedSafe

Uses embedding similarity to hunt for related behaviors across the environment. Starting from a confirmed malicious finding, it discovers behavior clusters, related entities, and variations of known attack patterns. Helps assess incident scope and detect coordinated or widespread threats.

Sby Skills Guide Bot
SecurityIntermediate
506/2/2026
Claude Code
#soc#hunting#clustering#investigation#embedding

Recommended for

Our review

Leverages embedding similarity to hunt for related behaviors across entities and identify coordinated or widespread threats.

Strengths

  • Finds similar behaviors that signature-based rules might miss
  • Multi-dimensional analysis (entities, time, techniques)
  • Progressive exploration (increasing k) suited for SOC workflows

Limitations

  • Requires the DeepTempo Findings Server MCP
  • Depends on the quality of pre-generated embeddings
  • May produce many false positives without proper filtering
When to use it

When you have a confirmed malicious finding and need to determine its scope across the environment.

When not to use it

For quick ad-hoc investigations without a reliable embedding seed or access to the required findings server.

Security analysis

Safe
Quality score90/100

The skill only uses read-only MCP calls to retrieve findings and embeddings for analysis. It does not instruct any destructive actions, code execution, or data exfiltration.

No concerns found

Examples

Expand from single finding
Use embedding hunt with seed finding ID 'finding_abc123' and k=50 to find similar activity. Filter by data_source=flow and min_anomaly_score=0.8.
Temporal scope assessment
Run an embedding hunt from finding 'finding_xyz' with k=100, then group results by source IP and time window to identify sub-clusters.
Technique-based refinement
For finding 'finding_001', perform nearest neighbors with k=30 and filters on MITRE technique T1566, then generate a hunt report.

name: embedding-hunt description: Pivot from one embedding to discover behavior clusters, find related activity across entities, and identify patterns that may indicate coordinated or widespread threats version: 1.0.0 author: DeepTempo tags:

  • soc
  • hunting
  • clustering
  • investigation requires:
  • mcp/deeptempo-findings-server

Embedding Hunt

Use embedding similarity to hunt for related behaviors across the environment.

When to Use

Use this skill when:

  • You have a confirmed malicious finding and want to find similar activity
  • Investigating whether a behavior is isolated or widespread
  • Looking for variations of a known attack pattern
  • Building a comprehensive view of an incident

Prerequisites

  • Access to the DeepTempo Findings Server MCP
  • A seed finding ID or embedding vector
  • Understanding of behavioral similarity concepts

Instructions

Step 1: Establish the Seed

Start with a known finding that represents the behavior you want to hunt:

get_finding(finding_id="<seed_finding_id>")

Document:

  • The behavioral pattern this finding represents
  • Key characteristics (entities, techniques, timing)
  • Why this is the hunting seed

Step 2: Expand the Search

Use nearest neighbors with increasing k values:

# Start narrow
nearest_neighbors(query="<seed_id>", k=10)

# Expand if pattern holds
nearest_neighbors(query="<seed_id>", k=50)

# Wide search for scope assessment
nearest_neighbors(query="<seed_id>", k=100)

Step 3: Analyze the Cluster

For each expansion level, analyze:

  1. Similarity Distribution: How quickly does similarity drop off?
  2. Entity Distribution: Same entity or multiple entities?
  3. Temporal Distribution: Clustered in time or spread out?
  4. Technique Consistency: Do neighbors share MITRE predictions?

Step 4: Apply Filters

Refine the hunt with filters:

# Filter by data source
nearest_neighbors(query="<seed_id>", k=50, filters={"data_source": "flow"})

# Filter by time range
nearest_neighbors(query="<seed_id>", k=50, filters={
    "time_range": {"start": "2024-01-15T00:00:00Z", "end": "2024-01-15T23:59:59Z"}
})

# Filter by minimum anomaly score
nearest_neighbors(query="<seed_id>", k=50, filters={"min_anomaly_score": 0.7})

Step 5: Identify Sub-Clusters

Look for natural groupings within results:

  • Group by source IP
  • Group by destination
  • Group by time window
  • Group by technique

Step 6: Generate Hunt Report

Document findings following the output format.

Output Format

# Embedding Hunt Report

**Seed Finding**: [Finding ID]
**Hunt Timestamp**: [Current Time]
**Status**: Requires Human Review

## Seed Behavior Summary

[Describe the behavior pattern being hunted]

### Seed Characteristics
| Attribute | Value |
|-----------|-------|
| Data Source | [source] |
| Primary Technique | [technique] |
| Anomaly Score | [score] |
| Key Entity | [entity] |

## Hunt Results

### Scope Summary

| Metric | Value |
|--------|-------|
| Total Similar Findings | [count] |
| Unique Source IPs | [count] |
| Unique Destinations | [count] |
| Unique Hostnames | [count] |
| Time Span | [duration] |

### Similarity Distribution

| Similarity Range | Count | Interpretation |
|------------------|-------|----------------|
| 0.95 - 1.00 | [n] | Near-identical behavior |
| 0.90 - 0.95 | [n] | Very similar |
| 0.80 - 0.90 | [n] | Related pattern |
| 0.70 - 0.80 | [n] | Loosely related |

### Entity Analysis

#### Affected Entities
| Entity | Finding Count | First Seen | Last Seen |
|--------|---------------|------------|-----------|
| [entity] | [count] | [time] | [time] |

#### Entity Relationships
[Describe connections between entities]

### Temporal Analysis

[Describe timing patterns:
- When did activity start?
- Is it ongoing?
- Are there bursts or steady activity?]

### Technique Distribution

| Technique | Findings | Avg Confidence |
|-----------|----------|----------------|
| [T####] | [count] | [avg] |

## Identified Clusters

### Cluster 1: [Label]
- **Findings**: [count]
- **Common Characteristic**: [description]
- **Entities**: [list]
- **Assessment**: [interpretation]

### Cluster 2: [Label]
[Repeat structure]

## Hunt Conclusions

### Pattern Assessment
[Is this isolated or widespread? Coordinated or independent?]

### Threat Assessment
[What does the scope tell us about the threat?]

### Confidence Level
[High/Medium/Low] - [Reasoning]

## Recommended Actions

### Immediate
1. [Action]

### Investigation
1. [Action]

### Monitoring
1. [Action]

---
*This report was generated by Claude using the Embedding Hunt skill.*
*All findings require human validation.*

Examples

Example 1: Hunting from Confirmed C2

Seed: Confirmed C2 beacon from compromised host Hunt Goal: Find other compromised hosts

Approach:

  1. Use seed embedding to find similar beaconing patterns
  2. Filter to exclude the seed host
  3. Group results by source IP
  4. Each unique source IP is a potential compromise

Example 2: Hunting Lateral Movement

Seed: Detected lateral movement attempt Hunt Goal: Map the full movement path

Approach:

  1. Find similar authentication/movement patterns
  2. Build timeline of activity
  3. Identify source and destination hosts
  4. Reconstruct the movement chain

Guidelines

  1. Start narrow, expand gradually - Don't overwhelm with too many results initially
  2. Document the seed clearly - Others need to understand what you're hunting
  3. Look for natural breakpoints - Similarity drop-offs indicate cluster boundaries
  4. Consider false positives - High similarity doesn't guarantee malicious
  5. Time-bound your hunt - Set reasonable time windows
  6. Validate findings - Spot-check results for relevance

Constraints

  • Do not assume all similar findings are malicious
  • Validate clusters before drawing conclusions
  • Note limitations of embedding similarity
  • Require human review for any response actions
  • Document methodology so hunts are reproducible
Related skills