name: embedding-hunt description: Pivot from one embedding to discover behavior clusters, find related activity across entities, and identify patterns that may indicate coordinated or widespread threats version: 1.0.0 author: DeepTempo tags:

soc
hunting
clustering
investigation requires:
mcp/deeptempo-findings-server

Embedding Hunt

Use embedding similarity to hunt for related behaviors across the environment.

When to Use

Use this skill when:

You have a confirmed malicious finding and want to find similar activity
Investigating whether a behavior is isolated or widespread
Looking for variations of a known attack pattern
Building a comprehensive view of an incident

Prerequisites

Access to the DeepTempo Findings Server MCP
A seed finding ID or embedding vector
Understanding of behavioral similarity concepts

Instructions

Step 1: Establish the Seed

Start with a known finding that represents the behavior you want to hunt:

get_finding(finding_id="<seed_finding_id>")

Document:

The behavioral pattern this finding represents
Key characteristics (entities, techniques, timing)
Why this is the hunting seed

Step 2: Expand the Search

Use nearest neighbors with increasing k values:

# Start narrow
nearest_neighbors(query="<seed_id>", k=10)

# Expand if pattern holds
nearest_neighbors(query="<seed_id>", k=50)

# Wide search for scope assessment
nearest_neighbors(query="<seed_id>", k=100)

Step 3: Analyze the Cluster

For each expansion level, analyze:

Similarity Distribution: How quickly does similarity drop off?
Entity Distribution: Same entity or multiple entities?
Temporal Distribution: Clustered in time or spread out?
Technique Consistency: Do neighbors share MITRE predictions?

Step 4: Apply Filters

Refine the hunt with filters:

# Filter by data source
nearest_neighbors(query="<seed_id>", k=50, filters={"data_source": "flow"})

# Filter by time range
nearest_neighbors(query="<seed_id>", k=50, filters={
    "time_range": {"start": "2024-01-15T00:00:00Z", "end": "2024-01-15T23:59:59Z"}
})

# Filter by minimum anomaly score
nearest_neighbors(query="<seed_id>", k=50, filters={"min_anomaly_score": 0.7})

Step 5: Identify Sub-Clusters

Look for natural groupings within results:

Group by source IP
Group by destination
Group by time window
Group by technique

Step 6: Generate Hunt Report

Document findings following the output format.

Output Format

# Embedding Hunt Report

**Seed Finding**: [Finding ID]
**Hunt Timestamp**: [Current Time]
**Status**: Requires Human Review

## Seed Behavior Summary

[Describe the behavior pattern being hunted]

### Seed Characteristics
| Attribute | Value |
|-----------|-------|
| Data Source | [source] |
| Primary Technique | [technique] |
| Anomaly Score | [score] |
| Key Entity | [entity] |

## Hunt Results

### Scope Summary

| Metric | Value |
|--------|-------|
| Total Similar Findings | [count] |
| Unique Source IPs | [count] |
| Unique Destinations | [count] |
| Unique Hostnames | [count] |
| Time Span | [duration] |

### Similarity Distribution

| Similarity Range | Count | Interpretation |
|------------------|-------|----------------|
| 0.95 - 1.00 | [n] | Near-identical behavior |
| 0.90 - 0.95 | [n] | Very similar |
| 0.80 - 0.90 | [n] | Related pattern |
| 0.70 - 0.80 | [n] | Loosely related |

### Entity Analysis

#### Affected Entities
| Entity | Finding Count | First Seen | Last Seen |
|--------|---------------|------------|-----------|
| [entity] | [count] | [time] | [time] |

#### Entity Relationships
[Describe connections between entities]

### Temporal Analysis

[Describe timing patterns:
- When did activity start?
- Is it ongoing?
- Are there bursts or steady activity?]

### Technique Distribution

| Technique | Findings | Avg Confidence |
|-----------|----------|----------------|
| [T####] | [count] | [avg] |

## Identified Clusters

### Cluster 1: [Label]
- **Findings**: [count]
- **Common Characteristic**: [description]
- **Entities**: [list]
- **Assessment**: [interpretation]

### Cluster 2: [Label]
[Repeat structure]

## Hunt Conclusions

### Pattern Assessment
[Is this isolated or widespread? Coordinated or independent?]

### Threat Assessment
[What does the scope tell us about the threat?]

### Confidence Level
[High/Medium/Low] - [Reasoning]

## Recommended Actions

### Immediate
1. [Action]

### Investigation
1. [Action]

### Monitoring
1. [Action]

---
*This report was generated by Claude using the Embedding Hunt skill.*
*All findings require human validation.*

Examples

Example 1: Hunting from Confirmed C2

Seed: Confirmed C2 beacon from compromised host Hunt Goal: Find other compromised hosts

Approach:

Use seed embedding to find similar beaconing patterns
Filter to exclude the seed host
Group results by source IP
Each unique source IP is a potential compromise

Example 2: Hunting Lateral Movement

Seed: Detected lateral movement attempt Hunt Goal: Map the full movement path

Approach:

Find similar authentication/movement patterns
Build timeline of activity
Identify source and destination hosts
Reconstruct the movement chain

Guidelines

Start narrow, expand gradually - Don't overwhelm with too many results initially
Document the seed clearly - Others need to understand what you're hunting
Look for natural breakpoints - Similarity drop-offs indicate cluster boundaries
Consider false positives - High similarity doesn't guarantee malicious
Time-bound your hunt - Set reasonable time windows
Validate findings - Spot-check results for relevance

Constraints

Do not assume all similar findings are malicious
Validate clusters before drawing conclusions
Note limitations of embedding similarity
Require human review for any response actions
Document methodology so hunts are reproducible

Chasse aux Embeddings

Recommandé pour

Notre avis

Points forts

Limites

Analyse de sécurité

Exemples