ADR-018: Threat Intelligence Feed Integration

Status

Accepted

Date

2026-03-20

Deciders

Liem Vo-Nguyen

Context

CloudForge performs AI-powered severity re-scoring on CSPM findings. To produce accurate risk assessments, the enrichment pipeline needs real-time threat intelligence from multiple external feeds:

EPSS (Exploit Prediction Scoring System) — probability that a CVE will be exploited in the next 30 days
CISA KEV (Known Exploited Vulnerabilities) — authoritative catalog of actively exploited CVEs
GreyNoise — IP reputation and internet-wide scanning classification
HIBP (Have I Been Pwned) — breach exposure for email addresses
OTX (AlienVault Open Threat Exchange) — IoC intelligence (IPs, domains, hashes)

Each feed has different data models, rate limits, authentication requirements, and freshness guarantees. The platform needs a unified integration pattern.

Decision

Adopt a client-per-feed architecture with shared caching and rate limiting.

Architecture

Each threat intel feed gets a dedicated Go client in internal/cspm/threatintel/:

Client	File	Cache TTL	Rate Limit	Auth
EPSS	`epss.go`	12h	None (bulk CSV)	None
CISA KEV	`kev.go`	12h (singleflight refresh)	None (bulk JSON)	None
GreyNoise	`greynoise.go`	12h	100/day (Community)	API key
HIBP	`hibp.go`	12h	10/min (sliding window)	API key
OTX	`otx.go`	12h	100/day	API key

Shared Patterns

All clients follow these conventions:

Cache-first lookup — GetCached() checks in-memory cache with TTL validation (TTL check fix: Sprint J QA)
HTTP client with timeout — 30s default, configurable per client
URL path escaping — url.PathEscape() on all user-supplied values (email, IP) before URL construction (Sprint J security fix)
Rate limiter — Per-client rate limiting with slot reservation before unlock to prevent TOCTOU races (Sprint J HIBP fix)
Singleflight dedup — sync/singleflight on refresh operations to prevent thundering herd (Sprint J KEV fix)
Error logging — All errors logged with context; never silently discarded (Sprint J KEV fix)

Enrichment Pipeline Integration

The EnrichmentService in cmd/server/service_enrichment.go orchestrates feed lookups:

Finding → [EPSS] → [KEV] → [GreyNoise] → [HIBP] → [OTX] → AI Scoring → Enriched Finding

Current limitation: feeds are called sequentially. Switch to errgroup for parallel execution when throughput demands it (P3).

Cache Implementation

type CachedResult struct {
    Data      interface{}
    FetchedAt time.Time
    TTL       time.Duration
}

func (c *CachedResult) IsExpired() bool {
    return time.Since(c.FetchedAt) > c.TTL
}

Consequences

Positive

Each feed is independently deployable and testable
Shared caching reduces API calls by 90%+ for repeated CVE/IP lookups
Singleflight prevents cache stampede on TTL expiry
URL escaping prevents injection via crafted email/IP values

Negative

Sequential enrichment adds latency (~200ms per feed, ~1s total for 5 feeds)
GreyNoise/HIBP community tier rate limits constrain throughput for large finding volumes
IP extraction uses finding.IPs with fallback to regex extraction from Description and ResourceID. Email extraction uses finding.Emails. All five feeds (EPSS, KEV, GreyNoise, HIBP, OTX) are wired and operational.

Risks

Feed API changes or deprecation require per-client updates
Rate limit exhaustion during bulk enrichment (mitigated by cache + backoff)
Community API key rate limits may be insufficient for production volumes (upgrade to paid tiers)

References

ADR-008 (Attack Path Computation) — attack paths benefit from IP reputation context
ADR-011 (Toxic Combo Detection) — threat intel feeds into toxic combination scoring
ADR-014 (Event-Driven Ingestion) — newly ingested findings trigger enrichment
Sprint J implementation: a15764d, QA fixes: fd6e4eb

Version	Date	Author	Changes
1.0	2026-03-20	Liem Vo-Nguyen	Initial ADR

Status​

Date​

Deciders​

Context​

Decision​

Architecture​

Shared Patterns​

Enrichment Pipeline Integration​

Cache Implementation​

Consequences​

Positive​

Negative​

Risks​

References​