ADR-004: AI Provider Selection

Status

Accepted

Date

2026-01-05

Context

AI capabilities are required for:

Contextual risk scoring beyond static severity
Toxic combination detection across findings
Natural language remediation recommendations
Misconfiguration root cause analysis

Options Considered

Anthropic Claude Opus 4.6 - Large context, strong reasoning
OpenAI GPT-4 Turbo - Fast, cost-effective, proven
Self-hosted LLM (Llama) - Privacy, no API costs
AWS Bedrock - Managed, multiple models

Decision

Anthropic Claude Opus 4.6 was selected as primary, with OpenAI GPT-4 as fallback.

Rationale

Claude Opus 4.6 Primary

Factor	Reasoning
Context Window	200K tokens - can analyze 50+ findings at once
Reasoning	Superior for nuanced security analysis
Structured Output	More consistent JSON generation
Hallucination	More conservative in risk assessment

GPT-4 Fallback

Factor	Reasoning
Rate Limits	Higher throughput when Claude is limited
Cost	60% cheaper for output-heavy workloads
Availability	Different failure domain

Why Not Self-Hosted

Requires significant GPU infrastructure
Model quality gap for complex reasoning
Maintenance and update burden
Security review of model weights

Implementation

Request Routing

if request.requires_deep_analysis:
    provider = "claude-opus-4-6"
elif request.high_volume:
    provider = "gpt-4-turbo"
else:
    provider = "claude-opus-4-6"

Cost Optimization

Cache AI responses for similar findings (30% reduction)
Batch findings for context efficiency
Use cheaper models for simple enrichment

Fallback Logic

try:
    response = claude.analyze(finding)
except RateLimitError:
    response = openai.analyze(finding)
except APIError:
    response = static_analysis(finding)  # Graceful degradation

Consequences

Positive

Best-in-class reasoning for security analysis
Large context enables batch processing
Fallback ensures availability

Negative

~$2,600/month for 100K findings
External API dependency
Data leaves infrastructure

Mitigations

Implement response caching
Use batch processing for efficiency
Encrypt findings before sending
Implement circuit breaker for outages

Cost Estimate

Volume	Claude Only	Hybrid	Savings
10K/mo	$260	$200	23%
100K/mo	$2,600	$1,800	31%
1M/mo	$26,000	$18,000	31%

Update: AWS Bedrock as Production Enrichment Provider

Date: 2026-03-20

For production enrichment workloads (finding risk scoring, remediation recommendations), CloudForge uses AWS Bedrock (anthropic.claude-haiku-4-5-20251001-v1:0) as the managed inference provider. Bedrock eliminates the need to manage API keys for direct Anthropic/OpenAI access in production, uses IAM-based authentication (see bedrock-sa IAM user), and keeps inference traffic within the AWS network boundary.

The provider abstraction (AIProvider interface) remains unchanged -- Bedrock is configured via AEGIS_AI_ENABLED=true, AEGIS_AI_REGION=us-east-1, and AEGIS_AI_MODEL=anthropic.claude-haiku-4-5-20251001-v1:0 environment variables. The fallback chain is: Bedrock -> direct Anthropic API -> static analysis.

Direct Anthropic Claude Opus 4.6 and OpenAI GPT-4 remain available for local development and as fallback providers.

ADR-005: Rate Limiting Strategy
ADR-018: Threat Intelligence Feed Integration

Status​

Date​

Context​

Options Considered​

Decision​

Rationale​

Claude Opus 4.6 Primary​

GPT-4 Fallback​

Why Not Self-Hosted​

Implementation​

Request Routing​

Cost Optimization​

Fallback Logic​

Consequences​

Positive​

Negative​

Mitigations​

Cost Estimate​

Update: AWS Bedrock as Production Enrichment Provider​

Related Decisions​