Component Selection Rationale and Cost Analysis

Document	Value
Version	1.0
Author	Liem Vo-Nguyen
Date	January 2026
Project	CloudForge

1. Cache Layer: Redis vs Memcached

Decision: Redis

Criteria	Redis	Memcached
Data Structures	Rich (strings, hashes, lists, sets, sorted sets)	Simple key-value only
Persistence	Optional (RDB/AOF)	None
Replication	Built-in master-replica	Manual
Pub/Sub	Yes	No
Lua Scripting	Yes	No
Cluster Mode	Yes	No (client-side sharding)
Memory Efficiency	Moderate	High

Why Redis

Session Management: Need TTL with complex session objects
Rate Limiting: Atomic increment with expire (INCR + EXPIRE)
Pub/Sub: Real-time finding notifications
Caching Compliance Data: Hash structures for framework controls
Distributed Locks: Redlock for workflow coordination

Cost Analysis (AWS ElastiCache)

Workload	Redis	Memcached
Instance	cache.r6g.large	cache.r6g.large
Monthly (On-Demand)	$219.60	$175.20
Monthly (Reserved 1yr)	$142.35	$113.88
Estimated Usage	50GB cache, 10K ops/sec	50GB cache, 10K ops/sec

Verdict: Redis costs ~25% more but provides essential features (persistence, pub/sub, Lua) that would require custom implementation with Memcached.

2. Compute: Lambda vs EC2 vs EKS

Decision: EKS (Primary), Lambda (Event Processing)

Criteria	Lambda	EC2	EKS
Cold Start	100ms-10s	None	None
Max Duration	15 min	Unlimited	Unlimited
Scaling	Automatic	Manual/ASG	HPA/KEDA
Cost Model	Per invocation	Per hour	Per hour + control plane
Ops Overhead	Low	High	Medium
State	Stateless	Stateful	Stateful pods

Why EKS (Primary)

Long-Running Workflows: Compliance scans can take hours
Consistent Latency: No cold starts for API requests
Multi-Tenancy: Namespace isolation for customers
Sidecar Patterns: Envoy for mTLS, Fluentd for logging
Stateful Workloads: Temporal workers need persistent connections

Why Lambda (Event Processing)

Webhook Receivers: Sporadic GRC callbacks
Scheduled Tasks: Daily compliance reports
Event-Driven: S3 trigger for file processing

Cost Analysis (100 concurrent users, 1M API calls/month)

Component	Lambda	EC2 (m6i.large)	EKS (m6i.large)
Compute/month	$200 (invocations)	$138 (2 instances)	$138 + $73 (control plane)
Load Balancer	N/A	$22	$22 (ingress)
Data Transfer	$10	$50	$50
Total	$210	$210	$283

At Scale (10x)

Component	Lambda	EC2	EKS
10M calls/month	$2,000	$690 (5 inst)	$690 + $73
Scaling	Automatic	ASG lag	KEDA fast

Verdict: EKS for API and long-running workloads ($73/mo premium for orchestration). Lambda for event-driven processing where cold starts acceptable.

3. Database: PostgreSQL vs MySQL vs DynamoDB

Decision: PostgreSQL (Primary), DynamoDB (High-Scale Findings)

Criteria	PostgreSQL	MySQL	DynamoDB
JSONB Support	Excellent	JSON (less performant)	Native
Partitioning	Built-in	Manual	Automatic
Extensions	Rich (pg_trgm, pgvector)	Limited	None
Transactions	Full ACID	Full ACID	Limited
Geospatial	PostGIS	Limited	Limited
Full-Text Search	Built-in	Built-in	OpenSearch needed
Cost (RDS)	Similar	Similar	Higher at scale

Why PostgreSQL

JSONB for Findings: Flexible schema for multi-source findings
Array Types: Compliance framework mappings as arrays
pg_trgm: Fuzzy search on finding descriptions
CTEs: Complex compliance reporting queries
Partitioning: By date for finding history

Why DynamoDB (Adjunct)

High-Volume Findings: 100K+ findings/day ingestion
Single-Table Design: Finding lookup by ID
TTL: Automatic expiration of transient data
Global Tables: Multi-region replication

Cost Analysis (500GB data, 10K reads/sec, 1K writes/sec)

Component	PostgreSQL (RDS)	DynamoDB
Instance	db.r6g.large	N/A
Storage (500GB)	$57.50	$125 (standard)
Compute	$219.60	N/A
Read Capacity	Included	$500 (on-demand)
Write Capacity	Included	$625 (on-demand)
Monthly Total	$277	$1,250

Verdict: PostgreSQL for primary data store. DynamoDB only if needing multi-region active-active or >100K TPS.

4. Message Queue: SQS vs Kafka vs RabbitMQ

Decision: SQS (Primary), Kafka (High-Volume Streams)

Criteria	SQS	Kafka	RabbitMQ
Ordering	FIFO queues	Per-partition	Per-queue
Throughput	3K msg/sec FIFO, unlimited std	100K+ msg/sec	50K msg/sec
Retention	14 days	Configurable	Until consumed
Ops Overhead	None (managed)	High	Medium
Consumer Groups	No (fan-out via SNS)	Yes	Yes
Replay	No	Yes	No

Why SQS

Managed: Zero ops for queue infrastructure
Integration: Native Lambda, ECS triggers
Dead Letter Queues: Built-in DLQ support
Sufficient Throughput: 3K/sec FIFO covers most use cases

Why Kafka (When Needed)

Event Sourcing: Full history replay capability
High Volume: >10K events/sec sustained
Stream Processing: Kafka Streams/ksqlDB
Multi-Consumer: Same events to multiple consumers

Cost Analysis (1M messages/day)

Component	SQS	MSK (Kafka)	AmazonMQ (RabbitMQ)
Messages	$12.60	N/A	N/A
Broker Costs	N/A	$438 (2x kafka.m5.large)	$213 (mq.m5.large)
Storage	N/A	$30 (100GB)	Included
Monthly Total	$13	$468	$213

Verdict: SQS for 99% of use cases. Kafka only for event sourcing requirements.

5. Orchestration: Temporal vs Step Functions vs Airflow

Decision: Temporal

Criteria	Temporal	Step Functions	Airflow
Long-Running	Unlimited	1 year max	DAG-based
Code-Based Workflows	Yes (Go, Java, etc.)	JSON/YAML	Python
Retries	Sophisticated	Basic	Basic
Visibility	Excellent UI	CloudWatch	Good UI
Testing	Unit testable	Difficult	Moderate
Cost	Self-hosted	Per transition	Self-hosted

Why Temporal

Code-Based: Workflows as Go code, testable
Long-Running: Approval workflows can take weeks
Saga Pattern: Complex multi-step provisioning
Visibility: Built-in workflow history
Self-Healing: Automatic retry with backoff

Cost Analysis

Component	Temporal (Self-Hosted)	Step Functions
Compute	2x m6i.large = $138/mo	N/A
Transitions (1M/mo)	N/A	$25
Storage	50GB EBS = $5/mo	Included
Monthly Total	$143	$25

But: Step Functions has state transition limits, no local testing, and JSON-based workflows. Temporal's code-first approach worth the cost for complex workflows.

6. AI Provider: Anthropic Claude vs OpenAI GPT-4

Decision: Anthropic Claude Opus 4.6 (Primary), OpenAI GPT-4 (Fallback)

Criteria	Claude Opus 4.6	GPT-4 Turbo
Context Window	200K tokens	128K tokens
Speed	Moderate	Fast
Reasoning	Excellent	Excellent
Coding	Excellent	Excellent
Cost (Input)	$15/1M tokens	$10/1M tokens
Cost (Output)	$75/1M tokens	$30/1M tokens
Rate Limits	Lower	Higher

Why Claude Opus 4.6 (Primary)

Context Window: 200K tokens for large finding batches
Reasoning: Better at nuanced security analysis
Structured Output: More consistent JSON responses
Less Hallucination: More conservative in risk assessment

Why GPT-4 (Fallback)

Rate Limits: Higher throughput when needed
Cost: 60% cheaper for output-heavy workloads
Availability: Different failure domains

Cost Analysis (100K findings/month, ~500 tokens/finding)

Metric	Claude Opus 4.6	GPT-4 Turbo
Input Tokens	50M @ $15/M = $750	50M @ $10/M = $500
Output Tokens	25M @ $75/M = $1,875	25M @ $30/M = $750
Monthly Total	$2,625	$1,250

Verdict: Use Claude for complex analysis (toxic combos, contextual risk), GPT-4 for high-volume simple enrichment. Hybrid approach ~$1,800/mo.

7. Secret Management: AWS Secrets Manager vs HashiCorp Vault

Decision: AWS Secrets Manager (Cloud), Vault (Hybrid)

Criteria	AWS Secrets Manager	HashiCorp Vault
Multi-Cloud	AWS only	Yes
Dynamic Secrets	Limited (RDS)	Extensive
PKI	No	Yes
Ops Overhead	None	High
Auditing	CloudTrail	Built-in
Cost	Per secret + API calls	Self-hosted

Why AWS Secrets Manager (Cloud-Only)

Zero Ops: Fully managed
IAM Integration: Native AWS IAM policies
Rotation: Built-in for RDS, Redshift
Lambda Integration: Seamless for serverless

Why Vault (Hybrid/Multi-Cloud)

Multi-Cloud: Single pane for AWS/Azure/GCP
Dynamic DB Creds: Ephemeral credentials
PKI: Certificate management
SSH: Dynamic SSH credentials

Cost Analysis (100 secrets, 100K API calls/month)

Component	AWS Secrets Manager	HashiCorp Vault (Self-Hosted)
Secrets (100)	$40/mo	N/A
API Calls (100K)	$5/mo	N/A
Compute	N/A	2x t3.medium = $60/mo
Storage	N/A	50GB EBS = $5/mo
Monthly Total	$45	$65

Verdict: AWS Secrets Manager for AWS-only. Vault if multi-cloud or need dynamic secrets/PKI.

8. Summary: Monthly Cost Estimate

Small Deployment (10 users, 10K findings/month)

Component	Choice	Monthly Cost
Compute (EKS)	2x m6i.large + control plane	$211
Database (RDS)	db.t3.medium	$50
Cache (ElastiCache)	cache.t3.micro	$12
Queue (SQS)	Standard	$5
AI (Claude)	10K findings	$260
Secrets	Secrets Manager	$10
Networking	NAT, LB	$50
Total		~$600/mo

Medium Deployment (100 users, 100K findings/month)

Component	Choice	Monthly Cost
Compute (EKS)	4x m6i.large + control plane	$365
Database (RDS)	db.r6g.large Multi-AZ	$440
Cache (ElastiCache)	cache.r6g.large	$220
Queue (SQS)	Standard	$15
AI (Hybrid)	100K findings	$1,800
Secrets	Secrets Manager	$50
Networking	NAT, LB, Transit Gateway	$200
Monitoring	CloudWatch, X-Ray	$100
Total		~$3,200/mo

Large Deployment (1000 users, 1M findings/month)

Component	Choice	Monthly Cost
Compute (EKS)	10x m6i.xlarge + control plane	$1,500
Database (RDS)	db.r6g.2xlarge Multi-AZ	$1,760
Cache (ElastiCache)	cache.r6g.xlarge cluster	$880
Queue (Kafka MSK)	3x kafka.m5.large	$660
AI (Hybrid)	1M findings	$15,000
Secrets	Vault cluster	$300
Networking	Full mesh, WAF	$800
Monitoring	Datadog/New Relic	$500
Total		~$21,400/mo

9. Cost Optimization Recommendations

Reserved Instances: 35-50% savings on compute/database
Savings Plans: Commit to 1-year for additional discounts
Spot Instances: Use for batch processing (Checkov scans)
AI Caching: Cache AI responses for similar findings (30% reduction)
S3 Intelligent-Tiering: For finding archives
Right-Sizing: Monthly review of instance utilization

1. Cache Layer: Redis vs Memcached​

Decision: Redis​

Why Redis​

Cost Analysis (AWS ElastiCache)​

2. Compute: Lambda vs EC2 vs EKS​

Decision: EKS (Primary), Lambda (Event Processing)​

Why EKS (Primary)​

Why Lambda (Event Processing)​

Cost Analysis (100 concurrent users, 1M API calls/month)​

3. Database: PostgreSQL vs MySQL vs DynamoDB​

Decision: PostgreSQL (Primary), DynamoDB (High-Scale Findings)​

Why PostgreSQL​

Why DynamoDB (Adjunct)​

Cost Analysis (500GB data, 10K reads/sec, 1K writes/sec)​

4. Message Queue: SQS vs Kafka vs RabbitMQ​

Decision: SQS (Primary), Kafka (High-Volume Streams)​

Why SQS​

Why Kafka (When Needed)​

Cost Analysis (1M messages/day)​

5. Orchestration: Temporal vs Step Functions vs Airflow​

Decision: Temporal​

Why Temporal​

Cost Analysis​

6. AI Provider: Anthropic Claude vs OpenAI GPT-4​

Decision: Anthropic Claude Opus 4.6 (Primary), OpenAI GPT-4 (Fallback)​

Why Claude Opus 4.6 (Primary)​

Why GPT-4 (Fallback)​

Cost Analysis (100K findings/month, ~500 tokens/finding)​

7. Secret Management: AWS Secrets Manager vs HashiCorp Vault​

Decision: AWS Secrets Manager (Cloud), Vault (Hybrid)​

Why AWS Secrets Manager (Cloud-Only)​

Why Vault (Hybrid/Multi-Cloud)​

Cost Analysis (100 secrets, 100K API calls/month)​

8. Summary: Monthly Cost Estimate​

Small Deployment (10 users, 10K findings/month)​

Medium Deployment (100 users, 100K findings/month)​

Large Deployment (1000 users, 1M findings/month)​

9. Cost Optimization Recommendations​