Skip to main content

Runbook: Cloud Aegis Deployment

Overview

This runbook covers deploying Cloud Aegis to production, including:

  • Container image builds
  • Database migrations
  • Service rollout
  • Verification procedures

Prerequisites

  • Access to CI/CD pipeline (GitHub Actions)
  • kubectl access to production cluster
  • Database migration permissions
  • Approval from change management (if required)

Pre-Deployment Checklist

# 1. Verify current service health
kubectl get pods -n aegis
kubectl top pods -n aegis

# 2. Check pending database migrations
./aegis migrate status

# 3. Verify ECS service status
aws ecs describe-services --cluster aegis-personal --services aegis-personal-api \
--profile lvn-personal --region us-east-1 --query 'services[0].{Status:status,Running:runningCount}'

# 4. Check CF Pages deployment
wrangler pages deployment list --project-name cloudguard | head -5

Deployment Procedure

Option A: Fly.io Deployment (Primary)

# 1. Deploy to Fly.io (uses fly.toml at repo root)
fly deploy

# 2. Monitor deployment
fly status -a cloudforge-api
fly logs -a cloudforge-api

# 3. Verify health
curl -s https://api.cloudforge-demo.lvonguyen.com/health | jq .

Option B: Standard CI/CD (Alternative — Kubernetes)

# 1. Create release tag
git tag v1.2.3
git push origin v1.2.3

# 2. Monitor pipeline
# GitHub Actions will:
# - Run tests
# - Build container image
# - Push to registry
# - Apply Kubernetes manifests
# - Run smoke tests

# 3. Verify deployment
kubectl rollout status deployment/aegis-api -n aegis

Option C: Manual Deployment (Emergency — Kubernetes)

# 1. Build and push image
docker build -t aegis:v1.2.3 .
docker tag aegis:v1.2.3 123456789.dkr.ecr.us-west-2.amazonaws.com/aegis:v1.2.3
docker push 123456789.dkr.ecr.us-west-2.amazonaws.com/aegis:v1.2.3

# 2. Update deployment
kubectl set image deployment/aegis-api \
api=123456789.dkr.ecr.us-west-2.amazonaws.com/aegis:v1.2.3 \
-n aegis

# 3. Wait for rollout
kubectl rollout status deployment/aegis-api -n aegis --timeout=300s

Database Migration

# 1. Run migrations in dry-run mode first
./aegis migrate --dry-run

# 2. Apply migrations
./aegis migrate up

# 3. Verify migrations
./aegis migrate status

Verification

API Health Check

# Check health endpoint
curl -s https://api.aegis.io/health | jq .

# Expected response:
# {
# "status": "healthy",
# "version": "1.2.3",
# "components": { ... }
# }

Functional Verification

# Test finding creation
curl -X POST https://api.aegis.io/api/v1/findings \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"title": "Test Finding", "severity": "low"}'

# Verify in UI
open https://app.aegis.io/findings

Metrics Verification

# Check Prometheus targets
curl -s http://prometheus:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job=="aegis")'

# Check error rate
curl -s 'http://prometheus:9090/api/v1/query?query=rate(aegis_http_requests_total{status=~"5.."}[5m])'

Rollback Procedure

Automatic Rollback (Kubernetes)

# Rollback to previous version
kubectl rollout undo deployment/aegis-api -n aegis

# Verify rollback
kubectl rollout status deployment/aegis-api -n aegis

Database Rollback

# Rollback last migration
./aegis migrate down 1

# Rollback to specific version
./aegis migrate goto 20260103120000

Post-Deployment

  1. Verify all pods healthy
  2. Check error rate in Grafana
  3. Verify log shipping working
  4. Update deployment ticket
  5. Notify stakeholders

Escalation

ConditionAction
Deployment failsRollback, then investigate
Error rate >1%Rollback immediately
Performance degradation >20%Consider rollback
Security vulnerabilityEmergency rollback

Contact

  • On-Call: PagerDuty
  • Platform Team: #platform-support (Slack)
  • Security Team: #security-ops (Slack)