ADR-019: Multi-Tenant Data Isolation at the Database Layer
Status
Accepted
Date
2026-03-23
Context
Cloud Aegis supports multi-tenancy via a middleware layer that resolves the tenant from JWT claims, HTTP headers, or subdomain, and injects a tenant.Config into the request context. However, no database query enforces tenant isolation. All SQL queries operate without a WHERE tenant_id = ? filter, meaning:
- A compromised or misconfigured middleware allows full cross-tenant data access.
- Internal service calls that bypass HTTP middleware (background jobs, event handlers) have no tenant boundary.
- A bug in the tenant resolution logic silently serves data from the wrong tenant.
The architecture grill assessment scored the resilience pillar at 2.8/5 and security pillar at 3.9/5, partly due to this gap. The red team confirmed that middleware-only tenant isolation is insufficient for a multi-tenant SaaS deployment.
Decision Drivers
- Compliance: SOC 2 Type II and ISO 27001 require demonstrable data isolation controls.
- Defense in depth: Middleware is a single layer; database-layer enforcement provides a second barrier.
- Whitelabel readiness: HAEA deployment serves multiple business units (HMA, KNA, HMNA, HMGNA) — each needs data isolation.
Decision
Add a tenant_id column (with denormalized tenant_name for auditability) to all query-scoped database tables and enforce WHERE tenant_id = $N in every SQL query. The tenant ID is extracted from request context at the data access layer, with a fallback to "default" for backward-compatible single-tenant operation.
Implementation Details
- Migration 005 adds
tenant_id VARCHAR(50) NOT NULL DEFAULT 'default'andtenant_name VARCHAR(255)to all tables. tenantFromCtx(ctx)helper extracts(id, name)fromtenant.FromContext(ctx).- Every
SELECT,INSERT,UPDATEinpostgres_provider.goincludestenant_idin the query. - Indexes on
(tenant_id, created_at DESC)cover the primary query pattern.
Tables Affected
| Table | tenant_id | tenant_name | Index |
|---|---|---|---|
| exception_requests | yes | yes | (tenant_id, created_at DESC) |
| approval_chain | yes | no | via exception FK |
| exception_audit_log | yes | no | via exception FK |
| risk_assessments | yes | no | via exception FK |
| compensating_controls | yes | no | via exception FK |
| findings | yes | yes | (tenant_id) |
| compliance_frameworks | yes | no | reference data |
| ai_agents | yes | no | - |
| agent_traces | yes | no | (tenant_id) |
| remediations | yes | yes | (tenant_id) |
| cost_summaries | yes | no | (tenant_id) |
| audit_log | yes | yes | (tenant_id, timestamp DESC) |
Consequences
Positive
- Cross-tenant data access is structurally impossible at the SQL layer.
- Background jobs and internal calls inherit tenant context, preventing accidental leaks.
tenant_namedenormalization eliminates join lookups for audit/export use cases.DEFAULT 'default'preserves backward compatibility for single-tenant deployments.
Negative
- Every SQL query now has an additional
AND tenant_id = $Nclause (minor performance cost, offset by index). - Developers must remember to pass context through to data access methods.
- Migration adds columns to all tables (one-time schema change, handled by
IF NOT EXISTS).
Risks
- If
tenantFromCtxreturns"default"when it shouldn't (context not propagated), queries will return empty results rather than cross-tenant data. This is a fail-safe default.
Alternatives Considered
- Row-Level Security (RLS): PostgreSQL RLS policies could enforce tenant isolation transparently. Rejected because: (a) requires
SET LOCALper connection, complicating connection pooling; (b) less visible in code reviews; (c) harder to test. - Schema-per-tenant: Separate schemas per tenant. Rejected because: (a) migration complexity scales linearly with tenants; (b) cross-tenant reporting requires schema-qualified queries.
- Middleware-only (status quo): Rejected per red team findings — single layer, no defense in depth.