Single Region Multi-AZ Resiliency

Transitioning from Multi-Region to a mirrored US-East (2 AZ) topology. The objective: Maximize High Availability while eliminating "Cold Path" failure risks.

The Core Dilemma

  • 1

    Consolidation: Moving all workloads to one region to reduce cost and complexity.

  • 2

    The Conflict: Party_A prefers Active/Passive (fear of locking). Party_B prefers Active/Warm (fear of silent failure).

  • 3

    The Goal: Prove that Multi-AZ Active/Active is safe and superior.

Architecture Topology

A mirrored stack across two Availability Zones. Use the toggles to visualize traffic flow.

☁️
Cloudflare (WAF)
HTTPS / API
AWS Global Accel
TCP / SIM
AWS US Region
Internet Gateway (IGW)
Zone A Primary
Ingress Firewall
Network Firewall Endpoint
ALB / NLB
Load Balancing
EKS Cluster A
App Pods • Kafka Workers
Zone B Standby
Ingress Firewall
Network Firewall Endpoint
ALB / NLB
Load Balancing
EKS Cluster B
App Pods • Kafka Workers
Shared Data Plane (Active/Active)
🔴 Redis Cluster
🍃 MongoDB Atlas

Decision Matrix

Select a traffic strategy to analyze the operational impact.

Analysis: This option provides the best balance. By sending 10% traffic to Zone B, we validate network paths, firewall rules, and IAM permissions continuously without the complexity of full bi-directional scaling.

Risk vs. Value Profile

Traffic Distribution

Failover Time
~5s
Resource Waste
Low

Debunking the "Locking" Myth

The fear of database locking is inherited from Multi-Region architectures where latency is high. In a Single-Region Multi-AZ setup, the physics change completely.

1

Sub-Millisecond Latency

Latency between AZs is < 2ms. To Redis and Mongo, this looks like a local LAN. Consensus protocols (Raft/Paxos) handle this transparently.

2

No Application Locks Needed

MongoDB uses Primary/Secondary election. Even in Active/Active, apps write to the *same* Primary. Redis uses CRDTs (Active-Active) to merge writes mathematically.

Latency Impact on Consistency

Lower latency = Lower risk of sync issues.