๐ Architectural Impact Analysis: Single vs. Multi-AZ Deploymentsยถ
Executive Summaryยถ
Research Overview
This comprehensive report evaluates the architectural implications of migrating from a multi-region AWS production environment (us-east-1 and us-west-2) to a single-region architecture utilizing Availability Zones (AZs).
The Strategic Divergenceยถ
| Stakeholder | Position | Rationale |
|---|---|---|
| Party_A | Single-AZ Active (Passive Standby) | Concerns about database "locking and deadlocks" |
| Party_B | Dual-AZ Active (High Availability) | Critical network security and connectivity requirements |
Technology Stack Analyzedยถ
- Database: MongoDB Atlas (Replica Sets)
- Cache: Redis (ElastiCache)
- Messaging: Apache Kafka (Strimzi)
- Security: AWS Network Firewall
- Connectivity: Mikrotik-based VPNs
Key Finding
Multi-AZ Active-Active architecture is the only viable configuration for a production-grade environment.
- Party_A's concerns about "locking" stem from legacy shared-disk paradigms that do not apply to modern consensus-based systems
- Single-AZ creates catastrophic risks for the network layer (AWS Network Firewall and VPN are strictly zonal)
- A localized AZ failure would result in total blackout of all traffic
๐ 1. Theoretical Foundations of Distributed Consistencyยถ
Addressing the 'Locking' Concern
The fear of distributed locking typically stems from legacy RDBMS experiences with Two-Phase Commit (2PC) or shared-storage clustering (e.g., Oracle RAC). These paradigms do not apply to our stack.
1.1 Evolution from Locking to Consensusยถ
Architecture:
- Complex Distributed Lock Managers (DLM)
- Multiple servers accessing same physical storage blocks
- Network failures โ DLM freezes โ perceived "deadlocks"
Problem: If interconnect fails, DLM freezes operations to prevent corruption
Architecture:
- Each node has its own local storage and memory
- Consistency via Replication and Consensus Protocols (Raft, Paxos)
- No global locks required
Benefit: Network latency between AZs does NOT extend write lock duration
No Global Locks in Multi-AZ MongoDB
In a Multi-AZ deployment, write operations do not require a "global lock" spanning both AZs. The Primary node:
- Acquires a local lock on the document
- Writes the data
- Allows Secondary to replicate asynchronously via Oplog
The network latency between AZs does not affect the application's write lock duration.
1.2 The "Active-Active" Terminology Gapยถ
Semantic Misunderstanding
Much friction between Operations and Leadership stems from different interpretations of "Active-Active."
| Interpretation | Description | Risk Level |
|---|---|---|
| Party_A's Fear | Simultaneous writes to same record in both AZs | โ ๏ธ Valid concern (without CRDTs) |
| Party_B's Reality | Infrastructure availability with Primary-Replica database | โ Zero locking penalty |
Clarification:
- "2 AZs always active" refers to infrastructure availability
- Network paths, load balancers, compute resources handle traffic in both zones
- Database uses Primary-Replica model (Active-Passive writes, Active-Active reads)
- Provides instantaneous failover with zero locking penalty
1.3 Latency vs. Deadlockยถ
Critical Distinction
These are fundamentally different concepts that must not be confused.
| Concept | Definition | Multi-AZ Impact |
|---|---|---|
| Latency | Time for signal to travel AZ1 โ AZ2 | < 2ms in AWS (negligible) |
| Deadlock | Two processes waiting for each other indefinitely | Architecturally impossible in MongoDB/Kafka |
Conclusion: Multi-AZ introduces minor latency, but replication protocols are mathematically proven to avoid distributed deadlocks.
๐๏ธ 2. MongoDB Atlas Architecture Deep Diveยถ
2.1 Asynchronous Replication Mechanicsยถ
Replica Set Topology
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Primary (AZ1) โโโโโโถโ Secondary (AZ2) โ โ Arbiter (AZ3) โ
โ Accepts writes โ โ Replicates โ โ Tie-breaker โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
Write Operation Flowยถ
sequenceDiagram
participant App
participant Primary as Primary (AZ1)
participant Secondary as Secondary (AZ2)
App->>Primary: Write Request
Primary->>Primary: Apply to memory + journal
Primary->>App: Acknowledge SUCCESS
Note over Primary,Secondary: Replication happens AFTER acknowledgment
Primary-->>Secondary: Async replication via Oplog
Locking Implications:
- โ Zero additional locking from AZ2 presence
- โ If link to AZ2 severed, Primary continues without pause
- โ "Deadlock" fear is unfounded
sequenceDiagram
participant App
participant Primary as Primary (AZ1)
participant Secondary as Secondary (AZ2)
App->>Primary: Write Request (w:majority)
Primary->>Primary: Apply to memory + journal
Primary->>Secondary: Replicate
Secondary->>Primary: Acknowledge
Primary->>App: Acknowledge SUCCESS
Failure Handling:
- If network to AZ2 fails โ operation times out (based on
wtimeout) - Database engine does NOT lock
- Other operations with lower write concerns continue
- Application receives timeout error for graceful handling
2.2 Dangers of Single-AZ (Cold Standby)ยถ
Party_A's Proposal: Passive Second AZ
Resources in AZ2 are turned off or not part of replication quorum until disaster occurs.
Data Loss Risk (RPO)ยถ
AZ1 Failure (fire, power loss)
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Replication to AZ2 was NOT continuous โ
โ (because it was "Passive") โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
ALL DATA SINCE LAST SNAPSHOT
IS IRRETRIEVABLY LOST
(RPO > 0)
Recovery Time Comparisonยถ
| Metric | Single-AZ (Cold Standby) | Multi-AZ (Hot Standby) |
|---|---|---|
| Provision instances | 5-15 minutes | N/A (already running) |
| Restore from snapshots | Hours (for TB of data) | N/A |
| Warm cache | Additional hours | Already warm |
| Total RTO | Hours of downtime | 2-10 seconds |
2.3 Distributed Deadlock Mythยถ
Architecturally Impossible
MongoDB uses document-level locking within the WiredTiger storage engine.
Key Facts:
- Locks are local to the node
- No protocol attempts to acquire locks on AZ1 and AZ2 simultaneously
- Classic "distributed deadlock" scenario is impossible in standard Replica Set deployment
Feature Comparisonยถ
| Feature | Single-AZ (Party_A) | Multi-AZ (Party_B) |
|---|---|---|
| Write Availability | โ Vulnerable to single DC failure | โ High Availability (auto failover) |
| Read Scalability | โ Limited to single node | โ Read Preference: Secondary offloads reads |
| Locking Overhead | None (Local only) | None (Local only) for w:1 |
| Failover Speed | โฑ๏ธ Hours (restore from backup) | โก Seconds (Raft election) |
๐พ 3. Redis and Caching Strategyยถ
3.1 Redis Topology Optionsยถ
Purpose: HA for non-clustered Redis
- Monitors Primary and replicas
- If Primary in AZ1 fails โ promotes Replica in AZ2
Split Brain Risk
Requires quorum (usually 3 sentinels). Without 3rd witness:
- Neither side can elect leader, OR
- Both think they are leaders
This validates some Party_A concerns about complexity (but NOT locking).
Purpose: Sharded data across multiple nodes
- Robust but requires careful configuration
- "Replica Groups" must span AZs
- If Master in AZ1 fails โ Slave in AZ2 takes over
3.2 Clarifying "Active-Active" for Redisยถ
Previous Multi-Region Setup
"Active-Active" Redis used CRDTs (Conflict-Free Replicated Data Types) via Redis Enterprise, allowing simultaneous writes to same key in both US-East and US-West.
New Single-Region Setup:
Message for Party_A
Standard ElastiCache (Redis OSS) does NOT support Active-Active writes (multi-master).
| Aspect | Configuration |
|---|---|
| Writes | Go to AZ1 (Primary) |
| Reads | Can be served from both AZs |
| Failover | AZ2 is hot standby (automatic promotion) |
| Multi-Master | โ NOT used |
Result: Eliminates risk of "deadlocks" or write conflicts while preserving AZ failure survival.
3.3 The "Cold Cache" Catastropheยถ
Thundering Herd Problem
Party_A's suggestion of "Passive" second AZ (Cold Standby) is particularly dangerous for Redis.
graph TD
A[AZ1 Fails] --> B[Failover to Cold Redis in AZ2]
B --> C[Cache is EMPTY]
C --> D[Every Query Hits MongoDB]
D --> E[Massive Spike in Load]
E --> F[Database Crashes]
F --> G[Total Systemic Failure]
style A fill:#ff6b6b
style G fill:#ff6b6b
Solution: Multi-AZ Active (Hot Standby)
- Replica in AZ2 constantly receives data updates
- Upon failover, cache is already warm
- Protects database from thundering herd
- Maintains application performance
๐จ 4. Kafka & Event Streaming (Strimzi)ยถ
4.1 Rack Awareness and Data Durabilityยถ
Kafka's Critical Role
Kafka is the nervous system of the architecture. Its resilience relies on "Rack Awareness" configuration.
Configuration:
Deployment Comparisonยถ
Scenario: All Kafka brokers in AZ1
| Event | Impact |
|---|---|
| AZ1 Failure | Total data unavailability |
| Storage Corruption | Permanent message queue loss |
| Recovery | No disaster recovery possible |
Scenario: Strimzi distributes partition replicas across AZs
- Partition leader in AZ1
- Follower in AZ2
- Producer writes with
acks=allreplicate to both AZs
Producer Configuration:
Is This Locking?
No. This introduces minor latency (network RTT), but not a deadlock.
| Scenario | Behavior |
|---|---|
| AZ2 becomes slow | Producer latency increases (not locked) |
| AZ2 fails | ISR list shrinks, leader continues |
4.2 Pod Placement and Affinityยถ
Kubernetes Configuration Required
Use podAntiAffinity rules to ensure brokers are strictly separated across AZs.
# Required Pod Anti-Affinity Configuration
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["kafka"]
topologyKey: topology.kubernetes.io/zone
Single-AZ Impact
Without Multi-AZ, proper pod distribution is impossible, creating a Single Point of Failure (SPOF) for the entire event pipeline.
๐ 5. Network Infrastructure & Securityยถ
The Critical Argument
While database arguments focus on performance and consistency, the Network Infrastructure argument is BINARY:
A Single-AZ deployment will NOT work reliably for the proposed security stack.
5.1 AWS Network Firewall: The Zonal Trapยถ
Critical Limitation
AWS Network Firewall is a zonal service (not regional). It's instantiated in a specific AZ using Gateway Load Balancer (GWLB) endpoints.
Single-AZ Failure Modeยถ
Scenario: Firewall deployed only in AZ1 (Party_A's "Passive AZ2" request)
graph TD
subgraph "Normal Operation"
IGW[Internet Gateway] --> FW1[Firewall AZ1]
FW1 --> APP1[App Servers AZ1]
FW1 --> APP2[App Servers AZ2]
end
subgraph "AZ1 Failure"
IGW2[Internet Gateway] -.->|BLOCKED| FW2[Firewall AZ1 โ]
FW2 -.->|UNREACHABLE| APP3[App Servers AZ1 โ]
FW2 -.->|NO PATH| APP4[App Servers AZ2 โ but orphaned]
end
Impact of AZ1 Failure
- Fiber cut or GWLB control plane outage in AZ1
- Firewall endpoint becomes unreachable
- ALL ingress/egress traffic for entire VPC stops
- Traffic destined for healthy servers in AZ2 is also blocked
- "Passive" AZ2 infrastructure is orphaned (no path to internet/corporate network)
Result: Total Blackout (even though AZ2 compute resources are healthy)
Multi-AZ Solutionยถ
Architecture:
- Firewall endpoint in dedicated subnet in each AZ
- Symmetric routing (stateful inspection requirement)
- Traffic for AZ2 stays within AZ2
Fault Isolation
If AZ1 fails, AZ2 traffic continues uninterrupted via AZ2 firewall endpoint.
Cost Analysis:
| Aspect | Value |
|---|---|
| Additional Cost | ~$0.395/hour per endpoint |
| Monthly Cost | ~$284 |
| Value | Prevents entire production from becoming black hole |
5.2 Mikrotik VPN and BGP Failoverยถ
VPN Architecture
Dual Mikrotik RouterOS instances for IPsec VPNs connect AWS to on-premises datacenter.
Multi-AZ BGP Architectureยถ
graph LR
subgraph "On-Premises"
OP[On-Prem Router]
end
subgraph "AWS"
OP -->|Tunnel A| VPN1[VPN Endpoint AZ1]
OP -->|Tunnel B| VPN2[VPN Endpoint AZ2]
VPN1 --> BGP1[BGP Session 1]
VPN2 --> BGP2[BGP Session 2]
BGP1 --> WL[AWS Workloads]
BGP2 --> WL
end
Failover Mechanics:
- Both tunnels exchange BGP routes
- If Tunnel A fails (AZ1 outage):
- BGP detects dead peer (DPD - Dead Peer Detection)
- Withdraws route from Tunnel A
- Traffic automatically shifts to Tunnel B
- Failover time: Seconds
No Locking in BGP
BGP is a routing protocol. "Active-Active" means both paths are valid for:
- Load balancing (ECMP)
- Fast failover without database consistency risks
Maintenance Benefitsยถ
- AWS patches one AZ at a time
- BGP session shifts traffic to other AZ
- Zero downtime
- Patch window = hard connectivity outage
- Manual intervention required
- Service disruption inevitable
5.3 Service Chain Dependenciesยถ
The Bottleneck
Traffic flows through: DMZ-VPC โ SEC-VPC (Firewall) โ WL-VPC (Workload)
The Choke Point: SEC-VPC (Security VPC)
- If SEC-VPC is Single-AZ = SPOF for every application
- Even if workload is in 2 AZs (one passive), Security VPC must be in 2 AZs
- Required to provide path for passive workload to become active
๐ง 6. Operational Resilience & Maintenanceยถ
6.1 Rolling Updates and Patchingยถ
Cloud Infrastructure Lifecycle
Cloud infrastructure is ephemeral and requires regular maintenance.
Process:
- Schedule service window (downtime)
- Stop service
- Apply patches
- Restart service
- Verify functionality
Result: Planned downtime for every update
Process:
- Patch Secondary in AZ2 (no impact)
- Seamless failover (step-down Primary in AZ1)
- AZ2 promoted to Primary (2-10 seconds)
- Patch former Primary in AZ1
- Revert to normal operation
Result: Zero downtime deployment
6.2 Split Brain Risk and Mitigationยถ
Valid Technical Concern
The one valid concern in a 2-AZ setup is the "Split Brain" scenario.
The Risk:
graph TD
A[Network Link Fails Between AZs] --> B[Both AZs Remain Up]
B --> C[Primary in AZ1 Can't Contact AZ2]
C --> D{Without Quorum}
D -->|AZ1 continues| E[Accepts writes]
D -->|AZ2 elects self| F[Also accepts writes]
E --> G[TWO PRIMARIES]
F --> G
G --> H[Data Corruption]
style H fill:#ff6b6b
The Solution: Quorum with Arbiter
Tie-Breaker Pattern
Deploy lightweight MongoDB Arbiter in 3rd AZ (or different region if 3 AZs unavailable).
graph TD
subgraph "AZ1 Isolated"
P[Primary AZ1]
end
subgraph "AZ2 + AZ3"
S[Secondary AZ2]
A[Arbiter AZ3]
end
P -.->|Cannot see majority| P2[Steps Down]
S -->|Sees Arbiter| S2[Forms Majority]
S2 --> S3[Promoted to Primary]
style P2 fill:#ffd93d
style S3 fill:#6bcb77
Result:
- If AZ1 isolated โ realizes it cannot see majority โ steps down
- AZ2 sees Arbiter โ forms majority โ promotes itself
- Consistency preserved. No locking. No split brain.
๐งช 7. Failure Simulation: Power Event in AZ1ยถ
Impact Comparison Tableยถ
| Timeline | Single-AZ Active (Party_A) | Dual-AZ Active (Party_B) |
|---|---|---|
| T+0s | Power fails in AZ1 | Power fails in AZ1 |
| T+1s | Application hard stop. VPN disconnects. Firewall unreachable. | Load Balancer health checks fail for AZ1. BGP sessions on Tunnel A drop. |
| T+5s | Manual intervention: Team alerted. Must spin up instances in AZ2 (Cold Start). | Automated Failover: MongoDB elects AZ2 Secondary. Redis promotes AZ2 Replica. VPN routes to Tunnel B. |
| T+30s | Downtime continues. Database restoring from snapshots. Network re-routing manually. | Service Restored. Application running on AZ2. Caches warm. |
| T+1h | Still recovering large datasets. Thundering herd as caches warm. | Operations investigates AZ1 root cause. Business continues normally. |
| Data Integrity | โ ๏ธ Potential Data Loss (RPO > 0) | โ Zero Data Loss (with w:majority) |
| Locking? | N/A (System is dead) | โ No locking observed |
๐ฐ 8. Financial and Risk Modelingยถ
8.1 The Cost of Downtimeยถ
Business Context
Production environment running Kafka, Redis, and MongoDB implies high-volume transactions (likely financial or messaging based on "SMS-service").
Risk Formula:
| Architecture | Probability | Cost | Risk Level |
|---|---|---|---|
| Single-AZ | ~0.1% per year per AZ | Catastrophic (hours of downtime) | ๐ด HIGH |
| Multi-AZ | Infinitesimal (simultaneous dual-AZ) | Minimal (seconds of brownout) | ๐ข LOW |
8.2 Hidden Costs of Single-AZยถ
Often Overlooked Expenses
| Hidden Cost | Description |
|---|---|
| Cross-AZ Data Transfer | If any peripheral services (backups, logs, monitoring) in another AZ, you pay transfer fees anyway |
| Emergency Engineering | Overtime and emergency contractors for recovery often exceed annual cost of redundant instances |
| Reputation Damage | Customer trust erosion from extended outages |
| SLA Penalties | Contractual penalties for failing availability commitments |
๐ฏ 9. Strategic Recommendationsยถ
Summary of Recommendations
1. Adopt Multi-AZ Active-Active Infrastructureยถ
The Operations team's requirement for 2 AZs is architecturally mandatory for:
- AWS Network Firewall
- VPN connectivity
A Single-AZ network layer is a single point of failure that compromises the entire stack.
2. Implement Active-Standby Database Topologyยถ
Clarification for Party_A
"Active-Active" infrastructure does NOT mean "Multi-Master Writes."
| Component | Configuration |
|---|---|
| MongoDB | 3-Voting-Node Replica Set (Primary AZ1, Secondary AZ2, Arbiter AZ3). Use w:1 for standard operations. |
| Redis | ElastiCache Multi-AZ with Automatic Failover. Hot standby (Active-Passive writes). |
3. Optimize Kafka for Durabilityยถ
# Strimzi Configuration
kafka:
config:
min.insync.replicas: 2
rack:
topologyKey: topology.kubernetes.io/zone
4. Security Architectureยถ
- Deploy AWS Network Firewall endpoints in both AZs with symmetric routing
- Configure Mikrotik VPNs with BGP and BFD for sub-second failover
๐ Final Conclusionยถ
The Verdict
The fear of "database deadlocks" in a Multi-AZ cloud architecture is a legacy concern that does not apply to modern consensus-based systems (MongoDB, Kafka).
The Real Risks of Single-AZ:
- โ Total loss of network connectivity
- โ Inability to fail over stateful services
- โ Extended downtime (hours vs. seconds)
- โ Potential data loss
The Bottom Line:
Dual-AZ Active architecture is not merely an 'option' for High Availability; it is the fundamental baseline for a reliable, production-grade AWS environment.
๐ Referencesยถ
Works Cited (Click to expand)
- Severalnines - High Availability - RSSing.com
- The cost of MongoDB ACID transactions - Henrik Ingo
- Active-Active Design with Amazon ElastiCache Redis - AWS Blog
- MemoryDB Multi-Region - AWS Documentation
- Guidance for Atlas High Availability - MongoDB Docs
- MDCC: Multi-data center consistency - ResearchGate
- Scalable Transaction Execution - CMU
- DX Application Performance Management - Broadcom
- Tunable Consistency in MongoDB - VLDB
- MongoDB Engineering Blog - MongoDB
- Distributed Transaction Processing - ResearchGate
- Redis Sentinel HA - Redis Docs
- Redis Split Brain - Stack Overflow
- ElastiCache Resilience - AWS Documentation
- Redis vs ElastiCache HA - Redis Blog
- Redpanda HA in Kubernetes - Redpanda Docs
- Confluent Multi-AZ Deployment - Confluent Docs
- Kafka Operator Best Practices - AutoMQ
- Single AZ vs Multi AZ - KodeKloud
- Network Firewall Multi-AZ - iCompaas
- Multi-zone architecture with Network Firewall - AWS Documentation
- Network Firewall Deployment Models - AWS Blog
- MikroTik VPN with AWS - AWS rePost
- Overlay Tunnel Failover - AWS Blog
- Site-to-Site VPN Routing - AWS Documentation
- Network Firewall Best Practices - AWS Security