📊 Architectural Impact Analysis: Single vs. Multi-AZ Deployments¶

2025-12-182025-12-18

Executive Summary¶

Research Overview

This comprehensive report evaluates the architectural implications of migrating from a multi-region AWS production environment (us-east-1 and us-west-2) to a single-region architecture utilizing Availability Zones (AZs).

The Strategic Divergence¶

Stakeholder	Position	Rationale
Party_A	Single-AZ Active (Passive Standby)	Concerns about database "locking and deadlocks"
Party_B	Dual-AZ Active (High Availability)	Critical network security and connectivity requirements

Technology Stack Analyzed¶

Database: MongoDB Atlas (Replica Sets)
Cache: Redis (ElastiCache)
Messaging: Apache Kafka (Strimzi)
Security: AWS Network Firewall
Connectivity: Mikrotik-based VPNs

Key Finding

Multi-AZ Active-Active architecture is the only viable configuration for a production-grade environment.

Party_A's concerns about "locking" stem from legacy shared-disk paradigms that do not apply to modern consensus-based systems
Single-AZ creates catastrophic risks for the network layer (AWS Network Firewall and VPN are strictly zonal)
A localized AZ failure would result in total blackout of all traffic

🎓 1. Theoretical Foundations of Distributed Consistency¶

Addressing the 'Locking' Concern

The fear of distributed locking typically stems from legacy RDBMS experiences with Two-Phase Commit (2PC) or shared-storage clustering (e.g., Oracle RAC). These paradigms do not apply to our stack.

1.1 Evolution from Locking to Consensus¶

Traditional Shared-Disk (Legacy)Modern Shared-Nothing (Current)

Architecture:

Complex Distributed Lock Managers (DLM)
Multiple servers accessing same physical storage blocks
Network failures → DLM freezes → perceived "deadlocks"

Problem: If interconnect fails, DLM freezes operations to prevent corruption

Architecture:

Each node has its own local storage and memory
Consistency via Replication and Consensus Protocols (Raft, Paxos)
No global locks required

Benefit: Network latency between AZs does NOT extend write lock duration

No Global Locks in Multi-AZ MongoDB

In a Multi-AZ deployment, write operations do not require a "global lock" spanning both AZs. The Primary node:

Acquires a local lock on the document
Writes the data
Allows Secondary to replicate asynchronously via Oplog

The network latency between AZs does not affect the application's write lock duration.

1.2 The "Active-Active" Terminology Gap¶

Semantic Misunderstanding

Much friction between Operations and Leadership stems from different interpretations of "Active-Active."

Interpretation	Description	Risk Level
Party_A's Fear	Simultaneous writes to same record in both AZs	⚠️ Valid concern (without CRDTs)
Party_B's Reality	Infrastructure availability with Primary-Replica database	✅ Zero locking penalty

Clarification:

"2 AZs always active" refers to infrastructure availability
Network paths, load balancers, compute resources handle traffic in both zones
Database uses Primary-Replica model (Active-Passive writes, Active-Active reads)
Provides instantaneous failover with zero locking penalty

1.3 Latency vs. Deadlock¶

Critical Distinction

These are fundamentally different concepts that must not be confused.

Concept	Definition	Multi-AZ Impact
Latency	Time for signal to travel AZ1 → AZ2	< 2ms in AWS (negligible)
Deadlock	Two processes waiting for each other indefinitely	Architecturally impossible in MongoDB/Kafka

Conclusion: Multi-AZ introduces minor latency, but replication protocols are mathematically proven to avoid distributed deadlocks.

🗄️ 2. MongoDB Atlas Architecture Deep Dive¶

2.1 Asynchronous Replication Mechanics¶

Replica Set Topology

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Primary (AZ1)  │────▶│ Secondary (AZ2) │     │  Arbiter (AZ3)  │
│  Accepts writes │     │   Replicates    │     │   Tie-breaker   │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Write Operation Flow¶

Default (w:1) - No Cross-AZ LatencyDurable (w:majority) - Managed Latency

sequenceDiagram
    participant App
    participant Primary as Primary (AZ1)
    participant Secondary as Secondary (AZ2)
    
    App->>Primary: Write Request
    Primary->>Primary: Apply to memory + journal
    Primary->>App: Acknowledge SUCCESS
    Note over Primary,Secondary: Replication happens AFTER acknowledgment
    Primary-->>Secondary: Async replication via Oplog

Locking Implications:

✅ Zero additional locking from AZ2 presence
✅ If link to AZ2 severed, Primary continues without pause
✅ "Deadlock" fear is unfounded

sequenceDiagram
    participant App
    participant Primary as Primary (AZ1)
    participant Secondary as Secondary (AZ2)
    
    App->>Primary: Write Request (w:majority)
    Primary->>Primary: Apply to memory + journal
    Primary->>Secondary: Replicate
    Secondary->>Primary: Acknowledge
    Primary->>App: Acknowledge SUCCESS

Failure Handling:

If network to AZ2 fails → operation times out (based on wtimeout)
Database engine does NOT lock
Other operations with lower write concerns continue
Application receives timeout error for graceful handling

2.2 Dangers of Single-AZ (Cold Standby)¶

Party_A's Proposal: Passive Second AZ

Resources in AZ2 are turned off or not part of replication quorum until disaster occurs.

Data Loss Risk (RPO)¶

AZ1 Failure (fire, power loss)
         │
         ▼
┌─────────────────────────────────────────┐
│  Replication to AZ2 was NOT continuous  │
│         (because it was "Passive")       │
└─────────────────────────────────────────┘
         │
         ▼
    ALL DATA SINCE LAST SNAPSHOT
       IS IRRETRIEVABLY LOST
           (RPO > 0)

Recovery Time Comparison¶

Metric	Single-AZ (Cold Standby)	Multi-AZ (Hot Standby)
Provision instances	5-15 minutes	N/A (already running)
Restore from snapshots	Hours (for TB of data)	N/A
Warm cache	Additional hours	Already warm
Total RTO	Hours of downtime	2-10 seconds

2.3 Distributed Deadlock Myth¶

Architecturally Impossible

MongoDB uses document-level locking within the WiredTiger storage engine.

Key Facts:

Locks are local to the node
No protocol attempts to acquire locks on AZ1 and AZ2 simultaneously
Classic "distributed deadlock" scenario is impossible in standard Replica Set deployment

Feature Comparison¶

Feature	Single-AZ (Party_A)	Multi-AZ (Party_B)
Write Availability	❌ Vulnerable to single DC failure	✅ High Availability (auto failover)
Read Scalability	❌ Limited to single node	✅ Read Preference: Secondary offloads reads
Locking Overhead	None (Local only)	None (Local only) for w:1
Failover Speed	⏱️ Hours (restore from backup)	⚡ Seconds (Raft election)

💾 3. Redis and Caching Strategy¶

3.1 Redis Topology Options¶

Redis SentinelRedis Cluster

Purpose: HA for non-clustered Redis

Monitors Primary and replicas
If Primary in AZ1 fails → promotes Replica in AZ2

Split Brain Risk

Requires quorum (usually 3 sentinels). Without 3rd witness:

Neither side can elect leader, OR
Both think they are leaders

This validates some Party_A concerns about complexity (but NOT locking).

Purpose: Sharded data across multiple nodes

Robust but requires careful configuration
"Replica Groups" must span AZs
If Master in AZ1 fails → Slave in AZ2 takes over

3.2 Clarifying "Active-Active" for Redis¶

Previous Multi-Region Setup

"Active-Active" Redis used CRDTs (Conflict-Free Replicated Data Types) via Redis Enterprise, allowing simultaneous writes to same key in both US-East and US-West.

New Single-Region Setup:

Message for Party_A

Standard ElastiCache (Redis OSS) does NOT support Active-Active writes (multi-master).

Aspect	Configuration
Writes	Go to AZ1 (Primary)
Reads	Can be served from both AZs
Failover	AZ2 is hot standby (automatic promotion)
Multi-Master	❌ NOT used

Result: Eliminates risk of "deadlocks" or write conflicts while preserving AZ failure survival.

3.3 The "Cold Cache" Catastrophe¶

Thundering Herd Problem

Party_A's suggestion of "Passive" second AZ (Cold Standby) is particularly dangerous for Redis.

graph TD
    A[AZ1 Fails] --> B[Failover to Cold Redis in AZ2]
    B --> C[Cache is EMPTY]
    C --> D[Every Query Hits MongoDB]
    D --> E[Massive Spike in Load]
    E --> F[Database Crashes]
    F --> G[Total Systemic Failure]
    
    style A fill:#ff6b6b
    style G fill:#ff6b6b

Solution: Multi-AZ Active (Hot Standby)

Replica in AZ2 constantly receives data updates
Upon failover, cache is already warm
Protects database from thundering herd
Maintains application performance

📨 4. Kafka & Event Streaming (Strimzi)¶

4.1 Rack Awareness and Data Durability¶

Kafka's Critical Role

Kafka is the nervous system of the architecture. Its resilience relies on "Rack Awareness" configuration.

Configuration:

# Strimzi Kafka Broker Configuration
broker:
  config:
    broker.rack: "us-west-2a"  # Maps to AWS AZ

Deployment Comparison¶

Single-AZ ❌ DANGEROUSMulti-AZ ✅ RECOMMENDED

Scenario: All Kafka brokers in AZ1

Event	Impact
AZ1 Failure	Total data unavailability
Storage Corruption	Permanent message queue loss
Recovery	No disaster recovery possible

Scenario: Strimzi distributes partition replicas across AZs

Partition leader in AZ1
Follower in AZ2
Producer writes with acks=all replicate to both AZs

Producer Configuration:

Properties props = new Properties();
props.put("acks", "all");              // Wait for replication
props.put("min.insync.replicas", 2);   // Require 2 AZs

Is This Locking?

No. This introduces minor latency (network RTT), but not a deadlock.

Scenario	Behavior
AZ2 becomes slow	Producer latency increases (not locked)
AZ2 fails	ISR list shrinks, leader continues

4.2 Pod Placement and Affinity¶

Kubernetes Configuration Required

Use podAntiAffinity rules to ensure brokers are strictly separated across AZs.

# Required Pod Anti-Affinity Configuration
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values: ["kafka"]
      topologyKey: topology.kubernetes.io/zone

Single-AZ Impact

Without Multi-AZ, proper pod distribution is impossible, creating a Single Point of Failure (SPOF) for the entire event pipeline.

🔒 5. Network Infrastructure & Security¶

The Critical Argument

While database arguments focus on performance and consistency, the Network Infrastructure argument is BINARY:

A Single-AZ deployment will NOT work reliably for the proposed security stack.

5.1 AWS Network Firewall: The Zonal Trap¶

Critical Limitation

AWS Network Firewall is a zonal service (not regional). It's instantiated in a specific AZ using Gateway Load Balancer (GWLB) endpoints.

Single-AZ Failure Mode¶

Scenario: Firewall deployed only in AZ1 (Party_A's "Passive AZ2" request)

graph TD
    subgraph "Normal Operation"
        IGW[Internet Gateway] --> FW1[Firewall AZ1]
        FW1 --> APP1[App Servers AZ1]
        FW1 --> APP2[App Servers AZ2]
    end
    
    subgraph "AZ1 Failure"
        IGW2[Internet Gateway] -.->|BLOCKED| FW2[Firewall AZ1 ❌]
        FW2 -.->|UNREACHABLE| APP3[App Servers AZ1 ❌]
        FW2 -.->|NO PATH| APP4[App Servers AZ2 ✓ but orphaned]
    end

Impact of AZ1 Failure

Fiber cut or GWLB control plane outage in AZ1
Firewall endpoint becomes unreachable
ALL ingress/egress traffic for entire VPC stops
Traffic destined for healthy servers in AZ2 is also blocked
"Passive" AZ2 infrastructure is orphaned (no path to internet/corporate network)

Result: Total Blackout (even though AZ2 compute resources are healthy)

Multi-AZ Solution¶

Architecture:

Firewall endpoint in dedicated subnet in each AZ
Symmetric routing (stateful inspection requirement)
Traffic for AZ2 stays within AZ2

Fault Isolation

If AZ1 fails, AZ2 traffic continues uninterrupted via AZ2 firewall endpoint.

Cost Analysis:

Aspect	Value
Additional Cost	~$0.395/hour per endpoint
Monthly Cost	~$284
Value	Prevents entire production from becoming black hole

5.2 Mikrotik VPN and BGP Failover¶

VPN Architecture

Dual Mikrotik RouterOS instances for IPsec VPNs connect AWS to on-premises datacenter.

Multi-AZ BGP Architecture¶

graph LR
    subgraph "On-Premises"
        OP[On-Prem Router]
    end
    
    subgraph "AWS"
        OP -->|Tunnel A| VPN1[VPN Endpoint AZ1]
        OP -->|Tunnel B| VPN2[VPN Endpoint AZ2]
        VPN1 --> BGP1[BGP Session 1]
        VPN2 --> BGP2[BGP Session 2]
        BGP1 --> WL[AWS Workloads]
        BGP2 --> WL
    end

Failover Mechanics:

Both tunnels exchange BGP routes
If Tunnel A fails (AZ1 outage):
BGP detects dead peer (DPD - Dead Peer Detection)
Withdraws route from Tunnel A
Traffic automatically shifts to Tunnel B
Failover time: Seconds

No Locking in BGP

BGP is a routing protocol. "Active-Active" means both paths are valid for:

Load balancing (ECMP)
Fast failover without database consistency risks

Maintenance Benefits¶

Multi-AZ ✅Single-AZ ❌

AWS patches one AZ at a time
BGP session shifts traffic to other AZ
Zero downtime

Patch window = hard connectivity outage
Manual intervention required
Service disruption inevitable

5.3 Service Chain Dependencies¶

The Bottleneck

Traffic flows through: DMZ-VPC → SEC-VPC (Firewall) → WL-VPC (Workload)

The Choke Point: SEC-VPC (Security VPC)

If SEC-VPC is Single-AZ = SPOF for every application
Even if workload is in 2 AZs (one passive), Security VPC must be in 2 AZs
Required to provide path for passive workload to become active

🔧 6. Operational Resilience & Maintenance¶

6.1 Rolling Updates and Patching¶

Cloud Infrastructure Lifecycle

Cloud infrastructure is ephemeral and requires regular maintenance.

Single-AZ ❌Multi-AZ ✅

Process:

Schedule service window (downtime)
Stop service
Apply patches
Restart service
Verify functionality

Result: Planned downtime for every update

Process:

Patch Secondary in AZ2 (no impact)
Seamless failover (step-down Primary in AZ1)
AZ2 promoted to Primary (2-10 seconds)
Patch former Primary in AZ1
Revert to normal operation

Result: Zero downtime deployment

6.2 Split Brain Risk and Mitigation¶

Valid Technical Concern

The one valid concern in a 2-AZ setup is the "Split Brain" scenario.

The Risk:

graph TD
    A[Network Link Fails Between AZs] --> B[Both AZs Remain Up]
    B --> C[Primary in AZ1 Can't Contact AZ2]
    C --> D{Without Quorum}
    D -->|AZ1 continues| E[Accepts writes]
    D -->|AZ2 elects self| F[Also accepts writes]
    E --> G[TWO PRIMARIES]
    F --> G
    G --> H[Data Corruption]
    
    style H fill:#ff6b6b

The Solution: Quorum with Arbiter

Tie-Breaker Pattern

Deploy lightweight MongoDB Arbiter in 3rd AZ (or different region if 3 AZs unavailable).

graph TD
    subgraph "AZ1 Isolated"
        P[Primary AZ1]
    end
    
    subgraph "AZ2 + AZ3"
        S[Secondary AZ2]
        A[Arbiter AZ3]
    end
    
    P -.->|Cannot see majority| P2[Steps Down]
    S -->|Sees Arbiter| S2[Forms Majority]
    S2 --> S3[Promoted to Primary]
    
    style P2 fill:#ffd93d
    style S3 fill:#6bcb77

Result:

If AZ1 isolated → realizes it cannot see majority → steps down
AZ2 sees Arbiter → forms majority → promotes itself
Consistency preserved. No locking. No split brain.

🧪 7. Failure Simulation: Power Event in AZ1¶

Impact Comparison Table¶

Timeline	Single-AZ Active (Party_A)	Dual-AZ Active (Party_B)
T+0s	Power fails in AZ1	Power fails in AZ1
T+1s	Application hard stop. VPN disconnects. Firewall unreachable.	Load Balancer health checks fail for AZ1. BGP sessions on Tunnel A drop.
T+5s	Manual intervention: Team alerted. Must spin up instances in AZ2 (Cold Start).	Automated Failover: MongoDB elects AZ2 Secondary. Redis promotes AZ2 Replica. VPN routes to Tunnel B.
T+30s	Downtime continues. Database restoring from snapshots. Network re-routing manually.	Service Restored. Application running on AZ2. Caches warm.
T+1h	Still recovering large datasets. Thundering herd as caches warm.	Operations investigates AZ1 root cause. Business continues normally.
Data Integrity	⚠️ Potential Data Loss (RPO > 0)	✅ Zero Data Loss (with w:majority)
Locking?	N/A (System is dead)	❌ No locking observed

💰 8. Financial and Risk Modeling¶

8.1 The Cost of Downtime¶

Business Context

Production environment running Kafka, Redis, and MongoDB implies high-volume transactions (likely financial or messaging based on "SMS-service").

Risk Formula:

Total Risk = Probability of Failure × Cost of Outage

Architecture	Probability	Cost	Risk Level
Single-AZ	~0.1% per year per AZ	Catastrophic (hours of downtime)	🔴 HIGH
Multi-AZ	Infinitesimal (simultaneous dual-AZ)	Minimal (seconds of brownout)	🟢 LOW

8.2 Hidden Costs of Single-AZ¶

Often Overlooked Expenses

Hidden Cost	Description
Cross-AZ Data Transfer	If any peripheral services (backups, logs, monitoring) in another AZ, you pay transfer fees anyway
Emergency Engineering	Overtime and emergency contractors for recovery often exceed annual cost of redundant instances
Reputation Damage	Customer trust erosion from extended outages
SLA Penalties	Contractual penalties for failing availability commitments

🎯 9. Strategic Recommendations¶

Summary of Recommendations

1. Adopt Multi-AZ Active-Active Infrastructure¶

The Operations team's requirement for 2 AZs is architecturally mandatory for:

AWS Network Firewall
VPN connectivity

A Single-AZ network layer is a single point of failure that compromises the entire stack.

2. Implement Active-Standby Database Topology¶

Clarification for Party_A

"Active-Active" infrastructure does NOT mean "Multi-Master Writes."

Component	Configuration
MongoDB	3-Voting-Node Replica Set (Primary AZ1, Secondary AZ2, Arbiter AZ3). Use `w:1` for standard operations.
Redis	ElastiCache Multi-AZ with Automatic Failover. Hot standby (Active-Passive writes).

3. Optimize Kafka for Durability¶

# Strimzi Configuration
kafka:
  config:
    min.insync.replicas: 2
  rack:
    topologyKey: topology.kubernetes.io/zone

4. Security Architecture¶

Deploy AWS Network Firewall endpoints in both AZs with symmetric routing
Configure Mikrotik VPNs with BGP and BFD for sub-second failover

🏁 Final Conclusion¶

The Verdict

The fear of "database deadlocks" in a Multi-AZ cloud architecture is a legacy concern that does not apply to modern consensus-based systems (MongoDB, Kafka).

The Real Risks of Single-AZ:

❌ Total loss of network connectivity
❌ Inability to fail over stateful services
❌ Extended downtime (hours vs. seconds)
❌ Potential data loss

The Bottom Line:

Dual-AZ Active architecture is not merely an 'option' for High Availability; it is the fundamental baseline for a reliable, production-grade AWS environment.

📚 References¶

Works Cited (Click to expand)

Severalnines - High Availability - RSSing.com
The cost of MongoDB ACID transactions - Henrik Ingo
Active-Active Design with Amazon ElastiCache Redis - AWS Blog
MemoryDB Multi-Region - AWS Documentation
Guidance for Atlas High Availability - MongoDB Docs
MDCC: Multi-data center consistency - ResearchGate
Scalable Transaction Execution - CMU
DX Application Performance Management - Broadcom
Tunable Consistency in MongoDB - VLDB
MongoDB Engineering Blog - MongoDB
Distributed Transaction Processing - ResearchGate
Redis Sentinel HA - Redis Docs
Redis Split Brain - Stack Overflow
ElastiCache Resilience - AWS Documentation
Redis vs ElastiCache HA - Redis Blog
Redpanda HA in Kubernetes - Redpanda Docs
Confluent Multi-AZ Deployment - Confluent Docs
Kafka Operator Best Practices - AutoMQ
Single AZ vs Multi AZ - KodeKloud
Network Firewall Multi-AZ - iCompaas
Multi-zone architecture with Network Firewall - AWS Documentation
Network Firewall Deployment Models - AWS Blog
MikroTik VPN with AWS - AWS rePost
Overlay Tunnel Failover - AWS Blog
Site-to-Site VPN Routing - AWS Documentation
Network Firewall Best Practices - AWS Security