UB-ATnT-Single-Regionยถ
Architecture Overviewยถ
This document details the transition from a Multi-Region Active-Active topology to a highly resilient Single-Region Multi-AZ architecture.
Previously utilizing us-east-1 and us-west-2 for geographic redundancy, the new architecture focuses on maximizing availability within a single region by strictly mirroring resources across two Availability Zones (Zone A & Zone B).
This approach reduces cross-region data transfer costs and complexity while maintaining high availability through redundant network paths and security appliances.
Traffic Entry Pointsยถ
The architecture utilizes distinct entry points for different traffic types to ensure optimized routing and security:
-
HTTPS Traffic (API/Dashboard):
- Entry Point: Cloudflare
- Flow: Cloudflare resolves to the AWS Internet Gateway (IGW)
Ingress Firewall Endpoints (Zone A/B)
Application Load Balancer (ALB).
- Resilience: Cloudflare performs health checks and steers traffic between the healthy AZs.
-
TCP Traffic (SIM-Service):
- Entry Point: AWS Global Accelerator
- Flow: Global Accelerator (Static IPs)
Network Load Balancer (NLB) on port
9506. - Resilience: Traffic is routed over the AWS global network to the nearest healthy endpoint, bypassing public internet congestion.
Network Resilience Strategyยถ
To eliminate single points of failure, the network infrastructure is split into parallel lanes:
- Multi-AZ Firewalls: AWS Network Firewall endpoints are deployed in both AZs. If one zone fails, traffic is automatically routed through the healthy zone's firewall.
- Separated Traffic Flows: * Ingress: Enters directly via the Workload VPC (WL-VPC) to minimize hops.
- Egress: Routes via Transit Gateway to a dedicated Security VPC (SEC-VPC) for centralized inspection before exiting through the DMZ.
Network Egress and VPN Connectivityยถ
To maintain secure site-to-site connectivity, redundant VPN tunnels are established:
- Mikrotik VPN Gateways: Deployed in both AZs within the DMZ-VPC. Each gateway connects to remote sites e.x (VTC_SAG & STC_SAG) ensuring continuous connectivity even if one AZ becomes unavailable.
- SMSC Routing: Traffic destined for SMSC endpoints, routes via the Transit Gateway to the Security VPC for inspection before exiting through the NAT Gateways.
- Both SMSC Endpoints are reachable from either AZ, ensuring no single point of failure.
- NAT Gateways: Deployed in both AZs to handle general internet-bound traffic, providing redundancy and failover capabilities.
Kafka Topicsยถ
- Topic Seperation: Dedicated Kafka topics are created for each availability zone to isolate traffic for SMS-Service processing.
sms-service-us-west-2asms-service-us-west-2b
Mermaid Diagramยถ
graph TD
%% --- External Entry Points ---
CF[Cloudflare HTTPS] -->|Traffic| IGW_A
CF -->|Traffic| IGW_B
GA[Global Accelerator TCP] -->|Traffic| IGW_A
GA -->|Traffic| IGW_B
%% --- Region Scope ---
subgraph US_West_2 [Region: us-west-2]
%% --- Workload VPC (Ingress & Compute) ---
subgraph WL_VPC [WL-VPC]
%% Availability Zone A
subgraph AZ_A [Availability Zone A]
IGW_A[Internet Gateway A] -->|All Traffic| FW_Ing_A[Ingress Firewall A]
FW_Ing_A -->|HTTPS| ALB_A[ALB Zone A]
FW_Ing_A -->|TCP/9506| NLB_A[NLB Zone A]
ALB_A --> EKS_A[EKS Node A]
NLB_A --> EKS_A
end
%% Availability Zone B
subgraph AZ_B [Availability Zone B]
IGW_B[Internet Gateway B] -->|All Traffic| FW_Ing_B[Ingress Firewall B]
FW_Ing_B -->|HTTPS| ALB_B[ALB Zone B]
FW_Ing_B -->|TCP/9506| NLB_B[NLB Zone B]
ALB_B --> EKS_B[EKS Node B]
NLB_B --> EKS_B
end
end
%% --- Transit Gateway ---
EKS_A -->|Egress| TGW[Transit Gateway]
EKS_B -->|Egress| TGW
%% --- Security VPC (Inspection) ---
subgraph SEC_VPC [SEC-VPC Inspection]
TGW --> FW_Eg_A[Egress Firewall A]
TGW --> FW_Eg_B[Egress Firewall B]
end
%% --- DMZ VPC (Exit) ---
subgraph DMZ_VPC [DMZ-VPC Edge]
FW_Eg_A --> NAT_A[NAT Gateway A]
FW_Eg_A --> MIK_A[Mikrotik VPN A]
FW_Eg_B --> NAT_B[NAT Gateway B]
FW_Eg_B --> MIK_B[Mikrotik VPN B]
end
end
%% --- External Destinations ---
NAT_A --> Internet((Public Internet))
NAT_B --> Internet
MIK_A -->|Tunnel| VTC[Remote Site: VTC_SAG]
MIK_B -->|Tunnel| STC[Remote Site: STC_SAG]
ยถ
graph TD
%% --- External Entry Points ---
CF[Cloudflare HTTPS] -->|Traffic| IGW_A
CF -->|Traffic| IGW_B
GA[Global Accelerator TCP] -->|Traffic| IGW_A
GA -->|Traffic| IGW_B
%% --- Region Scope ---
subgraph US_West_2 [Region: us-west-2]
%% --- Workload VPC (Ingress & Compute) ---
subgraph WL_VPC [WL-VPC]
%% Availability Zone A
subgraph AZ_A [Availability Zone A]
IGW_A[Internet Gateway A] -->|All Traffic| FW_Ing_A[Ingress Firewall A]
FW_Ing_A -->|HTTPS| ALB_A[ALB Zone A]
FW_Ing_A -->|TCP/9506| NLB_A[NLB Zone A]
ALB_A --> EKS_A[EKS Node A]
NLB_A --> EKS_A
end
%% Availability Zone B
subgraph AZ_B [Availability Zone B]
IGW_B[Internet Gateway B] -->|All Traffic| FW_Ing_B[Ingress Firewall B]
FW_Ing_B -->|HTTPS| ALB_B[ALB Zone B]
FW_Ing_B -->|TCP/9506| NLB_B[NLB Zone B]
ALB_B --> EKS_B[EKS Node B]
NLB_B --> EKS_B
end
end
%% --- Transit Gateway ---
EKS_A -->|Egress| TGW[Transit Gateway]
EKS_B -->|Egress| TGW
%% --- Security VPC (Inspection) ---
subgraph SEC_VPC [SEC-VPC Inspection]
TGW --> FW_Eg_A[Egress Firewall A]
TGW --> FW_Eg_B[Egress Firewall B]
end
%% --- DMZ VPC (Exit) ---
subgraph DMZ_VPC [DMZ-VPC Edge]
FW_Eg_A --> NAT_A[NAT Gateway A]
FW_Eg_A --> MIK_A[Mikrotik VPN A]
FW_Eg_B --> NAT_B[NAT Gateway B]
FW_Eg_B --> MIK_B[Mikrotik VPN B]
end
end
%% --- External Destinations ---
NAT_A --> Internet((Public Internet))
NAT_B --> Internet
MIK_A -->|Tunnel| VTC[Remote Site: VTC_SAG]
MIK_B -->|Tunnel| STC[Remote Site: STC_SAG]