🔄 Active-Passive Architecture¶

2025-12-172025-12-17

Table of Contents¶

Introduction¶

High Level Architecture Overview¶

Network Architecture Overview¶

AWS Components¶

SaaS Components¶

Unibeam Components¶

EKS Components¶

System-Wide Applications¶

Network Detailed Architecture Overview¶

Cloudflare¶

AWS Global Accelerator¶

EKS/K8s Integrations with AWS¶

AWS Pod Identity Add-on¶

Loki/Thanos/Prometheus¶

🗂️ Table of Contents¶

🗂️ Table of Contents
🚀 Introduction
🏗️ High Level Architecture Overview
🌐 Network Architecture Overview
🛠️ AWS Components
☁️ SaaS Components
🧩 Unibeam Components
🐳 EKS Components
🖥️ System Wide Applications
🕸️ Network Detailed Architecture Overview
- 🛡️ CloudFlare
- 🌍 AWS Global Accelerator
🔗 EKS/K8s Integrations with AWS

🚀 Introduction¶

This document provides a technical high-level architecture overview for the ATnT application, deployed in a multi-region, active-passive (warm) configuration on AWS EKS. The architecture leverages AWS services, SaaS integrations, and Kubernetes components to deliver high availability, scalability, and security.

🏗️ High Level Architecture Overview¶

The primary workload runs in us-east-1, with us-west-2 serving as a passive region for redundancy and failover.

🌐 Network Architecture Overview¶

Same as above

🛠️ AWS Components¶

Component	Description
EKS	Managed Kubernetes service with high availability and automatic patching
EC2	Elastic Compute Cloud for running worker nodes
ECR	Container registry for storing Docker images
Secret Manager	Secure storage for sensitive information
S3	Object storage for data and backups
VPC	Isolated network environment
TGW	Transit Gateway for connecting VPCs
Firewall	Managed firewall for network security
ALB	Application Load Balancer for HTTP/HTTPS traffic
NLB	Network Load Balancer for TCP/UDP traffic
Global Accelerator	Enhances availability and performance for global users
Route53	Managed DNS service for internal DNS resolution

EC2 Workers

Preferred EC2 instance type is t4g (AWS Graviton 2) for performance and cost efficiency.

☁️ SaaS Components¶

Service	Description
Atlas	Managed MongoDB with IAM-based ACL, active/active regional endpoints via AWS Private Link
RedisLabs	Managed Redis with username/password ACL, active/active cluster, replication, and frequent backups

Connection Details

Atlas MongoDB: Connected via AWS Private Link, supports TLS 1.2.
Redis: Connected via VPC peering, supports TLS 1.2 and 1.3.

🧩 Unibeam Components¶

us-east-1: Main region, runs majority of workloads and replication sets.
us-west-2: Passive region, runs lighter workloads, scaled up as needed.
sms-service: Deployed in all regions for SMPP bind availability and warm state.
HPA: Horizontal Pod Autoscaler for dynamic scaling based on metrics.
Karpenter: Automated provisioning of worker nodes for unscheduled pods.

🐳 EKS Components¶

Worker Group	Workloads
Unibeam Workers	SIM, SMS, MNO, API, Dashboard, Timer, Scheduled-Jobs, Audit
Kafka Workers	Kafka Broker, Coordinator
Spot Workers	Grafana, Loki Querier/Distributor, ArgoCD (except Redis), Strimzi, AWS LB Controller, Reflector
Monitor Workers	Monitoring (Kube-Prometheus-Stack, Loki, ArgoCD, Thanos, Promtail, Karpenter)

Spot Instances

Amazon EC2 Spot Instances offer up to 90% savings but can be reclaimed with a 2-minute warning.

🖥️ System Wide Applications¶

Application	Purpose
AWS Pod Identity Add-on	Secure IAM role assumption for pods
CoreDNS	DNS service for Kubernetes
Amazon VPC CNI	Pod networking and VPC integration
Kube Proxy	Endpoint services support
Amazon EBS CSI Driver	EBS support for persistent storage
AWS Load Balancer Controller	Manages AWS ELBs for Kubernetes services
mktxp-exporter	Mikrotik-IPSEC metrics exporter
twistlock-defender	Container runtime protection
reflector	Replicates secrets, configmaps, certificates
Strimzi	Kafka operator for Kubernetes
csi-secrets-store	CSI driver for external secrets
secrets-store-csi-driver-provider-aws	AWS Secrets Manager integration

Workload Placement

Workloads are scheduled using nodeSelectors and tolerations for optimal resource utilization.

🕸️ Network Detailed Architecture Overview¶

🛡️ CloudFlare¶

Cloudflare provides CDN and DDoS protection, performing health checks on ALBs. DNS api.atnt.unibeam.com resolves to Cloudflare LB, which routes traffic to the nearest ALB using "Least outstanding requests steering" with fallback to us-west-2.

graph TD
    A[Cloudflare-Global-DNS] -->|api.us.unibeam.com| B[Cloudflare-LB]
    B -->|us-east-1| C[ALB-API-East:443]
    B -->|us-west-2| D[ALB-API-West:443]

Cloudflare Health Check

Interval: 60s
Timeout: 5s
Retries: 2
Path: /health
Expected Status: 200
Response Body: {"status":"UP"}

🌍 AWS Global Accelerator¶

AWS Global Accelerator provides static IPs and routes traffic to the nearest NLB based on health checks and weighted routing (us-east-1: 100%, us-west-2: 20%).

graph TD
    A[AWS-Accelerator] -->|us-east-1| B[AWS-NLB:9506]
    A -->|us-west-2| C[AWS-NLB:9506]

Accelerator Health Check

Interval: 30s
Timeout: 5s
Threshold: 3
Port: 9506
Protocol: TCP

🔗 EKS/K8s Integrations with AWS¶

🆔 AWS Pod Identity Add-on¶

Enables pods to securely access AWS services by assigning temporary IAM credentials via Kubernetes Service Accounts.

graph TD
    A[Pod] -->|Uses| B[Kubernetes Service Account]
    B -->|Assumes| C[AWS IAM Role]
    C -->|Grants Access to| D[S3, SecretManager]
	B -->|us-west-2| D[ALB-API-West:443]

Temporary IAM credentials via AWS STS
Fine-grained permissions per namespace/workload
No hardcoded secrets
Native EKS integration

📊 Loki/Thanos/Prometheus¶

Loki/Thanos: Uses EBS for short-term and S3 for long-term log storage.
Prometheus: Stores metrics in EBS, integrates with Redis Labs, Atlas MongoDB, and CloudWatch.

Thanos Storage

Thanos stores metrics in S3 buckets with regional replication for high availability and durability.

Cloudflare LoadBalancer HealthCheck configurations:¶

Cloudflare Health Check

Interval: 60 seconds
Timeout: 5 seconds
Retries: 2
Health Check Path: /health
Expect Status Code: 200
Response Body: {"status":"UP"}

AWS Global Accelerator:¶

AWS Global Accelerator provides a static IP address that serves as a fixed entry point for the application. It routes traffic to the nearest Network Load Balancer (NLB) based on health checks and routing policies.

SIM-Accelerator details:¶

Endpoints are configured based on weighted routing policies, Traffic dial is set us-east-1 100% and us-west-2 20% to ensure that the majority of traffic is directed to the us-east-1 region, while a smaller portion is directed to the us-west-2 region for redundancy and failover.

75.2.108.23 3.33.243.63 2600:9000:a403:180c:6614:11ab:4d5b:1a99 2600:9000:a700:38a5:62e0:1fe9:a8b5:4bc8

Health Check configurations:

Health check interval - 30 seconds Timeout - 5 seconds Threshold count - 3 Health check port - 9506 Health check protocol - TCP

graph TD
		A[AWS-Accelerator] -->|us-east-1| B[AWS-NLB:9506]
    A -->|us-west-2| C[AWS-NLB:9506]

EKS/K8s Integrations with AWS¶

AWS Pod Identity Add-on¶

Kubernetes add-on that allows pods to securely access AWS services, it assigns temporary AWS IAM credentials to pods using Kubernetes Service Accounts

Temporary IAM Credentials
- Pods receive short-lived AWS credentials (via AWS_STS AssumeRole calls).
Fine-Grained Permissions
- Assign IAM roles per namespace or workload (least privilege).
No Hardcoded Secrets
- Eliminates the need for AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY.
Works with EKS (Amazon EKS Optimized)
- Native integration with Amazon EKS (but can work with other K8s clusters).

graph TD
    A[Pod] -->|Uses| B[Kubernetes Service Account]
    B -->|Assumes| C[AWS IAM Role]
    C -->|Grants Access to| D[S3, SecretManager]

Loki/Thanos/Prometheus:¶

Loki/Thanos uses two kinds of storage types: EBS, S3 - EBS - Used for short-term storage of logs, providing fast access and retrieval - S3 - Used for long-term storage of logs, providing durability and cost-effective storage Prometheus metrics are stored in EBS, service integrations include: * Redis Labs - For caching and fast access to frequently queried metrics * Atlas MongoDB - For storing and querying metrics data * CloudWatch - For monitoring and alerting on metrics data

Thanos Storage

Thanos stores data in S3 designated buckets, with region replication, providing high availability and durability for metrics and visibility for both regions.