Skip to content

๐Ÿ”„ Active-Passive Architectureยถ

Table of Contentsยถ

Introductionยถ

High Level Architecture Overviewยถ

Network Architecture Overviewยถ

AWS Componentsยถ

SaaS Componentsยถ

Unibeam Componentsยถ

EKS Componentsยถ

System-Wide Applicationsยถ

Network Detailed Architecture Overviewยถ

Cloudflareยถ

AWS Global Acceleratorยถ

EKS/K8s Integrations with AWSยถ

AWS Pod Identity Add-onยถ

Loki/Thanos/Prometheusยถ

๐Ÿ—‚๏ธ Table of Contentsยถ

๐Ÿš€ Introductionยถ

This document provides a technical high-level architecture overview for the ATnT application, deployed in a multi-region, active-passive (warm) configuration on AWS EKS. The architecture leverages AWS services, SaaS integrations, and Kubernetes components to deliver high availability, scalability, and security.


๐Ÿ—๏ธ High Level Architecture Overviewยถ

ATNT Architecture Active/Passive

The primary workload runs in us-east-1, with us-west-2 serving as a passive region for redundancy and failover.


๐ŸŒ Network Architecture Overviewยถ

Same as above


๐Ÿ› ๏ธ AWS Componentsยถ

Component Description
EKS Managed Kubernetes service with high availability and automatic patching
EC2 Elastic Compute Cloud for running worker nodes
ECR Container registry for storing Docker images
Secret Manager Secure storage for sensitive information
S3 Object storage for data and backups
VPC Isolated network environment
TGW Transit Gateway for connecting VPCs
Firewall Managed firewall for network security
ALB Application Load Balancer for HTTP/HTTPS traffic
NLB Network Load Balancer for TCP/UDP traffic
Global Accelerator Enhances availability and performance for global users
Route53 Managed DNS service for internal DNS resolution

EC2 Workers

Preferred EC2 instance type is t4g (AWS Graviton 2) for performance and cost efficiency.


โ˜๏ธ SaaS Componentsยถ

Service Description
Atlas Managed MongoDB with IAM-based ACL, active/active regional endpoints via AWS Private Link
RedisLabs Managed Redis with username/password ACL, active/active cluster, replication, and frequent backups

Connection Details

  • Atlas MongoDB: Connected via AWS Private Link, supports TLS 1.2.
  • Redis: Connected via VPC peering, supports TLS 1.2 and 1.3.

๐Ÿงฉ Unibeam Componentsยถ

  • us-east-1: Main region, runs majority of workloads and replication sets.
  • us-west-2: Passive region, runs lighter workloads, scaled up as needed.
  • sms-service: Deployed in all regions for SMPP bind availability and warm state.
  • HPA: Horizontal Pod Autoscaler for dynamic scaling based on metrics.
  • Karpenter: Automated provisioning of worker nodes for unscheduled pods.

๐Ÿณ EKS Componentsยถ

Worker Group Workloads
Unibeam Workers SIM, SMS, MNO, API, Dashboard, Timer, Scheduled-Jobs, Audit
Kafka Workers Kafka Broker, Coordinator
Spot Workers Grafana, Loki Querier/Distributor, ArgoCD (except Redis), Strimzi, AWS LB Controller, Reflector
Monitor Workers Monitoring (Kube-Prometheus-Stack, Loki, ArgoCD, Thanos, Promtail, Karpenter)

Spot Instances

Amazon EC2 Spot Instances offer up to 90% savings but can be reclaimed with a 2-minute warning.


๐Ÿ–ฅ๏ธ System Wide Applicationsยถ

Application Purpose
AWS Pod Identity Add-on Secure IAM role assumption for pods
CoreDNS DNS service for Kubernetes
Amazon VPC CNI Pod networking and VPC integration
Kube Proxy Endpoint services support
Amazon EBS CSI Driver EBS support for persistent storage
AWS Load Balancer Controller Manages AWS ELBs for Kubernetes services
mktxp-exporter Mikrotik-IPSEC metrics exporter
twistlock-defender Container runtime protection
reflector Replicates secrets, configmaps, certificates
Strimzi Kafka operator for Kubernetes
csi-secrets-store CSI driver for external secrets
secrets-store-csi-driver-provider-aws AWS Secrets Manager integration

Workload Placement

Workloads are scheduled using nodeSelectors and tolerations for optimal resource utilization.


๐Ÿ•ธ๏ธ Network Detailed Architecture Overviewยถ

๐Ÿ›ก๏ธ CloudFlareยถ

Cloudflare provides CDN and DDoS protection, performing health checks on ALBs. DNS api.atnt.unibeam.com resolves to Cloudflare LB, which routes traffic to the nearest ALB using "Least outstanding requests steering" with fallback to us-west-2.

graph TD
    A[Cloudflare-Global-DNS] -->|api.us.unibeam.com| B[Cloudflare-LB]
    B -->|us-east-1| C[ALB-API-East:443]
    B -->|us-west-2| D[ALB-API-West:443]

Cloudflare Health Check

  • Interval: 60s
  • Timeout: 5s
  • Retries: 2
  • Path: /health
  • Expected Status: 200
  • Response Body: {"status":"UP"}

๐ŸŒ AWS Global Acceleratorยถ

AWS Global Accelerator provides static IPs and routes traffic to the nearest NLB based on health checks and weighted routing (us-east-1: 100%, us-west-2: 20%).

graph TD
    A[AWS-Accelerator] -->|us-east-1| B[AWS-NLB:9506]
    A -->|us-west-2| C[AWS-NLB:9506]

Accelerator Health Check

  • Interval: 30s
  • Timeout: 5s
  • Threshold: 3
  • Port: 9506
  • Protocol: TCP

๐Ÿ”— EKS/K8s Integrations with AWSยถ

๐Ÿ†” AWS Pod Identity Add-onยถ

Enables pods to securely access AWS services by assigning temporary IAM credentials via Kubernetes Service Accounts.

graph TD
    A[Pod] -->|Uses| B[Kubernetes Service Account]
    B -->|Assumes| C[AWS IAM Role]
    C -->|Grants Access to| D[S3, SecretManager]
	B -->|us-west-2| D[ALB-API-West:443]
  • Temporary IAM credentials via AWS STS
  • Fine-grained permissions per namespace/workload
  • No hardcoded secrets
  • Native EKS integration

๐Ÿ“Š Loki/Thanos/Prometheusยถ

  • Loki/Thanos: Uses EBS for short-term and S3 for long-term log storage.
  • Prometheus: Stores metrics in EBS, integrates with Redis Labs, Atlas MongoDB, and CloudWatch.

Thanos Storage

Thanos stores metrics in S3 buckets with regional replication for high availability and durability.


Cloudflare LoadBalancer HealthCheck configurations:ยถ

Cloudflare Health Check

  • Interval: 60 seconds
  • Timeout: 5 seconds
  • Retries: 2
  • Health Check Path: /health
  • Expect Status Code: 200
  • Response Body: {"status":"UP"}

AWS Global Accelerator:ยถ

AWS Global Accelerator provides a static IP address that serves as a fixed entry point for the application. It routes traffic to the nearest Network Load Balancer (NLB) based on health checks and routing policies.

SIM-Accelerator details:ยถ

Endpoints are configured based on weighted routing policies, Traffic dial is set us-east-1 100% and us-west-2 20% to ensure that the majority of traffic is directed to the us-east-1 region, while a smaller portion is directed to the us-west-2 region for redundancy and failover.

75.2.108.23 3.33.243.63 2600:9000:a403:180c:6614:11ab:4d5b:1a99 2600:9000:a700:38a5:62e0:1fe9:a8b5:4bc8

Health Check configurations:

Health check interval - 30 seconds Timeout - 5 seconds Threshold count - 3 Health check port - 9506 Health check protocol - TCP

graph TD
		A[AWS-Accelerator] -->|us-east-1| B[AWS-NLB:9506]
    A -->|us-west-2| C[AWS-NLB:9506]
    

EKS/K8s Integrations with AWSยถ

AWS Pod Identity Add-onยถ

Kubernetes add-on that allows pods to securely access AWS services, it assigns temporary AWS IAM credentials to pods using Kubernetes Service Accounts

  1. Temporary IAM Credentials
    • Pods receive short-lived AWS credentials (via AWS_STS AssumeRole calls).
  2. Fine-Grained Permissions
    • Assign IAM roles per namespace or workload (least privilege).
  3. No Hardcoded Secrets
    • Eliminates the need for AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY.
  4. Works with EKS (Amazon EKS Optimized)
    • Native integration with Amazon EKS (but can work with other K8s clusters).
graph TD
    A[Pod] -->|Uses| B[Kubernetes Service Account]
    B -->|Assumes| C[AWS IAM Role]
    C -->|Grants Access to| D[S3, SecretManager]
Loki/Thanos/Prometheus:ยถ

Loki/Thanos uses two kinds of storage types: EBS, S3 - EBS - Used for short-term storage of logs, providing fast access and retrieval - S3 - Used for long-term storage of logs, providing durability and cost-effective storage Prometheus metrics are stored in EBS, service integrations include: * Redis Labs - For caching and fast access to frequently queried metrics * Atlas MongoDB - For storing and querying metrics data * CloudWatch - For monitoring and alerting on metrics data

Thanos Storage

Thanos stores data in S3 designated buckets, with region replication, providing high availability and durability for metrics and visibility for both regions.