๐ Active-Passive Architectureยถ
Table of Contentsยถ
Introductionยถ
High Level Architecture Overviewยถ
Network Architecture Overviewยถ
AWS Componentsยถ
SaaS Componentsยถ
Unibeam Componentsยถ
EKS Componentsยถ
System-Wide Applicationsยถ
Network Detailed Architecture Overviewยถ
Cloudflareยถ
AWS Global Acceleratorยถ
EKS/K8s Integrations with AWSยถ
AWS Pod Identity Add-onยถ
Loki/Thanos/Prometheusยถ
๐๏ธ Table of Contentsยถ
- ๐๏ธ Table of Contents
- ๐ Introduction
- ๐๏ธ High Level Architecture Overview
- ๐ Network Architecture Overview
- ๐ ๏ธ AWS Components
- โ๏ธ SaaS Components
- ๐งฉ Unibeam Components
- ๐ณ EKS Components
- ๐ฅ๏ธ System Wide Applications
- ๐ธ๏ธ Network Detailed Architecture Overview
- ๐ EKS/K8s Integrations with AWS
๐ Introductionยถ
This document provides a technical high-level architecture overview for the ATnT application, deployed in a multi-region, active-passive (warm) configuration on AWS EKS. The architecture leverages AWS services, SaaS integrations, and Kubernetes components to deliver high availability, scalability, and security.
๐๏ธ High Level Architecture Overviewยถ
The primary workload runs in us-east-1, with us-west-2 serving as a passive region for redundancy and failover.
๐ Network Architecture Overviewยถ
Same as above
๐ ๏ธ AWS Componentsยถ
| Component | Description |
|---|---|
| EKS | Managed Kubernetes service with high availability and automatic patching |
| EC2 | Elastic Compute Cloud for running worker nodes |
| ECR | Container registry for storing Docker images |
| Secret Manager | Secure storage for sensitive information |
| S3 | Object storage for data and backups |
| VPC | Isolated network environment |
| TGW | Transit Gateway for connecting VPCs |
| Firewall | Managed firewall for network security |
| ALB | Application Load Balancer for HTTP/HTTPS traffic |
| NLB | Network Load Balancer for TCP/UDP traffic |
| Global Accelerator | Enhances availability and performance for global users |
| Route53 | Managed DNS service for internal DNS resolution |
EC2 Workers
Preferred EC2 instance type is t4g (AWS Graviton 2) for performance and cost efficiency.
โ๏ธ SaaS Componentsยถ
| Service | Description |
|---|---|
| Atlas | Managed MongoDB with IAM-based ACL, active/active regional endpoints via AWS Private Link |
| RedisLabs | Managed Redis with username/password ACL, active/active cluster, replication, and frequent backups |
Connection Details
- Atlas MongoDB: Connected via AWS Private Link, supports TLS 1.2.
- Redis: Connected via VPC peering, supports TLS 1.2 and 1.3.
๐งฉ Unibeam Componentsยถ
- us-east-1: Main region, runs majority of workloads and replication sets.
- us-west-2: Passive region, runs lighter workloads, scaled up as needed.
- sms-service: Deployed in all regions for SMPP bind availability and warm state.
- HPA: Horizontal Pod Autoscaler for dynamic scaling based on metrics.
- Karpenter: Automated provisioning of worker nodes for unscheduled pods.
๐ณ EKS Componentsยถ
| Worker Group | Workloads |
|---|---|
| Unibeam Workers | SIM, SMS, MNO, API, Dashboard, Timer, Scheduled-Jobs, Audit |
| Kafka Workers | Kafka Broker, Coordinator |
| Spot Workers | Grafana, Loki Querier/Distributor, ArgoCD (except Redis), Strimzi, AWS LB Controller, Reflector |
| Monitor Workers | Monitoring (Kube-Prometheus-Stack, Loki, ArgoCD, Thanos, Promtail, Karpenter) |
Spot Instances
Amazon EC2 Spot Instances offer up to 90% savings but can be reclaimed with a 2-minute warning.
๐ฅ๏ธ System Wide Applicationsยถ
| Application | Purpose |
|---|---|
| AWS Pod Identity Add-on | Secure IAM role assumption for pods |
| CoreDNS | DNS service for Kubernetes |
| Amazon VPC CNI | Pod networking and VPC integration |
| Kube Proxy | Endpoint services support |
| Amazon EBS CSI Driver | EBS support for persistent storage |
| AWS Load Balancer Controller | Manages AWS ELBs for Kubernetes services |
| mktxp-exporter | Mikrotik-IPSEC metrics exporter |
| twistlock-defender | Container runtime protection |
| reflector | Replicates secrets, configmaps, certificates |
| Strimzi | Kafka operator for Kubernetes |
| csi-secrets-store | CSI driver for external secrets |
| secrets-store-csi-driver-provider-aws | AWS Secrets Manager integration |
Workload Placement
Workloads are scheduled using nodeSelectors and tolerations for optimal resource utilization.
๐ธ๏ธ Network Detailed Architecture Overviewยถ
๐ก๏ธ CloudFlareยถ
Cloudflare provides CDN and DDoS protection, performing health checks on ALBs. DNS api.atnt.unibeam.com resolves to Cloudflare LB, which routes traffic to the nearest ALB using "Least outstanding requests steering" with fallback to us-west-2.
graph TD
A[Cloudflare-Global-DNS] -->|api.us.unibeam.com| B[Cloudflare-LB]
B -->|us-east-1| C[ALB-API-East:443]
B -->|us-west-2| D[ALB-API-West:443]
Cloudflare Health Check
- Interval: 60s
- Timeout: 5s
- Retries: 2
- Path:
/health - Expected Status: 200
- Response Body:
{"status":"UP"}
๐ AWS Global Acceleratorยถ
AWS Global Accelerator provides static IPs and routes traffic to the nearest NLB based on health checks and weighted routing (us-east-1: 100%, us-west-2: 20%).
graph TD
A[AWS-Accelerator] -->|us-east-1| B[AWS-NLB:9506]
A -->|us-west-2| C[AWS-NLB:9506]
Accelerator Health Check
- Interval: 30s
- Timeout: 5s
- Threshold: 3
- Port: 9506
- Protocol: TCP
๐ EKS/K8s Integrations with AWSยถ
๐ AWS Pod Identity Add-onยถ
Enables pods to securely access AWS services by assigning temporary IAM credentials via Kubernetes Service Accounts.
graph TD
A[Pod] -->|Uses| B[Kubernetes Service Account]
B -->|Assumes| C[AWS IAM Role]
C -->|Grants Access to| D[S3, SecretManager]
B -->|us-west-2| D[ALB-API-West:443]
- Temporary IAM credentials via AWS STS
- Fine-grained permissions per namespace/workload
- No hardcoded secrets
- Native EKS integration
๐ Loki/Thanos/Prometheusยถ
- Loki/Thanos: Uses EBS for short-term and S3 for long-term log storage.
- Prometheus: Stores metrics in EBS, integrates with Redis Labs, Atlas MongoDB, and CloudWatch.
Thanos Storage
Thanos stores metrics in S3 buckets with regional replication for high availability and durability.
Cloudflare LoadBalancer HealthCheck configurations:ยถ
Cloudflare Health Check
- Interval: 60 seconds
- Timeout: 5 seconds
- Retries: 2
- Health Check Path: /health
- Expect Status Code: 200
- Response Body: {"status":"UP"}
AWS Global Accelerator:ยถ
AWS Global Accelerator provides a static IP address that serves as a fixed entry point for the application. It routes traffic to the nearest Network Load Balancer (NLB) based on health checks and routing policies.
SIM-Accelerator details:ยถ
Endpoints are configured based on weighted routing policies, Traffic dial is set us-east-1 100% and us-west-2 20% to ensure that the majority of traffic is directed to the us-east-1 region, while a smaller portion is directed to the us-west-2 region for redundancy and failover.
75.2.108.23 3.33.243.63 2600:9000:a403:180c:6614:11ab:4d5b:1a99 2600:9000:a700:38a5:62e0:1fe9:a8b5:4bc8
Health Check configurations:
Health check interval - 30 seconds Timeout - 5 seconds Threshold count - 3 Health check port - 9506 Health check protocol - TCP
graph TD
A[AWS-Accelerator] -->|us-east-1| B[AWS-NLB:9506]
A -->|us-west-2| C[AWS-NLB:9506]
EKS/K8s Integrations with AWSยถ
AWS Pod Identity Add-onยถ
Kubernetes add-on that allows pods to securely access AWS services, it assigns temporary AWS IAM credentials to pods using Kubernetes Service Accounts
- Temporary IAM Credentials
- Pods receive short-lived AWS credentials (via
AWS_STSAssumeRole calls).
- Pods receive short-lived AWS credentials (via
- Fine-Grained Permissions
- Assign IAM roles per namespace or workload (least privilege).
- No Hardcoded Secrets
- Eliminates the need for
AWS_ACCESS_KEY_ID&AWS_SECRET_ACCESS_KEY.
- Eliminates the need for
- Works with EKS (Amazon EKS Optimized)
- Native integration with Amazon EKS (but can work with other K8s clusters).
graph TD
A[Pod] -->|Uses| B[Kubernetes Service Account]
B -->|Assumes| C[AWS IAM Role]
C -->|Grants Access to| D[S3, SecretManager]
Loki/Thanos/Prometheus:ยถ
Loki/Thanos uses two kinds of storage types: EBS, S3 - EBS - Used for short-term storage of logs, providing fast access and retrieval - S3 - Used for long-term storage of logs, providing durability and cost-effective storage Prometheus metrics are stored in EBS, service integrations include: * Redis Labs - For caching and fast access to frequently queried metrics * Atlas MongoDB - For storing and querying metrics data * CloudWatch - For monitoring and alerting on metrics data
Thanos Storage
Thanos stores data in S3 designated buckets, with region replication, providing high availability and durability for metrics and visibility for both regions.
