Monitoring¶

2025-12-172025-12-17

TOC¶

TOC-ThisREADME¶

Monitoring
TOC
TOC-ThisREADME
📈 Monitoring Overview 🚦
🛠️ Metrics Collection with Kube-Prometheus-Stack
📊 Health Checks
📚 Log Aggregation with Loki \& Promtail
🛡️ Namespace Best Practices
🧰 Kube-Prometheus-Stack Components
🔗 References

📈 Monitoring Overview 🚦¶

This guide describes the monitoring setup for Unibeam microservices running on AWS EKS.
All applications expose /metrics and /health endpoints on port 8101.
We use Kube-Prometheus-Stack for metrics and alerting, and Loki with Promtail for centralized log aggregation.

🛠️ Metrics Collection with Kube-Prometheus-Stack¶

All Unibeam services (e.g., audit-service, mno-service, scheduled-jobs, sia-service, sim-service, sms-service, timer-service, dashboard-service) expose Prometheus-compatible metrics at:

Endpoint: /metrics
Port: 8101

Kube-Prometheus-Stack is deployed in the monitoring namespace and automatically discovers these endpoints using Kubernetes ServiceMonitors.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
    name: unibeam-app-monitor
    namespace: monitoring
spec:
    selector:
        matchLabels:
            app: unibeam-app
    endpoints:
        - port: http
            path: /metrics
            interval: 30s

ServiceMonitor Setup

Ensure each service has the correct labels and is targeted by a ServiceMonitor for automatic scraping.

📊 Health Checks¶

All apps expose a health endpoint for readiness and liveness probes:

Endpoint: /health
Port: 8101

Configure Kubernetes probes as follows:

livenessProbe:
    httpGet:
        path: /health
        port: 8101
    initialDelaySeconds: 10
    periodSeconds: 30
readinessProbe:
    httpGet:
        path: /health
        port: 8101
    initialDelaySeconds: 5
    periodSeconds: 10

Health Endpoint

The `/health` endpoint should return HTTP 200 when the service is healthy.

📚 Log Aggregation with Loki & Promtail¶

Loki is deployed in the loki namespace for centralized log storage and querying.
Promtail runs in the promtail namespace and is responsible for collecting logs from all pods across the cluster and shipping them to Loki.

All logs are searchable in Grafana using labels such as namespace, app, and pod. Promtail is configured to push logs to the Loki gateway endpoint and can filter out logs from infrastructure namespaces to reduce noise.

# Promtail Config Example
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: promtail
data:
  promtail.yaml: |
    clients:
      - url: http://logz-loki-gateway.loki/loki/api/v1/push
        # Optional: external_labels for multi-region setups
        # external_labels:
        #   region: us-west-2
        #   tenant_id: 1
    snippets:
      pipelineStages:
        - drop:
            source: "namespace"
            expression: "(kube-system|kube-public|promtail|loki|thanos|monitoring|argocd|strimzi|kafka|twistlock|scheduled-jobs|reflector|karpenter)"
        - cri: {}
    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod

Promtail Filtering

The pipeline stages above drop logs from infrastructure namespaces to keep application logs focused and relevant.

Log Search

Use Grafana to query logs by `namespace`, `app`, or `pod` for troubleshooting and auditing.

🛡️ Namespace Best Practices¶

Namespace	Purpose
monitoring	Metrics & alerting
loki	Log aggregation
promtail	Log shipping
	Application workloads

Isolation

Keep monitoring and logging components in dedicated namespaces for security and scalability.

🧰 Kube-Prometheus-Stack Components¶

Kube-Prometheus-Stack is a comprehensive monitoring solution for Kubernetes clusters.
It bundles several key services and tools for metrics, alerting, visualization, and monitoring:

Service	Purpose
Prometheus	Collects and stores metrics from Kubernetes and application endpoints.
Alertmanager	Manages alerts sent by Prometheus, including routing and notifications.
Grafana	Visualizes metrics and logs with customizable dashboards.
Node Exporter	Collects hardware and OS metrics from cluster nodes.
Kube State Metrics	Exposes cluster resource metrics (Deployments, Pods, etc).
Prometheus Operator	Simplifies deployment and management of Prometheus resources.
Blackbox Exporter	Enables synthetic monitoring (HTTP, TCP, ICMP probes).
Pushgateway	Allows ephemeral jobs to push metrics to Prometheus.
ServiceMonitors & PodMonitors	Discover and scrape metrics from services and pods.
Custom Rules & Alerts	Predefined and user-defined Prometheus alerting rules.

Stack Coverage

The stack covers infrastructure, application, and custom metrics, alerting, and visualization needs for Kubernetes environments.

Extensibility

You can extend the stack with additional exporters or custom dashboards as needed.

For more details, see the Kube-Prometheus-Stack Documentation.