Strimzi Kafka Cluster Configuration¶

2025-12-172025-12-17

KafkaNodePool Configuration¶

kafka-dual-role.yaml

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: dual-role
  namespace: kafka
  labels:
    strimzi.io/cluster: cluster
spec:
  replicas: 3
  roles:
  - controller
  - broker
  storage:
    type: jbod
    volumes:
    - id: 0
      type: persistent-claim
      size: 250Gi
      class: gp3
      deleteClaim: true
      kraftMetadata: shared
  template:
    pod:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kafka
                    operator: In
                    values:
                      - "true"
      tolerations:
        - key: "kafka"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              strimzi.io/cluster: cluster

🗂️ KafkaNodePool Configuration Explained¶

This section describes the key elements of the KafkaNodePool manifest for Strimzi Kafka, optimized for a dedicated 3-node worker pool.

🏷️ Metadata & Labels¶

name: dual-role — Identifies this node pool.
namespace: kafka — Deploys resources in the kafka namespace.
labels:
strimzi.io/cluster: cluster — Associates this node pool with the main Kafka cluster.

🔢 Spec¶

replicas: 3
Deploys three Kafka pods, matching your three dedicated worker nodes.
roles:
controller
broker
Each pod acts as both a controller and broker, supporting KRaft mode.
storage:
type: jbod — Allows multiple storage volumes.
volumes:
- id: 0
- type: persistent-claim
- size: 250Gi
- class: gp3
- deleteClaim: true
- kraftMetadata: shared
  Each pod gets a 250Gi persistent volume using the gp3 storage class.

🏗️ Pod Scheduling¶

nodeAffinity:
Ensures pods are scheduled only on nodes labeled kafka=true.
tolerations:
Allows pods to run on nodes tainted with kafka=true:NoSchedule, ensuring only dedicated Kafka nodes are used.
topologySpreadConstraints:
maxSkew: 1
Ensures pods are evenly distributed across nodes (no node has more than one pod difference).
topologyKey: kubernetes.io/hostname
Spreads pods by node hostname.
whenUnsatisfiable: DoNotSchedule
Prevents scheduling if even distribution is not possible.
labelSelector:
Applies only to pods with strimzi.io/cluster: cluster.

Why This Matters

This configuration guarantees high availability and fault tolerance by ensuring each Kafka pod is isolated on its own dedicated worker node, with persistent storage and strict scheduling

Kafka Configuration¶

kafka.yaml

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: cluster
  namespace: kafka
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafkaExporter:
    topicRegex: ".*"
    groupRegex: ".*"
  kafka:
    version: 3.9.0
    metadataVersion: "3.9"
    jvmOptions:
      -Xms: 4G  # Initial heap size
      -Xmx: 4G  # Maximum heap size
      -XX:
        UseG1GC: "true"  # Use G1 Garbage Collector
        G1HeapRegionSize: 16M  # Region size for G1 GC
        #UnlockExperimentalVMOptions: "true"  # Unlock experimental options
        #G1NewSizePercent: 20M  # New generation size
        #G1MaxNewSizePercent: 40M  # Max new generation size
        MaxGCPauseMillis: "20"  # Target max GC pause time
        InitiatingHeapOccupancyPercent: "35"  # Start GC when heap occupancy is 35%
        MinMetaspaceFreeRatio: "50"  # Keep at least 50% of metaspace free
        MaxMetaspaceFreeRatio: "80"  # Allow up to 80% of
    listeners:
    - name: plain
      port: 9092
      type: internal
      tls: false
    - name: tls
      port: 9093
      type: internal
      tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      log.retention.check.interval.ms: 300000 # 5 minutes (more frequent with short retention)
      # For 1-hour topics:
      log.segment.bytes: 1073741824  # 1GB (large segments for fewer rotations)
      log.segment.ms: 300000        # 5min (rotate frequently)
      log.retention.bytes: -1       # Disable size-based retention
      #log.retention.ms: 3600000     # 1hr (default, override per-topic)
      log.cleanup.policy: delete    # Not compacted
      log.cleaner.threads: 2       # Minimal cleanup overhead
      compression.type: lz4  # Efficient compression
      # CPU/Threading (Critical for 2 vCPUs)
      num.network.threads: 3             # Default (do not exceed vCPUs)
      num.io.threads: 4                  # Slightly higher than cores (for disk I/O)
      background.threads: 2              # For background tasks (e.g., log cleaning)
      socket.send.buffer.bytes: 1024000  # Optimize network buffers
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml
    template:
      clusterCaCert:
        metadata:
          annotations:
            reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
            reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
            reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "sms,sia,sim,mno,audit,timer,scheduled-jobs"
            reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "sms,sia,sim,mno,audit,timer,scheduled-jobs"
  entityOperator:
    topicOperator: {}
    userOperator: {}

⚡ Strimzi Kafka Cluster Resource Explained¶

This section describes the main Kafka custom resource manifest for deploying a Kafka cluster using Strimzi in Kubernetes.

🏷️ Metadata¶

name: cluster
The name of the Kafka cluster.
namespace: kafka
Deploys the cluster in the kafka namespace.
annotations:
strimzi.io/node-pools: enabled — Enables node pool support.
strimzi.io/kraft: enabled — Enables KRaft mode (no Zookeeper).

🔢 Spec¶

kafkaExporter:
Exports metrics for all topics and groups for monitoring.
kafka:
version: 3.9.0
Specifies the Kafka version.
jvmOptions:
Configures JVM heap size and garbage collection for optimal performance.
listeners:
- Internal listeners for both plain (non-TLS) and TLS traffic.
config:
- Sets replication factors for offsets and transaction logs to 3 for high availability.
- min.insync.replicas: 2 ensures data safety during broker failures.
- Log retention and segment settings optimize storage and performance for short-lived data.
- Compression is set to lz4 for efficient storage.
- Thread and buffer settings are tuned for typical 2 vCPU nodes.
metricsConfig:
Uses JMX Prometheus Exporter for metrics.
Loads configuration from the kafka-metrics ConfigMap.
template:
Customizes cluster CA certificate annotations for cross-namespace secret reflection.
entityOperator:
Enables both Topic and User Operators for automated topic and user management.

Why This Matters

This configuration provides a production-ready, highly available Kafka cluster with built-in monitoring, optimized JVM and log settings, and automated topic/user management. It is designed for AWS EKS or similar Kubernetes environments

KafkaTopic Configuration¶

kafka-topics.yaml

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: db.customers.sia.api
  labels:
    strimzi.io/cluster: cluster
spec:
  topicName: db_customers-sia.api
  partitions: 6
  replicas: 3
  config:
    retention.ms: 3600000

🗂️ KafkaTopic Resource Explained¶

The following KafkaTopic manifest defines a Kafka topic managed by Strimzi in your Kubernetes cluster.

🏷️ Metadata¶

name: db.customers.sia.api
The logical name for the topic resource in Kubernetes.
labels:
strimzi.io/cluster: cluster
Associates this topic with the main Kafka cluster managed by Strimzi.

🔢 Spec¶

topicName: db_customers-sia.api
The actual Kafka topic name created in the cluster.
partitions: 6
The topic will be split into 6 partitions, enabling parallelism and higher throughput.
replicas: 3
Each partition will have 3 replicas, ensuring data redundancy and high availability.
config:
retention.ms: 3600000
Messages in this topic will be retained for 1 hour (3,600,000 milliseconds).

Best Practices

Using multiple partitions improves scalability and consumer performance.
Setting replicas to match the number of Kafka brokers ensures fault tolerance.
Adjust retention.ms based on your application's data retention