Skip to content

🔗 Broker Tuning
🔗 Consumer Tuning
🔗 Cruise Control
🔗 Autorebalancing on Scaling

Example:

transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
offsets.topic.num.partitions=50
offsets.topic.replication.factor=3
offsets.commit.required.acks=-1
- transaction.state.log.replication.factor=3
Sets the replication factor for the Kafka transaction state log topic to 3, ensuring high availability and fault tolerance for transactional data.

  • transaction.state.log.min.isr=2
    Specifies the minimum number of in-sync replicas required for the transaction state log topic. A value of 2 means at least two replicas must acknowledge a write for it to be considered successful, improving reliability.

  • offsets.topic.num.partitions=50
    Configures the number of partitions for the Kafka offsets topic, which stores consumer group offsets. More partitions can improve scalability and parallelism for large numbers of consumer groups.

  • offsets.topic.replication.factor=3
    Sets the replication factor for the offsets topic to 3, ensuring that consumer offset data is highly available and protected against broker failures.

  • offsets.commit.required.acks=-1
    Requires all in-sync replicas to acknowledge offset commits before they are considered successful. This increases durability for committed offsets, reducing the risk of data loss.

Configuring Cruise Control for Partition Rebalancing in Strimzi Kafka

Since you're using Strimzi with Cruise Control already deployed, here's how to configure it for partition rebalancing:

1. Verify Cruise Control Deployment

First, ensure Cruise Control is properly deployed with your Strimzi Kafka cluster. Your Strimzi Kafka custom resource should include Cruise Control configuration like this:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  # ... other configurations ...
  cruiseControl:
    config:
      default.goals: com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal
      cpu.balance.threshold: 1.1
      replica.count.balance.threshold: 1.1
      disk.balance.threshold: 1.1
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        cpu: 500m
        memory: 1024Mi

2. Triggering Rebalance Operations

There are several ways to trigger rebalancing:

Option A: Using Cruise Control UI

  1. Access the Cruise Control UI (which you're setting up)
  2. Navigate to the "Proposals" section
  3. Review the proposed changes
  4. Click "Apply Proposal" to execute the rebalance

Option B: Using REST API

You can trigger rebalance via Cruise Control's REST API:

# Dry run (only shows what would be done)
curl -X POST "http://localhost:9090/kafkacruisecontrol/rebalance?dryrun=true"

# Execute rebalance
curl -X POST "http://<cruise-control-service>:9090/kafkacruisecontrol/rebalance"

Strimzi provides a KafkaRebalance custom resource for managing rebalances:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  name: my-rebalance
  labels:
    strimzi.io/cluster: my-cluster
spec:
  goals:
    - RackAwareDistributionGoal
    - ReplicaCapacityGoal
    - DiskCapacityGoal
  # Optional: Add these for more control
  # skipHardGoalCheck: false
  # replicationThrottle: 1048576

Then interact with it:

# Create the resource
kubectl apply -f kafka-rebalance.yaml

# Check status
kubectl get kafkarebalance my-rebalance -o yaml

# Approve the proposal (after reviewing)
kubectl annotate kafkarebalance my-rebalance strimzi.io/rebalance=approve

# Refresh the proposal
kubectl annotate kafkarebalance my-rebalance strimzi.io/rebalance=refresh

# Stop the rebalance
kubectl annotate kafkarebalance my-rebalance strimzi.io/rebalance=stop

3. Common Rebalance Goals

You can customize the goals Cruise Control uses for rebalancing:

spec:
  goals:
    - RackAwareDistributionGoal      # Distribute replicas across racks
    - ReplicaCapacityGoal           # Balance number of replicas per broker
    - DiskCapacityGoal              # Balance disk usage
    - NetworkInboundCapacityGoal    # Balance inbound network traffic
    - NetworkOutboundCapacityGoal   # Balance outbound network traffic
    - CpuCapacityGoal               # Balance CPU usage
    - LeaderReplicaDistributionGoal # Balance leader replicas
    - TopicReplicaDistributionGoal  # Even distribution of replicas per topic

4. Monitoring Rebalance Progress

Monitor rebalance operations through: - Cruise Control UI - KafkaRebalance resource status - Cruise Control logs (kubectl logs <cruise-control-pod>)

5. Important Considerations

  1. Rebalance Impact: Rebalancing can be resource-intensive. Consider doing it during low-traffic periods.

  2. Throttling: Configure replication throttling to minimize impact:

    spec:
      replicationThrottle: 1048576  # 1 MB/s throttle
    

  3. Approval Mode: By default, Strimzi requires manual approval of rebalances. You can enable automatic approval:

    spec:
      mode: auto
    

  4. Regular Rebalancing: Consider setting up periodic rebalancing to maintain cluster health.

Would you like me to provide more specific details about any of these aspects or help with troubleshooting any particular issue?

Kafka Performance Optimization for Large-Scale Systems with Strimzi

When running Kafka at scale with Strimzi (3 brokers/controllers), here are key optimizations to consider:

Broker Configuration

  1. JVM Settings:
  2. Set appropriate heap sizes (typically 4-8GB for brokers)
  3. Use G1 garbage collector: -XX:+UseG1GC
  4. Configure GC logging for monitoring

  5. Resource Allocation:

  6. Ensure adequate CPU (minimum 4 cores per broker)
  7. Fast local storage (preferably NVMe SSDs)
  8. Network bandwidth (10Gbps+ recommended)

  9. Kafka Broker Settings:

    config:
      num.io.threads: 8-16 (or 3x number of disks)
      num.network.threads: 12-16
      socket.send.buffer.bytes: 102400
      socket.receive.buffer.bytes: 102400
      socket.request.max.bytes: 104857600
      log.flush.interval.messages: 10000
      log.flush.interval.ms: 1000
      num.replica.fetchers: 4-8
    

Topic Configuration

  1. Partitioning:
  2. Aim for 100-200MB per partition
  3. Distribute partitions evenly across brokers
  4. Monitor partition skew

  5. Replication:

  6. For critical topics, keep replication factor of 3
  7. Consider min.insync.replicas: 2 for durability/performance balance

  8. Retention:

  9. For high-throughput topics, consider tiered storage
  10. Monitor disk usage despite short retention

Strimzi-Specific Optimizations

  1. Resource Requests/Limits:

    resources:
      requests:
        memory: "8Gi"
        cpu: "2000m"
      limits:
        memory: "8Gi"
        cpu: "4000m"
    

  2. Storage Configuration:

  3. Use type: persistent-claim with fast storage class
  4. Consider deleteClaim: false for data persistence

  5. Kubernetes Settings:

  6. Configure pod anti-affinity for broker distribution
  7. Set appropriate pod disruption budgets

Monitoring and Scaling

  1. Metrics Collection:
  2. Enable Prometheus metrics in Strimzi
  3. Monitor key metrics: under-replicated partitions, request latencies, network throughput

  4. Horizontal Scaling:

  5. Scale brokers based on CPU/network/disk usage
  6. Consider adding dedicated controller nodes at larger scales

  7. Tuning for Workload:

  8. Producer-heavy: optimize batch.size and linger.ms
  9. Consumer-heavy: optimize fetch.max.bytes and max.partition.fetch.bytes

Additional Considerations

  1. Network:
  2. Ensure low latency between brokers
  3. Consider dedicated network interfaces for replication traffic

  4. ZooKeeper:

  5. Ensure ZooKeeper ensemble is properly sized (3-5 nodes)
  6. Monitor ZK performance as it can become a bottleneck

  7. Security:

  8. Enable TLS for inter-broker communication
  9. Consider SASL for authentication if needed

Would you like me to elaborate on any specific area of optimization for your use case?

Kafka Performance Optimization with KRaft Mode in Strimzi

When using KRaft mode (Kafka Raft metadata mode) instead of ZooKeeper, you can achieve better performance and simpler operations. Here are key optimizations for your 3-broker KRaft cluster:

KRaft-Specific Optimizations

  1. Controller Configuration:

    config:
      node.id: # unique for each broker
      process.roles: "broker,controller" # combined mode
      controller.quorum.voters: "1@broker-1:9093,2@broker-2:9093,3@broker-3:9093"
      controller.listener.names: CONTROLLER
    

  2. Resource Savings:

  3. No need for separate ZooKeeper cluster (reduces resource overhead)
  4. More efficient metadata handling (reduced network hops)

Performance Tuning for KRaft

  1. Metadata Performance:

    config:
      metadata.log.max.record.bytes.between.snapshots: 20971520 # 20MB
      metadata.log.max.snapshot.interval.ms: 3600000 # 1 hour
    

  2. Quorum Configuration:

  3. With 3 controllers, set controller.quorum.election.timeout.ms: 10000
  4. Adjust controller.quorum.fetch.timeout.ms: 2000 for your network latency

  5. Log Retention:

    config:
      log.retention.check.interval.ms: 300000 # 5 minutes (more frequent with short retention)
    

Broker Configuration Enhancements

  1. Combined Broker/Controller Settings:

    resources:
      requests:
        memory: "10Gi" # More memory needed for combined roles
        cpu: "3000m"
    

  2. Network Optimization:

    config:
      controller.socket.timeout.ms: 30000
      controller.message.queue.size: 32
    

Monitoring Considerations

  1. KRaft-Specific Metrics:
  2. Monitor kafka.controller:type=KafkaController metrics
  3. Track ActiveControllerCount and LastCommittedRecordOffset

  4. Metadata Performance:

  5. Watch MetadataLogManager metrics for snapshot timing
  6. Monitor leader changes with QuorumController metrics
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.6.0 # Use latest KRaft-supported version
    replicas: 3
    config:
      process.roles: "broker,controller"
      controller.quorum.voters: "1@my-cluster-kafka-0.my-cluster-kafka-brokers:9093,2@my-cluster-kafka-1.my-cluster-kafka-brokers:9093,3@my-cluster-kafka-2.my-cluster-kafka-brokers:9093"
      controller.listener.names: CONTROLLER
      inter.broker.listener.name: INTERNAL
      num.partitions: 12 # Default partitions for new topics
      default.replication.factor: 3
      min.insync.replicas: 2
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 1000Gi
          class: fast-ssd
          deleteClaim: false

Scaling Considerations

  1. Future Expansion:
  2. To scale beyond 3 nodes, consider separating controller and broker roles
  3. For >10 brokers: dedicated controller nodes (5-7 recommended)

  4. Disaster Recovery:

  5. Configure metadata.log.dir on separate persistent volume
  6. Regular metadata snapshots backup

Would you like me to focus on any particular aspect of KRaft performance optimization for your specific workload patterns?

Here's a cost estimate for running your Kafka cluster on AWS EKS with Graviton2 (ARM-based) instances:

Instance Selection & Configuration

For a 3-broker KRaft cluster, I recommend: - Instance Type: r6g.xlarge (Graviton2, better suited for Kafka than t4g) - 4 vCPUs - 32GB RAM - Up to 10Gbps network - $0.2016/hour (~$146/month) per instance

Why not t4g: - t4g are burstable instances (not ideal for sustained Kafka workloads) - Limited network bandwidth (up to 5Gbps only on largest sizes) - Smaller memory options

Monthly Cost Breakdown

  1. EC2 Costs (3 nodes):
  2. r6g.xlarge: $0.2016/hour × 24 × 30 = $145.15/month per instance
  3. 3 instances: $435.46/month

  4. EBS Storage:

  5. Assuming 1TB gp3 volumes per broker (recommended for Kafka):
    • $0.08/GB-month × 1000GB = $80/month per volume
    • 3 volumes: $240/month
  6. (Optional) For better performance, use io1/io2 with 3000 IOPS:

    • ~$150/month per 1TB volume
  7. EKS Costs:

  8. $0.10/hour for EKS cluster: $72/month
  9. Additional for worker nodes: Already included in EC2 pricing

  10. Networking:

  11. Data transfer costs vary based on usage
  12. Inter-AZ traffic: $0.01/GB (if brokers are in different AZs)

Total Estimated Monthly Cost

Component Cost (USD)
3x r6g.xlarge $435.46
3x 1TB EBS (gp3) $240.00
EKS Management $72.00
Total $747.46

Cost Optimization Options

  1. Use Spot Instances:
  2. Can save 50-70% (but not recommended for production Kafka brokers)

  3. Reserved Instances:

  4. 1-year term: ~40% savings (~$450/month total)
  5. 3-year term: ~60% savings (~$300/month total)

  6. Storage Options:

  7. Use smaller volumes if your retention needs allow
  8. Consider st1 for older data (cheaper but lower performance)

  9. Instance Right-Sizing:

  10. Start with r6g.large (2vCPU, 16GB) for development (~$220/month total)
  11. Monitor and scale up as needed

Would you like me to provide a more detailed breakdown based on your specific throughput requirements or retention needs? Also, are you planning to run other workloads on these worker nodes or dedicated solely to Kafka?


Why r6g.xlarge over t4g for Kafka?

  1. Memory-Centric Workloads
  2. Kafka is memory-intensive (for page caching, JVM heap, OS buffers).
  3. r6g.xlarge gives 32GB RAM vs. t4g.xlarge's 16GB max (and t4g.2xlarge is needed for comparable RAM but costs more).
  4. More RAM = better disk caching = fewer I/O bottlenecks.

  5. No Burst Credits (Kafka Needs Sustained Performance)

  6. t4g uses burst credits (CPU throttles when credits run out).
  7. Kafka needs consistent CPU/network for producers, consumers, and replication.
  8. r6g offers dedicated vCPUs without throttling.

  9. Network Bandwidth

  10. r6g.xlarge: Up to 10Gbps
  11. t4g.xlarge: Up to 5Gbps (and only if credits are available).
  12. Kafka replication and high-throughput producers/consumers benefit from higher bandwidth.

  13. Graviton2 Performance

  14. Both r6g and t4g use Graviton2 (ARM), but r6g is optimized for memory-heavy workloads.
  15. Kafka’s performance scales with RAM (for log segment caching) and network.

When Might t4g Work?

  • Dev/Test Environments: If your workload is low-throughput (e.g., <10MB/s).
  • Cost-Sensitive Non-Prod: If you can tolerate occasional throttling.
  • Small Retention/Low Partitions: Fewer partitions → less memory pressure.

But for production-grade Kafka, r6g is safer.


Alternatives to Consider

  1. For Higher Throughput (50MB+/s):
  2. r6g.2xlarge (8 vCPUs, 64GB RAM) if you have many partitions or high traffic.

  3. For Cost Savings (If RAM Isn’t Critical):

  4. m6g.xlarge (4 vCPUs, 16GB RAM) — balanced CPU/memory, cheaper than r6g.

  5. For Network-Heavy Workloads:

  6. c6g.xlarge (4 vCPUs, 8GB RAM) — if CPU is the bottleneck (rare for Kafka).

Revised Cost Estimate (If You Really Want t4g)

Instance vCPUs RAM Network Cost/Month (3 Nodes) Notes
t4g.xlarge 4 16GB 5Gbps ~$180/month Risky for production
r6g.xlarge 4 32GB 10Gbps ~$435/month Recommended baseline
m6g.xlarge 4 16GB 10Gbps ~$300/month Middle ground

Key Takeaway

  • Use r6g.xlarge if:
  • You have >10 partitions, high throughput, or care about latency.
  • You need stable performance (no burst credits).
  • Use t4g.xlarge only for:
  • Dev/test clusters with low traffic.
  • Very small retention/low partitions.

Would you like me to adjust recommendations based on your expected message volume or partition count?

Disk and Volume Considerations for High-Throughput Kafka on AWS EKS

For high-throughput Kafka (e.g., 100MB+/s), disk performance is critical—it’s often the bottleneck. Here’s how to optimize storage for maximum throughput and low latency.


1. Disk Type Recommendations

AWS EBS Options

Volume Type Max IOPS Max Throughput Latency Best For
gp3 (General Purpose SSD) 16,000 1,000 MB/s Low Default choice (cost-effective)
io2/io1 (Provisioned IOPS SSD) 64,000+ 4,000 MB/s Ultra-low High-throughput production
st1 (Throughput HDD) 500 500 MB/s Higher Cheap archival (not recommended for prod)
Local NVMe (Instance Store) 1M+ 10,000+ MB/s Near-zero Best performance (ephemeral, risky)

Recommendation:
Use gp3 (with provisioned IOPS) or io2 for production Kafka.
🚫 Avoid st1 (too slow for writes) and t4g (network-bound).


2. Key Configuration Settings

For gp3 (Best Balance of Cost/Performance)

storage:
  type: persistent-claim
  size: 1Ti  # Adjust based on retention needs
  class: gp3
  deleteClaim: false
  overrides:
    # Tune for high throughput:
    iops: 6000  # Default is 3000 (up to 16,000)
    throughput: 500  # MB/s (default 125, max 1000)
- Why? Kafka is write-heavy—higher IOPS/throughput = lower latency.

For io2 (Highest Performance)

storage:
  type: persistent-claim
  size: 1Ti
  class: io2
  iops: 16000  # Can go up to 64,000
  throughput: 1000  # MB/s
- When? If you need >10K messages/sec or low-latency commits.


3. Kafka-Specific Disk Optimizations

A. JBOD vs. RAID-0

Approach Pros Cons
JBOD (Just a Bunch of Disks) Kafka balances partitions across disks No redundancy (rely on replication)
RAID-0 (Striping) Higher throughput Riskier (single disk failure = data loss)

Recommendation:
- Use JBOD (simpler, Kafka-native balancing).
- If using RAID-0, ensure replication factor ≥ 3.

B. Filesystem & Mount Options

# XFS is best for Kafka (ext4 is okay but slower)
mkfs.xfs /dev/nvme1n1

# Mount with noatime (reduces disk writes)
mount -o noatime /dev/nvme1n1 /var/lib/kafka
- Why? Kafka doesn’t need file access times (noatime reduces I/O).


4. Capacity Planning

A. Disk Size Calculation

Required Disk Space = (Daily Data Ingest × Retention Days × Replication Factor) + 20% Buffer
Example:
- 100GB/day ingested × 7-day retention × 3 replicas = 2.1TB (+20% = 2.5TB total).
- Per broker: ~850GB (distributed across 3 brokers).

B. Throughput Requirements

Target Throughput Recommended Disk Setup
<50 MB/s gp3 (3000 IOPS, 125 MB/s)
50-500 MB/s gp3 (6000+ IOPS, 500+ MB/s)
500+ MB/s io2 (16K+ IOPS, 1000 MB/s)

5. Monitoring & Alerts

Critical Disk Metrics to Watch:
- Disk I/O latency (>10ms is problematic)
- Disk queue length (sustained >1 indicates bottlenecks)
- Free disk space (Kafka crashes if disk fills)

Tools:
- AWS CloudWatch (EBS metrics)
- Kafka’s own metrics (kafka.log:type=LogFlushStats)


Summary: Best Practices for High Throughput

  1. Use gp3 (or io2 for extreme throughput).
  2. Provision extra IOPS/throughput (Kafka needs burstability).
  3. Use JBOD (not RAID) unless you need striping.
  4. Format with XFS + noatime.
  5. Monitor disk latency—it’s the #1 bottleneck.

Would you like a specific recommendation based on your expected message size/volume? (e.g., 1KB vs. 10KB messages?)

Optimized Kafka Storage Configuration for Short Retention (1hr) + One Long Retention (24hr) Topic

Given your retention profile (12 topics @ 1hr, 1 topic @ 24hr), we can optimize costs and performance by tiering storage and adjusting broker configurations.


1. Storage Tiering Strategy

A. For 1-Hour Retention Topics (High Throughput)

  • Data Characteristics:
  • Very low disk usage (data purged quickly)
  • Performance matters more than capacity
  • Recommendation:
    volumes:
      - id: 0  # Fast tier (for active segments)
        type: persistent-claim
        size: 200Gi  # Small (1hr retention needs minimal space)
        class: gp3
        iops: 6000   # Prioritize IOPS over capacity
        throughput: 500  # MB/s
    

B. For 24-Hour Retention Topic

  • Data Characteristics:
  • 24x more data retained than others
  • Can tolerate slightly higher latency
  • Recommendation:
    volumes:
      - id: 1  # Capacity tier
        type: persistent-claim
        size: 1Ti    # Adjust based on daily ingest
        class: gp3   # Or st1 if throughput <200MB/s
        iops: 3000   # Baseline IOPS
    

2. Kafka Broker Configuration Adjustments

Segment Sizing (Optimized for Short Retention)

config:
  # For 1-hour topics:
  log.segment.bytes: 268435456  # 256MB (smaller segments = faster cleanup)
  log.segment.ms: 300000        # 5min (rotate frequently)
  
  # For 24-hour topic (override via topic config):
  log.segment.bytes: 1073741824  # 1GB (fewer/larger segments)

Retention & Cleanup

config:
  log.retention.bytes: -1       # Disable size-based retention
  log.retention.ms: 3600000     # 1hr (default, override per-topic)
  log.cleanup.policy: delete    # Not compacted
  log.cleaner.threads: 2       # Minimal cleanup overhead

5. Monitoring Considerations

  1. Short-Retention Topics:
  2. Alert if disk usage >50% (unexpected backlog)
  3. Monitor LogFlushRate (should be high)

  4. 24-Hour Topic:

  5. Watch LogSize growth vs retention time
  6. Track BytesInPerSec to size storage properly

Strimzi supports JBOD (Just a Bunch of Disks), where Kafka automatically distributes partitions across multiple volumes. You can influence topic placement by:

1. Configure JBOD in Strimzi CR

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-kafka
spec:
  kafka:
    replicas: 3
    storage:
      type: jbod
      volumes:
      - id: 0  # Fast storage (for 1-hour retention topics)
        type: persistent-claim
        size: 200Gi
        class: gp3
        deleteClaim: false
        overrides:
          iops: 6000
          throughput: 500
      - id: 1  # Larger storage (for 24-hour topic)
        type: persistent-claim
        size: 1Ti
        class: gp3
        deleteClaim: false

2. Use Topic Partition Assignment Rules

Kafka automatically distributes partitions across all disks in JBOD. To bias the 24-hour topic to volume-1: - Manually assign partitions to brokers/disks:

# Create topic with explicit partition assignments
kafka-topics.sh --create \
  --topic long_retention_topic \
  --partitions 6 \
  --replication-factor 3 \
  --config retention.ms=86400000 \
  --bootstrap-server my-kafka:9092 \
  --replica-assignment 0:1:2,0:1:2,0:1:2,0:1:2,0:1:2,0:1:2
(This forces all partitions to brokers 0,1,2, which may land on volume-1 over time.)


Method 2: Separate Clusters (Extreme Approach)

If strict isolation is required: 1. Deploy two Kafka clusters (one for short-retention, one for long-retention).
2. Use different storage classes for each.
3. Route topics to the appropriate cluster via client configuration.

(Overkill for most cases, but guarantees separation.)


Method 3: Log Directory Management (Advanced)

If you need precise control, manually configure Kafka log directories: 1. Mount separate volumes to different paths:

volumes:
  - name: data-fast
    persistentVolumeClaim:
      claimName: kafka-fast
  - name: data-slow
    persistentVolumeClaim:
      claimName: kafka-slow
volumeMounts:
  - mountPath: /var/lib/kafka/fast
    name: data-fast
  - mountPath: /var/lib/kafka/slow
    name: data-slow
2. Override log.dirs per broker:
config:
  log.dirs: "/var/lib/kafka/fast,/var/lib/kafka/slow"
3. Use kafka-log-dirs.sh to move partitions:
kafka-log-dirs.sh \
  --bootstrap-server my-kafka:9092 \
  --topic long_retention_topic \
  --alter \
  --log-dir /var/lib/kafka/slow


Best Practice for Your Case

  1. Use JBOD (Method 1) and let Kafka balance partitions.
  2. Monitor disk usage:
    kafka-log-dirs.sh --describe --bootstrap-server my-kafka:9092
    
  3. Adjust retention settings per topic:
    kafka-configs.sh --alter \
      --topic long_retention_topic \
      --add-config retention.ms=86400000
    

Strimzi Limitations

  • Strimzi does not natively support topic-to-volume mapping.
  • Workarounds involve manual partition management or JBOD tuning.

Would you like a full Strimzi Kafka CR YAML with these settings?

Optimized Kafka Configuration for r6g.large (Combined Broker/Controller Roles)

Since you're using r6g.large instances (2 vCPUs, 16GB RAM) with combined broker/controller roles (KRaft mode), here are the performance-optimized settings for moderate workloads:


1. Critical Thread Pool Adjustments

Parameter Recommended Value (2 vCPU) Explanation
num.network.threads 3 Handles incoming requests (1.5x vCPUs).
num.io.threads 4 Disk I/O threads (2x vCPUs; max 8 for r6g.large).
num.replica.fetchers 2 Replication threads (1 per vCPU).
num.partitions 6-12 per topic Start low (6) to avoid overloading 2 vCPUs.

Strimzi YAML Snippet:

config:
  num.network.threads: 3
  num.io.threads: 4
  num.replica.fetchers: 2


2. Memory Allocation (16GB Total)

  • Kafka Heap: 8G (50% of RAM)
    resources:
      requests:
        memory: "16Gi"
      jvmOptions:
        -Xms: "8G"
        -Xmx: "8G"
    
  • OS Cache: Leave 8GB for:
  • Page cache (critical for throughput)
  • OS/Kernel overhead

3. Disk & Retention Optimizations

For 1-Hour Retention Topics

config:
  log.segment.bytes: "268435456"  # 256MB segments (faster cleanup)
  log.segment.ms: "300000"        # Rotate every 5min
  log.retention.check.interval.ms: "300000"  # Check every 5min

For 24-Hour Retention Topic

# Override per-topic:
kafka-configs.sh --alter \
  --topic long_retention_topic \
  --add-config segment.bytes=1073741824,retention.ms=86400000

4. Network & Socket Tuning

config:
  socket.send.buffer.bytes: "102400"    # 100KB
  socket.receive.buffer.bytes: "102400" # 100KB
  socket.request.max.bytes: "104857600" # 100MB (max message size)

5. KRaft-Specific Tweaks

config:
  controller.quorum.election.timeout.ms: "10000"  # Default
  controller.quorum.fetch.timeout.ms: "2000"      # Adjust for AWS latency

6. Monitoring & Alerts

Key Metrics to Watch on 2 vCPUs:
- CPU Usage (>70% sustained → upgrade to r6g.xlarge)
- Disk I/O Latency (>10ms → increase num.io.threads or upgrade storage)
- Network Thread Idle % (<30% → increase num.network.threads)

Tools:

kafka-topics.sh --describe --bootstrap-server YOUR_BROKER:9092
kafka-configs.sh --all --bootstrap-server YOUR_BROKER:9092


Here’s the optimized instance sizing for separated broker/controller roles in KRaft mode, using AWS Graviton2 (r6g instances) with your requirements (2 vCPUs + 32GB RAM):


1. Instance Recommendations

For Brokers (3 nodes)

  • Instance Type: r6g.2xlarge (4 vCPUs, 64GB RAM)
  • Why not r6g.xlarge (4vCPU/32GB)?
    • Kafka brokers need extra RAM for OS page cache (critical for performance).
    • 32GB is tight if you have many partitions or high throughput.
  • Alternative: m6g.2xlarge (8 vCPUs, 32GB RAM) if CPU-bound.

For Controllers (3 nodes for HA)

  • Instance Type: r6g.large (2 vCPUs, 16GB RAM)
  • Controllers are lightweight (only handle metadata, not data).
  • 16GB is overkill but ensures stability.

(Note: You suggested 2 vCPUs + 32GB, but AWS doesn’t offer this exact combo. Closest is r6g.xlarge [4vCPU/32GB].)


2. Strimzi Configuration

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.6.0
    replicas: 3  # Brokers
    resources:
      requests:
        cpu: "4000m"  # 4 vCPUs
        memory: "58Gi" # Leave 6GB for OS
    config:
      # Broker configs (optimized for 4 vCPUs)
      num.io.threads: 8
      num.network.threads: 6
    storage:
      type: jbod
      volumes:
        - id: 0
          size: 1Ti
          type: persistent-claim
          class: gp3
          iops: 6000
          throughput: 500
  controller:
    replicas: 3  # Controllers (KRaft)
    resources:
      requests:
        cpu: "2000m"  # 2 vCPUs
        memory: "12Gi" # Lightweight

3. Cost Estimate (US East Pricing)

Component Instance vCPUs RAM Monthly Cost (x3)
Brokers r6g.2xlarge 4 64GB $290.30 x 3 = $870.90
Controllers r6g.large 2 16GB $72.58 x 3 = $217.74
Total $1,088.64

(Compared to combined roles: ~20% cheaper due to smaller controllers.)


4. Key Rationale

  1. Brokers Need More CPU/RAM
  2. Kafka’s throughput scales with vCPUs (for I/O threads) and RAM (for page cache).
  3. 4 vCPUs prevent bottlenecks during replication.

  4. Controllers Are Lightweight

  5. KRaft controllers only manage metadata (no data storage).
  6. 2 vCPUs/16GB is overprovisioned (but ensures quorum stability).

  7. Why Not Your Suggested 2vCPU/32GB?

  8. AWS doesn’t offer this combo in r6g.
  9. 32GB instances (r6g.xlarge) have 4 vCPUs anyway.

5. Scaling Advice

  • If Brokers Are CPU-Bound: Upgrade to r6g.4xlarge (16 vCPUs).
  • If Low Throughput: Downgrade brokers to r6g.xlarge (4vCPU/32GB).

Want me to adjust for a specific throughput target (e.g., 50MB/s vs 500MB/s)?

Summary of Optimizations for r6g.large

  1. Thread Pools:
  2. num.io.threads=4, num.network.threads=3 (balanced for 2 vCPUs).
  3. Memory:
  4. 8GB JVM heap + 8GB OS cache.
  5. Disks:
  6. Small segments (256MB) for 1hr topics, larger (1GB) for 24hr topic.
  7. KRaft:
  8. Conservative timeouts for controller stability.

Upgrade Trigger: If CPU usage >70% or disk latency >10ms, move to r6g.xlarge (4 vCPUs).

Want me to provide a full Kafka CR YAML for Strimzi with these settings?

Yes! To ensure only Kafka pods run on your dedicated worker nodes, you’ll need to:
1. Label the nodes (for targeting)
2. Taint the nodes (for repelling other workloads)
3. Configure Strimzi to tolerate the taint + use node affinity

Here’s how to do it:


1. Label and Taint the Worker Nodes

A. Label the Nodes (e.g., kafka-worker)

# Label each node (run for each Kafka worker)
kubectl label nodes <NODE_NAME> kafka=broker

B. Add a NoSchedule Taint

kubectl taint nodes <NODE_NAME> kafka=broker:NoSchedule
- Effect:
- Only pods with matching toleration will schedule here.
- Other workloads (e.g., deployments) are automatically repelled.


2. Configure Strimzi Kafka CR for Tolerations + Affinity

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-kafka
spec:
  kafka:
    replicas: 3
    template:
      pod:
        # Tolerate the taint
        tolerations:
        - key: "kafka"
          operator: "Equal"
          value: "broker"
          effect: "NoSchedule"
        # Schedule only on labeled nodes
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kafka
                  operator: In
                  values:
                  - broker

3. For Controller Pods (KRaft Mode)

If you want controllers on the same nodes:

controller:
  template:
    pod:
      tolerations:
      - key: "kafka"
        operator: "Equal"
        value: "broker"
        effect: "NoSchedule"
      affinity:  # Optional: Co-locate controllers with brokers
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  strimzi.io/name: my-kafka-kafka
              topologyKey: kubernetes.io/hostname


4. Verify Scheduling

# Check node labels/taints
kubectl describe nodes <NODE_NAME> | grep -A 5 Labels
kubectl describe nodes <NODE_NAME> | grep Taints

# Check Kafka pod placement
kubectl get pods -n <NAMESPACE> -o wide | grep kafka

Key Notes

  1. Why Taint + Affinity?
  2. Taints: Block non-Kafka workloads.
  3. Affinity: Force Kafka to only use labeled nodes.

  4. If Using Separate Controller Nodes

  5. Repeat steps for controller-specific labels (e.g., kafka=controller).

  6. EKS Managed Node Groups

  7. Add labels/taints during node group creation:
    eksctl create nodegroup --taints="kafka=broker:NoSchedule" --labels="kafka=broker"
    

Need adjustments for your EKS setup? For example, if you’re using Karpenter or Fargate, the approach differs slightly.