2025-12-172025-12-17

Kafka-Strimzi Optimization Links¶

🔗 Broker Tuning
🔗 Consumer Tuning
🔗 Cruise Control
🔗 Autorebalancing on Scaling

Example:

transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
offsets.topic.num.partitions=50
offsets.topic.replication.factor=3
offsets.commit.required.acks=-1

- transaction.state.log.replication.factor=3
Sets the replication factor for the Kafka transaction state log topic to 3, ensuring high availability and fault tolerance for transactional data.

transaction.state.log.min.isr=2
Specifies the minimum number of in-sync replicas required for the transaction state log topic. A value of 2 means at least two replicas must acknowledge a write for it to be considered successful, improving reliability.
offsets.topic.num.partitions=50
Configures the number of partitions for the Kafka offsets topic, which stores consumer group offsets. More partitions can improve scalability and parallelism for large numbers of consumer groups.
offsets.topic.replication.factor=3
Sets the replication factor for the offsets topic to 3, ensuring that consumer offset data is highly available and protected against broker failures.
offsets.commit.required.acks=-1
Requires all in-sync replicas to acknowledge offset commits before they are considered successful. This increases durability for committed offsets, reducing the risk of data loss.

Configuring Cruise Control for Partition Rebalancing in Strimzi Kafka¶

Since you're using Strimzi with Cruise Control already deployed, here's how to configure it for partition rebalancing:

1. Verify Cruise Control Deployment¶

First, ensure Cruise Control is properly deployed with your Strimzi Kafka cluster. Your Strimzi Kafka custom resource should include Cruise Control configuration like this:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  # ... other configurations ...
  cruiseControl:
    config:
      default.goals: com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal
      cpu.balance.threshold: 1.1
      replica.count.balance.threshold: 1.1
      disk.balance.threshold: 1.1
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        cpu: 500m
        memory: 1024Mi

2. Triggering Rebalance Operations¶

There are several ways to trigger rebalancing:

Option A: Using Cruise Control UI¶

Access the Cruise Control UI (which you're setting up)
Navigate to the "Proposals" section
Review the proposed changes
Click "Apply Proposal" to execute the rebalance

Option B: Using REST API¶

You can trigger rebalance via Cruise Control's REST API:

# Dry run (only shows what would be done)
curl -X POST "http://localhost:9090/kafkacruisecontrol/rebalance?dryrun=true"

# Execute rebalance
curl -X POST "http://<cruise-control-service>:9090/kafkacruisecontrol/rebalance"

Option C: Using KafkaRebalance Custom Resource (Recommended with Strimzi)¶

Strimzi provides a KafkaRebalance custom resource for managing rebalances:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  name: my-rebalance
  labels:
    strimzi.io/cluster: my-cluster
spec:
  goals:
    - RackAwareDistributionGoal
    - ReplicaCapacityGoal
    - DiskCapacityGoal
  # Optional: Add these for more control
  # skipHardGoalCheck: false
  # replicationThrottle: 1048576

Then interact with it:

# Create the resource
kubectl apply -f kafka-rebalance.yaml

# Check status
kubectl get kafkarebalance my-rebalance -o yaml

# Approve the proposal (after reviewing)
kubectl annotate kafkarebalance my-rebalance strimzi.io/rebalance=approve

# Refresh the proposal
kubectl annotate kafkarebalance my-rebalance strimzi.io/rebalance=refresh

# Stop the rebalance
kubectl annotate kafkarebalance my-rebalance strimzi.io/rebalance=stop

3. Common Rebalance Goals¶

You can customize the goals Cruise Control uses for rebalancing:

spec:
  goals:
    - RackAwareDistributionGoal      # Distribute replicas across racks
    - ReplicaCapacityGoal           # Balance number of replicas per broker
    - DiskCapacityGoal              # Balance disk usage
    - NetworkInboundCapacityGoal    # Balance inbound network traffic
    - NetworkOutboundCapacityGoal   # Balance outbound network traffic
    - CpuCapacityGoal               # Balance CPU usage
    - LeaderReplicaDistributionGoal # Balance leader replicas
    - TopicReplicaDistributionGoal  # Even distribution of replicas per topic

4. Monitoring Rebalance Progress¶

Monitor rebalance operations through: - Cruise Control UI - KafkaRebalance resource status - Cruise Control logs (kubectl logs <cruise-control-pod>)

5. Important Considerations¶

Rebalance Impact: Rebalancing can be resource-intensive. Consider doing it during low-traffic periods.
Throttling: Configure replication throttling to minimize impact:
```
spec:
  replicationThrottle: 1048576  # 1 MB/s throttle
```
Approval Mode: By default, Strimzi requires manual approval of rebalances. You can enable automatic approval:
```
spec:
  mode: auto
```
Regular Rebalancing: Consider setting up periodic rebalancing to maintain cluster health.

Would you like me to provide more specific details about any of these aspects or help with troubleshooting any particular issue?

Kafka Performance Optimization for Large-Scale Systems with Strimzi¶

When running Kafka at scale with Strimzi (3 brokers/controllers), here are key optimizations to consider:

Broker Configuration¶

JVM Settings:
Set appropriate heap sizes (typically 4-8GB for brokers)
Use G1 garbage collector: -XX:+UseG1GC
Configure GC logging for monitoring
Resource Allocation:
Ensure adequate CPU (minimum 4 cores per broker)
Fast local storage (preferably NVMe SSDs)
Network bandwidth (10Gbps+ recommended)

Kafka Broker Settings:

config:
  num.io.threads: 8-16 (or 3x number of disks)
  num.network.threads: 12-16
  socket.send.buffer.bytes: 102400
  socket.receive.buffer.bytes: 102400
  socket.request.max.bytes: 104857600
  log.flush.interval.messages: 10000
  log.flush.interval.ms: 1000
  num.replica.fetchers: 4-8

Topic Configuration¶

Partitioning:
Aim for 100-200MB per partition
Distribute partitions evenly across brokers
Monitor partition skew
Replication:
For critical topics, keep replication factor of 3
Consider min.insync.replicas: 2 for durability/performance balance
Retention:
For high-throughput topics, consider tiered storage
Monitor disk usage despite short retention

Strimzi-Specific Optimizations¶

Resource Requests/Limits:

resources:
  requests:
    memory: "8Gi"
    cpu: "2000m"
  limits:
    memory: "8Gi"
    cpu: "4000m"

Storage Configuration:
Use type: persistent-claim with fast storage class
Consider deleteClaim: false for data persistence
Kubernetes Settings:
Configure pod anti-affinity for broker distribution
Set appropriate pod disruption budgets

Monitoring and Scaling¶

Metrics Collection:
Enable Prometheus metrics in Strimzi
Monitor key metrics: under-replicated partitions, request latencies, network throughput
Horizontal Scaling:
Scale brokers based on CPU/network/disk usage
Consider adding dedicated controller nodes at larger scales
Tuning for Workload:
Producer-heavy: optimize batch.size and linger.ms
Consumer-heavy: optimize fetch.max.bytes and max.partition.fetch.bytes

Additional Considerations¶

Network:
Ensure low latency between brokers
Consider dedicated network interfaces for replication traffic
ZooKeeper:
Ensure ZooKeeper ensemble is properly sized (3-5 nodes)
Monitor ZK performance as it can become a bottleneck
Security:
Enable TLS for inter-broker communication
Consider SASL for authentication if needed

Would you like me to elaborate on any specific area of optimization for your use case?

Kafka Performance Optimization with KRaft Mode in Strimzi¶

When using KRaft mode (Kafka Raft metadata mode) instead of ZooKeeper, you can achieve better performance and simpler operations. Here are key optimizations for your 3-broker KRaft cluster:

KRaft-Specific Optimizations¶

Controller Configuration:

config:
  node.id: # unique for each broker
  process.roles: "broker,controller" # combined mode
  controller.quorum.voters: "1@broker-1:9093,2@broker-2:9093,3@broker-3:9093"
  controller.listener.names: CONTROLLER

Resource Savings:
No need for separate ZooKeeper cluster (reduces resource overhead)
More efficient metadata handling (reduced network hops)

Performance Tuning for KRaft¶

Metadata Performance:

config:
  metadata.log.max.record.bytes.between.snapshots: 20971520 # 20MB
  metadata.log.max.snapshot.interval.ms: 3600000 # 1 hour

Quorum Configuration:
With 3 controllers, set controller.quorum.election.timeout.ms: 10000
Adjust controller.quorum.fetch.timeout.ms: 2000 for your network latency

Log Retention:

config:
  log.retention.check.interval.ms: 300000 # 5 minutes (more frequent with short retention)

Broker Configuration Enhancements¶

Combined Broker/Controller Settings:

resources:
  requests:
    memory: "10Gi" # More memory needed for combined roles
    cpu: "3000m"

Network Optimization:

config:
  controller.socket.timeout.ms: 30000
  controller.message.queue.size: 32

Monitoring Considerations¶

KRaft-Specific Metrics:
Monitor kafka.controller:type=KafkaController metrics
Track ActiveControllerCount and LastCommittedRecordOffset
Metadata Performance:
Watch MetadataLogManager metrics for snapshot timing
Monitor leader changes with QuorumController metrics

Recommended Strimzi Configuration¶

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.6.0 # Use latest KRaft-supported version
    replicas: 3
    config:
      process.roles: "broker,controller"
      controller.quorum.voters: "1@my-cluster-kafka-0.my-cluster-kafka-brokers:9093,2@my-cluster-kafka-1.my-cluster-kafka-brokers:9093,3@my-cluster-kafka-2.my-cluster-kafka-brokers:9093"
      controller.listener.names: CONTROLLER
      inter.broker.listener.name: INTERNAL
      num.partitions: 12 # Default partitions for new topics
      default.replication.factor: 3
      min.insync.replicas: 2
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 1000Gi
          class: fast-ssd
          deleteClaim: false

Scaling Considerations¶

Future Expansion:
To scale beyond 3 nodes, consider separating controller and broker roles
For >10 brokers: dedicated controller nodes (5-7 recommended)
Disaster Recovery:
Configure metadata.log.dir on separate persistent volume
Regular metadata snapshots backup

Would you like me to focus on any particular aspect of KRaft performance optimization for your specific workload patterns?

Here's a cost estimate for running your Kafka cluster on AWS EKS with Graviton2 (ARM-based) instances:

Instance Selection & Configuration¶

For a 3-broker KRaft cluster, I recommend: - Instance Type: r6g.xlarge (Graviton2, better suited for Kafka than t4g) - 4 vCPUs - 32GB RAM - Up to 10Gbps network - $0.2016/hour (~$146/month) per instance

Why not t4g: - t4g are burstable instances (not ideal for sustained Kafka workloads) - Limited network bandwidth (up to 5Gbps only on largest sizes) - Smaller memory options

Monthly Cost Breakdown¶

EC2 Costs (3 nodes):
r6g.xlarge: $0.2016/hour × 24 × 30 = $145.15/month per instance
3 instances: $435.46/month
EBS Storage:
Assuming 1TB gp3 volumes per broker (recommended for Kafka):
- $0.08/GB-month × 1000GB = $80/month per volume
- 3 volumes: $240/month
(Optional) For better performance, use io1/io2 with 3000 IOPS:
- ~$150/month per 1TB volume
EKS Costs:
$0.10/hour for EKS cluster: $72/month
Additional for worker nodes: Already included in EC2 pricing
Networking:
Data transfer costs vary based on usage
Inter-AZ traffic: $0.01/GB (if brokers are in different AZs)

Total Estimated Monthly Cost¶

Component	Cost (USD)
3x r6g.xlarge	$435.46
3x 1TB EBS (gp3)	$240.00
EKS Management	$72.00
Total	$747.46

Cost Optimization Options¶

Use Spot Instances:
Can save 50-70% (but not recommended for production Kafka brokers)
Reserved Instances:
1-year term: ~40% savings (~$450/month total)
3-year term: ~60% savings (~$300/month total)
Storage Options:
Use smaller volumes if your retention needs allow
Consider st1 for older data (cheaper but lower performance)
Instance Right-Sizing:
Start with r6g.large (2vCPU, 16GB) for development (~$220/month total)
Monitor and scale up as needed

Would you like me to provide a more detailed breakdown based on your specific throughput requirements or retention needs? Also, are you planning to run other workloads on these worker nodes or dedicated solely to Kafka?

Why r6g.xlarge over t4g for Kafka?¶

Memory-Centric Workloads
Kafka is memory-intensive (for page caching, JVM heap, OS buffers).
r6g.xlarge gives 32GB RAM vs. t4g.xlarge's 16GB max (and t4g.2xlarge is needed for comparable RAM but costs more).
More RAM = better disk caching = fewer I/O bottlenecks.
No Burst Credits (Kafka Needs Sustained Performance)
t4g uses burst credits (CPU throttles when credits run out).
Kafka needs consistent CPU/network for producers, consumers, and replication.
r6g offers dedicated vCPUs without throttling.
Network Bandwidth
r6g.xlarge: Up to 10Gbps
t4g.xlarge: Up to 5Gbps (and only if credits are available).
Kafka replication and high-throughput producers/consumers benefit from higher bandwidth.
Graviton2 Performance
Both r6g and t4g use Graviton2 (ARM), but r6g is optimized for memory-heavy workloads.
Kafka’s performance scales with RAM (for log segment caching) and network.

**When Might t4g Work?**¶

Dev/Test Environments: If your workload is low-throughput (e.g., <10MB/s).
Cost-Sensitive Non-Prod: If you can tolerate occasional throttling.
Small Retention/Low Partitions: Fewer partitions → less memory pressure.

But for production-grade Kafka, r6g is safer.

Alternatives to Consider¶

For Higher Throughput (50MB+/s):
r6g.2xlarge (8 vCPUs, 64GB RAM) if you have many partitions or high traffic.
For Cost Savings (If RAM Isn’t Critical):
m6g.xlarge (4 vCPUs, 16GB RAM) — balanced CPU/memory, cheaper than r6g.
For Network-Heavy Workloads:
c6g.xlarge (4 vCPUs, 8GB RAM) — if CPU is the bottleneck (rare for Kafka).

**Revised Cost Estimate (If You Really Want t4g)**¶

Instance	vCPUs	RAM	Network	Cost/Month (3 Nodes)	Notes
`t4g.xlarge`	4	16GB	5Gbps	~$180/month	Risky for production
`r6g.xlarge`	4	32GB	10Gbps	~$435/month	Recommended baseline
`m6g.xlarge`	4	16GB	10Gbps	~$300/month	Middle ground

Key Takeaway¶

Use r6g.xlarge if:
You have >10 partitions, high throughput, or care about latency.
You need stable performance (no burst credits).
Use t4g.xlarge only for:
Dev/test clusters with low traffic.
Very small retention/low partitions.

Would you like me to adjust recommendations based on your expected message volume or partition count?

Disk and Volume Considerations for High-Throughput Kafka on AWS EKS¶

For high-throughput Kafka (e.g., 100MB+/s), disk performance is critical—it’s often the bottleneck. Here’s how to optimize storage for maximum throughput and low latency.

1. Disk Type Recommendations¶

AWS EBS Options¶

Volume Type	Max IOPS	Max Throughput	Latency	Best For
gp3 (General Purpose SSD)	16,000	1,000 MB/s	Low	Default choice (cost-effective)
io2/io1 (Provisioned IOPS SSD)	64,000+	4,000 MB/s	Ultra-low	High-throughput production
st1 (Throughput HDD)	500	500 MB/s	Higher	Cheap archival (not recommended for prod)
Local NVMe (Instance Store)	1M+	10,000+ MB/s	Near-zero	Best performance (ephemeral, risky)

Recommendation:
✅ Use gp3 (with provisioned IOPS) or io2 for production Kafka.
🚫 Avoid st1 (too slow for writes) and t4g (network-bound).

2. Key Configuration Settings¶

For `gp3` (Best Balance of Cost/Performance)¶

storage:
  type: persistent-claim
  size: 1Ti  # Adjust based on retention needs
  class: gp3
  deleteClaim: false
  overrides:
    # Tune for high throughput:
    iops: 6000  # Default is 3000 (up to 16,000)
    throughput: 500  # MB/s (default 125, max 1000)

- Why? Kafka is write-heavy—higher IOPS/throughput = lower latency.

For `io2` (Highest Performance)¶

storage:
  type: persistent-claim
  size: 1Ti
  class: io2
  iops: 16000  # Can go up to 64,000
  throughput: 1000  # MB/s

- When? If you need >10K messages/sec or low-latency commits.

3. Kafka-Specific Disk Optimizations¶

A. JBOD vs. RAID-0¶

Approach	Pros	Cons
JBOD (Just a Bunch of Disks)	Kafka balances partitions across disks	No redundancy (rely on replication)
RAID-0 (Striping)	Higher throughput	Riskier (single disk failure = data loss)

Recommendation:
- Use JBOD (simpler, Kafka-native balancing).
- If using RAID-0, ensure replication factor ≥ 3.

B. Filesystem & Mount Options¶

# XFS is best for Kafka (ext4 is okay but slower)
mkfs.xfs /dev/nvme1n1

# Mount with noatime (reduces disk writes)
mount -o noatime /dev/nvme1n1 /var/lib/kafka

- Why? Kafka doesn’t need file access times (noatime reduces I/O).

4. Capacity Planning¶

A. Disk Size Calculation¶

Required Disk Space = (Daily Data Ingest × Retention Days × Replication Factor) + 20% Buffer

Example:
- 100GB/day ingested × 7-day retention × 3 replicas = 2.1TB (+20% = 2.5TB total).
- Per broker: ~850GB (distributed across 3 brokers).

B. Throughput Requirements¶

Target Throughput	Recommended Disk Setup
<50 MB/s	`gp3` (3000 IOPS, 125 MB/s)
50-500 MB/s	`gp3` (6000+ IOPS, 500+ MB/s)
500+ MB/s	`io2` (16K+ IOPS, 1000 MB/s)

5. Monitoring & Alerts¶

Critical Disk Metrics to Watch:
- Disk I/O latency (>10ms is problematic)
- Disk queue length (sustained >1 indicates bottlenecks)
- Free disk space (Kafka crashes if disk fills)

Tools:
- AWS CloudWatch (EBS metrics)
- Kafka’s own metrics (kafka.log:type=LogFlushStats)

Summary: Best Practices for High Throughput¶

Use gp3 (or io2 for extreme throughput).
Provision extra IOPS/throughput (Kafka needs burstability).
Use JBOD (not RAID) unless you need striping.
Format with XFS + noatime.
Monitor disk latency—it’s the #1 bottleneck.

Would you like a specific recommendation based on your expected message size/volume? (e.g., 1KB vs. 10KB messages?)

Optimized Kafka Storage Configuration for Short Retention (1hr) + One Long Retention (24hr) Topic¶

Given your retention profile (12 topics @ 1hr, 1 topic @ 24hr), we can optimize costs and performance by tiering storage and adjusting broker configurations.

1. Storage Tiering Strategy¶

A. For 1-Hour Retention Topics (High Throughput)¶

Data Characteristics:
Very low disk usage (data purged quickly)
Performance matters more than capacity

Recommendation:

volumes:
  - id: 0  # Fast tier (for active segments)
    type: persistent-claim
    size: 200Gi  # Small (1hr retention needs minimal space)
    class: gp3
    iops: 6000   # Prioritize IOPS over capacity
    throughput: 500  # MB/s

B. For 24-Hour Retention Topic¶

Data Characteristics:
24x more data retained than others
Can tolerate slightly higher latency

Recommendation:

volumes:
  - id: 1  # Capacity tier
    type: persistent-claim
    size: 1Ti    # Adjust based on daily ingest
    class: gp3   # Or st1 if throughput <200MB/s
    iops: 3000   # Baseline IOPS

2. Kafka Broker Configuration Adjustments¶

Segment Sizing (Optimized for Short Retention)¶

config:
  # For 1-hour topics:
  log.segment.bytes: 268435456  # 256MB (smaller segments = faster cleanup)
  log.segment.ms: 300000        # 5min (rotate frequently)
  
  # For 24-hour topic (override via topic config):
  log.segment.bytes: 1073741824  # 1GB (fewer/larger segments)

Retention & Cleanup¶

config:
  log.retention.bytes: -1       # Disable size-based retention
  log.retention.ms: 3600000     # 1hr (default, override per-topic)
  log.cleanup.policy: delete    # Not compacted
  log.cleaner.threads: 2       # Minimal cleanup overhead

5. Monitoring Considerations¶

Short-Retention Topics:
Alert if disk usage >50% (unexpected backlog)
Monitor LogFlushRate (should be high)
24-Hour Topic:
Watch LogSize growth vs retention time
Track BytesInPerSec to size storage properly

Method 1: JBOD Storage with Topic Partition Placement (Recommended)¶

Strimzi supports JBOD (Just a Bunch of Disks), where Kafka automatically distributes partitions across multiple volumes. You can influence topic placement by:

1. Configure JBOD in Strimzi CR¶

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-kafka
spec:
  kafka:
    replicas: 3
    storage:
      type: jbod
      volumes:
      - id: 0  # Fast storage (for 1-hour retention topics)
        type: persistent-claim
        size: 200Gi
        class: gp3
        deleteClaim: false
        overrides:
          iops: 6000
          throughput: 500
      - id: 1  # Larger storage (for 24-hour topic)
        type: persistent-claim
        size: 1Ti
        class: gp3
        deleteClaim: false

2. Use Topic Partition Assignment Rules¶

Kafka automatically distributes partitions across all disks in JBOD. To bias the 24-hour topic to volume-1: - Manually assign partitions to brokers/disks:

# Create topic with explicit partition assignments
kafka-topics.sh --create \
  --topic long_retention_topic \
  --partitions 6 \
  --replication-factor 3 \
  --config retention.ms=86400000 \
  --bootstrap-server my-kafka:9092 \
  --replica-assignment 0:1:2,0:1:2,0:1:2,0:1:2,0:1:2,0:1:2

(This forces all partitions to brokers 0,1,2, which may land on volume-1 over time.)

Method 2: Separate Clusters (Extreme Approach)¶

If strict isolation is required: 1. Deploy two Kafka clusters (one for short-retention, one for long-retention).
2. Use different storage classes for each.
3. Route topics to the appropriate cluster via client configuration.

(Overkill for most cases, but guarantees separation.)

Method 3: Log Directory Management (Advanced)¶

If you need precise control, manually configure Kafka log directories: 1. Mount separate volumes to different paths:

volumes:
  - name: data-fast
    persistentVolumeClaim:
      claimName: kafka-fast
  - name: data-slow
    persistentVolumeClaim:
      claimName: kafka-slow
volumeMounts:
  - mountPath: /var/lib/kafka/fast
    name: data-fast
  - mountPath: /var/lib/kafka/slow
    name: data-slow

2. Override log.dirs per broker:

config:
  log.dirs: "/var/lib/kafka/fast,/var/lib/kafka/slow"

3. Use kafka-log-dirs.sh to move partitions:

kafka-log-dirs.sh \
  --bootstrap-server my-kafka:9092 \
  --topic long_retention_topic \
  --alter \
  --log-dir /var/lib/kafka/slow

Best Practice for Your Case¶

Use JBOD (Method 1) and let Kafka balance partitions.

Monitor disk usage:

kafka-log-dirs.sh --describe --bootstrap-server my-kafka:9092

Adjust retention settings per topic:

kafka-configs.sh --alter \
  --topic long_retention_topic \
  --add-config retention.ms=86400000

Strimzi Limitations¶

Strimzi does not natively support topic-to-volume mapping.
Workarounds involve manual partition management or JBOD tuning.

Would you like a full Strimzi Kafka CR YAML with these settings?

Optimized Kafka Configuration for `r6g.large` (Combined Broker/Controller Roles)¶

Since you're using r6g.large instances (2 vCPUs, 16GB RAM) with combined broker/controller roles (KRaft mode), here are the performance-optimized settings for moderate workloads:

1. Critical Thread Pool Adjustments¶

Parameter	Recommended Value (2 vCPU)	Explanation
`num.network.threads`	`3`	Handles incoming requests (1.5x vCPUs).
`num.io.threads`	`4`	Disk I/O threads (2x vCPUs; max 8 for `r6g.large`).
`num.replica.fetchers`	`2`	Replication threads (1 per vCPU).
`num.partitions`	`6-12` per topic	Start low (6) to avoid overloading 2 vCPUs.

Strimzi YAML Snippet:

config:
  num.network.threads: 3
  num.io.threads: 4
  num.replica.fetchers: 2

2. Memory Allocation (16GB Total)¶

Kafka Heap: 8G (50% of RAM)

resources:
  requests:
    memory: "16Gi"
  jvmOptions:
    -Xms: "8G"
    -Xmx: "8G"

OS Cache: Leave 8GB for:
Page cache (critical for throughput)
OS/Kernel overhead

3. Disk & Retention Optimizations¶

For 1-Hour Retention Topics¶

config:
  log.segment.bytes: "268435456"  # 256MB segments (faster cleanup)
  log.segment.ms: "300000"        # Rotate every 5min
  log.retention.check.interval.ms: "300000"  # Check every 5min

For 24-Hour Retention Topic¶

# Override per-topic:
kafka-configs.sh --alter \
  --topic long_retention_topic \
  --add-config segment.bytes=1073741824,retention.ms=86400000

4. Network & Socket Tuning¶

config:
  socket.send.buffer.bytes: "102400"    # 100KB
  socket.receive.buffer.bytes: "102400" # 100KB
  socket.request.max.bytes: "104857600" # 100MB (max message size)

5. KRaft-Specific Tweaks¶

config:
  controller.quorum.election.timeout.ms: "10000"  # Default
  controller.quorum.fetch.timeout.ms: "2000"      # Adjust for AWS latency

6. Monitoring & Alerts¶

Key Metrics to Watch on 2 vCPUs:
- CPU Usage (>70% sustained → upgrade to r6g.xlarge)
- Disk I/O Latency (>10ms → increase num.io.threads or upgrade storage)
- Network Thread Idle % (<30% → increase num.network.threads)

Tools:

kafka-topics.sh --describe --bootstrap-server YOUR_BROKER:9092
kafka-configs.sh --all --bootstrap-server YOUR_BROKER:9092

Here’s the optimized instance sizing for separated broker/controller roles in KRaft mode, using AWS Graviton2 (r6g instances) with your requirements (2 vCPUs + 32GB RAM):

1. Instance Recommendations¶

For Brokers (3 nodes)¶

Instance Type: r6g.2xlarge (4 vCPUs, 64GB RAM)
Why not r6g.xlarge (4vCPU/32GB)?
- Kafka brokers need extra RAM for OS page cache (critical for performance).
- 32GB is tight if you have many partitions or high throughput.
Alternative: m6g.2xlarge (8 vCPUs, 32GB RAM) if CPU-bound.

For Controllers (3 nodes for HA)¶

Instance Type: r6g.large (2 vCPUs, 16GB RAM)
Controllers are lightweight (only handle metadata, not data).
16GB is overkill but ensures stability.

(Note: You suggested 2 vCPUs + 32GB, but AWS doesn’t offer this exact combo. Closest is r6g.xlarge [4vCPU/32GB].)

2. Strimzi Configuration¶

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.6.0
    replicas: 3  # Brokers
    resources:
      requests:
        cpu: "4000m"  # 4 vCPUs
        memory: "58Gi" # Leave 6GB for OS
    config:
      # Broker configs (optimized for 4 vCPUs)
      num.io.threads: 8
      num.network.threads: 6
    storage:
      type: jbod
      volumes:
        - id: 0
          size: 1Ti
          type: persistent-claim
          class: gp3
          iops: 6000
          throughput: 500
  controller:
    replicas: 3  # Controllers (KRaft)
    resources:
      requests:
        cpu: "2000m"  # 2 vCPUs
        memory: "12Gi" # Lightweight

3. Cost Estimate (US East Pricing)¶

Component	Instance	vCPUs	RAM	Monthly Cost (x3)
Brokers	`r6g.2xlarge`	4	64GB	$290.30 x 3 = $870.90
Controllers	`r6g.large`	2	16GB	$72.58 x 3 = $217.74
Total				$1,088.64

(Compared to combined roles: ~20% cheaper due to smaller controllers.)

4. Key Rationale¶

Brokers Need More CPU/RAM
Kafka’s throughput scales with vCPUs (for I/O threads) and RAM (for page cache).
4 vCPUs prevent bottlenecks during replication.
Controllers Are Lightweight
KRaft controllers only manage metadata (no data storage).
2 vCPUs/16GB is overprovisioned (but ensures quorum stability).
Why Not Your Suggested 2vCPU/32GB?
AWS doesn’t offer this combo in r6g.
32GB instances (r6g.xlarge) have 4 vCPUs anyway.

5. Scaling Advice¶

If Brokers Are CPU-Bound: Upgrade to r6g.4xlarge (16 vCPUs).
If Low Throughput: Downgrade brokers to r6g.xlarge (4vCPU/32GB).

Want me to adjust for a specific throughput target (e.g., 50MB/s vs 500MB/s)?

Summary of Optimizations for `r6g.large`¶

Thread Pools:
num.io.threads=4, num.network.threads=3 (balanced for 2 vCPUs).
Memory:
8GB JVM heap + 8GB OS cache.
Disks:
Small segments (256MB) for 1hr topics, larger (1GB) for 24hr topic.
KRaft:
Conservative timeouts for controller stability.

Upgrade Trigger: If CPU usage >70% or disk latency >10ms, move to r6g.xlarge (4 vCPUs).

Want me to provide a full Kafka CR YAML for Strimzi with these settings?

Yes! To ensure only Kafka pods run on your dedicated worker nodes, you’ll need to:
1. Label the nodes (for targeting)
2. Taint the nodes (for repelling other workloads)
3. Configure Strimzi to tolerate the taint + use node affinity

Here’s how to do it:

1. Label and Taint the Worker Nodes¶

A. Label the Nodes (e.g., `kafka-worker`)¶

# Label each node (run for each Kafka worker)
kubectl label nodes <NODE_NAME> kafka=broker

B. Add a NoSchedule Taint¶

kubectl taint nodes <NODE_NAME> kafka=broker:NoSchedule

- Effect:
- Only pods with matching toleration will schedule here.
- Other workloads (e.g., deployments) are automatically repelled.

2. Configure Strimzi Kafka CR for Tolerations + Affinity¶

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-kafka
spec:
  kafka:
    replicas: 3
    template:
      pod:
        # Tolerate the taint
        tolerations:
        - key: "kafka"
          operator: "Equal"
          value: "broker"
          effect: "NoSchedule"
        # Schedule only on labeled nodes
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kafka
                  operator: In
                  values:
                  - broker

3. For Controller Pods (KRaft Mode)¶

If you want controllers on the same nodes:

controller:
  template:
    pod:
      tolerations:
      - key: "kafka"
        operator: "Equal"
        value: "broker"
        effect: "NoSchedule"
      affinity:  # Optional: Co-locate controllers with brokers
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  strimzi.io/name: my-kafka-kafka
              topologyKey: kubernetes.io/hostname

4. Verify Scheduling¶

# Check node labels/taints
kubectl describe nodes <NODE_NAME> | grep -A 5 Labels
kubectl describe nodes <NODE_NAME> | grep Taints

# Check Kafka pod placement
kubectl get pods -n <NAMESPACE> -o wide | grep kafka

Key Notes¶

Why Taint + Affinity?
Taints: Block non-Kafka workloads.
Affinity: Force Kafka to only use labeled nodes.
If Using Separate Controller Nodes
Repeat steps for controller-specific labels (e.g., kafka=controller).
EKS Managed Node Groups

Add labels/taints during node group creation:

eksctl create nodegroup --taints="kafka=broker:NoSchedule" --labels="kafka=broker"

Need adjustments for your EKS setup? For example, if you’re using Karpenter or Fargate, the approach differs slightly.