Skip to main content

Command Palette

Search for a command to run...

Installing Kafka Clusters with Helm Charts: A Step-by-Step Guide

Published
โ€ข4 min read
Installing Kafka Clusters with Helm Charts: A Step-by-Step Guide

Apache Kafka is the backbone of modern data streaming, and deploying it on Kubernetes ensures scalability and resilience. In this tutorial, we will set up a Kafka cluster in KRaft mode (without Zookeeper) using a custom Helm Chart.

By the end of this guide, you will have a running Kafka cluster defined as code, ready to handle your streaming data.

Prerequisites

Before we dive in, make sure you have the following tools installed and configured:

  1. Kubernetes Cluster: A running cluster (Minikube, Kind, or a cloud provider like GKE/EKS).

  2. kubectl: The Kubernetes command-line tool.

  3. Helm: The package manager for Kubernetes.

Step 1: Installing Helm

If you haven't installed Helm yet, here is how you can do it on Linux/macOS.

For macOS (using Homebrew):

brew install helm

For Linux (using Script):

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Verify the installation:

helm version

Step 2: Initialize the Helm Chart

Let's create the directory structure for our chart. Run the following command to generate a boilerplate chart:

helm create kafka-chart

This creates a folder named kafka-chart. Since we want to build our own logic, clean up the default templates:

rm -rf kafka-chart/templates/*
rm kafka-chart/values.yaml

Now we have a clean slate to add our configuration files.

Step 3: Configuration Files

We need to define our Chart metadata and default values.

  1. Chart Definition (Chart.yaml)

Open kafka-chart/Chart.yaml and replace its content with the following to define our application info:

apiVersion: v2
name: kafka-chart
description: A Helm chart for deploying Kafka with KRaft mode
type: application
version: 0.1.0
appVersion: "1.0"
  1. Default Values (values.yaml)

Create a new kafka-chart/values.yaml. This file serves as the single source of truth for our configuration (replicas, image, storage, etc.).

replicaCount: 3

service:
  name: kafka-svc
  port: 9092

image:
  repository: doughgle/kafka-kraft
  tag: latest
  pullPolicy: IfNotPresent

pdb:
  minAvailable: 2

storage:
  size: 1Gi

kafka:
  clusterId: "oh-sxaDRTcyAr6pFRbXyzA"
  replicationFactor: 3
  minInSyncReplicas: 2
  shareDir: /mnt/kafka

namespace: default

Step 4: Creating Kubernetes Templates

Now, let's create the actual Kubernetes resources inside the kafka-chart/templates/ directory.

  1. Headless Service (templates/services.yaml)

We use a Headless Service (clusterIP: None) because Kafka brokers need stable network identities.

apiVersion: v1
kind: Service
metadata:
  name: {{ .Values.service.name }}
  labels:
    app: kafka-app
spec:
  clusterIP: None
  ports:
    - name: '9092'
      port: {{ .Values.service.port }}
      protocol: TCP
      targetPort: {{ .Values.service.port }}
  selector:
    app: kafka-app
  1. Pod Disruption Budget (templates/pdb.yaml)

To ensure high availability during voluntary disruptions (like node upgrades), we define a PDB.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: kafka-pdb
spec:
  minAvailable: {{ .Values.pdb.minAvailable }}
  selector:
    matchLabels:
      app: kafka-app
  1. StatefulSet (templates/statefulset.yaml)

The StatefulSet manages the deployment and scaling of the Kafka pods. It handles the storage volume claims and passes necessary environment variables for the KRaft mode.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  labels:
    app: kafka-app
spec:
  serviceName: {{ .Values.service.name }}
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: kafka-app
  template:
    metadata:
      labels:
        app: kafka-app
    spec:
      containers:
        - name: kafka-container
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - containerPort: 9092
            - containerPort: 9093
          env:
            - name: REPLICAS
              value: "{{ .Values.replicaCount }}"
            - name: SERVICE
              value: "{{ .Values.service.name }}"
            - name: NAMESPACE
              value: "{{ .Values.namespace }}"
            - name: SHARE_DIR
              value: "{{ .Values.kafka.shareDir }}"
            - name: CLUSTER_ID
              value: "{{ .Values.kafka.clusterId }}"
            - name: DEFAULT_REPLICATION_FACTOR
              value: "{{ .Values.kafka.replicationFactor }}"
            - name: DEFAULT_MIN_INSYNC_REPLICAS
              value: "{{ .Values.kafka.minInSyncReplicas }}"
          volumeMounts:
            - name: data
              mountPath: {{ .Values.kafka.shareDir }}
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: {{ .Values.storage.size }}

Step 5: Deploying the Chart

With all files in place, we can now install our Kafka cluster.

  1. Dry Run (Optional):

    It's good practice to verify what will be generated before applying it.

     helm install kafka-release ./kafka-chart --dry-run --debug
    
  2. Install the Chart:

    Run the following command to deploy:

     helm install kafka-release ./kafka-chart
    

Step 6: Verification

Once installed, check the status of your pods:

kubectl get pods -w

You should see 3 pods (kafka-0, kafka-1, kafka-2) transitioning to the Running state.

To verify the service:

kubectl get svc

You have now successfully deployed a Kafka cluster using Helm! This setup uses the KRaft mode, removing the dependency on Zookeeper and simplifying the architecture.

Happy Coding! ๐Ÿš€