Files
homelab-optimized/docs/infrastructure/kubernetes-cluster-setup.md
Gitea Mirror Bot b5e43a65a7
Some checks failed
Documentation / Build Docusaurus (push) Failing after 1m12s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-31 11:35:19 UTC
2026-03-31 11:35:19 +00:00

22 KiB

☸️ Kubernetes Cluster Setup Guide

🔴 Advanced Guide

This guide covers deploying and managing a production-ready Kubernetes cluster in your homelab, including high availability, storage, networking, and service deployment.

🎯 Kubernetes Architecture for Homelab

Cluster Design

# Recommended cluster topology:

# Control Plane Nodes (3 nodes for HA)
k8s-master-01: 192.168.10.201 (Concord-NUC)
k8s-master-02: 192.168.10.202 (Homelab-VM)
k8s-master-03: 192.168.10.203 (Chicago-VM)

# Worker Nodes (3+ nodes)
k8s-worker-01: 192.168.10.211 (Bulgaria-VM)
k8s-worker-02: 192.168.10.212 (Guava)
k8s-worker-03: 192.168.10.213 (Setillo)

# Storage Nodes (Ceph/Longhorn)
k8s-storage-01: 192.168.10.221 (Atlantis)
k8s-storage-02: 192.168.10.222 (Calypso)
k8s-storage-03: 192.168.10.223 (Anubis)

Resource Requirements

# Control Plane Nodes (minimum)
CPU: 2 cores
RAM: 4 GB
Storage: 50 GB SSD
Network: 1 Gbps

# Worker Nodes (minimum)
CPU: 4 cores
RAM: 8 GB
Storage: 100 GB SSD
Network: 1 Gbps

# Storage Nodes (recommended)
CPU: 4 cores
RAM: 16 GB
Storage: 500 GB+ SSD + additional storage
Network: 10 Gbps (if available)

🚀 Cluster Installation

Prerequisites on All Nodes

# Update system
sudo apt update && sudo apt upgrade -y

# Install required packages
sudo apt install -y apt-transport-https ca-certificates curl gpg

# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Load kernel modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Configure sysctl
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

sudo sysctl --system

Install Container Runtime (containerd)

# Install containerd
sudo apt install -y containerd

# Configure containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

# Enable SystemdCgroup
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

# Restart containerd
sudo systemctl restart containerd
sudo systemctl enable containerd

Install Kubernetes Components

# Add Kubernetes repository
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Install Kubernetes
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# Enable kubelet
sudo systemctl enable kubelet

Initialize First Control Plane Node

# On k8s-master-01 (192.168.10.201)
sudo kubeadm init \
  --control-plane-endpoint="k8s-api.vish.local:6443" \
  --upload-certs \
  --apiserver-advertise-address=192.168.10.201 \
  --pod-network-cidr=10.244.0.0/16 \
  --service-cidr=10.96.0.0/12

# Configure kubectl for root
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Save join commands (output from kubeadm init)
# Control plane join command:
kubeadm join k8s-api.vish.local:6443 --token TOKEN \
  --discovery-token-ca-cert-hash sha256:HASH \
  --control-plane --certificate-key CERT_KEY

# Worker join command:
kubeadm join k8s-api.vish.local:6443 --token TOKEN \
  --discovery-token-ca-cert-hash sha256:HASH

Install CNI Plugin (Flannel)

# Install Flannel for pod networking
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

# Verify installation
kubectl get pods -n kube-flannel
kubectl get nodes

Join Additional Control Plane Nodes

# On k8s-master-02 and k8s-master-03
# Use the control plane join command from kubeadm init output
sudo kubeadm join k8s-api.vish.local:6443 --token TOKEN \
  --discovery-token-ca-cert-hash sha256:HASH \
  --control-plane --certificate-key CERT_KEY

# Configure kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Join Worker Nodes

# On all worker nodes
# Use the worker join command from kubeadm init output
sudo kubeadm join k8s-api.vish.local:6443 --token TOKEN \
  --discovery-token-ca-cert-hash sha256:HASH

Method 2: k3s (Lightweight Alternative)

Install k3s Master

# On first master node
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --disable traefik \
  --disable servicelb \
  --write-kubeconfig-mode 644 \
  --cluster-cidr=10.244.0.0/16 \
  --service-cidr=10.96.0.0/12

# Get node token
sudo cat /var/lib/rancher/k3s/server/node-token

Join Additional Masters

# On additional master nodes
curl -sfL https://get.k3s.io | sh -s - server \
  --server https://192.168.10.201:6443 \
  --token NODE_TOKEN \
  --disable traefik \
  --disable servicelb

# Configure kubectl
mkdir -p $HOME/.kube
sudo cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Join Worker Nodes

# On worker nodes
curl -sfL https://get.k3s.io | sh -s - agent \
  --server https://192.168.10.201:6443 \
  --token NODE_TOKEN

🗄️ Storage Configuration

Longhorn Distributed Storage

Install Longhorn

# Add Longhorn Helm repository
helm repo add longhorn https://charts.longhorn.io
helm repo update

# Create namespace
kubectl create namespace longhorn-system

# Install Longhorn
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --set defaultSettings.defaultDataPath="/var/lib/longhorn" \
  --set defaultSettings.replicaCount=3 \
  --set defaultSettings.defaultDataLocality="best-effort"

# Verify installation
kubectl get pods -n longhorn-system
kubectl get storageclass

Configure Storage Classes

# Create storage classes for different use cases
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  diskSelector: "ssd"
  nodeSelector: "storage"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-bulk
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  diskSelector: "hdd"
EOF

NFS Storage (Alternative)

Setup NFS Server (on Atlantis)

# Install NFS server
sudo apt install nfs-kernel-server

# Create NFS exports
sudo mkdir -p /volume1/k8s-storage/{pv,dynamic}
sudo chown nobody:nogroup /volume1/k8s-storage/
sudo chmod 777 /volume1/k8s-storage/

# Configure exports
echo "/volume1/k8s-storage 192.168.10.0/24(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports

# Apply exports
sudo exportfs -ra
sudo systemctl restart nfs-kernel-server

Install NFS CSI Driver

# Install NFS CSI driver
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs \
  --namespace kube-system \
  --version v4.5.0

# Create NFS storage class
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-csi
provisioner: nfs.csi.k8s.io
parameters:
  server: atlantis.vish.local
  share: /volume1/k8s-storage/dynamic
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1
EOF

🌐 Networking Configuration

Install Ingress Controller (Nginx)

# Add Nginx Ingress Helm repository
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

# Install Nginx Ingress Controller
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.service.type=LoadBalancer \
  --set controller.service.loadBalancerIP=192.168.10.240 \
  --set controller.metrics.enabled=true \
  --set controller.podAnnotations."prometheus\.io/scrape"="true" \
  --set controller.podAnnotations."prometheus\.io/port"="10254"

# Verify installation
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx

Install MetalLB Load Balancer

# Install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.12/config/manifests/metallb-native.yaml

# Wait for MetalLB to be ready
kubectl wait --namespace metallb-system \
  --for=condition=ready pod \
  --selector=app=metallb \
  --timeout=90s

# Configure IP address pool
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: homelab-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.10.240-192.168.10.250
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: homelab-l2
  namespace: metallb-system
spec:
  ipAddressPools:
  - homelab-pool
EOF

Install Cert-Manager

# Add Cert-Manager Helm repository
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install Cert-Manager
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.13.3 \
  --set installCRDs=true

# Create Let's Encrypt ClusterIssuer
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@vish.local
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
EOF

📊 Monitoring and Observability

Install Prometheus Stack

# Add Prometheus Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Create monitoring namespace
kubectl create namespace monitoring

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=longhorn-fast \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
  --set grafana.persistence.enabled=true \
  --set grafana.persistence.storageClassName=longhorn-fast \
  --set grafana.persistence.size=10Gi \
  --set grafana.adminPassword="REDACTED_PASSWORD" \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=longhorn-fast \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi

# Verify installation
kubectl get pods -n monitoring
kubectl get svc -n monitoring

Create Ingress for Monitoring Services

cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: monitoring-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
spec:
  tls:
  - hosts:
    - grafana.k8s.vish.local
    - prometheus.k8s.vish.local
    - alertmanager.k8s.vish.local
    secretName: monitoring-tls
  rules:
  - host: grafana.k8s.vish.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-grafana
            port:
              number: 80
  - host: prometheus.k8s.vish.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-kube-prometheus-prometheus
            port:
              number: 9090
  - host: alertmanager.k8s.vish.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-kube-prometheus-alertmanager
            port:
              number: 9093
EOF

Install Logging Stack (ELK)

# Add Elastic Helm repository
helm repo add elastic https://helm.elastic.co
helm repo update

# Install Elasticsearch
helm install elasticsearch elastic/elasticsearch \
  --namespace logging \
  --create-namespace \
  --set replicas=3 \
  --set volumeClaimTemplate.storageClassName=longhorn-fast \
  --set volumeClaimTemplate.resources.requests.storage=100Gi

# Install Kibana
helm install kibana elastic/kibana \
  --namespace logging \
  --set service.type=ClusterIP

# Install Filebeat
helm install filebeat elastic/filebeat \
  --namespace logging \
  --set daemonset.enabled=true

🚀 Application Deployment

Migrate Docker Compose Services

Convert Docker Compose to Kubernetes

# Install kompose for conversion
curl -L https://github.com/kubernetes/kompose/releases/latest/download/kompose-linux-amd64 -o kompose
chmod +x kompose
sudo mv kompose /usr/local/bin

# Convert existing docker-compose files
cd ~/homelab/Atlantis/uptime-kuma
kompose convert -f docker-compose.yml

# Review and modify generated manifests
# Add ingress, persistent volumes, etc.

Example: Uptime Kuma on Kubernetes

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: uptime-kuma
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: uptime-kuma
  template:
    metadata:
      labels:
        app: uptime-kuma
    spec:
      containers:
      - name: uptime-kuma
        image: louislam/uptime-kuma:1
        ports:
        - containerPort: 3001
        volumeMounts:
        - name: data
          mountPath: /app/data
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: uptime-kuma-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: uptime-kuma-data
  namespace: monitoring
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn-fast
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: uptime-kuma
  namespace: monitoring
spec:
  selector:
    app: uptime-kuma
  ports:
    - protocol: TCP
      port: 3001
      targetPort: 3001
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: uptime-kuma
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - uptime.k8s.vish.local
    secretName: uptime-kuma-tls
  rules:
  - host: uptime.k8s.vish.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: uptime-kuma
            port:
              number: 3001
EOF

Helm Charts for Complex Applications

Create Custom Helm Chart

# Create new Helm chart
helm create homelab-app

# Directory structure:
homelab-app/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   └── pvc.yaml
└── charts/

# Example values.yaml for homelab services:
cat <<EOF > homelab-app/values.yaml
replicaCount: 1

image:
  repository: nginx
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: app.k8s.vish.local
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: app-tls
      hosts:
        - app.k8s.vish.local

persistence:
  enabled: true
  storageClass: longhorn-fast
  size: 10Gi

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi
EOF

# Install chart
helm install my-app ./homelab-app

🔒 Security Configuration

Pod Security Standards

# Create Pod Security Policy
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: secure-apps
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
EOF

Network Policies

# Example: Deny all traffic by default
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Allow ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 80
EOF

RBAC Configuration

# Create service account for applications
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: homelab-app
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: homelab-app-role
rules:
- apiGroups: [""]
  resources: ["pods", "services"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: homelab-app-binding
  namespace: default
subjects:
- kind: ServiceAccount
  name: homelab-app
  namespace: default
roleRef:
  kind: Role
  name: homelab-app-role
  apiGroup: rbac.authorization.k8s.io
EOF

🔧 Cluster Management

Backup and Restore

etcd Backup

# Create backup script
cat <<EOF > /usr/local/bin/etcd-backup.sh
#!/bin/bash
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot-\$(date +%Y%m%d-%H%M%S).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Keep only last 7 days of backups
find /backup -name "etcd-snapshot-*.db" -mtime +7 -delete
EOF

chmod +x /usr/local/bin/etcd-backup.sh

# Schedule daily backups
echo "0 2 * * * /usr/local/bin/etcd-backup.sh" | crontab -

Velero for Application Backup

# Install Velero CLI
wget https://github.com/vmware-tanzu/velero/releases/latest/download/velero-linux-amd64.tar.gz
tar -xzf velero-linux-amd64.tar.gz
sudo mv velero-*/velero /usr/local/bin/

# Install Velero server (using MinIO for storage)
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.8.0 \
  --bucket velero-backups \
  --secret-file ./credentials-velero \
  --use-volume-snapshots=false \
  --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.vish.local:9000

# Create backup schedule
velero schedule create daily-backup --schedule="0 1 * * *"

Cluster Upgrades

# Upgrade control plane nodes (one at a time)
# 1. Drain node
kubectl drain k8s-master-01 --ignore-daemonsets --delete-emptydir-data

# 2. Upgrade kubeadm
sudo apt update
sudo apt-mark unhold kubeadm
sudo apt install kubeadm=1.29.x-00
sudo apt-mark hold kubeadm

# 3. Upgrade cluster
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.29.x

# 4. Upgrade kubelet and kubectl
sudo apt-mark unhold kubelet kubectl
sudo apt install kubelet=1.29.x-00 kubectl=1.29.x-00
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet

# 5. Uncordon node
kubectl uncordon k8s-master-01

# Repeat for other control plane nodes and workers

Troubleshooting

# Common troubleshooting commands
kubectl get nodes -o wide
kubectl get pods --all-namespaces
kubectl describe node NODE_NAME
kubectl logs -n kube-system POD_NAME

# Check cluster health
kubectl get componentstatuses
kubectl cluster-info
kubectl get events --sort-by=.metadata.creationTimestamp

# Debug networking
kubectl run debug --image=nicolaka/netshoot -it --rm -- /bin/bash

📋 Migration Strategy

Phase 1: Cluster Setup

☐ Plan cluster architecture and resource allocation
☐ Install Kubernetes on all nodes
☐ Configure networking and storage
☐ Install monitoring and logging
☐ Set up backup and disaster recovery
☐ Configure security policies
☐ Test cluster functionality

Phase 2: Service Migration

☐ Identify services suitable for Kubernetes
☐ Convert Docker Compose to Kubernetes manifests
☐ Create Helm charts for complex applications
☐ Set up ingress and SSL certificates
☐ Configure persistent storage
☐ Test service functionality
☐ Update DNS and load balancing

Phase 3: Production Cutover

☐ Migrate non-critical services first
☐ Update monitoring and alerting
☐ Test disaster recovery procedures
☐ Migrate critical services during maintenance window
☐ Update documentation and runbooks
☐ Train team on Kubernetes operations
☐ Decommission old Docker Compose services


💡 Pro Tip: Start with a small, non-critical service migration to Kubernetes. Learn the platform gradually before moving mission-critical services. Kubernetes has a steep learning curve, but the benefits of container orchestration, scaling, and management are worth the investment for a growing homelab!