Building an Enterprise-Grade Kubernetes (k3s) Stack on Proxmox VE

Last Updated on June 27, 2026 by Thiago Crepaldi

Today, we are building a homelab-grade, production-ready Kubernetes cluster using K3s (Lightweight Kubernetes), Rancher for cluster management, Helm as package manager, NGINX Ingress Controller for reverse proxy and MetaILB as load balancer, Longhorn for fast Distributed Block Storage, NFS mountpoint from a Synology NAS as high capacity Storage, Cert-Manager as Let’s Encrypt certificate manager on a GPU Accelerated server thanks to NVIDIA Device Plugin.

To do this right, we will implement a Split-Plane Architecture that cleanly separates our management engine (“The Brain”) onto its own dedicated virtual machine (aka rancher-mgmt), while isolating our primary workload node (aka k3s-node-01). To make things more interesting, both the management and workload servers will be Proxmox Virtual Machines, having the latter a GPU passed through directly from the host to the VM, bypassing the Proxmox hypervisor, for better performance.

Before diving in, make sure you have completed our two essential setup guides that lay the groundwork for this environment:

Installing latest NVIDIA GPU Driver on Proxmox 9.2 (Debian Trixie + Linux Kernel 7.x) – This is a recipe for our Kubernetes Workload Node and goes through how to install latest NVIDIA GPU drivers on Proxmox 9.2+ (Debian Trixie / Kernel 7.x). This is how you can make your Proxmox VE GPU accelerated.
Setup Nvidia GPU Passthrough for Ubuntu VMs on Proxmox 9.2 – To build our high-performance k3s-node-01 virtual machine and grant it exclusive, bare-metal access to the GPU cores.

1. The Architectural Blueprint

To understand our networking, control routing, and workload isolation, I have created a diagram that explicitly shows every single component we are deploying—tracing exactly where each component reside and how they interact across the physical and virtual layers:

You might be asking why splitting management from workload in two separate VMs. If we were to install Rancher directly inside our compute cluster, the runtime footprint of the management UI would constantly fight your active workloads for host memory and CPU cycles. By hosting Rancher on its own lightweight VM (rancher-mgmt), and running the K3s workload engine on a distinct, hardware-accelerated VM (k3s-node-01), we isolate our control loop. Even if a massive machine learning training job pegs k3s-node-01‘s CPUs to 100%, the Rancher UI and control plane remain completely operational, allowing you to troubleshoot, inspect logs, and safely orchestrate cluster states.

You might also be wondering why using Kubernetes inside a VM instead of bare-metal. Installing Rancher and K3s inside Proxmox VMs instead of bare metal provides essential enterprise-grade resilience by enforcing hard resource isolation, ensuring heavy workloads cannot starve your management plane. This virtualized approach also unlocks instant snapshots for risk-free disaster recovery, dynamic scaling to resize storage or compute without downtime, and hardware abstraction for seamless node migration if physical components fail. Ultimately, it significantly reduces your security “blast radius,” containing any potential container breaches to a single VM rather than compromising your entire physical server and its underlying storage pools.

Enough said, let’s dive into it!

2. Optimizing ZFS’s Memory Allocation on Proxmox VE

Proxmox VE relies heavily on ZFS (Zettabyte File System) for software-defined RAID arrays. By default, ZFS implements an Adaptive Replacement Cache (ARC) designed to consume up to 50% of the host’s physical RAM as a read cache. While this speeds up local storage, it creates severe issues in virtualization environments.

If Proxmox is caching heavy disk writes, the ARC will greedily consume system memory. When you boot your large K3s virtual machines (which require static memory allocations), the Linux kernel’s Out-Of-Memory (OOM) Killer will step in to protect the hypervisor host. It will immediately terminate your most resource-intensive processes: your Kubernetes VMs.

To prevent this, we must configure Proxmox to clamp its ARC usage to a maximum of 2 GB, leaving the remaining memory pool free for our virtual machine layers.

SSH into your Proxmox host (pve2) and set a permanent driver restriction configuration:

echo "options zfs zfs_arc_max=2147483648" | sudo tee /etc/modprobe.d/zfs.conf

Force the Proxmox kernel loop to update its initramfs variables, then reboot the host to commit the layout:

sudo update-initramfs -u -k all

3. Provisioning the Management Plane Hardware

We will create our dedicated management VM (aka rancher-mgmt) directly through the Proxmox Web UI, mirroring our exact storage-optimized setup from the previous guide.

3.1 Download the Cloud Image on the Host Terminal

If you haven’t already, SSH into your Proxmox host shell (aka proxmox.domain.com) and make sure the official minimal Ubuntu 24.04 LTS Cloud Image is ready:

wget https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img

3.2 Create the `rancher-mgmt` VM via Web UI

Click Create VM in the top-right corner of the Proxmox UI and configure the tabs precisely as follows:

General: Assign VM ID 101 and name it rancher-mgmt.
OS: Select Do not use any media.
System:
- Graphic card: Select Standard VGA (std).
- Machine: q35 (keeps our VM layouts uniform).
- BIOS: OVMF (UEFI). Assign to your local storage pool.
- SCSI Controller: VirtIO SCSI Single.
Disks: Delete the default disk that Proxmox populates automatically so the tab is entirely blank.
CPU: Allocate 4 Cores. Set the Type to host.
Memory: Allocate 16GB (16384 MB) of RAM. Uncheck “Ballooning Device” (Rancher’s underlying database requires stable, non-fluctuating memory strings).
Network: Set Model to VirtIO (Intel MTU optimized).

Click Finish to save the hardware framework.

3.3 Import the Disk and Inject Multi-Console Mapping

To complete the layout, jump back onto your Proxmox host terminal to flash the Ubuntu OS partition map into this new shell and attach a serial port for flexible debugging:

# 1. Import the Cloud Image directly into your new management VM storage
qm importdisk 101 noble-server-cloudimg-amd64.img local-lvm

# 2. Attach the imported image explicitly to the VM as scsi0
qm set 101 --scsi0 local-lvm:vm-101-disk-0,discard=on,iothread=1

# 3. Grow the operating system partition pool size to 64GB
qm resize 101 scsi0 +62G

# 4. Attach the dedicated Cloud-Init hardware drive
qm set 101 --ide0 local-lvm:cloudinit

# 5. Add the virtual serial port adapter hardware
qm set 101 --serial0 socket

# 6. Prioritize scsi0 during boot execution
qm set 101 --boot order=scsi0

Return to the Proxmox Web UI, click on VM rancher-mgmt, navigate to the Cloud-Init panel, populate your administrative username, password, SSH public key, and your Static IP configuration (e.g., 192.168.1.11/24 with your network gateway). Click Regenerate Image.

3.4 Expand VM’s Storage

Modern Ubuntu Cloud Images utilize flat partition maps that drop into a tiny default container layout (~2.5GB). If you try installing K3s right now (or almost anything for that matter), it will instantly run out of space and fail the installation.

Fire up the management node from the Proxmox terminal or Web UI. Once online, connect straight to it from your desktop terminal via SSH or through Proxmox’s Console. Next force the VM’s kernel to recognize the full 64GB and resize the filesystem directly:

# 1. Force a hardware rescan of the virtual SCSI drive
echo 1 | sudo tee /sys/class/block/sda/device/rescan

# 2. Grow the flat root partition boundaries (Partition 1)
sudo growpart /dev/sda 1

# 3. Force the kernel to refresh the active geometry mappings
sudo partprobe /dev/sda

# 4. Live-expand the filesystem to claim the newly added space
sudo resize2fs /dev/sda1

Verify the layout using df -h /. You will see your available root storage scale up cleanly to the full 64GB boundary.

4. Installing Rancher (The Management Plane)

Now we deploy the Rancher management engine inside our newly provisioned rancher-mgmt virtual machine. Because this VM is decoupled from any specific compute node, it acts as our centralized administrative hub.

4.1 Install K3s in Master Mode

Let’s install K3s and to make access easier, let’s bootstrap the K3s control engine to your user’s ~/.bash_aliases so you never forget how to control the cluster:

curl -sfL https://get.k3s.io | sh -

# Grant local read permission and apply a permanent session environment alias
sudo chmod 644 /etc/rancher/k3s/k3s.yaml
echo "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml" >> ~/.bash_aliases
source ~/.bashrc

Verify the local control engine is active:

kubectl get nodes

4.2 Installing Helm

Helm is the official package manager for Kubernetes, acting as the equivalent of apt, yum, or homebrew for your containerized applications. It allows you to define, install, upgrade, and manage complex Kubernetes applications using a single tool.

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Verify helm is installed and kicking:

# Get helm version
helm version

4.3 Deploy Cert-Manager

Cert-Manager is an automated certificate management controller that streamlines the process of issuing, renewing, and managing SSL/TLS certificates within your cluster. By integrating directly with public authorities like Let’s Encrypt, it eliminates the manual overhead of updating certificates by automatically monitoring their expiration and performing the necessary domain validation challenges.

We will deploy cert-manager via Helm handle internal certificates internally:

# 1. Add stable repositories
helm repo add jetstack https://charts.jetstack.io
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update

# 2. Install Cert-Manager for local Rancher validation tracking
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set crds.enabled=true

Wait a couple minutes and verify cert-manager was installed

kubectl get pods -n cert-manager

Since we updated the deployment to use external DNS providers, let’s confirm that the arguments have been applied:

kubectl get deployment cert-manager -n cert-manager -o jsonpath='{.spec.template.spec.containers[0].args}'

Ensure the output contains:

--dns01-recursive-nameservers=8.8.8.8:53,1.1.1.1:53
--dns01-recursive-nameservers-only=true

4.4 Deploy Rancher

Rancher is an open-source, enterprise-grade container management platform that simplifies the operation of Kubernetes clusters across any environment. It provides a centralized dashboard for deploying, monitoring, and securing Kubernetes clusters, abstracting away the complexity of managing different cloud providers or on-premises infrastructure. By offering unified authentication, centralized security policies, and an integrated catalog of applications, Rancher enables teams to manage their entire containerized fleet from a single, intuitive interface.

Rancher requires secure TLS termination to manage downstream nodes. We will install Helm and deploy cert-manager to handle internal certificates automatically:

# 1. Add stable repositories
helm repo add jetstack https://charts.jetstack.io
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update

# 2. Install Cert-Manager for local Rancher validation tracking
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set crds.enabled=true

# 3. Install Rancher Management Dashboard
helm install rancher rancher-stable/rancher --namespace cattle-system --create-namespace \
  --set hostname=rancher.domain.com \
  --set bootstrapPassword='YourStrongPassword' \
  --set replicas=1

Give the pods a few minutes to initialize, then navigate to your configured address (e.g., https://rancher.domain.com) to access your new multi-cluster control center. Complete the registration wizard to define your master admin password and set the server URL to your intended local address: https://rancher.domain.com.

5. Creating the VM for the Accelerated Workload Plane

The k3s-node-01 VM acts as our primary compute powerhouse. This VM must be provisioned according to our prerequisite post: Setup Nvidia GPU Passthrough for Ubuntu VMs on Proxmox 9.2. This ensures that the VM has a virtual PCIe bus mapping your physical graphics card, with the Nvidia drivers compiled directly against the guest kernel.

Our K3s deployment here must be highly optimized. To construct an enterprise-grade stack, we must disable K3s’ default local storage provisioner, and Traefik. By omitting these components, we prevent them from conflicting with the custom, production-ready routing (MetalLB, NGINX) and storage (Longhorn) components we deploy in the following steps.

ssh user@1k3s-node-01

5.1 Bootstrapping the K3s Workload Worker Node

We will install K3s on this node with custom flags. We are explicitly instructing K3s to disable the default Traefik ingress controller and the local-path storage provisioner:

curl -sfL https://get.k3s.io | sh -s - \
  --disable traefik \
  --disable local-storage \
  --write-kubeconfig-mode 644

Verify the local control engine is active:

kubectl get nodes

5.2 Connecting the Workload Node to Rancher

Open your Rancher Web UI dashboard.
Navigate to Cluster Management -> Import Existing Cluster -> Generic.
Name the cluster Production-Cluster and click Create.
Copy the registration string (kubectl apply -f ...) presented on your screen.
Paste that exact command directly into your k3s-node-01 SSH terminal.

Within moments, you will see the node register inside the Rancher dashboard, moving smoothly from a status of Pending to a vibrant green Active flag.

Update Ubuntu’s Systemd DNS

Transitioning the Kubernetes host away from the systemd-resolved stub resolver (127.0.0.53) is essential because it eliminates the DNS resolution “loop” that prevents internal cluster components from reaching the public internet. When your host is locked to a local loopback address, pods attempting to resolve external services—such as plex.tv for authentication and claiming—often fail because they treat 127.0.0.1 as their own internal namespace, resulting in “Couldn’t resolve host” errors. By configuring your host to point directly to your upstream DNS (like your router or a public provider), you ensure that CoreDNS can natively forward external queries without interference. This allows critical processes, like the Plex Claim handshake, to complete successfully by providing a clean, authoritative resolution path that exists outside the constraints of the local container network.

Configure the Host DNS (Ubuntu 24.04/Netplan)

Edit your configuration: sudo nano /etc/systemd/resolved.conf
Set DNS=192.168.1.1 (or your router’s IP) and DNSStubListener=no.
Apply: sudo systemctl restart systemd-resolved.
Verify host resolution: Run nslookup plex.tv. It should now return a valid public IP.
Restart the K3s service to refresh the cluster’s awareness of the host environment: sudo systemctl restart k3s.

Whitelist K3s in pfSense DNS Resolver

If you are using pfSense as your router, you must tell pfSense that your Kubernetes cluster is a trusted network and is allowed to ask for internet IP addresses.

Log into your pfSense Admin UI.
Navigate to Services -> DNS Resolver.
Click on the Access Lists tab.
Click + Add to create a new Access List:
- Access List Name: K3s-Cluster (or whatever you prefer)
- Action: Allow
- Networks: Add your two K3s subnets here:
  - Network: 10.42.0.0, CIDR: 16 (This is your Pod network)
  - Network: 10.43.0.0, CIDR: 16 (This is your Service network)
Click Save, and then click Apply Changes at the top of the screen.

6. Constructing the Network Pipeline & Core Add-ons

Right now, your virtual machine can see the NVIDIA GPU via nvidia-smi, but Kubernetes is completely blind to it. We must configure our container runtime tools and deploy the resource plugin so K3s can schedule workloads onto the GPU’s CUDA cores. Then, we will bring up our enterprise networking stack.

6.1 Bootstrap Helm on the Workload Node

Kubernetes configuration files (YAMLs) can get incredibly complex. Helm is the official package manager for Kubernetes. It allows us to install, upgrade, and configure complex applications (like our ingress controllers and storage drivers) using pre-packaged templates called Charts, rather than writing thousands of lines of YAML manually.

Because the workload plane operates inside an independent cluster environment separate from our management VM, we also need to install the Helm binary locally on k3s-node-01 too. Then, we grant your user profile permission to read the K3s keys and bind a permanent session environment alias profile:

# Download and install the binary package
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Grant local read permission and apply a permanent session environment alias
sudo chmod 644 /etc/rancher/k3s/k3s.yaml
echo "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml" >> ~/.bash_aliases
source ~/.bashrc

Verify helm is installed and kicking:

# Get helm version
helm version

6.2 Install the Nvidia Container Toolkit

Before Kubernetes can coordinate container workloads on GPU cores, we must install the underlying container toolkit integration and bind it directly to containerd 2.0+ schemas. Run these commands inside your k3s-node-01 terminal to install the underlying Nvidia’s Container integration toolkit, ensuring the APT strings are written to the proper repository tracking layout and evaluating $(ARCH) to amd64:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sed "s#\$(ARCH)#amd64#g" | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update && sudo apt install -y nvidia-container-toolkit

Next, override K3s’ containerd configuration map to instruct the runtime engine to use the nvidia wrapper:

sudo mkdir -p /etc/rancher/k3s
sudo tee /etc/rancher/k3s/config.yaml << 'EOF'
disable:
  - traefik
  - local-storage
write-kubeconfig-mode: "644"
default-runtime: "nvidia"
EOF

# Restart the supervisor loop to commit the changes
sudo systemctl restart k3s

Verify k3s is using nvidia runtime by default:

sudo crictl info | grep "defaultRuntime"

You should see something like "defaultRuntimeName": "nvidia", in the output

6.3 Installing NVIDIA Device Plugin

Out of the box, Kubernetes has no idea your physical GPU exists. To allow containerized workloads to execute code on your GPU cores, we need two components: The Nvidia Container Toolkit (which patches our containerd runtime to understand Nvidia binaries) and the Nvidia Device Plugin (a Kubernetes daemon that scans the host hardware and advertises the available GPU resources to the Kubernetes scheduler).

If you try to deploy the vanilla NVIDIA Device Plugin right now, you will notice that the cluster reports DESIRED: 0 pods. This is because k3s-node-01 acts as its own cluster master, meaning K3s automatically brands it with a control-plane scheduling taint (node-role.kubernetes.io/control-plane:NoSchedule).

To bypass this safety layout cleanly, we must pass an explicit Toleration block through Helm. This allows both the core plugin and the GPU Feature Discovery (GFD) engine to run on our node. GFD will automatically scan the active kernel layers and apply the correct scheduling tags for us without any manual intervention:

# Add the Nvidia Helm repository track
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

# Deploy with global tolerations and automated feature discovery enabled
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
  --namespace kube-system \
  --set-json 'tolerations=[{"operator":"Exists","effect":"NoSchedule"}]' \
  --set gfd.enabled=true

Verify the nvdp pods are up and running:

kubectl get pods -n kube-system -l app.kubernetes.io/name=nvidia-device-plugin

Verify that your cluster successfully tracks your hardware acceleration capacity:

kubectl describe node | grep -E "Allocatable|nvidia.com/gpu"

If successful, the console will print nvidia.com/gpu: 1. The cluster now holds full native scheduling rights over your NVIDIA GPU.

6.4 Setting Up Bare-Metal Load Balancing (MetalLB)

When you request a service of type: LoadBalancer in a cloud provider like AWS, it automatically provisions a physical load balancer for you. On a bare-metal homelab, that request will just sit in a ‘Pending’ state forever. MetalLB bridges this gap. It monitors the cluster for load balancer requests and dynamically assigns them a real, routable IP address from a designated pool on your local network using Layer 2 ARP announcements.

Let’s install MetalLB to act as our local automated network load balancer:

# 1. Add the official MetalLB Helm repository channel
helm repo add metallb https://metallb.github.io/metallb
helm repo update

# 2. Deploy the core orchestration routers
helm install metallb metallb/metallb --namespace metallb-system --create-namespace

Wait until all MetalLB controller and speaker pods are running:

kubectl get pods -n metallb-system

Next, we assign a dedicated slice of unused IP addresses from our home network pool. Create a file named metallb-config.yaml (make sure this range is completely outside your local router’s DHCP pool to prevent collisions):

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: local-ip-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.201-192.168.1.210  # Assign a pristine local pool slice
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: local-l2-advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
    - local-ip-pool

Apply the networking policy configuration directly:

kubectl apply -f metallb-config.yaml

Verify the configured IPs in MetaILB pools:

kubectl get ipaddresspools -n metallb-system

You should see something like

NAME AUTO ASSIGN AVOID BUGGY IPS ADDRESSES
local-ip-pool true false ["192.168.1.201-192.168.1.250"]

6.5 Deploying NGINX Ingress Controller (Cloud-Native Mode)

This will create the necessary LoadBalancer service that receives traffic from your local network.

# Add the official repository
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

# Install the controller and request a LoadBalancer IP
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.service.type=LoadBalancer

6.5.1 Verify the LoadBalancer IP

After the installation, run the following command until you see an IP address (e.g., 192.168.1.201) appear under EXTERNAL-IP:

kubectl get svc -n ingress-nginx ingress-nginx-controller -w

6.6 Deploying Certificate Manager

Cert-Manager is an automated certificate authority controller. It talks to providers like Let’s Encrypt to request SSL certificates for your applications and automatically renews them before they expire.

In a private homelab, your local router’s DNS system often cannot resolve your public wildcard domains locally. During ACME DNS-01 validation challenges, this loopback limitation causes Cert-Manager to fail its self-check validations. To bypass this, we force Cert-Manager to query authoritative root servers (like Cloudflare at 1.1.1.1 and Google at 8.8.8.8) directly, bypassing local DNS loops entirely.

# Add the Jetstack repository for Cert-Manager
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install Cert-Manager with strict recursive DNS parameters enabled
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set installCRDs=true \
  --set 'extraArgs={--dns01-recursive-nameservers=8.8.8.8:53\,1.1.1.1:53}' \
  --set 'extraArgs={--dns01-recursive-nameservers-only=true}'

Verify that the Cert-Manager pods are healthy:

kubectl get pods -n cert-manager

For this tutorial, we will use Cloudflare to manage our DNS-01 challenge. To allow Cert-Manager to automatically create temporary DNS records for validation, you need a Cloudflare API token.

Generate your Cloudflare API Token:

Log into your Cloudflare Dashboard.
Go to My Profile -> API Tokens and click Create Token.
Select Create Custom Token.
Under Permissions, select Zone -> DNS -> Edit.
Under Zone Resources, select Include -> Specific Zone -> Select your domain (e.g., yourdomain.com).
Click Continue to summary and Create Token. Save this token securely.

Create a secure Kubernetes Secret to store your restricted Cloudflare API Token:

kubectl create secret generic cloudflare-api-token-secret \
  --namespace cert-manager \
  --from-literal=api-token="YOUR_CLOUDFLARE_API_TOKEN_HERE"

Create your cluster issuer registration manifest. Create letsencrypt-dns-production.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - dns01:
          cloudflare:
            email: admin@domain.com
            apiTokenSecretRef:
              name: cloudflare-api-token-secret
              key: api-token

Apply the file to register the automated ACME handshake issuer:

kubectl apply -f letsencrypt-dns-production.yaml

Verify if your ClusterIssuer is ready:

kubectl get clusterissuer letsencrypt-prod -o wide

Look at the output under the READY column. If it says True, your Cert-Manager is successfully communicating with Let’s Encrypt and is ready to issue certificates.

7. Storage Provisioning (The Hybrid Storage Plane)

To give our containerized apps flexible data persistence, we will implement a hybrid storage plane. We will use Longhorn backed by a newly initialized Proxmox ZFS pool for high-performance block storage, and a Synology NAS via NFS for shared, multi-pod asset storage.

7.1 High-Performance Block Storage (Proxmox ZFS to Longhorn)

For database logs and container states that require low latency (ReadWriteOnce), we want to back our storage using a dedicated ZFS block volume on the host. If your Proxmox server does not have an active ZFS storage pool initialized yet, we will locate an empty disk, clean it, and build our pool fresh.

Step A: Provision the ZFS Pool on the Proxmox Host (`pve2`)

Open a separate terminal window on your physical Proxmox host (pve2) and execute the storage inventory sweep:

# 1. Inspect physical storage topography to locate your unassigned 1TB disk
lsblk

Locate your empty target disk name from the output (for this guide, we are targeting /dev/sdb, which offers ~953.9G of raw block space). Clear out any legacy filesystem headers and bind the device to a fresh pool named local-zfs

# 2. Wipe hidden signature tables to prevent mounting blocks
sudo wipefs -a /dev/sdb

# 3. Initialize the new high-performance ZFS storage pool
sudo zpool create -f local-zfs /dev/sdb

# 4. Verify the pool is active, online, and healthy
sudo zpool status local-zfs

Step B: Carve out the Virtual Block Volume

With our pool cleanly initialized on the host, we will carve out an 850GB volume dataset. Leaving roughly 10-15% of the raw disk space unallocated provides ZFS with the structural buffer it needs to maintain optimal write optimization performance:

# 1. Generate an 850GB ZFS Block Volume dataset
zfs create -V 850G local-zfs/k3s-persistent-storage

# 2. Map the block lane directly to our Workload VM (ID 100) on SCSI slot 1
qm set 100 -scsi1 /dev/zvol/local-zfs/k3s-persistent-storage

Step C: Format and Mount inside the Workload VM (`k3s-node-01`)

Open a separate terminal window and SSH directly into your k3s-node-01 VM window. The Linux kernel inside the guest will instantly surface the new hardware block allocation path on /dev/sdb.

We will apply an enterprise-standard XFS filesystem layer directly over the drive. Compared to ext4, XFS utilizes independent allocation zones that handle massive parallel read/write tasks across our 16 CPU cores seamlessly without encountering monolithic allocation locks or running into ext4 inode exhaustion limits:

# 1. Install storage formatting libraries
sudo apt install -y xfsprogs

# 2. Flash an optimized XFS structure over the disk channel
sudo mkfs.xfs /dev/sdb

# 3. Create the permanent mounting target directory
sudo mkdir -p /var/lib/longhorn

# 4. Mount the drive to the initialization target path
sudo mount /dev/sdb /var/lib/longhorn

# 5. Commit the block device map permanently to fstab for safe boots
echo "/dev/sdb /var/lib/longhorn xfs defaults 0 0" | sudo tee -a /etc/fstab

Verify that your drive is mounted correctly:

df -h | grep longhorn
# Expect to see: /dev/sdb mounted to /var/lib/longhorn

Step D: Deploy Longhorn Storage Engine

Now that the file mount path is online with raw high-performance storage blocks, deploy Longhorn via Helm to manage the localized allocation matrix:

helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace

7.2 Shared Network Storage (Synology NFS Provisioner)

For workloads where multiple pods need to read and write to the exact same directories simultaneously (ReadWriteMany), we will connect K3s to a Synology NAS NFS export.

First, install the native NFS common libraries on your k3s-node-01 VM so the guest kernel can translate network file locks:

sudo apt update && sudo apt install -y nfs-common

Now, implement the storage strategy that fits your workload targets:

Strategy A: Dynamic Subfolder Provisioning

Best used for automated, encapsulated application data paths where Kubernetes manages isolated sub-folders:

# Add the NFS external provisioner repository
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner
helm repo update

# Deploy the provisioner, mapping it to your NAS IP and Shared Folder
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
  --namespace storage-system --create-namespace \
  --set nfs.server=192.168.1.50 \
  --set nfs.path=/volume1/k3s-nfs-share \
  --set storageClass.name=network-nfs \
  --set storageClass.defaultClass=false

To use the nfs-subdir-external-provisioner, you only need a PersistentVolumeClaim (PVC) that references the correct storageClassName. Unlike static PVs, you do not need to define the PersistentVolume (PV) manually; the provisioner will create it for you automatically. This is an example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-dynamic-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  # This class name must match the one defined in your 
  # nfs-subdir-external-provisioner configuration
  storageClassName: network-nfs 
  resources:
    requests:
      storage: 10Gi

Strategy B: Static Root Volume Mapping

Best used for mapping top-level media pools or raw download endpoints directly, bypassing subfolders so your NAS apps (like Plex) can see them instantly. Create static-nfs-root.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: network-multimedia-root-pv
spec:
  capacity:
    storage: 5Ti
  volumeMode: Filesystem
  accessModes: [ReadWriteMany]
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 192.168.1.50
    path: /volume1/k3s-nfs-share
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: network-multimedia-root-pvc
spec:
  accessModes: [ReadWriteMany]
  storageClassName: ""
  volumeName: network-multimedia-root-pv
  resources:
    requests:
      storage: 5Ti

Apply it to the cluster:

kubectl apply -f static-nfs-root.yaml

(Swap out 192.168.1.50 and /volume1/k3s-nfs-share with your actual Synology management IP and volume path).

To implement a static NFS mount, you need two objects: a PersistentVolume (which defines the actual NFS server and path details) and a PersistentVolumeClaim (which binds to that volume).

apiVersion: v1
kind: PersistentVolume
metadata:
  name: synology-nfs-static-pv
spec:
  capacity:
    storage: 5Ti
  volumeMode: Filesystem
  accessModes: [ReadWriteMany]
  persistentVolumeReclaimPolicy: Retain
  # storageClassName must be empty ("") to force static binding
  storageClassName: ""
  nfs:
    server: synology.domain.com
    path: /volume1/k3s-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-static-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  # volumeName tells Kubernetes to bind specifically to the PV defined above
  volumeName: synology-nfs-static-pv
  resources:
    requests:
      storage: 5Ti

8. Local DNS Routing

Because this infrastructure operates on your private subnet, public domain names like app.domain.com will not automatically resolve to your internal nodes. Unless we map these domains internally, your web browser will look up the public DNS tree, fail to find your private IP address, and report a connection timeout.

We must map our wildcard domain zone directly to our local NGINX Ingress controller IP (e.g., 192.168.1.201) assigned by MetalLB.

Execution Options

Option A: Central DNS Server Override (Pi-hole / pfSense / Unbound)

If you manage a centralized local network DNS server, define a wildcard rewrite rule:

*.domain.com -> 192.168.1.201

Option B: Desktop System Hosts File Mapping

If you do not have a local DNS server, you can configure your testing system to point directly to the ingress IP:

Linux / macOS (/etc/hosts):192.168.1.201 app.domain.com
Windows (C:\Windows\System32\drivers\etc\hosts):192.168.1.201 app.domain.com

9. Putting all together and validating the cluster

The goal of our validation suite is to test the reliability of our hardware acceleration, network routing, and dynamic storage configurations under load.

Rather than testing these in isolation, we will deploy a multi-container pod architecture:

A high-performance CUDA container is assigned access to your passed-through GPU. It executes a constant monitoring loop and exports the current hardware status to a shared storage directory.
A lightweight NGINX container mounts the exact same shared volume. It reads the exported status file and hosts it as a live web page.
We back these containers with a dynamic Longhorn Storage Class volume to confirm our block-storage layer is fully operational.
Simultaneously, we declare a secondary dynamic PVC backed by network NFS (the NFS subdir external provisioner) to append a live filesystem heartbeat, proving both hot and cold hybrid storage planes work in perfect harmony.
Lastly, we also mapped the static root mapping for the NFS just for fun – we didn’t really wrote anything to it, but left as an example on how to use it on your deployments

Create a multi-resource testing schema file named cluster-validation.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-longhorn-block-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-network-nfs-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: network-nfs
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: synology-nfs-root-pv
spec:
  capacity:
    storage: 15Ti
  volumeMode: Filesystem
  accessModes: [ReadWriteMany]
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ""
  nfs:
    server: synology.domain.com
    path: /volume1/k3s-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: validation-nfs-static-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  volumeName: synology-nfs-root-pv
  resources:
    requests:
      storage: 5Ti
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: validation-web-app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: validation-web
  template:
    metadata:
      labels:
        app: validation-web
    spec:
      containers:
        # Container 1: High-Performance Web Server Front-End
        - name: web-server
          image: "nginx:alpine"
          ports:
            - containerPort: 80
          volumeMounts:
            - name: html-mount
              mountPath: /usr/share/nginx/html
        # Container 2: Dedicated CUDA Silicon Pass-Through Tester
        - name: cuda-tester
          image: "nvidia/cuda:12.2.0-runtime-ubuntu22.04"
          command: ["/bin/bash", "-c"]
          args:
            - |
              echo "== starting hardware cluster diagnostics =="
              rm -f /usr/share/nginx/html/index.html
              while true; do
                echo "<html><head><meta http-equiv='refresh' content='10'></head><body style='font-family:monospace;background:#111;color:#eee;padding:20px;'>" > /usr/share/nginx/html/index.html
                echo "<h2>Cluster Validation Status: ACTIVE</h2>" >> /usr/share/nginx/html/index.html
                echo "<h3>[$(date)] Live GPU Metrics:</h3><pre>" >> /usr/share/nginx/html/index.html
                nvidia-smi >> /usr/share/nginx/html/index.html 2>&1
                echo "</pre></body></html>" >> /usr/share/nginx/html/index.html

                # Append write heartbeat test straight to the NFS mount path
                echo "[$(date)] NAS Write Operation Successful" >> /output/network-nfs-test/heartbeat.log
                sleep 10
              done
          resources:
            limits:
              nvidia.com/gpu: 1
          volumeMounts:
            - name: html-mount
              mountPath: /usr/share/nginx/html
            - name: network-mount
              mountPath: /output/network-nfs-test
            - name: nfs-static-mount
              mountPath: /output/network-nfs-static
      volumes:
        - name: html-mount
          persistentVolumeClaim: {claimName: test-longhorn-block-pvc}
        - name: network-mount
          persistentVolumeClaim: {claimName: test-network-nfs-pvc}
        - name: nfs-static-mount
          persistentVolumeClaim: {claimName: validation-nfs-static-pvc}
---
apiVersion: v1
kind: Service
metadata:
  name: validation-web-service
  namespace: default
spec:
  ports:
    - port: 80
      targetPort: 80
  selector:
    app: validation-web
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: validation-web-ingress
  namespace: default
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - "test.domain.com"
      secretName: validation-web-tls-certs
  rules:
    - host: test.domain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: validation-web-service
                port: {number: 80}

Apply the execution block directly using kubectl:

kubectl apply -f cluster-validation.yaml

10. Running the Quality Assurance Diagnostics

Before opening the site in your browser, verify that your Kubernetes components are healthy:

1. Verify Storage Allocation

kubectl get pvc test-longhorn-block-pvc
# Status should be: Bound

kubectl get pvc test-network-nfs-pvc
# Status should be: Bound

2. Inspect Ingress and Certificate Status

kubectl get ingress validation-web-ingress
# Check that the address is successfully mapped to: 192.168.1.201

kubectl get certificate validation-web-tls-certs
# Expected output: READY -> True

3. Check GPU Scheduler Logs

kubectl logs -l app=validation-web -c cuda-tester --tail=50
# Confirm that nvidia-smi runs successfully and does not return: "Nvidia driver not found"

4. Check Network NFS Heartbeat Logging

Check the log file directly inside your mounting path:

kubectl exec -it $(kubectl get pods -l app=validation-web -o jsonpath='{.items[0].metadata.name}') -c cuda-tester -- tail -n 10 /output/network-nfs-test/heartbeat.log

Now, open your browser and navigate to:

https://app.domain.com

Verify SSL: Check the browser address bar. You should see a clean padlock icon confirming a valid, trusted wildcard certificate issued by Let’s Encrypt.
Verify Storage & Hardware Passthrough: The web page should display a live, refreshing table generated by the physical nvidia-smi command. Because the webpage layout you are viewing is being generated by your heavy cuda-tester container but served out to the internet by your independent web-server container, the visual rendering itself serves as proof that your underlying Longhorn Distributed Block Storage Layer is flawlessly managing concurrent filesystem reads and disk writes.

Your split-plane control cluster is now fully configured, secure, accelerated, and operational.

The Lab Medic: Troubleshooting Common Gotchas

The “Disk Pressure” Pod Eviction Taint

Symptom: Nodes move to an Unavailable status flag, and your application pods are forcefully evicted.
Cause: Standard Ubuntu Cloud Images default to a very small base root partition footprint. When K3s reads that the operating system drive is filling up with container logs and images, it triggers an emergency lock down.
The Fix: Expand the guest volume file allocation layer directly inside your VM console shell to take full advantage of your disk allocation boundaries:

echo 1 | sudo tee /sys/class/block/sda/device/rescan
sudo growpart /dev/sda 1
sudo partprobe /dev/sda
sudo resize2fs /dev/sda1

The Let’s Encrypt DNS Challenge Hang

Symptom: kubectl get certificate shows READY -> False and hangs in an endless validation loop.
Cause: By default, your local router is intercepting outbound DNS queries and attempting to validate the ACME challenge against its own internal DNS cache (which has no knowledge of the Cloudflare API modification yet).
The Fix: Ensure your cert-manager instance was installed using the --set 'extraArgs={--dns01-recursive-nameservers-only=true}' and recursive server settings detailed in Step 6.5.

Summary: Your Proxmox Compute Power is Unlocked

Your technical foundation is now complete. You have a split-plane virtualized Kubernetes environment running on Proxmox VE capable of routing intensive containerized applications directly into dedicated graphics hardware, all while monitored and managed by a clean interface layer. Happy hacking!