Skip to content

GPU Passthrough

Intel iGPU hardware transcoding is available on worker-0a (pve1 — Intel i7-1195G7). This page documents how GPU access flows from bare metal into Kubernetes pods.

Architecture

GPU access is a 5-layer stack from hardware to application:

graph TD
    A["Proxmox: PCI Passthrough\nPCI 00:02.0 → VM"] --> B
    B["Talos Node: i915 kernel driver\n/dev/dri/renderD128"] --> C
    C["DaemonSet: Intel GPU Device Plugin\nRegisters gpu.intel.com/i915: 1"] --> D
    D["Kubelet Device Manager\nAllocates device to pod"] --> E
    E["Container\nEmby → VA-API → /dev/dri/renderD128"]
Hold "Alt" / "Option" to enable pan & zoom

Layer breakdown

Proxmox passes the physical GPU (PCI 00:02.0) directly to the worker-0a VM. Talos loads the i915 driver, which creates device nodes under /dev/dri/:

  • /dev/dri/card0 — display/render control
  • /dev/dri/renderD128 — render node (used by VA-API)

The Intel GPU Device Plugin runs as a DaemonSet on GPU-capable nodes. It:

  1. Scans /dev/dri/ for Intel GPU devices
  2. Registers them as the extended resource gpu.intel.com/i915
  3. Reports the count to kubelet (gpu.intel.com/i915: 1)

The plugin is deployed via the intel-device-plugins-operator HelmRelease in infrastructure/.

A pod requests the GPU with:

resources:
  limits:
    gpu.intel.com/i915: "1"

Kubelet asks the plugin which device to allocate, then tells containerd to mount /dev/dri/renderD128 and add the process to the video/render groups.

The app detects /dev/dri/renderD128 inside the container and uses VA-API to send hardware decode/encode commands. The i915 kernel driver translates these to GPU operations on the host.

Node Targeting

NFD (Node Feature Discovery) is not used. The GPU node is targeted with a static nodeAffinity rule to keep things simple:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker-0a
Approach Pros Cons
NFD + auto-detection Dynamic discovery, scales to new GPU nodes Requires NFD, more complex setup
Static hostname (used here) Simple, no dependencies, explicit Manual update if GPU node changes

Verification

Check the GPU resource is registered on the node:

kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{'\n'}{' i915: '}{.status.allocatable.gpu\.intel\.com/i915}{'\n'}"

Expected output:

worker-0a
 i915: 1

Check the device plugin pods are running:

kubectl get pods -n intel-device-plugins -l app=intel-gpu-plugin

Apps Using GPU

App Namespace Purpose
Emby media Hardware video transcoding via VA-API