WSL2下的GPU虚拟化

本文主要记录WSL2下通过k8s管理GPU用于模型WDDM、推理和转码的原理和流程,分为部署、推理、转码三个部分。

部署

环境

  • 显卡:4卡L208卡 5090

  • WSL2

    image-20251223163542672

    1. Wsl2:2.6.2.0
    2. Ubuntu:24.04
  • Windows

    image-20251223163934497

    1. Windows:win11 23H2 22631.2428

架构图

GPU虚拟化.drawio

  • pod视角

pod--> nvidia-cuda-toolkit --> wsl2 --> nvidia-container-toolkit --> windows driver

image-20251223162232252

网络(Bridge)

  • 原理图

image-20251223164352885

  • 效果图

image-20251223164429834

  • 配置方式

    1. vSwitch: Hyper-V创建Virtual Switches

    2. .wslconfig:用户目录配置.wslconfig,内容如下:

      1
      2
      3
      4
      5
      [ws12]
      networkingMode-bridged
      vmSwitch=WSLBridge
      dhcp=false
      swap=0 ##K8S集群必须关闭SWAP
  • 注意点:

    1. Bridge模式主要用于K8S的混合节点通信,对齐同一网络平面
    2. 默认的Mirrored无法支持
    3. WSL2Bridge模式不支持挂载多张网卡

Nvidia

Nvidia-Container-Toolkit

  • 指南

    https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.9.0/nvidia-containerd.html

  • 安装

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    # Install the prerequisites for the instructions below:
    sudo apt-get update && sudo apt-get install -y --no-install-recommends \
    curl \
    gnupg2

    # Configure the production repository:
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

    # Install the NVIDIA Container Toolkit packages:
    sudo apt-get update
    export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.1-1
    sudo apt-get install -y \
    nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
  • 配置(Containerd)

    1
    2
    3
    4
    # Configure the container runtime by using the nvidia-ctk command:
    sudo nvidia-ctk runtime configure --runtime=containerd
    # Restart containerd:
    sudo systemctl restart containerd
  • 补丁

    image-20251223192245372

Nvidia-Device-Plugin

  • 作用

    1
    仅用于GPU的申请和调度
  • Prerequisites

    1
    2
    3
    4
    5
    6
    The list of prerequisites for running the NVIDIA device plugin is described below:

    NVIDIA drivers ~= 384.81
    nvidia-docker >= 2.0 || nvidia-container-toolkit >= 1.7.0 (>= 1.11.0 to use integrated GPUs on Tegra-based systems)
    nvidia-container-runtime configured as the default low-level runtime
    Kubernetes version >= 1.10
  • Quick Start

    1
    2
    3
    # 1. configure containerd
    # 2. Daemonset
    kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.1/deployments/static/nvidia-device-plugin.yml

Nvidia-CUDA-Toolkit

cuda 13.0.0和ubuntu 24.04为例

  • 13.0.0-base-ubuntu24.04(FFmpeg)

    1
    Includes the CUDA runtime (cudart)
  • 13.0.0-runtime-ubuntu24.04(推理)

    1
    Builds on the base and includes the CUDA math libraries, and NCCL. A runtime image that also includes cuDNN is available. Some images may also include TensorRT.
  • 13.0.0-devel-ubuntu24.04(构建)

    1
    Builds on the runtime and includes headers, development tools for building CUDA images. These images are particularly useful for multi-stage builds.

推理

vLLM

  • 兼容性:

    1. Qwen3-VL-32Bqwen3-vl需要vLLM>=0.11.0,单卡显存不足,需要多卡
    2. 由于WSL2不支持NCCLP2P,需要使用SHM进行多卡通信
  • 部署

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    namespace: dev
    name: vllm-v1
    labels:
    app: vllm-v1
    spec:
    replicas: 1
    strategy:
    type: Recreate
    selector:
    matchLabels:
    app: vllm-v1
    template:
    metadata:
    labels:
    app: vllm-v1
    spec:
    nodeSelector:
    node-role.kubernetes.io/worker-gpu: "true"
    containers:
    - name: vllm
    env:
    - name: NCCL_CUMEM_ENABLE
    value: "0"
    image: docker.1ms.run/vllm/vllm-openai:v0.11.1
    imagePullPolicy: IfNotPresent
    command:
    - sh
    - -c
    - |
    vllm serve "/models/Qwen/Qwen3-VL-32B-Instruct" \
    --served-model-name "Qwen3-VL-32B-Instruct" \
    --tensor-parallel-size 4 \
    --port 9999 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.85 \
    --max-num-seqs 10 \
    --max-num-batched-tokens 32768 \
    --dtype bfloat16
    ports:
    - containerPort: 9999
    name: http
    resources:
    limits:
    nvidia.com/gpu: "4"
    requests:
    nvidia.com/gpu: "4"
    volumeMounts:
    - mountPath: /models
    name: qwen-data
    subPathExpr: models
    - mountPath: /dev/shm
    name: dshm
    volumes:
    - name: qwen-data
    persistentVolumeClaim:
    claimName: pvc-nas-test
    - name: dshm
    emptyDir:
    medium: Memory
    sizeLimit: 16G

转码

FFmpeg

  • 兼容性:

    1. 安装:来源于https://github.com/BtbN/FFmpeg-Builds,也可以编译定制
    2. WSL2:由于WSL2默认只会自动挂载libcuda.soffmpeg对应的编解码需要手动挂载
  • 部署

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    namespace: dev
    name: ffempeg-cuda-v1
    labels:
    app: ffempeg-cuda-v1
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: ffempeg-cuda-v1
    template:
    metadata:
    name: ffempeg-cuda-v1
    labels:
    app: ffempeg-cuda-v1
    spec:
    containers:
    - name: ffempeg-cuda-v1
    image: docker.1ms.run/nvidia/cuda:12.6.3-base-ubuntu22.04
    imagePullPolicy: IfNotPresent
    command:
    - sh
    - -c
    - |
    tail -f /dev/null
    restartPolicy: Always
  • 测试用例

    1
    2
    3
    4
    5
    6
    ./ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
    -threads 2 \
    -i input.mp4 \
    -vf "scale_cuda=854:480" \
    -c:v h264_nvenc -preset p1 -cq 28 \
    -c:a copy -y output.mp4

WDDM

L20

  • 作用:Windows宿主机通过WDDM使用dxdiag识别显卡,支持运行windows的图形软件

  • 支持模式

    image-20251223182017230

  • 开启Physical Display Modes

    image-20251223182124657

  • TCC改为WDDM

    image-20251223182359183

参考链接

分享到