20.4 C
New York
Thursday, July 3, 2025

GPU Partitioning in Windows Server 2025 Hyper-V


GPU Partitioning (GPU-P) is a feature in Windows Server 2025 Hyper-V that allows multiple virtual machines to share a single physical GPU by dividing it into isolated fractions. Each VM is allocated a dedicated portion of the GPU’s resources (memory, compute, encoders, etc.) instead of using the entire GPU. This is achieved via Single-Root I/O Virtualization (SR-IOV), which provides a hardware-enforced isolation between GPU partitions, ensuring each VM can access only its assigned GPU fraction with predictable performance and security. In contrast, GPU Passthrough (also known as Discrete Device Assignment, DDA) assigns a whole physical GPU exclusively to one VM. With DDA, the VM gets full control of the GPU, but no other VMs can use that GPU simultaneously. GPU-P’s ability to time-slice or partition the GPU allows higher utilization and VM density for graphics or compute workloads, whereas DDA offers maximum performance for a single VM at the cost of flexibility.

GPU-P is ideal when you want to share a GPU among multiple VMs, such as for VDI desktops or AI inference tasks that only need a portion of a GPU’s power. DDA (passthrough) is preferred when a workload needs the full GPU (e.g. large model training) or when the GPU doesn’t support partitioning. Another major difference is mobility: GPU-P supports live VM mobility and failover clustering, meaning a VM using a GPU partition can move or restart on another host with minimal downtime. DDA-backed VMs cannot live-migrate. If you need to move a DDA VM, it must be powered off and then started on a target host (in clustering, a DDA VM will be restarted on a node with an available GPU upon failover, since live migration isn’t supported). Additionally, you cannot mix modes on the same device. A physical GPU can be either partitioned for GPU-P or passed through via DDA, but not both simultaneously.

Supported GPU Hardware and Driver Requirements

GPU Partitioning in Windows Server 2025 is supported on select GPU hardware that provides SR-IOV or similar virtualization capabilities, along with appropriate drivers. Only specific GPUs support GPU-P and you won’t be able to configure it on a consumer gaming GPU like your RTX 5090.

In addition to the GPU itself, certain platform features are required:

  • Modern CPU with IOMMU: The host processors must support Intel VT-d or AMD-Vi with DMA remapping (IOMMU). This is crucial for mapping device memory securely between host and VMs. Older processors lacking these enhancements may not fully support live migration of GPU partitions.
  • BIOS Settings: Ensure that in each host’s UEFI/BIOS, Intel VT-d/AMD-Vi and SR-IOV are enabled. These options may be under virtualization or PCIe settings. Without SR-IOV enabled at the firmware level, the OS will not recognize the GPU as partitionable (in Windows Admin Center it might show status “Paravirtualization” indicating the driver is capable but the platform isn’t).
  • Host GPU Drivers: Use vendor-provided drivers that support GPU virtualization. For NVIDIA, this means installing the NVIDIA virtual GPU (vGPU) driver on the Windows Server 2025 host (the driver package that supports GPU-P). Check the GPU vendor’s documentation for installation for specifics. After installing, you can verify the GPU’s status via PowerShell or WAC.
  • Guest VM Drivers: The guest VMs also need appropriate GPU drivers installed (within the VM’s OS) to make use of the virtual GPU. For instance, if using Windows 11 or Windows Server 2025 as a guest, install the GPU driver inside the VM (often the same data-center driver or a guest-compatible subset from the vGPU package) so that the GPU is usable for DirectX/OpenGL or CUDA in that VM. Linux guests (Ubuntu 18.04/20.04/22.04 are supported) likewise need the Linux driver installed. Guest OS support for GPU-P in WS2025 covers Windows 10/11, Windows Server 2019+, and certain Ubuntu LTS versions.

After hardware setup and driver installation, it’s important to verify that the host recognizes the GPU as “partitionable.” You can use Windows Admin Center or PowerShell for this: in WAC’s GPU tab, check the “Assigned status” of the GPU it should show “Partitioned” if everything is configured correctly (if it shows “Ready for DDA assignment” then the partitioning driver isn’t active, and if “Not assignable” then the GPU/driver doesn’t support either method). In PowerShell, you can run:

Get-VMHostPartitionableGpu | FL Name, ValidPartitionCounts, PartitionCount

This will list each GPU device’s identifier and what partition counts it supports. For example, an NVIDIA A40 might return ValidPartitionCounts : {16, 8, 4, 2 …} indicating the GPU can be split into 2, 4, 8, or 16 partitions, and also show the current PartitionCount setting (by default it may equal the max or current configured value). If no GPUs are listed, or the list is empty, the GPU is not recognized as partitionable (check drivers/BIOS). If the GPU is listed but ValidPartitionCounts is blank or shows only “1,” then it may not support SR-IOV and can only be used via DDA.

Enabling and Configuring GPU Partitioning

Once the hardware and drivers are ready, enabling GPU Partitioning involves configuring how the GPU will be divided and ensuring all Hyper-V hosts (especially in a cluster) have a consistent setup.

Each physical GPU must be configured with a partition count (how many partitions to create on that GPU). You cannot define an arbitrary number – it must be one of the supported counts reported by the hardware/driver. The default might be the maximum supported (e.g., 16). To set a specific partition count, use PowerShell on each host:

  1. Decide on a partition count that suits your workloads. Fewer partitions means each VM gets more GPU resources (more VRAM and compute per partition), whereas more partitions means you can assign the GPU to more VMs concurrently (each getting a smaller slice). For AI/ML, you might choose a moderate number – e.g. split a 24 GB GPU into 4 partitions of ~6 GB each for inference tasks.
  2. Run the Set-VMHostPartitionableGpu cmdlet. Provide the GPU’s device ID (from the Name field of the earlier Get-VMHostPartitionableGpu output) and the desired -PartitionCount. For example:
Set-VMHostPartitionableGpu -Name "<GPU-device-ID>" -PartitionCount 4

This would configure the GPU to be divided into 4 partitions. Repeat this for each GPU device if the host has multiple GPUs (or specify -Name accordingly for each). Verify the setting by running:

Get-VMHostPartitionableGpu | FL Name,PartitionCount

It should now show the PartitionCount set to your chosen value (e.g., PartitionCount : 4 for each listed GPU).

If you are in a clustered environment, apply the same partition count on every host in the cluster for all identical GPUs. Consistency is critical: a VM using a “quarter GPU” partition can only fail over to another host that also has its GPU split into quarters. Windows Admin Center will actually enforce this by warning you if you try to set mismatched counts on different nodes.

You can also configure the partition count via the WAC GUI. In WAC’s GPU partitions tool, select the GPU (or a set of homogeneous GPUs across hosts) and choose Configure partition count. WAC will present a dropdown of valid partition counts (as reported by the GPU). Selecting a number will show a tooltip of how much VRAM each partition would have (e.g., selecting 8 partitions on a 16 GB card might show ~2 GB per partition). WAC helps ensure you apply the change to all similar GPUs in the cluster together. After applying, it will update the partition count on each host automatically.

After this step, the physical GPUs on the host (or cluster) are partitioned into the configured number of virtual GPUs. They are now ready to be assigned to VMs. The host’s perspective will show each partition as a shareable resource. (Note: You cannot assign more partitions to VMs than the number configured)

Assigning GPU Partitions to Virtual Machines

With the GPU partitioned at the host level, the next step is to attach a GPU partition to a VM. This is analogous to plugging a virtual GPU device into the VM. Each VM can have at most one GPU partition device attached, so choose the VM that needs GPU acceleration and assign one partition to it. There are two main ways to do this: using PowerShell commands or using the Windows Admin Center UI. Below are the instructions for each method.

To add the GPU Partition to the VM use the Add-VMGpuPartitionAdapter cmdlet to attach a partitioned GPU to the VM. For example:

Add-VMGpuPartitionAdapter -VMName "<VMName>"

This will allocate one of the available GPU partitions on the host to the specified VM. (There is no parameter to specify which partition or GPU & Hyper-V will auto-select an available partition from a compatible GPU. If no partition is free or the host GPUs aren’t partitioned, this cmdlet will return an error)

You can check that the VM has a GPU partition attached by running:

Get-VMGpuPartitionAdapter -VMName "<VMName>" | FL InstancePath,PartitionId

This will show details like the GPU device instance path and a PartitionId for the VM’s GPU device. If you see an entry with an instance path (matching the GPU’s PCI ID) and a PartitionId, the partition is successfully attached.

Power on the VM. On boot, the VM’s OS will detect a new display adapter. In Windows guests, you should see a GPU in Device Manager (it may appear as a GPU with a specific model, or a virtual GPU device name). Install the appropriate GPU driver inside the VM if not already installed, so that the VM can fully utilize the GPU (for example, install NVIDIA drivers in the guest to get CUDA, DirectX, etc. working). Once the driver is active in the guest, the VM will be able to leverage the GPU partition for AI/ML computations or graphics rendering.

Using Windows Admin Center:

  1. Open Windows Admin Center and navigate to your Hyper-V cluster or host, then go to the GPUs extension. Ensure you have added the GPUs extension v2.8.0 or later to WAC.
  2. In the GPU Partitions tab, you’ll see a list of the physical GPUs and any existing partitions. Click on “+ Assign partition”. This opens an assignment wizard.
  3. Select the VM: First choose the host server where the target VM currently resides (WAC will list all servers in the cluster). Then select the VM from that host to assign a partition to. (If a VM is greyed out in the list, it likely already has a GPU partition assigned or is incompatible.)
  4. Select Partition Size (VRAM): Choose the partition size from the dropdown. WAC will list options that correspond to the partition counts you configured. For example, if the GPU is split into 4, you might see an option like “25% of GPU (≈4 GB)” or similar. Ensure this matches the partition count you set. You cannot assign more memory than a partition contains.
  5. Offline Action (HA option): If the VM is clustered and you want it to be highly available, check the option for “Configure offline action to force shutdown” (if presented in the UI).
  6. Proceed to assign. WAC will automatically: shut down the VM (if it was running), attach a GPU partition to it, and then power the VM back on. After a brief moment, the VM should come online with the GPU partition attached. In the WAC GPU partitions list, you will now see an entry showing the VM name under the GPU partition it’s using.

At this point, the VM is running with a virtual GPU. You can repeat the process for other VMs, up to the number of partitions available. Each physical GPU can only support a fixed number of active partitions equal to the PartitionCount set. If you attempt to assign more VMs than partitions, the additional VMs will not get a GPU (or the Add command will fail). Also note that a given VM can only occupy one partition on one GPU – you cannot span a single VM across multiple GPU partitions or across multiple GPUs with GPU-P.

GPU Partitioning in Clustered Environments (Failover Clustering)

One of the major benefits introduced with Windows Server 2025 is that GPU partitions can be used in Failover Clustering scenarios for high availability. This means you can have a Hyper-V cluster where VMs with virtual GPUs are clustered roles, capable of moving between hosts either through live migration (planned) or failover (unplanned). To utilize GPU-P in a cluster, you must pay special attention to configuration consistency and understand the current limitations:

  • Use Windows Server 2025 Datacenter: As mentioned, clustering features (like failover) for GPU partitions are supported only on Datacenter edition.
  • Homogeneous GPU Configuration: All hosts in the cluster should have identical GPU hardware and partitioning setup. Failover/Live Migration with GPU-P does not support mixing GPU models or partition sizes in a GPU-P cluster. Each host should have the same GPU model. The partition count configured (e.g., 4 or 8 etc.) must be the same on every host. This uniformity ensures that a VM expecting a certain size partition will find an equivalent on any other node.

Windows Server 2025 introduces support for live migrating VMs that have a GPU partition attached. However, there are important caveats:

    • Hardware support: Live migration with GPU-P requires that the hosts’ CPUs and chipsets fully support isolating DMA and device state. In practice, as noted, you need Intel VT-d or AMD-Vi enabled, and the CPUs ideally supporting “DMA bit tracking.” If this is in place, Hyper-V will attempt to live migrate the VM normally. During such a migration, the GPU’s state is not seamlessly copied like regular memory; instead, Windows will fallback to a slower migration process to preserve integrity. Specifically, when migrating a VM using GPU-P, Hyper-V automatically uses TCP/IP with compression (even if you have faster methods like RDMA configured). This is because device state transfer is more complex. The migration will still succeed, but you may notice higher CPU usage on the host and a longer migration time than usual.
    • Cross-node compatibility: Ensure that the GPU driver versions on all hosts are the same, and that each host has an available partition for the VM. If a VM is running and you trigger a live migrate, Hyper-V will find a target where the VM can get an identical partition. If none are free, the migration will not proceed (or the VM may have to be restarted elsewhere as a failover).
  • Failover (Unplanned Moves): If a host crashes or goes down, a clustered VM with a GPU partition will be automatically restarted on another node, much like any HA VM. The key difference is that the VM cannot save its state, so it will be a cold start on the new node, attaching to a new GPU partition there. When the VM comes up on the new node, it will request a GPU partition. Hyper-V will allocate one if available. If NodeB had no free partition (say all were assigned to other VMs), the VM might start but not get a GPU (and likely Windows would log an error that the virtual GPU could not start). Administrators should monitor and possibly leverage anti-affinity rules to avoid packing too many GPU VMs on one host if full automatic failover is required.

To learn more about GPU-P on Windows Server 2025, consult the documentation on Learn: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/gpu-partitioning

 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles