Hyper-V on WS2025: 80–90% 4K Q1/T1 hit inside VM — how do I reduce vSCSI small-IO latency?

Question

Hyper-V on WS2025: 80–90% 4K Q1/T1 hit inside VM — how do I reduce vSCSI small-IO latency?

ReubenTishkoff 10

Goal: Understand and reduce the large 4K Q1/T1 penalty inside a VM vs host on Windows Server 2025 (latest August CU).

Setup

Host: WS2025, Hyper-V role, NUMA Spanning OFF, High Performance. Dell PowerEdge R7725, 2× AMD EPYC 9175F (64 LP).
Drives (direct to CPU, no RAID): • Intel Optane P5800X 800 GB (U.2, PCIe 4.0 x4) • Solidigm D7-PS1030 3.2 TB (E3.S, PCIe 5.0 x4)
VM (Gen2): WS2025, 32 vCPU (single vNUMA), fixed VHDX 4K/4K, guest NTFS 64K, CPU Groups pinned to one NUMA node. Defender exclusions in place.

Method (DiskSpd, host & VM; same args)

-c64G -W10 -d60 -Sh -L
4K Q1T1 (r/w), 4K Q32T1 (r), 4K T32Q1 (r),
SEQ 1MiB Q1T1 (w), SEQ 1MiB Q8T1 (r)

Host runs pinned with cmd /NODE 1 /AFFINITY FFFFFFFF00000000. Tested both local and remote NVMe to that node. VM tested with VHDX on local and remote nodes (letters M/K vs F/E).

Headline results (examples)

4K Q1T1 (IOPS/latency): VM is ~−80–90% vs host on both drives (e.g., Optane Read 113k → 22–23k IOPS; Solidigm Read 90k → ~8k IOPS).
4K Q32T1 Read: VM ≈ host or better on Optane (+~14%).
SEQ 1MiB Q1T1 Write (logs): VM −25–27%.
SEQ 1MiB Q8T1 Read: VM ≈ parity with host.
VM local vs remote NUMA storage: ±0–6% only (vSCSI/VMBus overhead dominates).

Questions

Is a ~80–90% 4K Q1/T1 drop inside a VM expected today on WS2025 (vSCSI/StorVSP/StorVSC), or am I missing a tuning?
Any supported knobs to reduce small-IO latency in VMs (queues/channels, interrupt moderation, ring buffers, controller settings)?
Beyond fixed VHDX 4K/4K + NTFS 64K, any layout best practices for low latency? Is DDA the only practical way to narrow the Q1/T1 gap?
Is there an ETA/plan for vNVMe or multi-queue virtual storage in Hyper-V?

I can attach DiskSpd XML logs if helpful. Thanks!

2 answers

Your answer

Answer 1

Dear ReubenTishkoff,

Based on your setup and DiskSpd results, you've identified a significant performance drop (~80–90%) in 4K Q1/T1 workloads inside the VM compared to the host, while other workloads (e.g., Q32T1 and sequential reads/writes) show near parity or acceptable deltas. Your configuration—including NUMA pinning, fixed VHDX with 4K block size, NTFS 64K allocation, and Defender exclusions—is well-optimized for low-latency scenarios.

1. Is the 4K Q1/T1 drop expected in current Hyper-V architecture? Yes, this behavior is consistent with known limitations of the vSCSI stack (StorVSP/StorVSC) in Hyper-V. Small I/O operations with low queue depth are particularly sensitive to virtualization overhead, especially when using virtual disk layers (VHDX) and synthetic storage paths.

2. Are there supported tuning options to reduce small-IO latency? While there are no direct knobs to eliminate the Q1/T1 gap entirely, the following adjustments may help mitigate latency:

Use SCSI Controller Type 0 for minimal overhead

Increase the number of virtual storage queues via registry or PowerShell (limited support)

Review interrupt moderation settings on the host NIC and storage controller

Ensure latest integration services and VM configuration version are applied

Use fixed-size VHDX over dynamic (already in place in your setup)

3. Are there layout best practices beyond VHDX 4K/4K + NTFS 64K? For latency-sensitive workloads, consider:

Pass-through disks (though limited in flexibility)

Direct Device Assignment (DDA) for NVMe devices, which bypasses the virtual storage stack entirely

Avoid placing VHDX on SMB shares or tiered storage unless RDMA is enabled

4. Is DDA the only practical way to narrow the Q1/T1 gap? Currently, DDA remains the most effective method to achieve near-host performance for small I/O workloads. It provides direct access to PCIe devices and eliminates virtualization overhead, but it requires exclusive device access and is best suited for dedicated workloads.

5. Is there an ETA or roadmap for vNVMe or multi-queue virtual storage in Hyper-V? Microsoft has acknowledged the need for multi-queue virtual storage and vNVMe support in Hyper-V. While no public ETA is available, these features are under active consideration for future releases. You can follow updates via the Windows Server Tech Community.

I hope this helps. Just kindly tick Accept Answer that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

Best regards,

Domic Vo

Answer 2

Thanks for the quick answer! The explanation makes sense (vSCSI overhead on small-IO, Q1/T1), and I can confirm my environment matches your recommendations: WS2025 (Aug CU), fixed VHDX 4K/4K, NTFS 64K, High Performance, Defender exclusions, NUMA pinning (CPU Groups), and identical DiskSpd runs on host vs VM.

A few specific follow-ups so I can re-test with supported knobs:

“Use SCSI Controller Type 0” — do you mean attach VHDX to SCSI Controller index 0? Are controllers 0–3 identical in Hyper-V, or is there a measurable overhead difference? Any official doc on this?

“Increase the number of virtual storage queues” — could you share the supported way to do this on WS2025 (PowerShell/registry names, scope: per-VM, per-controller, per-disk)? A concrete example would help a lot.

Interrupt moderation — is there guidance that impacts the storage path (StorVSP/StorVSC) specifically? I can certainly tune NIC moderation, but I’d love a pointer if there is a storage-side knob that’s supported.

DDA for NVMe — acknowledged. To confirm constraints on WS2025: no checkpoints, no live migration, exclusive device, etc.? Any updated doc you recommend before I prototype this?

Finally, is there any public reference stating that a ~80–90% 4K Q1/T1 drop vs host is expected today for vSCSI on WS2025? That would help me set expectations internally.

Share via

Hyper-V on WS2025: 80–90% 4K Q1/T1 hit inside VM — how do I reduce vSCSI small-IO latency?

2 answers

Your answer