Thursday, February 14, 2019

GPU passthrough on KVM

 

A use case came up where multiple virtual machines needed access to various video cards.

 

We had the following for the host:

 

·      ESC8000 G4

·      2x Intel Xeon Gold 6140

·      4x NVIDIA QUADRO RTX 6000

·      4x NVIDIA TESLA Pascal P100

·      Ubuntu 16.04 (Xenial)

 

Before installing confirm the host BIOS is capable of pass though (IOMMU or VT-d) and enabled.

 

It looks something like this:

 

And of course make sure you enable VMX

 

 

See if it is already enabled in the kernel

DMAR: IOMMU enabled

 

Prepare the host kernel for passthrough

 

Add intel_iommu=on into the grub file

 

GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"

 

Locate the PCI bus address of the GPUs that will be passed through

 

ubuntu@asus_gpu:~$ lspci | grep -i nvidia

1d:00.0 VGA compatible controller: NVIDIA Corporation Device 1e30 (rev a1)

1d:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)

1d:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)

1d:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)

1e:00.0 VGA compatible controller: NVIDIA Corporation Device 1e30 (rev a1)

1e:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)

1e:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)

1e:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)

1f:00.0 VGA compatible controller: NVIDIA Corporation Device 1e30 (rev a1)

1f:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)

1f:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)

1f:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)

20:00.0 VGA compatible controller: NVIDIA Corporation Device 1e30 (rev a1)

20:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)

20:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)

20:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)

21:00.0 3D controller: NVIDIA Corporation Device 15f8 (rev a1)

22:00.0 3D controller: NVIDIA Corporation Device 15f8 (rev a1)

23:00.0 3D controller: NVIDIA Corporation Device 15f8 (rev a1)

24:00.0 3D controller: NVIDIA Corporation Device 15f8 (rev a1)

ubuntu@asus_gpu:~$

 

 

Look at the kernel drivers and vendor and device codes

 

For one of the Tesla P100

ubuntu@asus_gpu:~$ lspci -nn -k -s 21:00.0

21:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)

              Subsystem: NVIDIA Corporation Device [10de:118f]

              Kernel driver in use: nouveau

              Kernel modules: nouveau

 

For one of the RTX 6000, there are 4 because of the audio and other stuff I presume

 

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.0

1d:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e30] (rev a1)

                  Subsystem: NVIDIA Corporation Device [10de:12ba]

                  Kernel driver in use: nouveau

                  Kernel modules: nvidiafb, nouveau

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.1

1d:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)

                  Subsystem: NVIDIA Corporation Device [10de:12ba]

                  Kernel driver in use: snd_hda_intel

                  Kernel modules: snd_hda_intel

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.2

1d:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)

                  Subsystem: NVIDIA Corporation Device [10de:12ba]

                  Kernel driver in use: nouveau

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.3

1d:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)

                  Subsystem: NVIDIA Corporation Device [10de:12ba]

                  Kernel driver in use: nouveau

ubuntu@asus_gpu:~$

 

 

Update the file /etc/initramfs-tools/modules

 

With the following and fill in the device you want to pass through

vfio

vfio_iommu_type1

vfio_pci ids=10de:1e30,10de:10f7,10de:1ad6,10de:1ad7,10de:15f8 vhost-net

 

Update the /etc/modules file

 

vfio

vfio_iommu_type1

vfio_pci ids=10de:1e30,10de:10f7,10de:1ad6,10de:1ad7,10de:15f8 vhost-net

 

 

Because you updated grub and stuff run the following two commands then reboot.

 

 

sudo update-grub

sudo update-initramfs -u

 

 

Once it comes back up, see if the passthrough is enabled and if the drivers are assigned correctly.

 

 

ubuntu@asus_gpu:~$ lspci -nn -k -s 21:00.0

21:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)

              Subsystem: NVIDIA Corporation Device [10de:118f]

              Kernel driver in use: vfio-pci

              Kernel modules: nvidiafb, nouveau

ubuntu@asus_gpu:~$

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.0

1d:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e30] (rev a1)

              Subsystem: NVIDIA Corporation Device [10de:12ba]

              Kernel driver in use: vfio-pci

              Kernel modules: nvidiafb, nouveau

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.1

1d:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)

              Subsystem: NVIDIA Corporation Device [10de:12ba]

              Kernel driver in use: vfio-pci

              Kernel modules: snd_hda_intel

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.2

1d:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)

              Subsystem: NVIDIA Corporation Device [10de:12ba]

              Kernel driver in use: vfio-pci

ubuntu@asus_gpu:~$ lspci -nn -k -s 1d:00.3

1d:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)

              Subsystem: NVIDIA Corporation Device [10de:12ba]

                  Kernel driver in use: vfio-pci

 

 

If the Kernel driver in use: is something other than vfio-pci double check your addresses.

 

Building the VM

 

Launch a Ubuntu KVM VM as you normally would. Once it is up and running start attaching the stuff you want by updating the KVM XML file

 

You can modify it manually using virsh edit or import it in:

 

Locate the address needed for the PCI – this would be the bus number in hex format, in my case it’s bus='0x21'

 

ubuntu@asus_gpu:~$ virsh nodedev-dumpxml pci_0000_21_00_0

<device>

  <name>pci_0000_21_00_0</name>

  <path>/sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:08.0/0000:21:00.0</path>

  <parent>pci_0000_19_08_0</parent>

  <driver>

    <name>vfio-pci</name>

  </driver>

  <capability type='pci'>

    <domain>0</domain>

    <bus>33</bus>

    <slot>0</slot>

    <function>0</function>

    <product id='0x15f8' />

    <vendor id='0x10de'>NVIDIA Corporation</vendor>

    <iommuGroup number='41'>

      <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/>

    </iommuGroup>

    <numa node='0'/>

    <pci-express>

      <link validity='cap' port='8' speed='8' width='16'/>

      <link validity='sta' speed='8' width='16'/>

    </pci-express>

  </capability>

</device>

 

 

ubuntu@asus_gpu:~$

 

 

Create an XML with the following and plug in the information you got earlier

 

<hostdev mode='subsystem' type='pci' managed='yes'>

  <driver name='vfio'/>

  <source>

    <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/>

  </source>

  <alias name='hostdev0'/>

  <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>

</hostdev>

 

 

 

Attach it with the following command

 

virsh attach-device (VM Name) --file ~/(your_xml_file) --config"

 

 

This can all be done manually or from the GUI using Virt Manager as well. The RTX 6000 will need all 4 devices attached to function properly.

 

Shutdown the VM and the start it back up.

 

Login to the VM and run lspci | grep -i nvidia and confirm you see the GPU

 

 

NOTE:

 

Some Tesla GPUs will function without defining the CPU on the systems, some will not.

The ones that do not will still show up in the VM and look like it should function, but it will throw arbitrary errors when attempting to access the GPU.

 

Errors like:

clCreateContext(): CL_OUT_OF_RESOURCES

or

code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”

 

strace will show device reading the memory but stops when attempting to write

 

Update the CPU information in KVM and define it something other than the default hypervisor

 

<cpu mode='custom' match='exact'>

    <model fallback='allow'>Broadwell-IBRS</model>

  </cpu>

 

Shutdown and restart the VM for changes to take effect.

 

Install the nvidia drivers and cuda on your VM and have fun.

 

It’s always a hassle for me to install drivers so just paste the following if you want to do it quickly

 

sudo apt update

 

sudo apt install wget -y

 

 

# Download the files to install

# Nvidia 410.79 driver

wget http://us.download.nvidia.com/tesla/410.79/nvidia-diag-driver-local-repo-ubuntu1604-410.79_1.0-1_amd64.deb

# Nvidia Cuda 10 installer

wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64

 

sudo dpkg -i cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64

sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub

sudo apt-get update

sudo apt-get install cuda -y

sudo apt install nvidia-cuda-toolkit -y

# reboot

 

All of this can be easily scripted – feel free to hit me up for it but good chance you might be ignored since I don’t log on that often. Next chapter will be multiple containers concurrently accessing the GPU