Find PCIe device location in system

From HPCWIKI
Jump to navigation Jump to search

In GPGPU system for AI, it is common to install multiple same model of GPU on multiple PCIe bus in one system.

When one or more specific located GPU needs to replace or maintaining, how to identify the location of specific GPU in the sytem?

In Linux, there is a way to find out which PCI card is plugged into which PCI slot.


For example, if there is two identical GPUs in a system we can identify where they are located with following step

# list all vga device
$ lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1)
45:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
81:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1)

# NVIDIA GPU is located at 01:00.0 and 81:00.0 in the system

# identify the PCIe bus location, we can use dmidecode and pipe to filter 
$ sudo dmidecode -t slot | grep -e Designation -e PCIE -e Bus -e Type
        Designation: PCIE1
        Type: x16 PCI Express
        Designation: PCIE2
        Type: x16 PCI Express
        Bus Address: 0000:02:00.0
        Designation: PCIE3
        Type: x16 PCI Express
        Designation: PCIE4
        Type: x16 PCI Express
        Designation: PCIE5
        Type: x16 PCI Express
        Bus Address: 0000:01:00.0
        Designation: PCIE6
        Type: x16 PCI Express
        Designation: PCIE7
        Type: x16 PCI Express
        Bus Address: 0000:81:00.0
        Designation: OCU1
        Type: x4 PCI Express
        Designation: OCU2
        Type: x4 PCI Express
        Designation: M2_1
        Type: x4 PCI Express
        Bus Address: 0000:02:00.0
        Designation: M2_2
        Type: x4 PCI Express
# Now we can identify PCIE5 and PCIE7 has NVIDIA GPU

# Then reference your system manual to identify where is the PCIe bus 5 and 7

References