FAQ: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
== could not select device driver "" with capabilities: GPU ==
== Failed to load plugin io.containerd and could not use snapshotter ==
Reason - warning or information from the snapshotter<ref>https://dev.to/napicella/what-is-a-containerd-snapshotters-3eo2</ref> - image storage - that we have a lot of choices
 
Impact : the warning log doesn't impact the whole system operating
 
Solve to
 
1.Disable the snapshotter plugins which you don't need by updating config file for your system and restart containerd, like
<code># /etc/containerd/config.toml
disabled_plugins = ["cri", "btrfs"]</code>
2. To use ZFS, you need to mount ZFS dataset on  /var/lib/containerd/io.containerd.snapshotter.v1.zfs
 
3. To use btrfs, you need to mount btrfs to /var/lib/containerd/io.containerd.snapshotter.v1.btrfs
 
4. For aufs, you need to modprobe it as explained in the error log
 
== Could not select device driver "" with capabilities: GPU ==


* Reason - no nvidia-container-toolkit or currupt exist package
* Reason - no nvidia-container-toolkit or currupt exist package
Line 53: Line 69:


A VM configured with a vGPU that supports SR-IOV may fail to start, This issue occurs because [[PCIe]] [[AER (Advanced Error Reporting)]] [[support]] was disabled in the [[BIOS]] settings of the server.
A VM configured with a vGPU that supports SR-IOV may fail to start, This issue occurs because [[PCIe]] [[AER (Advanced Error Reporting)]] [[support]] was disabled in the [[BIOS]] settings of the server.
== Reference ==
<references />

Revision as of 15:23, 24 April 2023

Failed to load plugin io.containerd and could not use snapshotter

Reason - warning or information from the snapshotter[1] - image storage - that we have a lot of choices

Impact : the warning log doesn't impact the whole system operating

Solve to

1.Disable the snapshotter plugins which you don't need by updating config file for your system and restart containerd, like

# /etc/containerd/config.toml
disabled_plugins = ["cri", "btrfs"]

2. To use ZFS, you need to mount ZFS dataset on /var/lib/containerd/io.containerd.snapshotter.v1.zfs

3. To use btrfs, you need to mount btrfs to /var/lib/containerd/io.containerd.snapshotter.v1.btrfs

4. For aufs, you need to modprobe it as explained in the error log

Could not select device driver "" with capabilities: GPU

  • Reason - no nvidia-container-toolkit or currupt exist package
  • Solve to install/reinstall nvidia-container-toolkit then restart docker daemon
$distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ 
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ 
    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list 

$sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

$sudo systemctl restart docker

Pytorch FAQ

  • How to get CUDA compute capability of a GPU?
    • $python -c "import torch; print(torch.cuda.get_arch_list())"

Show List Of Network Cards on Linux

  • lspci command : List all PCI devices.
    • #lspci | egrep -i --color 'network|ethernet'
    • #lspci | egrep -i --color 'network|ethernet|wireless|wi-fi'
  • lshw command : Linux identify Ethernet interfaces and NIC hardware.
    • #lshw -class network
    • $sudo lshw -class network -short
  • dmidecode command : List all hardware data from BIOS.
  • ifconfig command : Outdated network config
    • $ifconfig -a
    • $ip link show
    • $ip a
  • ip command : Recommended new network config .
    • $ip a show wlp82s0
    • $ip -br -c link show # To list all interface, link status, MAC address, etc
    • $ip -br -c addr show # similar list with IP address instead of MAC Address
  • hwinfo command : Probe Linux for network cards.
    • $sudo hwinfo --network --short
  • ethtool command : See NIC/card driver and settings on Linux.
    • $sudo ethtool -i eno1
    • $sudo ethtool -i enp0s31f6
  • /proc/net/dev file - The dev pseudo-file contains network device status information. This gives the number of received and sent packets, the number of errors and collisions and other basic statistics
    • $cat /proc/net/dev

Failed to set iommu for container: Invalid argument

A VM configured with a vGPU that supports SR-IOV may fail to start, This issue occurs because PCIe AER (Advanced Error Reporting) support was disabled in the BIOS settings of the server.

Reference