FAQ

From HPCWIKI
Revision as of 15:23, 24 April 2023 by Admin (talk | contribs)
Jump to navigation Jump to search

Failed to load plugin io.containerd and could not use snapshotter

Reason - warning or information from the snapshotter[1] - image storage - that we have a lot of choices

Impact : the warning log doesn't impact the whole system operating

Solve to

1.Disable the snapshotter plugins which you don't need by updating config file for your system and restart containerd, like

# /etc/containerd/config.toml
disabled_plugins = ["cri", "btrfs"]

2. To use ZFS, you need to mount ZFS dataset on /var/lib/containerd/io.containerd.snapshotter.v1.zfs

3. To use btrfs, you need to mount btrfs to /var/lib/containerd/io.containerd.snapshotter.v1.btrfs

4. For aufs, you need to modprobe it as explained in the error log

Could not select device driver "" with capabilities: GPU

  • Reason - no nvidia-container-toolkit or currupt exist package
  • Solve to install/reinstall nvidia-container-toolkit then restart docker daemon
$distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ 
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ 
    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list 

$sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

$sudo systemctl restart docker

Pytorch FAQ

  • How to get CUDA compute capability of a GPU?
    • $python -c "import torch; print(torch.cuda.get_arch_list())"

Show List Of Network Cards on Linux

  • lspci command : List all PCI devices.
    • #lspci | egrep -i --color 'network|ethernet'
    • #lspci | egrep -i --color 'network|ethernet|wireless|wi-fi'
  • lshw command : Linux identify Ethernet interfaces and NIC hardware.
    • #lshw -class network
    • $sudo lshw -class network -short
  • dmidecode command : List all hardware data from BIOS.
  • ifconfig command : Outdated network config
    • $ifconfig -a
    • $ip link show
    • $ip a
  • ip command : Recommended new network config .
    • $ip a show wlp82s0
    • $ip -br -c link show # To list all interface, link status, MAC address, etc
    • $ip -br -c addr show # similar list with IP address instead of MAC Address
  • hwinfo command : Probe Linux for network cards.
    • $sudo hwinfo --network --short
  • ethtool command : See NIC/card driver and settings on Linux.
    • $sudo ethtool -i eno1
    • $sudo ethtool -i enp0s31f6
  • /proc/net/dev file - The dev pseudo-file contains network device status information. This gives the number of received and sent packets, the number of errors and collisions and other basic statistics
    • $cat /proc/net/dev

Failed to set iommu for container: Invalid argument

A VM configured with a vGPU that supports SR-IOV may fail to start, This issue occurs because PCIe AER (Advanced Error Reporting) support was disabled in the BIOS settings of the server.

Reference