FAQ: Difference between revisions
No edit summary |
|||
Line 1: | Line 1: | ||
== System reboot - by Software or Hardware ? == | |||
If kernel.panic system parameter is <code>0,</code> it is turned off automatic reboot on panic, any other value is the number of seconds it wait before reboot. | |||
With <code>sysctl -w kernel.panic=0</code> you would turn it off, if it is not already off. | |||
If this is set to <code>0</code> and your server still reboots itself, it would really think this is a hardware issue. If this stops the automatic rebooting, then we know the reboot is caused by a watchdog timer or other software issue | |||
== Failed to load plugin io.containerd and could not use snapshotter == | == Failed to load plugin io.containerd and could not use snapshotter == | ||
Line 7: | Line 15: | ||
1.Disable the snapshotter plugins which you don't need by updating config file for your system and restart containerd, like | 1.Disable the snapshotter plugins which you don't need by updating config file for your system and restart containerd, like | ||
<# /etc/containerd/config.toml | <# /etc/containerd/config.toml | ||
disabled_plugins = ["cri", "btrfs"] | disabled_plugins = ["cri", "btrfs"] | ||
2. To use ZFS, you need to mount ZFS dataset on /var/lib/containerd/io.containerd.snapshotter.v1.zfs | 2. To use ZFS, you need to mount ZFS dataset on /var/lib/containerd/io.containerd.snapshotter.v1.zfs | ||
Revision as of 18:37, 24 April 2023
System reboot - by Software or Hardware ?
If kernel.panic system parameter is 0,
it is turned off automatic reboot on panic, any other value is the number of seconds it wait before reboot.
With sysctl -w kernel.panic=0
you would turn it off, if it is not already off.
If this is set to 0
and your server still reboots itself, it would really think this is a hardware issue. If this stops the automatic rebooting, then we know the reboot is caused by a watchdog timer or other software issue
Failed to load plugin io.containerd and could not use snapshotter
- Reason - warning or information from the snapshotter[1] - image storage - that we have a lot of choices
- Impact : the warning log doesn't impact the whole system operating
- Solve to
1.Disable the snapshotter plugins which you don't need by updating config file for your system and restart containerd, like
<# /etc/containerd/config.toml disabled_plugins = ["cri", "btrfs"]
2. To use ZFS, you need to mount ZFS dataset on /var/lib/containerd/io.containerd.snapshotter.v1.zfs
3. To use btrfs, you need to mount btrfs to /var/lib/containerd/io.containerd.snapshotter.v1.btrfs
4. For aufs, you need to modprobe it as explained in the error log
Could not select device driver "" with capabilities: GPU
- Reason - no nvidia-container-toolkit or currupt exist package
- Solve to install/reinstall nvidia-container-toolkit then restart docker daemon
$distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$sudo systemctl restart docker
Pytorch FAQ
- How to get CUDA compute capability of a GPU?
- $python -c "import torch; print(torch.cuda.get_arch_list())"
Show List Of Network Cards on Linux
- lspci command : List all PCI devices.
#lspci | egrep -i --color 'network|ethernet'
#lspci | egrep -i --color 'network|ethernet|wireless|wi-fi'
- lshw command : Linux identify Ethernet interfaces and NIC hardware.
#lshw -class network
- $sudo lshw -class network -short
- dmidecode command : List all hardware data from BIOS.
- ifconfig command : Outdated network config
$ifconfig -a
$ip link show
$ip a
- ip command : Recommended new network config .
$ip a show wlp82s0
$ip -br -c link show # To list all interface, link status, MAC address, etc
$ip -br -c addr show # similar list with IP address instead of MAC Address
- hwinfo command : Probe Linux for network cards.
$sudo hwinfo --network --short
- ethtool command : See NIC/card driver and settings on Linux.
$sudo ethtool -i eno1
$sudo ethtool -i enp0s31f6
- /proc/net/dev file - The dev pseudo-file contains network device status information. This gives the number of received and sent packets, the number of errors and collisions and other basic statistics
$cat /proc/net/dev
Failed to set iommu for container: Invalid argument
A VM configured with a vGPU that supports SR-IOV may fail to start, This issue occurs because PCIe AER (Advanced Error Reporting) support was disabled in the BIOS settings of the server.