Nvdia-smi tips and tricks

From HPCWIKI
Jump to navigation Jump to search

Ouput example

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:D8:00.0 Off |                  Off |
| 30%   42C    P8              38W / 450W |      2MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Turn on / off ECC[1]

To Turn off the ECC RAM

# nvidia-smi -g 0 --ecc-config=0
(repeat with -g x for each GPU ID)

To Turn back on ECC RAM

# nvidia-smi -g 0 --ecc-config=1
(repeat with -g x for each GPU ID)

To Reset ECC error[2]

# nvidia-smi -g 0 --reset-ecc-errors=TYPE (0|VOLATILE or 1|AGGREGATE)

Reset GPU

# nvidia-smi -g 0 --gpu-reset

Compute mode

#nvidia-smi -g0 -c <mode number>

number Mode
0 Default
1 Exclusive_Thread
2 Prohibited
3 Exclusive_Process

References