Nvdia-smi tips and tricks: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
Line 28: Line 28:
  # nvidia-smi -g 0 --ecc-config=1
  # nvidia-smi -g 0 --ecc-config=1
  (repeat with -g x for each GPU ID)
  (repeat with -g x for each GPU ID)
To Reset ECC error<ref>https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf</ref>
# nvidia-smi -g 0 --reset-ecc-errors=TYPE (0|VOLATILE or 1|AGGREGATE)
== Reset GPU ==
# nvidia-smi -g 0 --gpu-reset
== Compute mode ==
<code>#nvidia-smi -g0 -c <mode number></code>
{| class="wikitable"
|+
!number
!Mode
!
|-
|<code>0</code>
|<code>Default</code>
|
|-
|<code>1</code>
|<code>Exclusive_Thread</code>
|
|-
|<code>2</code>
|<code>Prohibited</code>
|
|-
|<code>3</code>
|<code>Exclusive_Process</code>
|
|}


== References ==
== References ==
<references />
<references />

Revision as of 11:16, 26 July 2023

Ouput example

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:D8:00.0 Off |                  Off |
| 30%   42C    P8              38W / 450W |      2MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Turn on / off ECC[1]

To Turn off the ECC RAM

# nvidia-smi -g 0 --ecc-config=0
(repeat with -g x for each GPU ID)

To Turn back on ECC RAM

# nvidia-smi -g 0 --ecc-config=1
(repeat with -g x for each GPU ID)

To Reset ECC error[2]

# nvidia-smi -g 0 --reset-ecc-errors=TYPE (0|VOLATILE or 1|AGGREGATE)

Reset GPU

# nvidia-smi -g 0 --gpu-reset

Compute mode

#nvidia-smi -g0 -c <mode number>

number Mode
0 Default
1 Exclusive_Thread
2 Prohibited
3 Exclusive_Process

References