Nvdia-smi tips and tricks: Difference between revisions
Jump to navigation
Jump to search
(5 intermediate revisions by the same user not shown) | |||
Line 19: | Line 19: | ||
| No running processes found | | | No running processes found | | ||
+---------------------------------------------------------------------------------------+ | +---------------------------------------------------------------------------------------+ | ||
</syntaxhighlight> | |||
{| class="wikitable" | |||
|+ | |||
!Property name | |||
!Anotation | |||
!Meaning | |||
|- | |||
|Performance State | |||
|Perf | |||
|States range from P0 (maxi-mum performance) to P12 (minimum performance). | |||
|} | |||
== Disable or enable GPU == | |||
<syntaxhighlight lang="bash"> | |||
##Disable GPU, where xx is PCIe bus number from lspci | |||
$sudo nvidia-smi -i 0000:xx:00.0 -pm 0 | |||
$sudo nvidia-smi drain -p 0000:xx:00.0 -m 1 | |||
## The device will still be visible with lspci after running the commands above. | |||
#Enable GPU | |||
$sudo nvidia-smi drain -p 0000:xx:00.0 -m 0 | |||
</syntaxhighlight> | </syntaxhighlight> | ||
== --gpus options == | |||
To use --gpus options with [[docker]] on Ubuntu | |||
without nvidia-container-toolkit, docker with --gpus options makes following error | |||
docker: Error response from daemon: could not select device driver "" with capabilities: --gpus options | |||
nvidia-container-toolkit package will solve the issue | |||
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit | |||
== Turn on / off ECC<ref>https://thelinuxcluster.com/2013/07/24/turning-off-and-on-ecc-ram-for-nvidia-gp-gpu-cards/</ref> == | == Turn on / off ECC<ref>https://thelinuxcluster.com/2013/07/24/turning-off-and-on-ecc-ram-for-nvidia-gp-gpu-cards/</ref> == | ||
Line 33: | Line 64: | ||
== Reset GPU == | == Reset GPU == | ||
# nvidia-smi -g 0 --gpu-reset | # nvidia-smi -g 0 --gpu-reset | ||
== GPU mode == | |||
The mode of the GPU is established directly at power-on, from settings stored in the GPU’s non-volatile memory. | |||
gpumodeswitch changes the mode of the GPU by updating the GPU’s non-volatile memory settings. | |||
Compute mode is a configuration that is optimized for high-performance computing (HPC) applications, Compute mode can cause compatibility problems with OS and hypervisors when the GPU is used primarily as a graphics device. | |||
Graphic mode | |||
== Compute mode == | == Compute mode == | ||
Line 40: | Line 82: | ||
!number | !number | ||
!Mode | !Mode | ||
! | !Meaning | ||
|- | |- | ||
|<code>0</code> | |<code>0</code> | ||
|<code>Default</code> | |<code>Default</code> | ||
| | |Default mode GPU can be shared with several jobs, | ||
|- | |- | ||
|<code>1</code> | |<code>1</code> | ||
|<code>Exclusive_Thread</code> | |<code>Exclusive_Thread</code> | ||
| | |Exclusive thread mode only is allowed to run one job, but in the same time, only one thread runs on exclusive thread mode GPU. | ||
|- | |- | ||
|<code>2</code> | |<code>2</code> | ||
|<code>Prohibited</code> | |<code>Prohibited</code> | ||
| | |prohibited mode GPU is not allowed to run job, | ||
|- | |- | ||
|<code>3</code> | |<code>3</code> | ||
|<code>Exclusive_Process</code> | |<code>Exclusive_Process</code> | ||
| | |Exclusive process mode is allowed to run one job, but in the same time, only one process runs on exclusive process mode GPU. | ||
|} | |} | ||
== References == | == References == | ||
<references /> | <references /> |
Latest revision as of 10:57, 27 August 2024
Ouput example
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05 Driver Version: 535.86.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:D8:00.0 Off | Off |
| 30% 42C P8 38W / 450W | 2MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Property name | Anotation | Meaning |
---|---|---|
Performance State | Perf | States range from P0 (maxi-mum performance) to P12 (minimum performance). |
Disable or enable GPU
##Disable GPU, where xx is PCIe bus number from lspci
$sudo nvidia-smi -i 0000:xx:00.0 -pm 0
$sudo nvidia-smi drain -p 0000:xx:00.0 -m 1
## The device will still be visible with lspci after running the commands above.
#Enable GPU
$sudo nvidia-smi drain -p 0000:xx:00.0 -m 0
--gpus options
To use --gpus options with docker on Ubuntu
without nvidia-container-toolkit, docker with --gpus options makes following error
docker: Error response from daemon: could not select device driver "" with capabilities: --gpus options
nvidia-container-toolkit package will solve the issue
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Turn on / off ECC[1]
To Turn off the ECC RAM
# nvidia-smi -g 0 --ecc-config=0 (repeat with -g x for each GPU ID)
To Turn back on ECC RAM
# nvidia-smi -g 0 --ecc-config=1 (repeat with -g x for each GPU ID)
To Reset ECC error[2]
# nvidia-smi -g 0 --reset-ecc-errors=TYPE (0|VOLATILE or 1|AGGREGATE)
Reset GPU
# nvidia-smi -g 0 --gpu-reset
GPU mode
The mode of the GPU is established directly at power-on, from settings stored in the GPU’s non-volatile memory.
gpumodeswitch changes the mode of the GPU by updating the GPU’s non-volatile memory settings.
Compute mode is a configuration that is optimized for high-performance computing (HPC) applications, Compute mode can cause compatibility problems with OS and hypervisors when the GPU is used primarily as a graphics device.
Graphic mode
Compute mode
#nvidia-smi -g0 -c <mode number>
number | Mode | Meaning |
---|---|---|
0
|
Default
|
Default mode GPU can be shared with several jobs, |
1
|
Exclusive_Thread
|
Exclusive thread mode only is allowed to run one job, but in the same time, only one thread runs on exclusive thread mode GPU. |
2
|
Prohibited
|
prohibited mode GPU is not allowed to run job, |
3
|
Exclusive_Process
|
Exclusive process mode is allowed to run one job, but in the same time, only one process runs on exclusive process mode GPU. |