MIG: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
Line 74: Line 74:
  +-----------------------------------------------------------------------------+
  +-----------------------------------------------------------------------------+
List the possible placements available using the following command
List the possible placements available using the following command
==


== Enable MIG Mode ==
When MIG is enabled on the GPU, depending on the GPU product, the driver will attempt to reset the GPU so that MIG mode can                          take effect.                                                       
When MIG is enabled on the GPU, depending on the GPU product, the driver will attempt to reset the GPU so that MIG mode can                          take effect.                                                       
   
   
Line 90: Line 87:
<nowiki>**</nowiki> Reference [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/ MIG user guide] for various post processing of MIG configuration change to take effect.
<nowiki>**</nowiki> Reference [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/ MIG user guide] for various post processing of MIG configuration change to take effect.


== List GPU Instance Profiles ==
List the possible placements available using the following command
List the possible placements available using the following command
  <kbd>$ nvidia-smi mig -lgipp</kbd>
  <kbd>$ nvidia-smi mig -lgipp</kbd>
Line 99: Line 97:
  GPU  0 Profile ID  5 Placement : {0}:4
  GPU  0 Profile ID  5 Placement : {0}:4
  GPU  0 Profile ID  0 Placement : {0}:8
  GPU  0 Profile ID  0 Placement : {0}:8
== Creating GPU Instances ==
Without creating GPU instances (and corresponding compute instances), CUDA workloads cannot be run on the GPU.
create GPU instances using the <samp>-cgi</samp> option. One of three options can be used to specify the instance profiles to be created:
# Profile ID (e.g. 9, 14, 5)
# Short name of the profile (e.g. <samp>3g.20gb</samp>
# Full profile name of the instance (e.g. <samp>MIG 3g.20gb</samp>)                               
Once the GPU instances are created, one needs to create the corresponding Compute Instances (CI). By using the                            <samp>-C</samp> option 
Example, create two GPU instances (of type <samp>3g.20gb</samp>), with each GPU instance having half of the available compute and memory capacityv
<kbd>$ sudo nvidia-smi mig -cgi 9,3g.20gb -C</kbd>
Successfully created GPU instance ID  2 on GPU  0 using profile MIG 3g.20gb (ID  9)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  2 using profile MIG 3g.20gb (ID  2)
Successfully created GPU instance ID  1 on GPU  0 using profile MIG 3g.20gb (ID  9)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  1 using profile MIG 3g.20gb (ID  2)
$ sudo nvidia-smi mig -lgi
+----------------------------------------------------+
| GPU instances:                                    |
| GPU  Name          Profile  Instance  Placement  |
|                      ID      ID      Start:Size |
|====================================================|
|  0  MIG 3g.20gb      9        1          4:4    |
+----------------------------------------------------+
|  0  MIG 3g.20gb      9        2          0:4    |
+----------------------------------------------------+
#Now verify that the GIs and corresponding CIs are created:
$ nvidia-smi
+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared        |
|      ID  ID  Dev |                      | SM    Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                      |
|==================+======================+===========+=======================|
|  0    1  0  0  |    11MiB / 20224MiB | 42      0 |  3  0    2    0    0 |
+------------------+----------------------+-----------+-----------------------+
|  0    2  0  1  |    11MiB / 20096MiB | 42      0 |  3  0    2    0    0 |
+------------------+----------------------+-----------+-----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU  GI  CI        PID  Type  Process name                  GPU Memory |
|        ID  ID                                                  Usage      |
|=============================================================================|
|  No running processes found                                                |
+-----------------------------------------------------------------------------+ 


limitation  
limitation  

Revision as of 10:18, 22 May 2023

NVIDIA Multi-Instance GPU

NVIDIA introduced MIG(Multi-Instance GPU) since Ampere architecture.

MIG feature allows a single GPU into multiple fully isolated virtual GPU devices that are efficiently sized per-user-case, specifically smaller use-cases that only require a subset of GPU resources.


MIG ensures to providing each instance's processors have separate and isolated paths through the entire memory system - the on-chip crossbar ports, L2 cache banks, memory controllers, and DRAM address busses are all assigned uniquely to an individual instanceenhanced isolation GPU resources.

nvidia-mig-example


Benefits of MIG on MIG featured GPU are

  • Physical allocation of resourdces used by parallel GPU workloads - Secure multi-tenant environments with isolation and predictable QoS
  • Versatile profiles with dynamic configuration - Maximized utilization by configuring for specfic workloads
  • CUDA programming model unchanged

MIG Teminology

MIG feature allows one or more GPU instances to be allocated within a GPU. so that a single GPU appear as if it were many.

GPU instalace(GI)

is a fully isolated collection of all physical GPU resources such as GPU memory, GPU SMs. it can contain one or more GPU compute instances

Compute instance(CI)

is an isolated collection of GPU SMs (CUDA cores) belongs to a single GPU instance. so that it provides partial isolation within the GPU instance for compute resourdces and independent workload scheduling.

MIG device

is made up with GPU instance and a compute instance. MIG devices are assigned GPU UUIDs and can be displayed with

$nvidia-smi -L  

To use MIG feature effectivly, clock speed, MIG profile congifuration, and other settings should be optimized base on the expected MIG use-case. There is no 'best' or 'optimal' combination of profiles and configurations.

GPU Slice

is the smallest fraction of the GPU that combines a single GPU memory slice and a single GPU SM slice

GPU Memory Slice

is the smallest fraction of the GPU's memory including the corresponding memory controllers and cache. generally it is roughly 1/8 of the total GPU memory resources.

GPU SM Slice

the smallest fraction of the GPU's SMs. generally it is roughly 1/7 of the total number of SMs

GPU Engine

what executes job on the GPU that scheduled independently and work in a GPU context. different engines are responsible for different actions such as compute engine or copy engine

MIG Configuration (profile)

GPU reset required to enable or disable MIG mode, this is one-time operation per GPU and persists across system reboots.


The number of slices that a GI can be created with is not arbitrary. The NVIDIA driver APIs provide a number of “GPU Instance Profiles” and users can create GIs by specifying one of these profiles.  18 combinations possible.

$ nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.5gb        19     7/7        4.75       No     14     0     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.5gb+me     20     1/1        4.75       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb       15     4/4        9.62       No     14     1     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.10gb       14     3/3        9.62       No     28     1     0   |
|                                                             2     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.20gb        9     2/2        19.50      No     42     2     0   |
|                                                             3     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.20gb        5     1/1        19.50      No     56     2     0   |
|                                                             4     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.40gb        0     1/1        39.25      No     98     5     0   |
|                                                             7     1     1   |
+-----------------------------------------------------------------------------+

List the possible placements available using the following command

Enable MIG Mode

When MIG is enabled on the GPU, depending on the GPU product, the driver will attempt to reset the GPU so that MIG mode can take effect.                                                     

$ sudo nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:36:00.0
All done.

$ nvidia-smi -i 0 --query-gpu=pci.bus_id,mig.mode.current --format=csv
pci.bus_id, mig.mode.current 
00000000:36:00.0, Enabled

** Reference MIG user guide for various post processing of MIG configuration change to take effect.

List GPU Instance Profiles

List the possible placements available using the following command

$ nvidia-smi mig -lgipp
GPU  0 Profile ID 19 Placements: {0,1,2,3,4,5,6}:1 #seven instances of 1g.5gb (profile ID 19)
GPU  0 Profile ID 20 Placements: {0,1,2,3,4,5,6}:1 
GPU  0 Profile ID 15 Placements: {0,2,4,6}:2
GPU  0 Profile ID 14 Placements: {0,2,4}:2
GPU  0 Profile ID  9 Placements: {0,4}:4.     #create two instances of type 3g.20gb (profile ID 9)
GPU  0 Profile ID  5 Placement : {0}:4
GPU  0 Profile ID  0 Placement : {0}:8

Creating GPU Instances

Without creating GPU instances (and corresponding compute instances), CUDA workloads cannot be run on the GPU.

create GPU instances using the -cgi option. One of three options can be used to specify the instance profiles to be created:

  1. Profile ID (e.g. 9, 14, 5)
  2. Short name of the profile (e.g. 3g.20gb
  3. Full profile name of the instance (e.g. MIG 3g.20gb)                               

Once the GPU instances are created, one needs to create the corresponding Compute Instances (CI). By using the -C option

Example, create two GPU instances (of type 3g.20gb), with each GPU instance having half of the available compute and memory capacityv

$ sudo nvidia-smi mig -cgi 9,3g.20gb -C
Successfully created GPU instance ID  2 on GPU  0 using profile MIG 3g.20gb (ID  9)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  2 using profile MIG 3g.20gb (ID  2)
Successfully created GPU instance ID  1 on GPU  0 using profile MIG 3g.20gb (ID  9)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  1 using profile MIG 3g.20gb (ID  2)


$ sudo nvidia-smi mig -lgi
+----------------------------------------------------+
| GPU instances:                                     |
| GPU   Name          Profile  Instance   Placement  |
|                       ID       ID       Start:Size |
|====================================================|
|   0  MIG 3g.20gb       9        1          4:4     |
+----------------------------------------------------+
|   0  MIG 3g.20gb       9        2          0:4     |
+----------------------------------------------------+

#Now verify that the GIs and corresponding CIs are created:

$ nvidia-smi
+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |                      | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    1   0   0  |     11MiB / 20224MiB | 42      0 |  3   0    2    0    0 |
+------------------+----------------------+-----------+-----------------------+
|  0    2   0   1  |     11MiB / 20096MiB | 42      0 |  3   0    2    0    0 |
+------------------+----------------------+-----------+-----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+  

limitation

Graphics contexts not supported

P2P not available (no NVLink)

Each VMs instance requires vGPU

MIG supported GPUs[1]

Product Architecture Microarchitecture Compute Capability Memory Size Max Number of Instances
H100-SXM5 Hopper GH100 9.0 80GB 7
H100-PCIE Hopper GH100 9.0 80GB 7
A100-SXM4 NVIDIA Ampere GA100 8.0 40GB 7
A100-SXM4 NVIDIA Ampere GA100 8.0 80GB 7
A100-PCIE NVIDIA Ampere GA100 8.0 40GB 7
A100-PCIE NVIDIA Ampere GA100 8.0 80GB 7
A30 NVIDIA Ampere GA100 8.0 24GB 4

References