MIG: Difference between revisions

Revision as of 10:02, 22 May 2023

NVIDIA Multi-Instance GPU

NVIDIA introduced MIG(Multi-Instance GPU) since Ampere architecture.

MIG feature allows a single GPU into multiple fully isolated virtual GPU devices that are efficiently sized per-user-case, specifically smaller use-cases that only require a subset of GPU resources.

MIG ensures to providing each instance's processors have separate and isolated paths through the entire memory system - the on-chip crossbar ports, L2 cache banks, memory controllers, and DRAM address busses are all assigned uniquely to an individual instanceenhanced isolation GPU resources.

Benefits of MIG on MIG featured GPU are

Physical allocation of resourdces used by parallel GPU workloads - Secure multi-tenant environments with isolation and predictable QoS
Versatile profiles with dynamic configuration - Maximized utilization by configuring for specfic workloads
CUDA programming model unchanged

MIG Teminology

MIG feature allows one or more GPU instances to be allocated within a GPU. so that a single GPU appear as if it were many.

GPU instalace(GI)

is a fully isolated collection of all physical GPU resources such as GPU memory, GPU SMs. it can contain one or more GPU compute instances

Compute instance(CI)

is an isolated collection of GPU SMs (CUDA cores) belongs to a single GPU instance. so that it provides partial isolation within the GPU instance for compute resourdces and independent workload scheduling.

MIG device

is made up with GPU instance and a compute instance. MIG devices are assigned GPU UUIDs and can be displayed with

$nvidia-smi -L

To use MIG feature effectivly, clock speed, MIG profile congifuration, and other settings should be optimized base on the expected MIG use-case. There is no 'best' or 'optimal' combination of profiles and configurations.

GPU Slice

is the smallest fraction of the GPU that combines a single GPU memory slice and a single GPU SM slice

GPU Memory Slice

is the smallest fraction of the GPU's memory including the corresponding memory controllers and cache. generally it is roughly 1/8 of the total GPU memory resources.

GPU SM Slice

the smallest fraction of the GPU's SMs. generally it is roughly 1/7 of the total number of SMs

GPU Engine

what executes job on the GPU that scheduled independently and work in a GPU context. different engines are responsible for different actions such as compute engine or copy engine

MIG Configuration (profile)

GPU reset required to enable or disable MIG mode, this is one-time operation per GPU and persists across system reboots.

The number of slices that a GI can be created with is not arbitrary. The NVIDIA driver APIs provide a number of “GPU Instance Profiles” and users can create GIs by specifying one of these profiles. 18 combinations possible.

$nvidia-smi mig --list-gpu-instances

When MIG is enabled on the GPU, depending on the GPU product, the driver will attempt to reset the GPU so that MIG mode can take effect.

limitation

Graphics contexts not supported

P2P not available (no NVLink)

Each VMs instance requires vGPU

MIG supported GPUs^[1]

Product	Architecture	Microarchitecture	Compute Capability	Memory Size	Max Number of Instances
H100-SXM5	Hopper	GH100	9.0	80GB	7
H100-PCIE	Hopper	GH100	9.0	80GB	7
A100-SXM4	NVIDIA Ampere	GA100	8.0	40GB	7
A100-SXM4	NVIDIA Ampere	GA100	8.0	80GB	7
A100-PCIE	NVIDIA Ampere	GA100	8.0	40GB	7
A100-PCIE	NVIDIA Ampere	GA100	8.0	80GB	7
A30	NVIDIA Ampere	GA100	8.0	24GB	4

References

↑ https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

[1] ttps://docs.nvidia.com/datacenter/tesla/mig-user-guide/

[1]

@@ Line 24: / Line 24: @@
 === MIG device ===
-is made up with GPU instance and a compute instance.
+is made up with GPU instance and a compute instance. MIG devices are assigned GPU UUIDs and can be displayed with
+ $nvidia-smi -L
 To use MIG feature effectivly, clock speed, MIG profile congifuration, and other settings should be optimized base on the expected MIG use-case. There is no 'best' or 'optimal' combination of profiles and configurations.
@@ Line 37: / Line 38: @@
 the smallest fraction of the GPU's SMs. generally it is roughly 1/7 of the total number of SMs
-=== GPU engine ===
+=== GPU Engine ===
 what executes job on the GPU that scheduled independently and work in a GPU context. different engines are responsible for different actions such as compute engine or copy engine
+== MIG  Configuration (profile) ==
+GPU reset required to enable or disable MIG mode, this is one-time operation per GPU and persists across system reboots.
+The number of slices that a GI can be created with is not arbitrary. The [[NVIDIA driver]] APIs provide a number of “GPU Instance Profiles” and users can create GIs by specifying one of these profiles.  18 combinations possible.
+ $nvidia-smi mig --list-gpu-instances
+When MIG is enabled on the GPU, depending on the GPU product, the driver will attempt to reset the GPU so that MIG mode can                           take effect.
+limitation
+Graphics contexts not supported
+P2P not available (no NVLink)
+Each VMs instance requires vGPU
+== MIG supported GPUs<ref>https://docs.nvidia.com/datacenter/tesla/mig-user-guide/</ref> ==
+{| class="wikitable"
+! colspan="1" rowspan="1" |Product
+! colspan="1" rowspan="1" |Architecture
+! colspan="1" rowspan="1" |Microarchitecture
+! colspan="1" rowspan="1" |Compute Capability
+! colspan="1" rowspan="1" |Memory Size
+! colspan="1" rowspan="1" |Max Number of Instances
+|-
+| colspan="1" rowspan="1" |H100-SXM5
+| colspan="1" rowspan="1" |Hopper
+| colspan="1" rowspan="1" |GH100
+| colspan="1" rowspan="1" |9.0
+| colspan="1" rowspan="1" |80GB
+| colspan="1" rowspan="1" |7
+|-
+| colspan="1" rowspan="1" |H100-[[PCIe|PCIE]]
+| colspan="1" rowspan="1" |Hopper
+| colspan="1" rowspan="1" |GH100
+| colspan="1" rowspan="1" |9.0
+| colspan="1" rowspan="1" |80GB
+| colspan="1" rowspan="1" |7
+|-
+| colspan="1" rowspan="1" |A100-SXM4
+| colspan="1" rowspan="1" |NVIDIA Ampere
+| colspan="1" rowspan="1" |GA100
+| colspan="1" rowspan="1" |8.0
+| colspan="1" rowspan="1" |40GB
+| colspan="1" rowspan="1" |7
+|-
+| colspan="1" rowspan="1" |A100-SXM4
+| colspan="1" rowspan="1" |NVIDIA Ampere
+| colspan="1" rowspan="1" |GA100
+| colspan="1" rowspan="1" |8.0
+| colspan="1" rowspan="1" |80GB
+| colspan="1" rowspan="1" |7
+|-
+| colspan="1" rowspan="1" |A100-PCIE
+| colspan="1" rowspan="1" |NVIDIA Ampere
+| colspan="1" rowspan="1" |GA100
+| colspan="1" rowspan="1" |8.0
+| colspan="1" rowspan="1" |40GB
+| colspan="1" rowspan="1" |7
+|-
+| colspan="1" rowspan="1" |A100-PCIE
+| colspan="1" rowspan="1" |NVIDIA Ampere
+| colspan="1" rowspan="1" |GA100
+| colspan="1" rowspan="1" |8.0
+| colspan="1" rowspan="1" |80GB
+| colspan="1" rowspan="1" |7
+|-
+| colspan="1" rowspan="1" |A30
+| colspan="1" rowspan="1" |NVIDIA Ampere
+| colspan="1" rowspan="1" |GA100
+| colspan="1" rowspan="1" |8.0
+| colspan="1" rowspan="1" |24GB
+| colspan="1" rowspan="1" |4
+|}
 == References==
 <references />

MIG: Difference between revisions

Revision as of 10:02, 22 May 2023

Contents

NVIDIA Multi-Instance GPU

MIG Teminology

GPU instalace(GI)

Compute instance(CI)

MIG device

GPU Slice

GPU Memory Slice

GPU SM Slice

GPU Engine

MIG Configuration (profile)

MIG supported GPUs^[1]

References

Navigation menu

MIG: Difference between revisions

Revision as of 10:02, 22 May 2023

NVIDIA Multi-Instance GPU

MIG Teminology

GPU instalace(GI)

Compute instance(CI)

MIG device

GPU Slice

GPU Memory Slice

GPU SM Slice

GPU Engine

MIG Configuration (profile)

MIG supported GPUs[1]

References

Navigation menu

Search

MIG supported GPUs^[1]