MIG: Difference between revisions
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
*[[CUDA]] programming model unchanged | *[[CUDA]] programming model unchanged | ||
== MIG Teminology == | |||
MIG feature allows one or more GPU instances to be allocated within a GPU. so that a single GPU appear as if it were many. | |||
=== GPU instalace(GI) === | |||
is a fully isolated collection of all physical GPU resources such as GPU memory, GPU SMs. it can contain one or more GPU compute instances | |||
=== Compute instance(CI) === | |||
is an isolated collection of GPU SMs (CUDA cores) belongs to a single GPU instance. so that it provides partial isolation within the GPU instance for compute resourdces and independent [[workload]] scheduling. | |||
=== MIG device === | |||
is made up with GPU instance and a compute instance. | |||
To use MIG feature effectivly, clock speed, MIG profile congifuration, and other settings should be optimized base on the expected MIG use-case. There is no 'best' or 'optimal' combination of profiles and configurations. | |||
=== GPU Slice === | |||
is the smallest fraction of the GPU that combines a single GPU memory slice and a single GPU SM slice | |||
=== GPU Memory Slice === | |||
is the smallest fraction of the GPU's memory including the corresponding memory controllers and cache. generally it is roughly 1/8 of the total GPU memory resources. | |||
=== GPU SM Slice === | |||
the smallest fraction of the GPU's SMs. generally it is roughly 1/7 of the total number of SMs | |||
=== GPU engine === | |||
what executes job on the GPU that scheduled independently and work in a GPU context. different engines are responsible for different actions such as compute engine or copy engine | |||
== References== | == References== | ||
<references /> | <references /> |
Revision as of 09:42, 22 May 2023
NVIDIA Multi-Instance GPU
NVIDIA introduced MIG(Multi-Instance GPU) since Ampere architecture.
MIG feature allows a single GPU into multiple fully isolated virtual GPU devices that are efficiently sized per-user-case, specifically smaller use-cases that only require a subset of GPU resources.
MIG ensures to providing each instance's processors have separate and isolated paths through the entire memory system - the on-chip crossbar ports, L2 cache banks, memory controllers, and DRAM address busses are all assigned uniquely to an individual instanceenhanced isolation GPU resources.
Benefits of MIG on MIG featured GPU are
- Physical allocation of resourdces used by parallel GPU workloads - Secure multi-tenant environments with isolation and predictable QoS
- Versatile profiles with dynamic configuration - Maximized utilization by configuring for specfic workloads
- CUDA programming model unchanged
MIG Teminology
MIG feature allows one or more GPU instances to be allocated within a GPU. so that a single GPU appear as if it were many.
GPU instalace(GI)
is a fully isolated collection of all physical GPU resources such as GPU memory, GPU SMs. it can contain one or more GPU compute instances
Compute instance(CI)
is an isolated collection of GPU SMs (CUDA cores) belongs to a single GPU instance. so that it provides partial isolation within the GPU instance for compute resourdces and independent workload scheduling.
MIG device
is made up with GPU instance and a compute instance.
To use MIG feature effectivly, clock speed, MIG profile congifuration, and other settings should be optimized base on the expected MIG use-case. There is no 'best' or 'optimal' combination of profiles and configurations.
GPU Slice
is the smallest fraction of the GPU that combines a single GPU memory slice and a single GPU SM slice
GPU Memory Slice
is the smallest fraction of the GPU's memory including the corresponding memory controllers and cache. generally it is roughly 1/8 of the total GPU memory resources.
GPU SM Slice
the smallest fraction of the GPU's SMs. generally it is roughly 1/7 of the total number of SMs
GPU engine
what executes job on the GPU that scheduled independently and work in a GPU context. different engines are responsible for different actions such as compute engine or copy engine