NVIDIA GPU: Difference between revisions
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
| Line 1: | Line 1: | ||
=== NVIDIA GPU Architecture ===  | === NVIDIA GPU Architecture ===  | ||
nvcc sm flags and what they’re used for: When compiling with NVCC,   | |||
* the arch flag (‘'''<code>-arch</code>'''‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for.  | |||
* Gencodes (‘'''<code>-gencode</code>'''‘) allows for more PTX generations and can be repeated many times for different architectures.  | |||
Matching CUDA arch and CUDA gencode for various NVIDIA architectures  | |||
{| class="wikitable sortable"  | {| class="wikitable sortable"  | ||
!Series  | !Series  | ||
!Architecture  | !Architecture  | ||
(--arch)  | |||
!CUDA gencode   | |||
(--sm)  | |||
!Compute Capability  | !Compute Capability  | ||
!Notable Models  | !Notable Models  | ||
| Line 9: | Line 21: | ||
|Tesla  | |Tesla  | ||
|Tesla  | |Tesla  | ||
|  | |||
|1.0, 1.1, 2.0, 2.1  | |1.0, 1.1, 2.0, 2.1  | ||
|C1060, M2050, K80, P100, V100, A100  | |C1060, M2050, K80, P100, V100, A100  | ||
| Line 15: | Line 28: | ||
|Fermi  | |Fermi  | ||
|Fermi  | |Fermi  | ||
|sm_20  | |||
|3.0, 3.1  | |3.0, 3.1  | ||
|GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000  | |GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000  | ||
| Line 21: | Line 35: | ||
|Kepler  | |Kepler  | ||
|Kepler  | |Kepler  | ||
|sm_30,  | |||
sm_35,  | |||
sm_37  | |||
|3.2, 3.5, 3.7  | |3.2, 3.5, 3.7  | ||
|GTX 600, GTX 700, Tesla K-series, Quadro K-series  | |GTX 600, GTX 700, Tesla K-series, Quadro K-series  | ||
| Line 27: | Line 44: | ||
|Maxwell  | |Maxwell  | ||
|Maxwell  | |Maxwell  | ||
|sm_50,  | |||
sm_52,  | |||
sm_53  | |||
|5.0, 5.2  | |5.0, 5.2  | ||
|GTX 900, GTX 1000, Quadro M-series  | |GTX 900, GTX 1000, Quadro M-series  | ||
| Line 33: | Line 53: | ||
|Pascal  | |Pascal  | ||
|Pascal  | |Pascal  | ||
|sm_60,  | |||
sm_61,  | |||
sm_62  | |||
|6.0, 6.1, 6.2  | |6.0, 6.1, 6.2  | ||
|GTX 1000, Quadro P-series  | |GTX 1000, Quadro P-series  | ||
| Line 39: | Line 62: | ||
|Volta  | |Volta  | ||
|Volta  | |Volta  | ||
|sm_70,   | |||
sm_72  | |||
(Xavier)  | |||
|7.0, 7.2, 7.5  | |7.0, 7.2, 7.5  | ||
|Titan V, Tesla V100, Quadro GV100  | |Titan V, Tesla V100, Quadro GV100  | ||
| Line 45: | Line 71: | ||
|Turing  | |Turing  | ||
|Turing  | |Turing  | ||
|sm_75  | |||
|7.5, 7.6  | |7.5, 7.6  | ||
|RTX 2000, GTX 1600, Quadro RTX  | |RTX 2000, GTX 1600, Quadro RTX  | ||
| Line 51: | Line 78: | ||
|Ampere  | |Ampere  | ||
|Ampere  | |Ampere  | ||
|sm_80,  | |||
sm_86,  | |||
sm_87 (Orin)  | |||
|8.0, 8.6  | |8.0, 8.6  | ||
|RTX 3000, A-series  | |RTX 3000, A-series  | ||
| Line 57: | Line 87: | ||
|Lovelace  | |Lovelace  | ||
|Ada Lovelace<ref>https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture)</ref>  | |Ada Lovelace<ref>https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture)</ref>  | ||
|sm_89  | |||
|8.7, 8.9  | |8.7, 8.9  | ||
|GeForce RTX 4070 Ti (AD104)  | |GeForce RTX 4070 Ti (AD104)  | ||
| Line 75: | Line 106: | ||
* Graphics cards built upon the Ada architecture feature new eighth generation NVIDIA Encoders (NVENC) with AV1 encoding, enabling a raft of new possibilities for streamers, broadcasters, and video callers.  | * Graphics cards built upon the Ada architecture feature new eighth generation NVIDIA Encoders (NVENC) with AV1 encoding, enabling a raft of new possibilities for streamers, broadcasters, and video callers.  | ||
* It’s 40% more efficient than H.264 and allows users who are streaming at 1080p to increase their stream resolution to 1440p while running at the same bitrate and quality.  | * It’s 40% more efficient than H.264 and allows users who are streaming at 1080p to increase their stream resolution to 1440p while running at the same bitrate and quality.  | ||
|-  | |||
|Hopper  | |||
|  | |||
|sm_90, sm_90a(Thor)  | |||
|  | |||
|  | |||
|  | |||
|}  | |}  | ||
Revision as of 15:01, 19 March 2023
NVIDIA GPU Architecture
nvcc sm flags and what they’re used for: When compiling with NVCC,
- the arch flag (‘
-arch‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for. - Gencodes (‘
-gencode‘) allows for more PTX generations and can be repeated many times for different architectures. 
Matching CUDA arch and CUDA gencode for various NVIDIA architectures
| Series | Architecture
 (--arch)  | 
CUDA gencode
 (--sm)  | 
Compute Capability | Notable Models | Key Features | 
|---|---|---|---|---|---|
| Tesla | Tesla | 1.0, 1.1, 2.0, 2.1 | C1060, M2050, K80, P100, V100, A100 | First dedicated GPGPU series | |
| Fermi | Fermi | sm_20 | 3.0, 3.1 | GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000 | First to feature CUDA cores and support for ECC memory | 
| Kepler | Kepler | sm_30,
 sm_35, sm_37  | 
3.2, 3.5, 3.7 | GTX 600, GTX 700, Tesla K-series, Quadro K-series | First to feature Dynamic Parallelism and Hyper-Q | 
| Maxwell | Maxwell | sm_50,
 sm_52, sm_53  | 
5.0, 5.2 | GTX 900, GTX 1000, Quadro M-series | First to support VR and 4K displays | 
| Pascal | Pascal | sm_60,
 sm_61, sm_62  | 
6.0, 6.1, 6.2 | GTX 1000, Quadro P-series | First to support simultaneous multi-projection | 
| Volta | Volta | sm_70,
 sm_72 (Xavier)  | 
7.0, 7.2, 7.5 | Titan V, Tesla V100, Quadro GV100 | First to feature Tensor Cores and NVLink 2.0 | 
| Turing | Turing | sm_75 | 7.5, 7.6 | RTX 2000, GTX 1600, Quadro RTX | First to feature Ray Tracing Cores and RTX technology | 
| Ampere | Ampere | sm_80,
 sm_86, sm_87 (Orin)  | 
8.0, 8.6 | RTX 3000, A-series | Features third-generation Tensor Cores and more | 
| Lovelace | Ada Lovelace[1] | sm_89 | 8.7, 8.9 | GeForce RTX 4070 Ti (AD104)
 GeForce RTX 4080 (AD103) GeForce RTX 4090 (AD102) Nvidia RTX 6000 Ada Generation (AD102, formerly Quadro) Nvidia L40 (AD102, formerly Tesla)  | 
  | 
| Hopper | sm_90, sm_90a(Thor) | 
NVIDIA GPU Models
| Model | Architecture | CUDA Cores | Tensor Cores | RT Cores | Memory Size | Memory Type | Memory Bandwidth | TDP | Launch Date | 
|---|---|---|---|---|---|---|---|---|---|
| Tesla C870 | Tesla | 128 | No | No | 1.5 GB GDDR3 | GDDR3 | 76.8 GB/s | 105W | Jun 2006 | 
| Tesla C1060 | Tesla | 240 | No | No | 4 GB GDDR3 | GDDR3 | 102 GB/s | 238W | Dec 2008 | 
| Tesla M1060 | Tesla | 240 | No | No | 4 GB GDDR3 | GDDR3 | 102 GB/s | 225W | Dec 2008 | 
| Tesla M2050 | Fermi | 448 | No | No | 3 GB GDDR5 | GDDR5 | 148 GB/s | 225W | May 2010 | 
| Tesla M2070 | Fermi | 448 | No | No | 6 GB GDDR5 | GDDR5 | 150 GB/s | 225W | May 2010 | 
| Tesla K10 | Kepler | 3072 | No | No | 8 GB GDDR5 | GDDR5 | 320 GB/s | 225W | May 2012 | 
| Tesla K20 | Kepler | 2496 | No | No | 5/6 GB GDDR5 | GDDR5 | 208 GB/s | 225W | Nov 2012 | 
| Tesla K40 | Kepler | 2880 | No | No | 12 GB GDDR5 | GDDR5 | 288 GB/s | 235W | Nov 2013 | 
| Tesla K80 | Kepler | 4992 | No | No | 24 GB GDDR5 | GDDR5 | 480 GB/s | 300W | Nov 2014 | 
| Tesla M40 | Maxwell | 3072 | No | No | 12 GB GDDR5 | GDDR5 | 288 GB/s | 250W | Nov 2015 | 
| Tesla P4 | Pascal | 2560 | No | No | 8 GB GDDR5 | GDDR5 | 192 GB/s | 75W | Sep 2016 | 
| Tesla P40 | Pascal | 3840 | No | No | 24 GB GDDR5X | GDDR5X | 480 GB/s | 250W | Sep 2016 | 
| Tesla V100 | Volta | 5120 | 640 | Yes | 16/32 GB HBM2 | HBM2 | 900 GB/s | 300W | May 2017 | 
| Tesla T4 | Turing | 2560 | 320 | No | 16 GB | ||||
| A100 PCIe | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | HBM2 | 1555 GB/s | 250W | May 2020 | 
| A100 SXM4 | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | HBM2 | 1555 GB/s | 400W | May 2020 | 
| A30 | Ampere | 7424 | 184 | No | 24 GB GDDR6 | GDDR6 | 696 GB/s | 165W | Apr 2021 | 
| A40 | Ampere | 10752 | 336 | No | 48 GB GDDR6 | GDDR6 | 696 GB/s | 300W | Apr 2021 | 
| A10 | Ampere | 10240 | 320 | No | 24 GB GDDR6 | GDDR6 | 624 GB/s | 150W | Mar 2021 | 
| A16 | Ampere | 16384 | 512 | No | 48 GB GDDR6 | GDDR6 | 768 GB/s | 400W | Mar 2021 | 
| A100 80GB | Ampere | 6912 | 432 | Yes | 80 GB HBM2 | HBM2 | 2025 GB/s | 400W | Apr 2021 | 
| A100 40GB | Ampere | 6912 | 432 | Yes | 40 GB HBM2 | HBM2 | 1555 GB/s | 250W | May 2020 | 
| A200 PCIe | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | HBM2 | 2050 GB/s | 400W | Nov 2021 | 
| A200 SXM4 | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | HBM2 | 2050 GB/s | 400W | Nov 2021 | 
| A5000 | Ampere | 8192 | 256 | Yes | 24 GB GDDR6 | GDDR6 | 768 GB/s | 230W | Apr 2021 | 
| A4000 | Ampere | 6144 | 192 | Yes | 16 GB GDDR6 | GDDR6 | 512 GB/s | 140W | Apr 2021 | 
| A3000 | Ampere | 3584 | 112 | Yes | 24 GB G | ||||
| Titan RTX | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | GDDR6 | 672 GB/s | 280W | Dec 2018 | 
| GeForce RTX 3090 | Turing | 10496 | 328 | Yes | 24 GB GDDR6X | GDDR6X | 936 GB/s | 350W | Sep 2020 | 
| GeForce RTX 3080 Ti | Turing | 10240 | 320 | Yes | 12 GB GDDR6X | GDDR6X | 912 GB/s | 350W | May 2021 | 
| GeForce RTX 3080 | Turing | 8704 | 272 | Yes | 10 GB GDDR6X | GDDR6X | 760 GB/s | 320W | Sep 2020 | 
| GeForce RTX 3070 Ti | Turing | 6144 | 192 | Yes | 8 GB GDDR6X | GDDR6X | 608 GB/s | 290W | Jun 2021 | 
| GeForce RTX 3070 | Turing | 5888 | 184 | Yes | 8 GB GDDR6 | GDDR6 | 448 GB/s | 220W | Oct 2020 | 
| GeForce RTX 3060 Ti | Turing | 4864 | 152 | Yes | 8 GB GDDR6 | GDDR6 | 448 GB/s | 200W | Dec 2020 | 
| GeForce RTX 3060 | Turing | 3584 | 112 | No | 12 GB GDDR6 | GDDR6 | 360 GB/s | 170W | Feb 2021 | 
| Quadro RTX 8000 | Turing | 4608 | 576 | Yes | 48 GB GDDR6 | GDDR6 | 624 GB/s | 295W | Aug 2018 | 
| Quadro RTX 6000 | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | GDDR6 | 432 GB/s | 260W | Aug 2018 | 
| Quadro RTX 5000 | Turing | 3072 | 384 | Yes | 16 GB GDDR6 | GDDR6 | 448 GB/s | 230W | Nov 2018 | 
| Quadro RTX 4000 | Turing | 2304 | 288 | Yes | 8 GB GDDR6 | GDDR6 | 416 GB/s | 160W | Nov 2018 | 
| Titan RTX (T-Rex) | Turing | 4608 | 576 | Yes | 24 GB | ||||
| Titan V | Volta | 5120 | 640 | 12 GB HBM2 | HBM2 | 652.8 GB/s | 250W | Dec 2017 | |
| Tesla V100 (PCIe) | Volta | 5120 | 640 | 16 GB HBM2 | HBM2 | 900 GB/s | 250W | June 2017 | |
| Tesla V100 (SXM2) | Volta | 5120 | 640 | 16 GB HBM2 | HBM2 | 900 GB/s | 300W | June 2017 | |
| Quadro GV100 | Volta | 5120 | 640 | 32 GB HBM2 | HBM2 | 870 GB/s | 250W | Mar 2018 | |
| Tesla GV100 (SXM2) | Volta | 5120 | 640 | 32 GB HBM2 | HBM2 | 900 GB/s | 300W | Mar 2018 | |
| DGX-1 (Volta) | Volta | 5120 | 640 | 16 x 32 GB HBM2 (512 GB total) | HBM2 | 2.7 TB/s | 3200W | Mar 2018 | 
NVIDIA Grace Architecture
NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023
| Architecture | Key Features | 
|---|---|
| Grace | CPU-GPU integration, ARM Neoverse CPU, HBM2E memory | 
| 900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink | |
| 10x performance improvement for certain HPC workloads | |
| Energy efficiency improvements through unified memory space |