NVIDIA GPU: Difference between revisions
Line 114: | Line 114: | ||
* '''''SM30 or <code>SM_30, compute_30</code> –''''' '''''Kepler architecture (e.g. generic Kepler, GeForce 700, GT-730).''''' '''''Adds support for unified memory programmingCompletely dropped from CUDA 11 onwards.''''' | * '''''SM30 or <code>SM_30, compute_30</code> –''''' '''''Kepler architecture (e.g. generic Kepler, GeForce 700, GT-730).''''' '''''Adds support for unified memory programmingCompletely dropped from CUDA 11 onwards.''''' | ||
* '''''SM35 or <code>SM_35, compute_35</code> –''''' '''''Tesla K40.''''' '''''Adds support for dynamic parallelism.''''' '''''Deprecated from CUDA 11, will be dropped in future versions.''''' | * '''''SM35 or <code>SM_35, compute_35</code> –''''' '''''Tesla K40.''''' '''''Adds support for dynamic parallelism.''''' '''''Deprecated from CUDA 11, will be dropped in future versions.''''' | ||
* '''''SM37 or <code>SM_37, compute_37</code> –''''' '''''Tesla K80.''''' '''''Adds a few more registers.''''' '''''Deprecated from CUDA 11, will be dropped in future versions, strongly suggest replacing with a 32GB PCIe Tesla V100.''''' | * '''''SM37 or <code>SM_37, compute_37</code> –''''' '''''Tesla K80.''''' '''''Adds a few more registers.''''' '''''Deprecated from CUDA 11, will be dropped in future versions, strongly suggest replacing with a 32GB [[PCIe]] Tesla V100.''''' | ||
|- | |- | ||
|Maxwell | |Maxwell | ||
Line 231: | Line 231: | ||
!RT Cores | !RT Cores | ||
!Memory Size | !Memory Size | ||
!MIG | |||
!Memory Type | !Memory Type | ||
!Memory Bandwidth | !Memory Bandwidth | ||
Line 242: | Line 243: | ||
|No | |No | ||
|1.5 GB GDDR3 | |1.5 GB GDDR3 | ||
| | |||
|GDDR3 | |GDDR3 | ||
|76.8 GB/s | |76.8 GB/s | ||
Line 253: | Line 255: | ||
|No | |No | ||
|4 GB GDDR3 | |4 GB GDDR3 | ||
| | |||
|GDDR3 | |GDDR3 | ||
|102 GB/s | |102 GB/s | ||
Line 264: | Line 267: | ||
|No | |No | ||
|4 GB GDDR3 | |4 GB GDDR3 | ||
| | |||
|GDDR3 | |GDDR3 | ||
|102 GB/s | |102 GB/s | ||
Line 275: | Line 279: | ||
|No | |No | ||
|3 GB GDDR5 | |3 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|148 GB/s | |148 GB/s | ||
Line 286: | Line 291: | ||
|No | |No | ||
|6 GB GDDR5 | |6 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|150 GB/s | |150 GB/s | ||
Line 297: | Line 303: | ||
|No | |No | ||
|8 GB GDDR5 | |8 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|320 GB/s | |320 GB/s | ||
Line 308: | Line 315: | ||
|No | |No | ||
|5/6 GB GDDR5 | |5/6 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|208 GB/s | |208 GB/s | ||
Line 319: | Line 327: | ||
|No | |No | ||
|12 GB GDDR5 | |12 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|288 GB/s | |288 GB/s | ||
Line 330: | Line 339: | ||
|No | |No | ||
|24 GB GDDR5 | |24 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|480 GB/s | |480 GB/s | ||
Line 341: | Line 351: | ||
|No | |No | ||
|12 GB GDDR5 | |12 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|288 GB/s | |288 GB/s | ||
Line 352: | Line 363: | ||
|No | |No | ||
|8 GB GDDR5 | |8 GB GDDR5 | ||
| | |||
|GDDR5 | |GDDR5 | ||
|192 GB/s | |192 GB/s | ||
Line 363: | Line 375: | ||
|No | |No | ||
|24 GB GDDR5X | |24 GB GDDR5X | ||
| | |||
|GDDR5X | |GDDR5X | ||
|480 GB/s | |480 GB/s | ||
Line 374: | Line 387: | ||
|Yes | |Yes | ||
|16/32 GB HBM2 | |16/32 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|900 GB/s | |900 GB/s | ||
Line 385: | Line 399: | ||
|No | |No | ||
|16 GB | |16 GB | ||
| | |||
| | | | ||
| | | | ||
Line 396: | Line 411: | ||
|Yes | |Yes | ||
|40 GB HBM2 / 80 GB HBM2 | |40 GB HBM2 / 80 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|1555 GB/s | |1555 GB/s | ||
Line 407: | Line 423: | ||
|Yes | |Yes | ||
|40 GB HBM2 / 80 GB HBM2 | |40 GB HBM2 / 80 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|1555 GB/s | |1555 GB/s | ||
Line 418: | Line 435: | ||
|No | |No | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|696 GB/s | |696 GB/s | ||
Line 429: | Line 447: | ||
|No | |No | ||
|48 GB GDDR6 | |48 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|696 GB/s | |696 GB/s | ||
Line 440: | Line 459: | ||
|No | |No | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|624 GB/s | |624 GB/s | ||
Line 451: | Line 471: | ||
|No | |No | ||
|48 GB GDDR6 | |48 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|768 GB/s | |768 GB/s | ||
Line 462: | Line 483: | ||
|Yes | |Yes | ||
|80 GB HBM2 | |80 GB HBM2 | ||
|Up to 7 | |||
MIGs @ | |||
10GB | |||
|HBM2 | |HBM2 | ||
| | |1935GB/s | ||
| | |300W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |- | ||
Line 473: | Line 498: | ||
|Yes | |Yes | ||
|40 GB HBM2 | |40 GB HBM2 | ||
|Up to 7 | |||
MIGs @ | |||
5GB | |||
|HBM2 | |HBM2 | ||
|1555 GB/s | |1555 GB/s | ||
Line 484: | Line 513: | ||
|Yes | |Yes | ||
|80 GB HBM2 / 160 GB HBM2 | |80 GB HBM2 / 160 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|2050 GB/s | |2050 GB/s | ||
Line 495: | Line 525: | ||
|Yes | |Yes | ||
|80 GB HBM2 / 160 GB HBM2 | |80 GB HBM2 / 160 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|2050 GB/s | |2050 GB/s | ||
Line 506: | Line 537: | ||
|Yes | |Yes | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|768 GB/s | |768 GB/s | ||
Line 517: | Line 549: | ||
|Yes | |Yes | ||
|16 GB GDDR6 | |16 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|512 GB/s | |512 GB/s | ||
Line 528: | Line 561: | ||
|Yes | |Yes | ||
|24 GB G | |24 GB G | ||
| | |||
| | | | ||
| | | | ||
Line 539: | Line 573: | ||
|Yes | |Yes | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|672 GB/s | |672 GB/s | ||
Line 550: | Line 585: | ||
|Yes | |Yes | ||
|24 GB GDDR6X | |24 GB GDDR6X | ||
| | |||
|GDDR6X | |GDDR6X | ||
|936 GB/s | |936 GB/s | ||
Line 561: | Line 597: | ||
|Yes | |Yes | ||
|12 GB GDDR6X | |12 GB GDDR6X | ||
| | |||
|GDDR6X | |GDDR6X | ||
|912 GB/s | |912 GB/s | ||
Line 572: | Line 609: | ||
|Yes | |Yes | ||
|10 GB GDDR6X | |10 GB GDDR6X | ||
| | |||
|GDDR6X | |GDDR6X | ||
|760 GB/s | |760 GB/s | ||
Line 583: | Line 621: | ||
|Yes | |Yes | ||
|8 GB GDDR6X | |8 GB GDDR6X | ||
| | |||
|GDDR6X | |GDDR6X | ||
|608 GB/s | |608 GB/s | ||
Line 594: | Line 633: | ||
|Yes | |Yes | ||
|8 GB GDDR6 | |8 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|448 GB/s | |448 GB/s | ||
Line 605: | Line 645: | ||
|Yes | |Yes | ||
|8 GB GDDR6 | |8 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|448 GB/s | |448 GB/s | ||
Line 616: | Line 657: | ||
|No | |No | ||
|12 GB GDDR6 | |12 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|360 GB/s | |360 GB/s | ||
Line 627: | Line 669: | ||
|Yes | |Yes | ||
|48 GB GDDR6 | |48 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|624 GB/s | |624 GB/s | ||
Line 638: | Line 681: | ||
|Yes | |Yes | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|432 GB/s | |432 GB/s | ||
Line 649: | Line 693: | ||
|Yes | |Yes | ||
|16 GB GDDR6 | |16 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|448 GB/s | |448 GB/s | ||
Line 660: | Line 705: | ||
|Yes | |Yes | ||
|8 GB GDDR6 | |8 GB GDDR6 | ||
| | |||
|GDDR6 | |GDDR6 | ||
|416 GB/s | |416 GB/s | ||
Line 671: | Line 717: | ||
|Yes | |Yes | ||
|24 GB | |24 GB | ||
| | |||
| | | | ||
| | | | ||
Line 682: | Line 729: | ||
| | | | ||
|12 GB HBM2 | |12 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|652.8 GB/s | |652.8 GB/s | ||
Line 693: | Line 741: | ||
| | | | ||
|16 GB HBM2 | |16 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|900 GB/s | |900 GB/s | ||
Line 704: | Line 753: | ||
| | | | ||
|16 GB HBM2 | |16 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|900 GB/s | |900 GB/s | ||
Line 715: | Line 765: | ||
| | | | ||
|32 GB HBM2 | |32 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|870 GB/s | |870 GB/s | ||
Line 726: | Line 777: | ||
| | | | ||
|32 GB HBM2 | |32 GB HBM2 | ||
| | |||
|HBM2 | |HBM2 | ||
|900 GB/s | |900 GB/s | ||
Line 737: | Line 789: | ||
| | | | ||
|16 x 32 GB HBM2 (512 GB total) | |16 x 32 GB HBM2 (512 GB total) | ||
| | |||
|HBM2 | |HBM2 | ||
|2.7 TB/s | |2.7 TB/s |
Revision as of 12:22, 5 April 2023
GPU Tenser performance notes for RTX 4090
According to this thread NVIDIA looks cut the tensor FP16 & TF32 operation rate in half, resulting in a 4090 with even lower FP16 & TF32 performance than the 4080 16GB. This may have been done to prevent the 4090 from cannibalizing the Quadro/Tesla sales. So if you are choosing GPUs, you can choose the 4090 for memory, but lower tensor performance than the 4080 16GB. eventhough 4090 has more than twice the ray tracing performance of the 4080 12GB.
RTX 4090 | RTX 4080 16GB | RTX 4080 12GB | RTX 3090 Ti | |
---|---|---|---|---|
non-tensor FP32 tflops | 82.6 (206%) | 48.7 (122%) | 40.1 (100%) | 40 (100%) |
non-tensor FP16 tflops | 82.6 (206%) | 48.7 (122%) | 40.1 (100%) | 40 (100%) |
Tensor Cores | 512 (152%) | 304 (90%) | 240 (71%) | 336 (100%) |
Optical flow TOPS | 305 (242%) | 305 (242%) | 305 (242%) | 126 (100%) |
tensor FP16 w/ FP32 accumulate TFLOPS ** | 165.2 (207%) | 194.9 (244%) | 160.4 (200%) | 80 (100%) |
tensor TF32 TFLOPS ** | 82.6 (207%) | 97.5 (244%) | 80.2 (200%) | 40 (100%) |
Ray trace Cores | 128 (152%) | 76 (90%) | 60 (71%) | 84 (100%) |
Ray trace TFLOPS | 191 (245%) | 112.7 (144%) | 92.7 (119%) | 78.1 (100%) |
POWER (W) | 450 (100%) | 320 (71%) | 285 (63%) | 450 (100%) |
NVIDIA GPU Architecture
nvcc sm flags and what they’re used for: When compiling with NVCC[1],
- the arch flag (‘
-arch
‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for. - Gencodes (‘
-gencode
‘) allows for more PTX generations and can be repeated many times for different architectures.
Matching CUDA arch and CUDA gencode for various NVIDIA architectures
Series | Architecture
(--arch) |
CUDA gencode
(--sm) |
Compute Capability | Notable Models | Supported CUDA version | Key Features |
---|---|---|---|---|---|---|
Tesla | Tesla | 1.0, 1.1, 2.0, 2.1 | C1060, M2050, K80, P100, V100, A100 | First dedicated GPGPU series | ||
Fermi | Fermi | sm_20 | 3.0, 3.1 | GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000 | CUDA 3.2 until CUDA 8 | First to feature CUDA cores and support for ECC memory
|
Kepler | Kepler | sm_30
sm_35, sm_37 |
3.2, 3.5, 3.7 | GTX 600, GTX 700, Tesla K-series, Quadro K-series | CUDA 5 until CUDA 10 | First to feature Dynamic Parallelism and Hyper-Q
|
Maxwell | Maxwell | sm_50,
sm_52, sm_53 |
5.0, 5.2 | GTX 900, GTX 1000, Quadro M-series | CUDA 6 until CUDA 11 | First to support VR and 4K displays
|
Pascal | Pascal | sm_60,
sm_61, sm_62 |
6.0, 6.1, 6.2 | GTX 1000, Quadro P-series | CUDA 8 and later | First to support simultaneous multi-projection
|
Volta | Volta | sm_70,
sm_72 (Xavier) |
7.0, 7.2, 7.5 | Titan V, Tesla V100, Quadro GV100 | CUDA 9 and later | First to feature Tensor Cores and NVLink 2.0
|
Turing | Turing | sm_75 | 7.5, 7.6 | RTX 2000, GTX 1600, Quadro RTX | CUDA 10 and later | First to feature Ray Tracing Cores and RTX technology
|
Ampere | Ampere | sm_80,
sm_86, sm_87 (Orin) |
8.0, 8.6 | RTX 3000, A-series | CUDA 11.1 and later | Features third-generation Tensor Cores and more
|
Lovelace | Ada Lovelace[2] | sm_89 | 8.7, 8.9 | GeForce RTX 4070 Ti (AD104)
GeForce RTX 4080 (AD103) GeForce RTX 4090 (AD102) Nvidia RTX 6000 Ada Generation (AD102, formerly Quadro) Nvidia L40 (AD102, formerly Tesla) |
CUDA 11.8 and later
cuDNN 8.6 and later |
|
Hopper | Hopper | sm_90, sm_90a(Thor) | 9.0 | CUDA 12 and later | TODO
|
NVIDIA GPU Models
Model | Architecture | CUDA Cores | Tensor Cores | RT Cores | Memory Size | MIG | Memory Type | Memory Bandwidth | TDP | Launch Date |
---|---|---|---|---|---|---|---|---|---|---|
Tesla C870 | Tesla | 128 | No | No | 1.5 GB GDDR3 | GDDR3 | 76.8 GB/s | 105W | Jun 2006 | |
Tesla C1060 | Tesla | 240 | No | No | 4 GB GDDR3 | GDDR3 | 102 GB/s | 238W | Dec 2008 | |
Tesla M1060 | Tesla | 240 | No | No | 4 GB GDDR3 | GDDR3 | 102 GB/s | 225W | Dec 2008 | |
Tesla M2050 | Fermi | 448 | No | No | 3 GB GDDR5 | GDDR5 | 148 GB/s | 225W | May 2010 | |
Tesla M2070 | Fermi | 448 | No | No | 6 GB GDDR5 | GDDR5 | 150 GB/s | 225W | May 2010 | |
Tesla K10 | Kepler | 3072 | No | No | 8 GB GDDR5 | GDDR5 | 320 GB/s | 225W | May 2012 | |
Tesla K20 | Kepler | 2496 | No | No | 5/6 GB GDDR5 | GDDR5 | 208 GB/s | 225W | Nov 2012 | |
Tesla K40 | Kepler | 2880 | No | No | 12 GB GDDR5 | GDDR5 | 288 GB/s | 235W | Nov 2013 | |
Tesla K80 | Kepler | 4992 | No | No | 24 GB GDDR5 | GDDR5 | 480 GB/s | 300W | Nov 2014 | |
Tesla M40 | Maxwell | 3072 | No | No | 12 GB GDDR5 | GDDR5 | 288 GB/s | 250W | Nov 2015 | |
Tesla P4 | Pascal | 2560 | No | No | 8 GB GDDR5 | GDDR5 | 192 GB/s | 75W | Sep 2016 | |
Tesla P40 | Pascal | 3840 | No | No | 24 GB GDDR5X | GDDR5X | 480 GB/s | 250W | Sep 2016 | |
Tesla V100 | Volta | 5120 | 640 | Yes | 16/32 GB HBM2 | HBM2 | 900 GB/s | 300W | May 2017 | |
Tesla T4 | Turing | 2560 | 320 | No | 16 GB | |||||
A100 PCIe | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | HBM2 | 1555 GB/s | 250W | May 2020 | |
A100 SXM4 | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | HBM2 | 1555 GB/s | 400W | May 2020 | |
A30 | Ampere | 7424 | 184 | No | 24 GB GDDR6 | GDDR6 | 696 GB/s | 165W | Apr 2021 | |
A40 | Ampere | 10752 | 336 | No | 48 GB GDDR6 | GDDR6 | 696 GB/s | 300W | Apr 2021 | |
A10 | Ampere | 10240 | 320 | No | 24 GB GDDR6 | GDDR6 | 624 GB/s | 150W | Mar 2021 | |
A16 | Ampere | 16384 | 512 | No | 48 GB GDDR6 | GDDR6 | 768 GB/s | 250W | Mar 2021 | |
A100 80GB | Ampere | 6912 | 432 | Yes | 80 GB HBM2 | Up to 7
MIGs @ 10GB |
HBM2 | 1935GB/s | 300W | Apr 2021 |
A100 40GB | Ampere | 6912 | 432 | Yes | 40 GB HBM2 | Up to 7
MIGs @ 5GB |
HBM2 | 1555 GB/s | 250W | May 2020 |
A200 PCIe | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | HBM2 | 2050 GB/s | 400W | Nov 2021 | |
A200 SXM4 | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | HBM2 | 2050 GB/s | 400W | Nov 2021 | |
A5000 | Ampere | 8192 | 256 | Yes | 24 GB GDDR6 | GDDR6 | 768 GB/s | 230W | Apr 2021 | |
A4000 | Ampere | 6144 | 192 | Yes | 16 GB GDDR6 | GDDR6 | 512 GB/s | 140W | Apr 2021 | |
A3000 | Ampere | 3584 | 112 | Yes | 24 GB G | |||||
Titan RTX | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | GDDR6 | 672 GB/s | 280W | Dec 2018 | |
GeForce RTX 3090 | Turing | 10496 | 328 | Yes | 24 GB GDDR6X | GDDR6X | 936 GB/s | 350W | Sep 2020 | |
GeForce RTX 3080 Ti | Turing | 10240 | 320 | Yes | 12 GB GDDR6X | GDDR6X | 912 GB/s | 350W | May 2021 | |
GeForce RTX 3080 | Turing | 8704 | 272 | Yes | 10 GB GDDR6X | GDDR6X | 760 GB/s | 320W | Sep 2020 | |
GeForce RTX 3070 Ti | Turing | 6144 | 192 | Yes | 8 GB GDDR6X | GDDR6X | 608 GB/s | 290W | Jun 2021 | |
GeForce RTX 3070 | Turing | 5888 | 184 | Yes | 8 GB GDDR6 | GDDR6 | 448 GB/s | 220W | Oct 2020 | |
GeForce RTX 3060 Ti | Turing | 4864 | 152 | Yes | 8 GB GDDR6 | GDDR6 | 448 GB/s | 200W | Dec 2020 | |
GeForce RTX 3060 | Turing | 3584 | 112 | No | 12 GB GDDR6 | GDDR6 | 360 GB/s | 170W | Feb 2021 | |
Quadro RTX 8000 | Turing | 4608 | 576 | Yes | 48 GB GDDR6 | GDDR6 | 624 GB/s | 295W | Aug 2018 | |
Quadro RTX 6000 | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | GDDR6 | 432 GB/s | 260W | Aug 2018 | |
Quadro RTX 5000 | Turing | 3072 | 384 | Yes | 16 GB GDDR6 | GDDR6 | 448 GB/s | 230W | Nov 2018 | |
Quadro RTX 4000 | Turing | 2304 | 288 | Yes | 8 GB GDDR6 | GDDR6 | 416 GB/s | 160W | Nov 2018 | |
Titan RTX (T-Rex) | Turing | 4608 | 576 | Yes | 24 GB | |||||
Titan V | Volta | 5120 | 640 | 12 GB HBM2 | HBM2 | 652.8 GB/s | 250W | Dec 2017 | ||
Tesla V100 (PCIe) | Volta | 5120 | 640 | 16 GB HBM2 | HBM2 | 900 GB/s | 250W | June 2017 | ||
Tesla V100 (SXM2) | Volta | 5120 | 640 | 16 GB HBM2 | HBM2 | 900 GB/s | 300W | June 2017 | ||
Quadro GV100 | Volta | 5120 | 640 | 32 GB HBM2 | HBM2 | 870 GB/s | 250W | Mar 2018 | ||
Tesla GV100 (SXM2) | Volta | 5120 | 640 | 32 GB HBM2 | HBM2 | 900 GB/s | 300W | Mar 2018 | ||
DGX-1 (Volta) | Volta | 5120 | 640 | 16 x 32 GB HBM2 (512 GB total) | HBM2 | 2.7 TB/s | 3200W | Mar 2018 |
NVIDIA Features by Architecture[3]
NVIDIA Flagship Gaming GPUs | |||||
---|---|---|---|---|---|
VideoCardz.com | AD102 | GA102 | TU102 | GV100 | GP102 |
Launch Year | 2022 | 2020 | 2018 | 2017 | 2017 |
Architecture | Ada Lovelace | Ampere | Turing | Volta | Pascal |
Node | TSMC 4N | SAMSUNG 8N | TSMC 12nm | TSMC 12nm | TSMC 16nm |
Die Size | 608 mm² | 628 mm² | 754 mm² | 815 mm² | 471 mm² |
Transistors | 76.3B | 28.3B | 18.6B | 21.1B | 12.0B |
Trans. Density | 125.5M TRAN/mm2 | 45.1M TRAN/mm2 | 24.7M TRAN/mm2 | 25.9M TRAN/mm2 | 25.5M TRAN/mm2 |
CUDA Cores | 18432 | 10752 | 4608 | 5120 | 3840 |
Tensor Cores | 576 Gen4 | 336 Gen3 | 576 Gen2 | 640 | – |
RT Cores | 144 Gen3 | 84 Gen2 | 72 Gen1 | – | – |
Memory Bus | GDDR6X 384-bit | GDDR6X 384-bit | GDDR6 384-bit | HBM2 3072-bit | GDDR6X 384-bit |
NVIDIA Grace Architecture
NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023
Architecture | Key Features |
---|---|
Grace | CPU-GPU integration, ARM Neoverse CPU, HBM2E memory |
900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink | |
10x performance improvement for certain HPC workloads | |
Energy efficiency improvements through unified memory space |