NVIDIA GPU: Difference between revisions
Line 230: | Line 230: | ||
!Tensor Cores | !Tensor Cores | ||
!RT Cores | !RT Cores | ||
!NVLink | |||
!FF | !FF | ||
!Memory Size | !Memory Size | ||
Line 235: | Line 236: | ||
!Memory Bandwidth | !Memory Bandwidth | ||
!TDP | !TDP | ||
!Launch Date | !Launch Date | ||
|- | |- | ||
Line 245: | Line 245: | ||
528 | 528 | ||
|No | |No | ||
| | |||
|SXM5 | |SXM5 | ||
|80GB HBM3 | |80GB HBM3 | ||
Line 251: | Line 252: | ||
|3.35TB/s | |3.35TB/s | ||
|700W | |700W | ||
|Jan 2023 | |Jan 2023 | ||
|- | |- | ||
Line 260: | Line 260: | ||
|4th Gen 456 | |4th Gen 456 | ||
|No | |No | ||
| | |||
|PCIe | |PCIe | ||
Gen 5 x16 | Gen 5 x16 | ||
Line 267: | Line 268: | ||
|2TB/s | |2TB/s | ||
|300~350W | |300~350W | ||
|Jan 2023 | |Jan 2023 | ||
|- | |- | ||
Line 275: | Line 275: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|4 GB GDDR3 | |4 GB GDDR3 | ||
Line 280: | Line 281: | ||
|102 GB/s | |102 GB/s | ||
|238W | |238W | ||
|Dec 2008 | |Dec 2008 | ||
|- | |- | ||
Line 288: | Line 288: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|8 GB GDDR5 | |8 GB GDDR5 | ||
Line 293: | Line 294: | ||
|320 GB/s | |320 GB/s | ||
|225W | |225W | ||
|May 2012 | |May 2012 | ||
|- | |- | ||
Line 301: | Line 301: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|5/6 GB GDDR5 | |5/6 GB GDDR5 | ||
Line 306: | Line 307: | ||
|208 GB/s | |208 GB/s | ||
|225W | |225W | ||
|Nov 2012 | |Nov 2012 | ||
|- | |- | ||
Line 314: | Line 314: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|12 GB GDDR5 | |12 GB GDDR5 | ||
Line 319: | Line 320: | ||
|288 GB/s | |288 GB/s | ||
|235W | |235W | ||
|Nov 2013 | |Nov 2013 | ||
|- | |- | ||
Line 327: | Line 327: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|24 GB GDDR5 | |24 GB GDDR5 | ||
Line 332: | Line 333: | ||
|480 GB/s | |480 GB/s | ||
|300W | |300W | ||
|Nov 2014 | |Nov 2014 | ||
|- | |- | ||
Line 340: | Line 340: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|12 GB GDDR5 | |12 GB GDDR5 | ||
Line 345: | Line 346: | ||
|288 GB/s | |288 GB/s | ||
|250W | |250W | ||
|Nov 2015 | |Nov 2015 | ||
|- | |- | ||
Line 353: | Line 353: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|8 GB GDDR5 | |8 GB GDDR5 | ||
Line 358: | Line 359: | ||
|192 GB/s | |192 GB/s | ||
|75W | |75W | ||
|Sep 2016 | |Sep 2016 | ||
|- | |- | ||
Line 366: | Line 366: | ||
|No | |No | ||
|No | |No | ||
| | |||
| | | | ||
|24 GB GDDR5X | |24 GB GDDR5X | ||
Line 371: | Line 372: | ||
|480 GB/s | |480 GB/s | ||
|250W | |250W | ||
|Sep 2016 | |Sep 2016 | ||
|- | |- | ||
Line 379: | Line 379: | ||
|640 | |640 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|16/32 GB HBM2 | |16/32 GB HBM2 | ||
Line 384: | Line 385: | ||
|900 GB/s | |900 GB/s | ||
|300W | |300W | ||
|May 2017 | |May 2017 | ||
|- | |- | ||
Line 392: | Line 392: | ||
|320 | |320 | ||
|No | |No | ||
| | |||
| | | | ||
|16 GB | |16 GB | ||
| | | | ||
| | | | ||
Line 405: | Line 405: | ||
|432 | |432 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|40 GB HBM2 / 80 GB HBM2 | |40 GB HBM2 / 80 GB HBM2 | ||
Line 410: | Line 411: | ||
|1555 GB/s | |1555 GB/s | ||
|250W | |250W | ||
|May 2020 | |May 2020 | ||
|- | |- | ||
Line 418: | Line 418: | ||
|432 | |432 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|40 GB HBM2 / 80 GB HBM2 | |40 GB HBM2 / 80 GB HBM2 | ||
Line 423: | Line 424: | ||
|1555 GB/s | |1555 GB/s | ||
|400W | |400W | ||
|May 2020 | |May 2020 | ||
|- | |- | ||
Line 431: | Line 431: | ||
|184 | |184 | ||
|No | |No | ||
| | |||
| | | | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
Line 436: | Line 437: | ||
|696 GB/s | |696 GB/s | ||
|165W | |165W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |- | ||
|A40 | |A40<ref>https://images.nvidia.com/content/Solutions/data-center/a40/nvidia-a40-datasheet.pdf</ref> | ||
|Ampere | |Ampere | ||
|10752 | |10752 | ||
|336 | |336 | ||
| | |84 | ||
| | |NVIDIA® NVLink® 112.5 GB/s | ||
|48 GB GDDR6 | (bidirectional)3 PCIe Gen4: 64GB/s | ||
|PCI | |||
4.4" (H) x 10.5" (L) dual sl, Passive | |||
|48 GB GDDR6 with ECC | |||
| | | | ||
|696 GB/s | |696 GB/s | ||
|300W | |300W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |- | ||
Line 457: | Line 459: | ||
|320 | |320 | ||
|No | |No | ||
| | |||
| | | | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
Line 462: | Line 465: | ||
|624 GB/s | |624 GB/s | ||
|150W | |150W | ||
|Mar 2021 | |Mar 2021 | ||
|- | |- | ||
Line 470: | Line 472: | ||
|3rd Gen 160 | |3rd Gen 160 | ||
|40 | |40 | ||
| | |||
|PCIe Gen4 x16 | |PCIe Gen4 x16 | ||
|64 GB GDDR6 | |64 GB GDDR6 | ||
Line 475: | Line 478: | ||
|800 GB/s | |800 GB/s | ||
|250W | |250W | ||
|Mar 2021 | |Mar 2021 | ||
|- | |- | ||
Line 484: | Line 486: | ||
|432 | |432 | ||
| - | | - | ||
| | |||
| | | | ||
|80 GB HBM2e | |80 GB HBM2e | ||
Line 490: | Line 493: | ||
|1935GB/s | |1935GB/s | ||
|300W | |300W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |- | ||
Line 499: | Line 501: | ||
|432 | |432 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|40 GB HBM2 | |40 GB HBM2 | ||
Line 505: | Line 508: | ||
|1555 GB/s | |1555 GB/s | ||
|250W | |250W | ||
|May 2020 | |May 2020 | ||
|- | |- | ||
Line 513: | Line 515: | ||
|672 | |672 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|80 GB HBM2 / 160 GB HBM2 | |80 GB HBM2 / 160 GB HBM2 | ||
Line 518: | Line 521: | ||
|2050 GB/s | |2050 GB/s | ||
|400W | |400W | ||
|Nov 2021 | |Nov 2021 | ||
|- | |- | ||
Line 526: | Line 528: | ||
|672 | |672 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|80 GB HBM2 / 160 GB HBM2 | |80 GB HBM2 / 160 GB HBM2 | ||
Line 531: | Line 534: | ||
|2050 GB/s | |2050 GB/s | ||
|400W | |400W | ||
|Nov 2021 | |Nov 2021 | ||
|- | |- | ||
Line 539: | Line 541: | ||
|568 | |568 | ||
|142 | |142 | ||
| | |||
| | | | ||
|48GB GDDR6 | |48GB GDDR6 | ||
Line 544: | Line 547: | ||
|960 GB/s | |960 GB/s | ||
|300 W | |300 W | ||
|Jan 2023 | |Jan 2023 | ||
|- | |- | ||
Line 552: | Line 554: | ||
|336 | |336 | ||
|84 | |84 | ||
| | |||
| | | | ||
|48 GB GDDR6 | |48 GB GDDR6 | ||
Line 557: | Line 560: | ||
|768 GB/s | |768 GB/s | ||
|300 W | |300 W | ||
| | | | ||
|- | |- | ||
Line 565: | Line 567: | ||
|256 | |256 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
Line 570: | Line 573: | ||
|768 GB/s | |768 GB/s | ||
|230W | |230W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |- | ||
Line 578: | Line 580: | ||
|192 | |192 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|16 GB GDDR6 | |16 GB GDDR6 | ||
Line 583: | Line 586: | ||
|512 GB/s | |512 GB/s | ||
|140W | |140W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |- | ||
Line 591: | Line 593: | ||
|112 | |112 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|24 GB G | |24 GB G | ||
| | | | ||
| | | | ||
Line 604: | Line 606: | ||
|576 | |576 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
Line 609: | Line 612: | ||
|672 GB/s | |672 GB/s | ||
|280W | |280W | ||
|Dec 2018 | |Dec 2018 | ||
|- | |- | ||
Line 617: | Line 619: | ||
|512 | |512 | ||
|Yes, 128 | |Yes, 128 | ||
| | |||
| | | | ||
|24 GB GDDR6X | |24 GB GDDR6X | ||
Line 622: | Line 625: | ||
|21.2Gbps | |21.2Gbps | ||
|450W | |450W | ||
| | | | ||
|- | |- | ||
Line 630: | Line 632: | ||
|336 | |336 | ||
|84 | |84 | ||
| | |||
| | | | ||
|24 GB GDDR6X | |24 GB GDDR6X | ||
Line 635: | Line 638: | ||
|21.2Gbps | |21.2Gbps | ||
|450W | |450W | ||
| | | | ||
|- | |- | ||
Line 643: | Line 645: | ||
|328 | |328 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|24 GB GDDR6X | |24 GB GDDR6X | ||
Line 648: | Line 651: | ||
|936 GB/s | |936 GB/s | ||
|350W | |350W | ||
|Sep 2020 | |Sep 2020 | ||
|- | |- | ||
Line 656: | Line 658: | ||
|320 | |320 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|12 GB GDDR6X | |12 GB GDDR6X | ||
Line 661: | Line 664: | ||
|912 GB/s | |912 GB/s | ||
|350W | |350W | ||
|May 2021 | |May 2021 | ||
|- | |- | ||
Line 669: | Line 671: | ||
|272 | |272 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|10 GB GDDR6X | |10 GB GDDR6X | ||
Line 674: | Line 677: | ||
|760 GB/s | |760 GB/s | ||
|320W | |320W | ||
|Sep 2020 | |Sep 2020 | ||
|- | |- | ||
Line 682: | Line 684: | ||
|192 | |192 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|8 GB GDDR6X | |8 GB GDDR6X | ||
Line 687: | Line 690: | ||
|608 GB/s | |608 GB/s | ||
|290W | |290W | ||
|Jun 2021 | |Jun 2021 | ||
|- | |- | ||
Line 695: | Line 697: | ||
|184 | |184 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|8 GB GDDR6 | |8 GB GDDR6 | ||
Line 700: | Line 703: | ||
|448 GB/s | |448 GB/s | ||
|220W | |220W | ||
|Oct 2020 | |Oct 2020 | ||
|- | |- | ||
Line 708: | Line 710: | ||
|152 | |152 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|8 GB GDDR6 | |8 GB GDDR6 | ||
Line 713: | Line 716: | ||
|448 GB/s | |448 GB/s | ||
|200W | |200W | ||
|Dec 2020 | |Dec 2020 | ||
|- | |- | ||
Line 721: | Line 723: | ||
|112 | |112 | ||
|No | |No | ||
| | |||
| | | | ||
|12 GB GDDR6 | |12 GB GDDR6 | ||
Line 726: | Line 729: | ||
|360 GB/s | |360 GB/s | ||
|170W | |170W | ||
|Feb 2021 | |Feb 2021 | ||
|- | |- | ||
Line 734: | Line 736: | ||
|576 | |576 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|48 GB GDDR6 | |48 GB GDDR6 | ||
Line 739: | Line 742: | ||
|624 GB/s | |624 GB/s | ||
|295W | |295W | ||
|Aug 2018 | |Aug 2018 | ||
|- | |- | ||
Line 747: | Line 749: | ||
|576 | |576 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|24 GB GDDR6 | |24 GB GDDR6 | ||
Line 752: | Line 755: | ||
|432 GB/s | |432 GB/s | ||
|260W | |260W | ||
|Aug 2018 | |Aug 2018 | ||
|- | |- | ||
Line 761: | Line 763: | ||
|3rd Gen | |3rd Gen | ||
142 | 142 | ||
| | |||
|PCIe Gen4x1 | |PCIe Gen4x1 | ||
|48GB GDDR6 with ECC | |48GB GDDR6 with ECC | ||
Line 766: | Line 769: | ||
|864GB/s | |864GB/s | ||
|300W | |300W | ||
|Jan 2023 | |Jan 2023 | ||
|- | |- | ||
Line 774: | Line 776: | ||
|384 | |384 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|16 GB GDDR6 | |16 GB GDDR6 | ||
Line 779: | Line 782: | ||
|448 GB/s | |448 GB/s | ||
|230W | |230W | ||
|Nov 2018 | |Nov 2018 | ||
|- | |- | ||
Line 787: | Line 789: | ||
|288 | |288 | ||
|Yes | |Yes | ||
| | |||
| | | | ||
|8 GB GDDR6 | |8 GB GDDR6 | ||
Line 792: | Line 795: | ||
|416 GB/s | |416 GB/s | ||
|160W | |160W | ||
|Nov 2018 | |Nov 2018 | ||
|- | |- | ||
Line 800: | Line 802: | ||
|576 | |576 | ||
|No | |No | ||
| | |||
| | | | ||
|24 GB | |24 GB | ||
Line 805: | Line 808: | ||
|672 Gb/s | |672 Gb/s | ||
|280 W | |280 W | ||
| | | | ||
|- | |- | ||
Line 812: | Line 814: | ||
|5120 | |5120 | ||
|640 | |640 | ||
| | |||
| | | | ||
| | | | ||
Line 818: | Line 821: | ||
|652.8 GB/s | |652.8 GB/s | ||
|250W | |250W | ||
|Dec 2017 | |Dec 2017 | ||
|- | |- | ||
Line 826: | Line 828: | ||
|640 | |640 | ||
|No | |No | ||
| | |||
| | | | ||
|32/16 GB HBM2 | |32/16 GB HBM2 | ||
Line 831: | Line 834: | ||
|900 GB/s | |900 GB/s | ||
|250W | |250W | ||
|June 2017 | |June 2017 | ||
|- | |- | ||
Line 839: | Line 841: | ||
|640 | |640 | ||
|No | |No | ||
| | |||
| | | | ||
|32/16 GB HBM2 | |32/16 GB HBM2 | ||
Line 844: | Line 847: | ||
|900 GB/s | |900 GB/s | ||
|300W | |300W | ||
|June 2017 | |June 2017 | ||
|- | |- | ||
Line 852: | Line 854: | ||
|640 | |640 | ||
|No | |No | ||
| | |||
| | | | ||
|32 GB HBM2 | |32 GB HBM2 | ||
Line 857: | Line 860: | ||
|870 GB/s | |870 GB/s | ||
|250W | |250W | ||
|Mar 2018 | |Mar 2018 | ||
|- | |- | ||
Line 865: | Line 867: | ||
|640 | |640 | ||
|No | |No | ||
| | |||
| | | | ||
|32 GB HBM2 | |32 GB HBM2 | ||
Line 870: | Line 873: | ||
|900 GB/s | |900 GB/s | ||
|300W | |300W | ||
|Mar 2018 | |Mar 2018 | ||
|} | |} |
Revision as of 10:12, 11 July 2023
HPCMATE provides all level of GPU model as air-cooling or liquid-cooling version for any type of server or workstation.
GPU Tenser performance notes for RTX 4090
According to this thread NVIDIA looks cut the tensor FP16 & TF32 operation rate in half, resulting in a 4090 with even lower FP16 & TF32 performance than the 4080 16GB. This may have been done to prevent the 4090 from cannibalizing the Quadro/Tesla sales. So if you are choosing GPUs, you can choose the 4090 for memory, but lower tensor performance than the 4080 16GB. eventhough 4090 has more than twice the ray tracing performance of the 4080 12GB.
RTX 4090 | RTX 4080 16GB | RTX 4080 12GB | RTX 3090 Ti | |
---|---|---|---|---|
non-tensor FP32 tflops | 82.6 (206%) | 48.7 (122%) | 40.1 (100%) | 40 (100%) |
non-tensor FP16 tflops | 82.6 (206%) | 48.7 (122%) | 40.1 (100%) | 40 (100%) |
Tensor Cores | 512 (152%) | 304 (90%) | 240 (71%) | 336 (100%) |
Optical flow TOPS | 305 (242%) | 305 (242%) | 305 (242%) | 126 (100%) |
tensor FP16 w/ FP32 accumulate TFLOPS ** | 165.2 (207%) | 194.9 (244%) | 160.4 (200%) | 80 (100%) |
tensor TF32 TFLOPS ** | 82.6 (207%) | 97.5 (244%) | 80.2 (200%) | 40 (100%) |
Ray trace Cores | 128 (152%) | 76 (90%) | 60 (71%) | 84 (100%) |
Ray trace TFLOPS | 191 (245%) | 112.7 (144%) | 92.7 (119%) | 78.1 (100%) |
POWER (W) | 450 (100%) | 320 (71%) | 285 (63%) | 450 (100%) |
NVIDIA GPU Architecture
nvcc sm flags and what they’re used for: When compiling with NVCC[1],
- the arch flag (‘
-arch
‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for. - Gencodes (‘
-gencode
‘) allows for more PTX generations and can be repeated many times for different architectures.
Matching CUDA arch and CUDA gencode for various NVIDIA architectures
Series | Architecture
(--arch) |
CUDA gencode
(--sm) |
Compute Capability | Notable Models | Supported CUDA version | Key Features |
---|---|---|---|---|---|---|
Tesla | Tesla | 1.0, 1.1, 2.0, 2.1 | C1060, M2050, K80, P100, V100, A100 | First dedicated GPGPU series | ||
Fermi | Fermi | sm_20 | 3.0, 3.1 | GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000 | CUDA 3.2 until CUDA 8 | First to feature CUDA cores and support for ECC memory
|
Kepler | Kepler | sm_30
sm_35, sm_37 |
3.2, 3.5, 3.7 | GTX 600, GTX 700, Tesla K-series, Quadro K-series | CUDA 5 until CUDA 10 | First to feature Dynamic Parallelism and Hyper-Q
|
Maxwell | Maxwell | sm_50,
sm_52, sm_53 |
5.0, 5.2 | GTX 900, GTX 1000, Quadro M-series | CUDA 6 until CUDA 11 | First to support VR and 4K displays
|
Pascal | Pascal | sm_60,
sm_61, sm_62 |
6.0, 6.1, 6.2 | GTX 1000, Quadro P-series | CUDA 8 and later | First to support simultaneous multi-projection
|
Volta | Volta | sm_70,
sm_72 (Xavier) |
7.0, 7.2, 7.5 | Titan V, Tesla V100, Quadro GV100 | CUDA 9 and later | First to feature Tensor Cores and NVLink 2.0
|
Turing | Turing | sm_75 | 7.5, 7.6 | RTX 2000, GTX 1600, Quadro RTX | CUDA 10 and later | First to feature Ray Tracing Cores and RTX technology
|
Ampere | Ampere | sm_80,
sm_86, sm_87 (Orin) |
8.0, 8.6 | RTX 3000, A-series | CUDA 11.1 and later | Features third-generation Tensor Cores and more
|
Lovelace | Ada Lovelace[2] | sm_89 | 8.9 | GeForce RTX 4070 Ti (AD104)
GeForce RTX 4080 (AD103) GeForce RTX 4090 (AD102) Nvidia RTX 6000 Ada Generation (AD102, formerly Quadro) Nvidia L40 (AD102, formerly Tesla) |
CUDA 11.8 and later
cuDNN 8.6 and later |
|
Hopper[3] | Hopper | sm_90, sm_90a(Thor) | 9.0 | CUDA 12 and later | TODO
|
NVIDIA GPU Models
Model | Architecture | CUDA Cores | Tensor Cores | RT Cores | NVLink | FF | Memory Size | MIG[4] | Memory Bandwidth | TDP | Launch Date |
---|---|---|---|---|---|---|---|---|---|---|---|
H100-SXM5 | Hopper
(GH100) |
16896 | 4th Gen
528 |
No | SXM5 | 80GB HBM3
50 MB L2 cache |
7@10GB | 3.35TB/s | 700W | Jan 2023 | |
H100-PCIE[5][6] | Hopper
(GH100) |
14592 | 4th Gen 456 | No | PCIe
Gen 5 x16 |
80 GB HBM2
50 MB L2 cache |
7@10GB | 2TB/s | 300~350W | Jan 2023 | |
Tesla C1060 | Tesla | 240 | No | No | 4 GB GDDR3 | 102 GB/s | 238W | Dec 2008 | |||
Tesla K10 | Kepler | 3072 | No | No | 8 GB GDDR5 | 320 GB/s | 225W | May 2012 | |||
Tesla K20 | Kepler | 2496 | No | No | 5/6 GB GDDR5 | 208 GB/s | 225W | Nov 2012 | |||
Tesla K40 | Kepler | 2880 | No | No | 12 GB GDDR5 | 288 GB/s | 235W | Nov 2013 | |||
Tesla K80 | Kepler | 4992 | No | No | 24 GB GDDR5 | 480 GB/s | 300W | Nov 2014 | |||
Tesla M40 | Maxwell | 3072 | No | No | 12 GB GDDR5 | 288 GB/s | 250W | Nov 2015 | |||
Tesla P4 | Pascal | 2560 | No | No | 8 GB GDDR5 | 192 GB/s | 75W | Sep 2016 | |||
Tesla P40 | Pascal | 3840 | No | No | 24 GB GDDR5X | 480 GB/s | 250W | Sep 2016 | |||
Tesla V100 | Volta | 5120 | 640 | Yes | 16/32 GB HBM2 | 900 GB/s | 300W | May 2017 | |||
Tesla T4 | Turing | 2560 | 320 | No | 16 GB | ||||||
A100 PCIe | Ampere (GA100) | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | 1555 GB/s | 250W | May 2020 | |||
A100 SXM4 | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | 7 | 1555 GB/s | 400W | May 2020 | ||
A30 | Ampere | 7424 | 184 | No | 24 GB GDDR6 | 4 | 696 GB/s | 165W | Apr 2021 | ||
A40[7] | Ampere | 10752 | 336 | 84 | NVIDIA® NVLink® 112.5 GB/s
(bidirectional)3 PCIe Gen4: 64GB/s |
PCI
4.4" (H) x 10.5" (L) dual sl, Passive |
48 GB GDDR6 with ECC | 696 GB/s | 300W | Apr 2021 | |
A10 | Ampere | 10240 | 320 | No | 24 GB GDDR6 | 624 GB/s | 150W | Mar 2021 | |||
A16[8] | Ampere | 5120 | 3rd Gen 160 | 40 | PCIe Gen4 x16 | 64 GB GDDR6 | 800 GB/s | 250W | Mar 2021 | ||
A100 80GB | Ampere
(GA100) |
6912 | 432 | - | 80 GB HBM2e | 7@
10GB |
1935GB/s | 300W | Apr 2021 | ||
A100 40GB | Ampere
(GA100) |
6912 | 432 | Yes | 40 GB HBM2 | 7@
5GB |
1555 GB/s | 250W | May 2020 | ||
A200 PCIe | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | 2050 GB/s | 400W | Nov 2021 | |||
A200 SXM4 | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | 2050 GB/s | 400W | Nov 2021 | |||
A6000[9] | Ada Lovelace | 18176 | 568 | 142 | 48GB GDDR6 | 960 GB/s | 300 W | Jan 2023 | |||
A6000[10] | Ampere | 10752 | 336 | 84 | 48 GB GDDR6 | 768 GB/s | 300 W | ||||
A5000 | Ampere | 8192 | 256 | Yes | 24 GB GDDR6 | 768 GB/s | 230W | Apr 2021 | |||
A4000[11] | Ampere | 6144 | 192 | Yes | 16 GB GDDR6 | 512 GB/s | 140W | Apr 2021 | |||
A3000 | Ampere | 3584 | 112 | Yes | 24 GB G | ||||||
Titan RTX | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | 672 GB/s | 280W | Dec 2018 | |||
GeForce RTX 4090 | Ada Lovelace | 16384 | 512 | Yes, 128 | 24 GB GDDR6X | 21.2Gbps | 450W | ||||
GeForce RTX 3090 Ti | Turing | 10752 | 336 | 84 | 24 GB GDDR6X | 21.2Gbps | 450W | ||||
GeForce RTX 3090 | Turing | 10496 | 328 | Yes | 24 GB GDDR6X | 936 GB/s | 350W | Sep 2020 | |||
GeForce RTX 3080 Ti | Turing | 10240 | 320 | Yes | 12 GB GDDR6X | 912 GB/s | 350W | May 2021 | |||
GeForce RTX 3080 | Turing | 8704 | 272 | Yes | 10 GB GDDR6X | 760 GB/s | 320W | Sep 2020 | |||
GeForce RTX 3070 Ti | Turing | 6144 | 192 | Yes | 8 GB GDDR6X | 608 GB/s | 290W | Jun 2021 | |||
GeForce RTX 3070 | Turing | 5888 | 184 | Yes | 8 GB GDDR6 | 448 GB/s | 220W | Oct 2020 | |||
GeForce RTX 3060 Ti | Turing | 4864 | 152 | Yes | 8 GB GDDR6 | 448 GB/s | 200W | Dec 2020 | |||
GeForce RTX 3060 | Turing | 3584 | 112 | No | 12 GB GDDR6 | 360 GB/s | 170W | Feb 2021 | |||
Quadro RTX 8000 | Turing | 4608 | 576 | Yes | 48 GB GDDR6 | 624 GB/s | 295W | Aug 2018 | |||
Quadro RTX 6000 | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | 432 GB/s | 260W | Aug 2018 | |||
Tesla L40[12] | Ada Lovelace | 18,176 | 4th Gen 568 | 3rd Gen
142 |
PCIe Gen4x1 | 48GB GDDR6 with ECC | 864GB/s | 300W | Jan 2023 | ||
Quadro RTX 5000 | Turing | 3072 | 384 | Yes | 16 GB GDDR6 | 448 GB/s | 230W | Nov 2018 | |||
Quadro RTX 4000 | Turing | 2304 | 288 | Yes | 8 GB GDDR6 | 416 GB/s | 160W | Nov 2018 | |||
Titan RTX (T-Rex) | Turing | 4608 | 576 | No | 24 GB | 672 Gb/s | 280 W | ||||
Titan V | Volta | 5120 | 640 | 12 GB HBM2 | 652.8 GB/s | 250W | Dec 2017 | ||||
Tesla V100 (PCIe) | Volta | 5120 | 640 | No | 32/16 GB HBM2 | 900 GB/s | 250W | June 2017 | |||
Tesla V100 (SXM2) | Volta | 5120 | 640 | No | 32/16 GB HBM2 | 900 GB/s | 300W | June 2017 | |||
Quadro GV100 | Volta | 5120 | 640 | No | 32 GB HBM2 | 870 GB/s | 250W | Mar 2018 | |||
Tesla GV100 (SXM2) | Volta | 5120 | 640 | No | 32 GB HBM2 | 900 GB/s | 300W | Mar 2018 |
NVIDIA Features by Architecture[13]
NVIDIA GPU Architectures | |||||||
---|---|---|---|---|---|---|---|
AD102 | GA102 | GA100 | TU102 | GV100 | GP102 | GP100 | |
Launch Year | 2022 | 2020 | 2020 | 2018 | 2017 | 2017 | – |
Architecture | Ada Lovelace | Ampere | Ampere | Turing | Volta | Pascal | Pascal |
Form Factor | – | – | SXM4/PCIe | – | SXM2/PCIe | – | SXM/PCIe |
TDP | – | – | 400W | – | 300W | – | 300W |
Node | TSMC 4N | SAMSUNG 8N | – | TSMC 12nm | TSMC 12nm | TSMC 16nm | – |
CUDA Cores | 18432 | 10752 | – | 4608 | 5120 | 3840 | – |
Tensor Cores | 576 Gen4 | 336 Gen3 | – | 576 Gen2 | 640 | – | – |
RT Cores | 144 Gen3 | 84 Gen2 | – | 72 Gen1 | – | – | – |
Memory Bus | GDDR6X 384-bit | GDDR6X 384-bit | – | GDDR6 384-bit | HBM2 3072-bit | GDDR6X 384-bit | – |
NVIDIA Grace Architecture
NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023, by then HPCMATE starts to offer those products through local and global partners.
Architecture | Key Features |
---|---|
Grace | CPU-GPU integration, ARM Neoverse CPU, HBM2E memory |
900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink | |
10x performance improvement for certain HPC workloads | |
Energy efficiency improvements through unified memory space |
Reference
- ↑ https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
- ↑ https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture)
- ↑ https://www.nvidia.com/en-us/data-center/h100/
- ↑ https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/data-center/h100/PB-11133-001_v01.pdf
- ↑ https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet
- ↑ https://images.nvidia.com/content/Solutions/data-center/a40/nvidia-a40-datasheet.pdf
- ↑ https://images.nvidia.com/content/Solutions/data-center/vgpu-a16-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/proviz-print-rtx6000-datasheet-web-2504660.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-product-literature/proviz-print-nvidia-rtx-a6000-datasheet-us-nvidia-1454980-r9-web%20(1).pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs21/rtx-a4000/nvidia-rtx-a4000-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/support-guide/NVIDIA-L40-Datasheet-January-2023.pdf
- ↑ https://videocardz.com/newz/nvidia-details-ad102-gpu-up-to-18432-cuda-cores-76-3b-transistors-and-608-mm%C2%B2