NVIDIA GPU: Difference between revisions
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[http://www.hpcmate.com/ HPCMATE] provides all level of GPU model as air-cooling or liquid-cooling version for any type of server or workstation. | [http://www.hpcmate.com/ HPCMATE] provides all level of GPU model as air-cooling or liquid-cooling version for any type of server or workstation. | ||
== Nvidia GPU tips and tricks == | |||
* TRX - RTX stands for real-time ray tracing | |||
* GeForce vs Quadro - In general, GeForce cards have less computing power than Quadro cards. Quadro cards simply have more computation and memory than GeForce cards. So if we are training a neural network, rendering an animated film, or running exhaustive CAD/CAE applications , that extra memory is extra welcome | |||
== GPU Tenser performance notes for RTX 4090 == | == GPU Tenser performance notes for RTX 4090 == | ||
Line 202: | Line 207: | ||
[[cuDNN]] 8.6 and later | [[cuDNN]] 8.6 and later | ||
| | | | ||
* Fourth-Gen Tensor Cores increasing throughput by up to 5X, to 1.4 Tensor-petaFLOPS using the new FP8 Transformer Engine (like H100 model) | * Fourth-Gen Tensor Cores increasing throughput by up to 5X, to 1.4 Tensor-petaFLOPS using the new FP8 [[Transformer Engine]] (like H100 model) | ||
* Third-generation RT Cores have twice the ray-triangle intersection throughput, increasing RT-TFLOP performance by over 2x | * Third-generation RT Cores have twice the ray-triangle intersection throughput, increasing RT-TFLOP performance by over 2x | ||
* The new RT Cores also include a new Opacity Micromap (OMM) Engine and a new Displaced Micro-Mesh (DMM) Engine. The OMM Engine enables much faster ray tracing of alpha-tested textures often used for foliage, particles, and fences. The DMM Engine delivers up to 10X faster Bounding Volume Hierarchy (BVH) build time with up to 20X less BVH storage space, enabling real-time ray tracing of geometrically complex scenes | * The new RT Cores also include a new Opacity Micromap (OMM) Engine and a new Displaced Micro-Mesh (DMM) Engine. The OMM Engine enables much faster ray tracing of alpha-tested textures often used for foliage, particles, and fences. The DMM Engine delivers up to 10X faster Bounding Volume Hierarchy (BVH) build time with up to 20X less BVH storage space, enabling real-time ray tracing of geometrically complex scenes | ||
Line 223: | Line 228: | ||
|} | |} | ||
== NVIDIA GPU Models == | == NVIDIA GPU Models<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/design-visualization/quadro-product-literature/rtx-6000-l40-linecard-nvidia-us-2653097-r7-web.pdf</ref> <ref>https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/rtx/quadro-ampere-linecard-us-nvidia.pdf</ref> == | ||
{| class="wikitable sortable" | {| class="wikitable sortable" | ||
!Model | !Model | ||
Line 236: | Line 241: | ||
!Memory Bandwidth | !Memory Bandwidth | ||
!Thermal | !Thermal | ||
!Power connector | |||
!TDP | !TDP | ||
!Launch Date | !Launch Date | ||
Line 252: | Line 258: | ||
|7@10GB | |7@10GB | ||
|3.35TB/s | |3.35TB/s | ||
| | |||
| | | | ||
|700W | |700W | ||
|Jan 2023 | |Jan 2023 | ||
|- | |- | ||
|H100-PCIE<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/data-center/h100/PB-11133-001_v01.pdf</ref><ref>https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet</ref> | |H100-PCIE<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/data-center/h100/PB-11133-001_v01.pdf</ref><ref>https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet</ref><ref>https://www.aspsys.com/wp-content/uploads/2023/09/nvidia-h100-datasheet.pdf</ref> | ||
|Hopper | |Hopper | ||
(GH100) | (GH100) | ||
Line 262: | Line 269: | ||
|4th Gen 456 | |4th Gen 456 | ||
|No | |No | ||
| | |NVLink | ||
(600GB PCIe, | |||
128GB Gen5) | |||
|PCIe | |PCIe | ||
Gen 5 x16 | Gen 5 x16 | ||
Line 270: | Line 280: | ||
|2TB/s | |2TB/s | ||
|Passive | |Passive | ||
| | | | ||
|300~350W | |||
(configurable) | |||
|Jan 2023 | |||
|- | |||
|H100 NVL<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/h100/PB-11773-001_v01.pdf</ref> <ref>https://www.aspsys.com/wp-content/uploads/2023/09/nvidia-h100-datasheet.pdf</ref> | |||
|Hopper | |||
(P1010 SKU 210) | |||
| colspan="3" |2 H100 PCIe boards that come already bridged together<ref>https://www.anandtech.com/show/18780/nvidia-announces-h100-nvl-max-memory-server-card-for-large-language-models</ref> | |||
|NVLink | |||
(600GB PCIe, | |||
128GB Gen5) | |||
|2 x PCIe | |||
PCIe 16-pin | Gen 5 x16 | ||
|94 GB HBM3 | |||
|14@12GB | |||
| | |||
|Passive | |||
|One PCIe 16-pin auxiliary power connector (12v-2x6 auxiliary power | |||
connector) | |||
| | |2x 300~450W (configurable) | ||
|Sept 2023 | |||
|- | |- | ||
|Tesla C1060 | |Tesla C1060 | ||
Line 296: | Line 315: | ||
| | | | ||
|102 GB/s | |102 GB/s | ||
| | |||
| | | | ||
|238W | |238W | ||
Line 310: | Line 330: | ||
| | | | ||
|320 GB/s | |320 GB/s | ||
| | |||
| | | | ||
|225W | |225W | ||
Line 324: | Line 345: | ||
| | | | ||
|208 GB/s | |208 GB/s | ||
| | |||
| | | | ||
|225W | |225W | ||
Line 338: | Line 360: | ||
| | | | ||
|288 GB/s | |288 GB/s | ||
| | |||
| | | | ||
|235W | |235W | ||
Line 352: | Line 375: | ||
| | | | ||
|480 GB/s | |480 GB/s | ||
| | |||
| | | | ||
|300W | |300W | ||
Line 366: | Line 390: | ||
| | | | ||
|288 GB/s | |288 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 380: | Line 405: | ||
| | | | ||
|192 GB/s | |192 GB/s | ||
| | |||
| | | | ||
|75W | |75W | ||
Line 394: | Line 420: | ||
| | | | ||
|480 GB/s | |480 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 415: | Line 442: | ||
PCIe 3.0 × 16 | PCIe 3.0 × 16 | ||
|Passive | |Passive | ||
| | |||
|250 W | |250 W | ||
| | | | ||
Line 428: | Line 456: | ||
| | | | ||
|900 GB/s | |900 GB/s | ||
| | |||
| | | | ||
|300W | |300W | ||
Line 440: | Line 469: | ||
| | | | ||
|16 GB | |16 GB | ||
| | |||
| | | | ||
| | | | ||
Line 456: | Line 486: | ||
| | | | ||
|1555 GB/s | |1555 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 470: | Line 501: | ||
|7 | |7 | ||
|1555 GB/s | |1555 GB/s | ||
| | |||
| | | | ||
|400W | |400W | ||
|May 2020 | |May 2020 | ||
|- | |- | ||
|A30 | |A30<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/data-center/products/a30-gpu/pdf/a30-datasheet.pdf</ref> | ||
|Ampere | |Ampere | ||
|7424 | |7424 | ||
Line 481: | Line 513: | ||
| | | | ||
| | | | ||
| | |24GB HBM2 | ||
|4 | |4 MIGs @ 6GB each | ||
| | 2 MIGs @ 12GB each | ||
1 MIGs @ 24G | |||
|933GB/ | |||
| | |||
| | | | ||
|165W | |165W | ||
Line 500: | Line 536: | ||
| | | | ||
|696 GB/s | |696 GB/s | ||
| | |||
| | | | ||
|300W | |300W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |- | ||
|A10 | |A10<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a10/pdf/datasheet-new/nvidia-a10-datasheet.pdf</ref> | ||
|Ampere | |Ampere | ||
|10240 | |10240 | ||
Line 511: | Line 548: | ||
| | | | ||
| | | | ||
|24 GB GDDR6 | |24 GB GDDR6 with ECC | ||
| | |||
|600 GB/s | |||
| | | | ||
| | | | ||
|150W | |150W | ||
Line 528: | Line 566: | ||
| | | | ||
|800 GB/s | |800 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 544: | Line 583: | ||
10GB | 10GB | ||
|1935GB/s | |1935GB/s | ||
| | |||
| | | | ||
|300W | |300W | ||
Line 560: | Line 600: | ||
5GB | 5GB | ||
|1555 GB/s | |1555 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 574: | Line 615: | ||
| | | | ||
|2050 GB/s | |2050 GB/s | ||
| | |||
| | | | ||
|400W | |400W | ||
Line 588: | Line 630: | ||
| | | | ||
|2050 GB/s | |2050 GB/s | ||
| | |||
| | | | ||
|400W | |400W | ||
|Nov 2021 | |Nov 2021 | ||
|- | |- | ||
| | |RTX A4500<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/rtx/nvidia-rtx-a4500-datasheet.pdf</ref> | ||
|Ampere | |||
|7,16 | |||
|224 | |||
|56 | |||
|2Way (2slots or 3slots) | |||
112.5 GB/s (bidirectional) | |||
|PCI Express Gen 4 x 16 | |||
|20 GB GDDR6 with ECC | |||
| | |||
|640 GB/s | |||
|Active | |||
|1x 8-pin PCI | |||
|200W | |||
|2023 | |||
|- | |||
|Quadro RTX 6000 Ada<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/proviz-print-rtx6000-datasheet-web-2504660.pdf</ref> | |||
|Ada Lovelace | |Ada Lovelace | ||
|18176 | |18176 | ||
|568 | |fourth-generation 568 | ||
|142 | |third-generation | ||
| | 142 | ||
| | |No NVLink | ||
VR ready | |||
vGPU ready | |||
|PCIe 4.0 x16 | |||
|48GB GDDR6 with ECC | |48GB GDDR6 with ECC | ||
| | | | ||
|960 GB/s | |960 GB/s | ||
| | |Active | ||
|1x PCIe CEM5 16-pin | |||
|300 W | |300 W | ||
|Jan 2023 | |Jan 2023 | ||
Line 616: | Line 679: | ||
| | | | ||
|768 GB/s | |768 GB/s | ||
| | |||
| | | | ||
|300 W | |300 W | ||
| | | | ||
|- | |- | ||
|A5000 | |A5000<ref>https://nvdam.widen.net/s/wrqrqt75vh/nvidia-rtx-a5000-datasheet</ref> | ||
|Ampere | |Ampere | ||
|8192 | |8192 | ||
|256 | |256 | ||
| | |64 | ||
| | |112.5 GB/s (bidirectional | ||
| | |PCIe 4.0 x16 | ||
|24 GB GDDR6 | |24 GB GDDR6 with ECC | ||
| | | | ||
|768 GB/s | |768 GB/s | ||
| | | | ||
|1x 8-pin PCIe | |||
|230W | |230W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |||
|'''''RTX 5000 Ada'''''<ref>https://www.nvidia.com/en-us/design-visualization/rtx-5000/</ref><ref>https://resources.nvidia.com/en-us-design-viz-stories-ep/rtx-5000-ada-datasheet?lx=CCKW39&contentType=data-sheet</ref> | |||
|Ada Lovelace | |||
|12800 | |||
|400 | |||
|100 | |||
| | |||
|PCIe 4.0 x16 | |||
|32GB GDDR6 with ECC | |||
| | |||
|576GB/s | |||
| | |||
| | |||
|250W | |||
| | |||
|- | |||
|NVIDIA RTX A5500<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/rtx-a5500/nvidia-rtx-a5500-datasheet.pdf</ref> | |||
|Ampere | |||
|10,240 | |||
|320 (3rd Gen) | |||
|80 (2nd Gen) | |||
|Low profile bridges connect two | |||
NVIDIA RTX A5500 GPUs, | |||
112.5 GB/s (bidirectional | |||
|PCI Express 4.0 x16 | |||
|24 GB GDDR6 with ECC | |||
| | |||
|768 GB/s | |||
| | |||
|1x 8-pin PCIe | |||
|230 W | |||
| | |||
|- | |- | ||
|A4000<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs21/rtx-a4000/nvidia-rtx-a4000-datasheet.pdf</ref> | |A4000<ref>https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs21/rtx-a4000/nvidia-rtx-a4000-datasheet.pdf</ref> | ||
Line 644: | Line 742: | ||
| | | | ||
|512 GB/s | |512 GB/s | ||
| | |||
| | | | ||
|140W | |140W | ||
|Apr 2021 | |Apr 2021 | ||
|- | |||
|NVIDIA RTX 4000 SFF Ada Generation | |||
|Ada Lovelace | |||
|6,14 | |||
|192 (4th Gen) | |||
|48 (3rd Gen) | |||
| | |||
| | |||
|20 GB | |||
| | |||
|320 GB/ | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
|A3000 | |A3000 | ||
Line 656: | Line 770: | ||
| | | | ||
|24 GB G | |24 GB G | ||
| | |||
| | | | ||
| | | | ||
Line 672: | Line 787: | ||
| | | | ||
|672 GB/s | |672 GB/s | ||
| | |||
| | | | ||
|280W | |280W | ||
|Dec 2018 | |Dec 2018 | ||
|- | |||
|GeForce RTX 5090 | |||
|Blackwell | |||
|21,760 | |||
|318(4th Gen) | |||
|3,352 (5th Gen) | |||
| | |||
|PCIe 5.0 16x | |||
|32GB GDDR7 | |||
| | |||
|1,792GB/s | |||
| | |||
|1x 16pin | |||
|575W | |||
|'''Jan, 30, 2025''' | |||
|- | |- | ||
|GeForce RTX 4090 | |GeForce RTX 4090 | ||
Line 686: | Line 817: | ||
| | | | ||
|21.2Gbps | |21.2Gbps | ||
| | |||
| | | | ||
|450W | |450W | ||
Line 700: | Line 832: | ||
| | | | ||
|21.2Gbps | |21.2Gbps | ||
| | |||
| | | | ||
|450W | |450W | ||
Line 714: | Line 847: | ||
| | | | ||
|936 GB/s | |936 GB/s | ||
| | |||
| | | | ||
|350W | |350W | ||
Line 728: | Line 862: | ||
| | | | ||
|912 GB/s | |912 GB/s | ||
| | |||
| | | | ||
|350W | |350W | ||
Line 742: | Line 877: | ||
| | | | ||
|760 GB/s | |760 GB/s | ||
| | |||
| | | | ||
|320W | |320W | ||
Line 756: | Line 892: | ||
| | | | ||
|608 GB/s | |608 GB/s | ||
| | |||
| | | | ||
|290W | |290W | ||
Line 770: | Line 907: | ||
| | | | ||
|448 GB/s | |448 GB/s | ||
| | |||
| | | | ||
|220W | |220W | ||
Line 784: | Line 922: | ||
| | | | ||
|448 GB/s | |448 GB/s | ||
| | |||
| | | | ||
|200W | |200W | ||
Line 798: | Line 937: | ||
| | | | ||
|360 GB/s | |360 GB/s | ||
| | |||
| | | | ||
|170W | |170W | ||
Line 812: | Line 952: | ||
| | | | ||
|624 GB/s | |624 GB/s | ||
| | |||
| | | | ||
|295W | |295W | ||
Line 826: | Line 967: | ||
| | | | ||
|432 GB/s | |432 GB/s | ||
| | |||
| | | | ||
|260W | |260W | ||
Line 836: | Line 978: | ||
|3rd Gen | |3rd Gen | ||
142 | 142 | ||
| | |No | ||
|PCIe | |PCIe Gen4 x16: 64GB/s bidirectional | ||
|48GB GDDR6 with ECC | |48GB GDDR6 with ECC | ||
| | |No | ||
|864GB/s | |864GB/s | ||
| | |Passive | ||
|16-pin | |||
|300W | |300W | ||
|Jan 2023 | |Jan 2023 | ||
|- | |||
|Tesla L40S<ref>https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413</ref> | |||
|Ada Lovelace | |||
|18176 | |||
|4th Gen 568 | |||
|3rd Gen | |||
142 | |||
|No | |||
|PCIe Gen4 x16: 64GB/s bidirectional | |||
|48GB GDDR6 with ECC | |||
|No | |||
|864GB/s | |||
|Passive | |||
|16-pin | |||
|350W | |||
|2023 | |||
|- | |- | ||
|Quadro RTX 5000 | |Quadro RTX 5000 | ||
Line 855: | Line 1,014: | ||
| | | | ||
|448 GB/s | |448 GB/s | ||
| | |||
| | | | ||
|230W | |230W | ||
Line 869: | Line 1,029: | ||
| | | | ||
|416 GB/s | |416 GB/s | ||
| | |||
| | | | ||
|160W | |160W | ||
Line 883: | Line 1,044: | ||
| | | | ||
|672 Gb/s | |672 Gb/s | ||
| | |||
| | | | ||
|280 W | |280 W | ||
Line 897: | Line 1,059: | ||
| | | | ||
|652.8 GB/s | |652.8 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 911: | Line 1,074: | ||
| | | | ||
|900 GB/s | |900 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 925: | Line 1,089: | ||
| | | | ||
|900 GB/s | |900 GB/s | ||
| | |||
| | | | ||
|300W | |300W | ||
Line 939: | Line 1,104: | ||
| | | | ||
|870 GB/s | |870 GB/s | ||
| | |||
| | | | ||
|250W | |250W | ||
Line 953: | Line 1,119: | ||
| | | | ||
|900 GB/s | |900 GB/s | ||
| | |||
| | | | ||
|300W | |300W |
Latest revision as of 07:52, 14 January 2025
HPCMATE provides all level of GPU model as air-cooling or liquid-cooling version for any type of server or workstation.
Nvidia GPU tips and tricks
- TRX - RTX stands for real-time ray tracing
- GeForce vs Quadro - In general, GeForce cards have less computing power than Quadro cards. Quadro cards simply have more computation and memory than GeForce cards. So if we are training a neural network, rendering an animated film, or running exhaustive CAD/CAE applications , that extra memory is extra welcome
GPU Tenser performance notes for RTX 4090
According to this thread NVIDIA looks cut the tensor FP16 & TF32 operation rate in half, resulting in a 4090 with even lower FP16 & TF32 performance than the 4080 16GB. This may have been done to prevent the 4090 from cannibalizing the Quadro/Tesla sales. So if you are choosing GPUs, you can choose the 4090 for memory, but lower tensor performance than the 4080 16GB. eventhough 4090 has more than twice the ray tracing performance of the 4080 12GB.
RTX 4090 | RTX 4080 16GB | RTX 4080 12GB | RTX 3090 Ti | |
---|---|---|---|---|
non-tensor FP32 tflops | 82.6 (206%) | 48.7 (122%) | 40.1 (100%) | 40 (100%) |
non-tensor FP16 tflops | 82.6 (206%) | 48.7 (122%) | 40.1 (100%) | 40 (100%) |
Tensor Cores | 512 (152%) | 304 (90%) | 240 (71%) | 336 (100%) |
Optical flow TOPS | 305 (242%) | 305 (242%) | 305 (242%) | 126 (100%) |
tensor FP16 w/ FP32 accumulate TFLOPS ** | 165.2 (207%) | 194.9 (244%) | 160.4 (200%) | 80 (100%) |
tensor TF32 TFLOPS ** | 82.6 (207%) | 97.5 (244%) | 80.2 (200%) | 40 (100%) |
Ray trace Cores | 128 (152%) | 76 (90%) | 60 (71%) | 84 (100%) |
Ray trace TFLOPS | 191 (245%) | 112.7 (144%) | 92.7 (119%) | 78.1 (100%) |
POWER (W) | 450 (100%) | 320 (71%) | 285 (63%) | 450 (100%) |
NVIDIA GPU Architecture
nvcc sm flags and what they’re used for: When compiling with NVCC[1],
- the arch flag (‘
-arch
‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for. - Gencodes (‘
-gencode
‘) allows for more PTX generations and can be repeated many times for different architectures.
Matching CUDA arch and CUDA gencode for various NVIDIA architectures
Series | Architecture
(--arch) |
CUDA gencode
(--sm) |
Compute Capability | Notable Models | Supported CUDA version | Key Features |
---|---|---|---|---|---|---|
Tesla | Tesla | 1.0, 1.1, 2.0, 2.1 | C1060, M2050, K80, P100, V100, A100 | First dedicated GPGPU series | ||
Fermi | Fermi | sm_20 | 3.0, 3.1 | GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000 | CUDA 3.2 until CUDA 8 | First to feature CUDA cores and support for ECC memory
|
Kepler | Kepler | sm_30
sm_35, sm_37 |
3.2, 3.5, 3.7 | GTX 600, GTX 700, Tesla K-series, Quadro K-series | CUDA 5 until CUDA 10 | First to feature Dynamic Parallelism and Hyper-Q
|
Maxwell | Maxwell | sm_50,
sm_52, sm_53 |
5.0, 5.2 | GTX 900, GTX 1000, Quadro M-series | CUDA 6 until CUDA 11 | First to support VR and 4K displays
|
Pascal | Pascal | sm_60,
sm_61, sm_62 |
6.0, 6.1, 6.2 | GTX 1000, Quadro P-series | CUDA 8 and later | First to support simultaneous multi-projection
|
Volta | Volta | sm_70,
sm_72 (Xavier) |
7.0, 7.2, 7.5 | Titan V, Tesla V100, Quadro GV100 | CUDA 9 and later | First to feature Tensor Cores and NVLink 2.0
|
Turing | Turing | sm_75 | 7.5, 7.6 | RTX 2000, GTX 1600, Quadro RTX | CUDA 10 and later | First to feature Ray Tracing Cores and RTX technology
|
Ampere | Ampere | sm_80,
sm_86, sm_87 (Orin) |
8.0, 8.6 | RTX 3000, A-series | CUDA 11.1 and later | Features third-generation Tensor Cores and more
|
Lovelace | Ada Lovelace[2] | sm_89 | 8.9 | GeForce RTX 4070 Ti (AD104)
GeForce RTX 4080 (AD103) GeForce RTX 4090 (AD102) Nvidia RTX 6000 Ada Generation (AD102, formerly Quadro) Nvidia L40 (AD102, formerly Tesla) |
CUDA 11.8 and later
cuDNN 8.6 and later |
|
Hopper[3] | Hopper | sm_90, sm_90a(Thor) | 9.0 | CUDA 12 and later | TODO
|
NVIDIA GPU Models[4] [5]
Model | Architecture | CUDA Cores | Tensor Cores | RT Cores | NVLink | FF | Memory Size | MIG[6] | Memory Bandwidth | Thermal | Power connector | TDP | Launch Date |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H100-SXM5 | Hopper
(GH100) |
16896 | 4th Gen
528 |
No | SXM5 | 80GB HBM3
50 MB L2 cache |
7@10GB | 3.35TB/s | 700W | Jan 2023 | |||
H100-PCIE[7][8][9] | Hopper
(GH100) |
14592 | 4th Gen 456 | No | NVLink
(600GB PCIe, 128GB Gen5) |
PCIe
Gen 5 x16 |
80 GB HBM2
50 MB L2 cache |
7@10GB | 2TB/s | Passive | 300~350W
(configurable) |
Jan 2023 | |
H100 NVL[10] [11] | Hopper
(P1010 SKU 210) |
2 H100 PCIe boards that come already bridged together[12] | NVLink
(600GB PCIe, 128GB Gen5) |
2 x PCIe
Gen 5 x16 |
94 GB HBM3 | 14@12GB | Passive | One PCIe 16-pin auxiliary power connector (12v-2x6 auxiliary power
connector) |
2x 300~450W (configurable) | Sept 2023 | |||
Tesla C1060 | Tesla | 240 | No | No | 4 GB GDDR3 | 102 GB/s | 238W | Dec 2008 | |||||
Tesla K10 | Kepler | 3072 | No | No | 8 GB GDDR5 | 320 GB/s | 225W | May 2012 | |||||
Tesla K20 | Kepler | 2496 | No | No | 5/6 GB GDDR5 | 208 GB/s | 225W | Nov 2012 | |||||
Tesla K40 | Kepler | 2880 | No | No | 12 GB GDDR5 | 288 GB/s | 235W | Nov 2013 | |||||
Tesla K80 | Kepler | 4992 | No | No | 24 GB GDDR5 | 480 GB/s | 300W | Nov 2014 | |||||
Tesla M40 | Maxwell | 3072 | No | No | 12 GB GDDR5 | 288 GB/s | 250W | Nov 2015 | |||||
Tesla P4 | Pascal | 2560 | No | No | 8 GB GDDR5 | 192 GB/s | 75W | Sep 2016 | |||||
Tesla P40 | Pascal | 3840 | No | No | 24 GB GDDR5X | 480 GB/s | 250W | Sep 2016 | |||||
Tesla P100[13][14] | Pascal | 3584 | 16GB CoWoS HBM2 at
732 GB/s or 12GB CoWoS HBM2 at 549 GB/ |
732.2GB/s
PCIe 3.0 × 16 |
Passive | 250 W | |||||||
Tesla V100 | Volta | 5120 | 640 | Yes | 16/32 GB HBM2 | 900 GB/s | 300W | May 2017 | |||||
Tesla T4 | Turing | 2560 | 320 | No | 16 GB | ||||||||
A100 PCIe | Ampere (GA100) | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | 1555 GB/s | 250W | May 2020 | |||||
A100 SXM4 | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | 7 | 1555 GB/s | 400W | May 2020 | ||||
A30[15] | Ampere | 7424 | 184 | No | 24GB HBM2 | 4 MIGs @ 6GB each
2 MIGs @ 12GB each 1 MIGs @ 24G |
933GB/ | 165W | Apr 2021 | ||||
A40[16] | Ampere | 10752 | 336 | 84 | NVIDIA® NVLink® 112.5 GB/s
(bidirectional)3 PCIe Gen4: 64GB/s |
PCI
4.4" (H) x 10.5" (L) dual sl, Passive |
48 GB GDDR6 with ECC | 696 GB/s | 300W | Apr 2021 | |||
A10[17] | Ampere | 10240 | 320 | No | 24 GB GDDR6 with ECC | 600 GB/s | 150W | Mar 2021 | |||||
A16[18] | Ampere | 5120 | 3rd Gen 160 | 40 | PCIe Gen4 x16 | 64 GB GDDR6 | 800 GB/s | 250W | Mar 2021 | ||||
A100 80GB | Ampere
(GA100) |
6912 | 432 | - | 80 GB HBM2e | 7@
10GB |
1935GB/s | 300W | Apr 2021 | ||||
A100 40GB | Ampere
(GA100) |
6912 | 432 | Yes | 40 GB HBM2 | 7@
5GB |
1555 GB/s | 250W | May 2020 | ||||
A200 PCIe | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | 2050 GB/s | 400W | Nov 2021 | |||||
A200 SXM4 | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | 2050 GB/s | 400W | Nov 2021 | |||||
RTX A4500[19] | Ampere | 7,16 | 224 | 56 | 2Way (2slots or 3slots)
112.5 GB/s (bidirectional) |
PCI Express Gen 4 x 16 | 20 GB GDDR6 with ECC | 640 GB/s | Active | 1x 8-pin PCI | 200W | 2023 | |
Quadro RTX 6000 Ada[20] | Ada Lovelace | 18176 | fourth-generation 568 | third-generation
142 |
No NVLink
VR ready vGPU ready |
PCIe 4.0 x16 | 48GB GDDR6 with ECC | 960 GB/s | Active | 1x PCIe CEM5 16-pin | 300 W | Jan 2023 | |
A6000[21] | Ampere | 10752 | 336 | 84 | 48 GB GDDR6 | 768 GB/s | 300 W | ||||||
A5000[22] | Ampere | 8192 | 256 | 64 | 112.5 GB/s (bidirectional | PCIe 4.0 x16 | 24 GB GDDR6 with ECC | 768 GB/s | 1x 8-pin PCIe | 230W | Apr 2021 | ||
RTX 5000 Ada[23][24] | Ada Lovelace | 12800 | 400 | 100 | PCIe 4.0 x16 | 32GB GDDR6 with ECC | 576GB/s | 250W | |||||
NVIDIA RTX A5500[25] | Ampere | 10,240 | 320 (3rd Gen) | 80 (2nd Gen) | Low profile bridges connect two
NVIDIA RTX A5500 GPUs, 112.5 GB/s (bidirectional |
PCI Express 4.0 x16 | 24 GB GDDR6 with ECC | 768 GB/s | 1x 8-pin PCIe | 230 W | |||
A4000[26] | Ampere | 6144 | 192 | Yes | 16 GB GDDR6 | 512 GB/s | 140W | Apr 2021 | |||||
NVIDIA RTX 4000 SFF Ada Generation | Ada Lovelace | 6,14 | 192 (4th Gen) | 48 (3rd Gen) | 20 GB | 320 GB/ | |||||||
A3000 | Ampere | 3584 | 112 | Yes | 24 GB G | ||||||||
Titan RTX | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | 672 GB/s | 280W | Dec 2018 | |||||
GeForce RTX 5090 | Blackwell | 21,760 | 318(4th Gen) | 3,352 (5th Gen) | PCIe 5.0 16x | 32GB GDDR7 | 1,792GB/s | 1x 16pin | 575W | Jan, 30, 2025 | |||
GeForce RTX 4090 | Ada Lovelace | 16384 | 512 | Yes, 128 | 24 GB GDDR6X | 21.2Gbps | 450W | ||||||
GeForce RTX 3090 Ti | Turing | 10752 | 336 | 84 | 24 GB GDDR6X | 21.2Gbps | 450W | ||||||
GeForce RTX 3090 | Turing | 10496 | 328 | Yes | 24 GB GDDR6X | 936 GB/s | 350W | Sep 2020 | |||||
GeForce RTX 3080 Ti | Turing | 10240 | 320 | Yes | 12 GB GDDR6X | 912 GB/s | 350W | May 2021 | |||||
GeForce RTX 3080 | Turing | 8704 | 272 | Yes | 10 GB GDDR6X | 760 GB/s | 320W | Sep 2020 | |||||
GeForce RTX 3070 Ti | Turing | 6144 | 192 | Yes | 8 GB GDDR6X | 608 GB/s | 290W | Jun 2021 | |||||
GeForce RTX 3070 | Turing | 5888 | 184 | Yes | 8 GB GDDR6 | 448 GB/s | 220W | Oct 2020 | |||||
GeForce RTX 3060 Ti | Turing | 4864 | 152 | Yes | 8 GB GDDR6 | 448 GB/s | 200W | Dec 2020 | |||||
GeForce RTX 3060 | Turing | 3584 | 112 | No | 12 GB GDDR6 | 360 GB/s | 170W | Feb 2021 | |||||
Quadro RTX 8000 | Turing | 4608 | 576 | Yes | 48 GB GDDR6 | 624 GB/s | 295W | Aug 2018 | |||||
Quadro RTX 6000 | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | 432 GB/s | 260W | Aug 2018 | |||||
Tesla L40[27] | Ada Lovelace | 18176 | 4th Gen 568 | 3rd Gen
142 |
No | PCIe Gen4 x16: 64GB/s bidirectional | 48GB GDDR6 with ECC | No | 864GB/s | Passive | 16-pin | 300W | Jan 2023 |
Tesla L40S[28] | Ada Lovelace | 18176 | 4th Gen 568 | 3rd Gen
142 |
No | PCIe Gen4 x16: 64GB/s bidirectional | 48GB GDDR6 with ECC | No | 864GB/s | Passive | 16-pin | 350W | 2023 |
Quadro RTX 5000 | Turing | 3072 | 384 | Yes | 16 GB GDDR6 | 448 GB/s | 230W | Nov 2018 | |||||
Quadro RTX 4000 | Turing | 2304 | 288 | Yes | 8 GB GDDR6 | 416 GB/s | 160W | Nov 2018 | |||||
Titan RTX (T-Rex) | Turing | 4608 | 576 | No | 24 GB | 672 Gb/s | 280 W | ||||||
Titan V | Volta | 5120 | 640 | 12 GB HBM2 | 652.8 GB/s | 250W | Dec 2017 | ||||||
Tesla V100 (PCIe) | Volta | 5120 | 640 | No | 32/16 GB HBM2 | 900 GB/s | 250W | June 2017 | |||||
Tesla V100 (SXM2) | Volta | 5120 | 640 | No | 32/16 GB HBM2 | 900 GB/s | 300W | June 2017 | |||||
Quadro GV100 | Volta | 5120 | 640 | No | 32 GB HBM2 | 870 GB/s | 250W | Mar 2018 | |||||
Tesla GV100 (SXM2) | Volta | 5120 | 640 | No | 32 GB HBM2 | 900 GB/s | 300W | Mar 2018 |
NVIDIA Features by Architecture[29]
NVIDIA GPU Architectures | |||||||
---|---|---|---|---|---|---|---|
AD102 | GA102 | GA100 | TU102 | GV100 | GP102 | GP100 | |
Launch Year | 2022 | 2020 | 2020 | 2018 | 2017 | 2017 | – |
Architecture | Ada Lovelace | Ampere | Ampere | Turing | Volta | Pascal | Pascal |
Form Factor | – | – | SXM4/PCIe | – | SXM2/PCIe | – | SXM/PCIe |
TDP | – | – | 400W | – | 300W | – | 300W |
Node | TSMC 4N | SAMSUNG 8N | – | TSMC 12nm | TSMC 12nm | TSMC 16nm | – |
CUDA Cores | 18432 | 10752 | – | 4608 | 5120 | 3840 | – |
Tensor Cores | 576 Gen4 | 336 Gen3 | – | 576 Gen2 | 640 | – | – |
RT Cores | 144 Gen3 | 84 Gen2 | – | 72 Gen1 | – | – | – |
Memory Bus | GDDR6X 384-bit | GDDR6X 384-bit | – | GDDR6 384-bit | HBM2 3072-bit | GDDR6X 384-bit | – |
NVIDIA Grace Architecture
NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023, by then HPCMATE starts to offer those products through local and global partners.
Architecture | Key Features |
---|---|
Grace | CPU-GPU integration, ARM Neoverse CPU, HBM2E memory |
900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink | |
10x performance improvement for certain HPC workloads | |
Energy efficiency improvements through unified memory space |
Reference
- ↑ https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
- ↑ https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture)
- ↑ https://www.nvidia.com/en-us/data-center/h100/
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/design-visualization/quadro-product-literature/rtx-6000-l40-linecard-nvidia-us-2653097-r7-web.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/rtx/quadro-ampere-linecard-us-nvidia.pdf
- ↑ https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/data-center/h100/PB-11133-001_v01.pdf
- ↑ https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet
- ↑ https://www.aspsys.com/wp-content/uploads/2023/09/nvidia-h100-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/h100/PB-11773-001_v01.pdf
- ↑ https://www.aspsys.com/wp-content/uploads/2023/09/nvidia-h100-datasheet.pdf
- ↑ https://www.anandtech.com/show/18780/nvidia-announces-h100-nvl-max-memory-server-card-for-large-language-models
- ↑ https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/solutions/resources/documents1/NV-tesla-p100-pcie-PB-08248-001-v01.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/data-center/products/a30-gpu/pdf/a30-datasheet.pdf
- ↑ https://images.nvidia.com/content/Solutions/data-center/a40/nvidia-a40-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a10/pdf/datasheet-new/nvidia-a10-datasheet.pdf
- ↑ https://images.nvidia.com/content/Solutions/data-center/vgpu-a16-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/rtx/nvidia-rtx-a4500-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/proviz-print-rtx6000-datasheet-web-2504660.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-product-literature/proviz-print-nvidia-rtx-a6000-datasheet-us-nvidia-1454980-r9-web%20(1).pdf
- ↑ https://nvdam.widen.net/s/wrqrqt75vh/nvidia-rtx-a5000-datasheet
- ↑ https://www.nvidia.com/en-us/design-visualization/rtx-5000/
- ↑ https://resources.nvidia.com/en-us-design-viz-stories-ep/rtx-5000-ada-datasheet?lx=CCKW39&contentType=data-sheet
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/rtx-a5500/nvidia-rtx-a5500-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs21/rtx-a4000/nvidia-rtx-a4000-datasheet.pdf
- ↑ https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/support-guide/NVIDIA-L40-Datasheet-January-2023.pdf
- ↑ https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413
- ↑ https://videocardz.com/newz/nvidia-details-ad102-gpu-up-to-18432-cuda-cores-76-3b-transistors-and-608-mm%C2%B2