NVIDIA GPU: Difference between revisions
No edit summary |
|||
Line 389: | Line 389: | ||
|GDDR6 | |GDDR6 | ||
|768 GB/s | |768 GB/s | ||
| | |250W | ||
|Mar 2021 | |Mar 2021 | ||
|- | |- |
Revision as of 11:58, 21 March 2023
NVIDIA GPU Architecture
nvcc sm flags and what they’re used for: When compiling with NVCC[1],
- the arch flag (‘
-arch
‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for. - Gencodes (‘
-gencode
‘) allows for more PTX generations and can be repeated many times for different architectures.
Matching CUDA arch and CUDA gencode for various NVIDIA architectures
Series | Architecture
(--arch) |
CUDA gencode
(--sm) |
Compute Capability | Notable Models | Supported CUDA version | Key Features |
---|---|---|---|---|---|---|
Tesla | Tesla | 1.0, 1.1, 2.0, 2.1 | C1060, M2050, K80, P100, V100, A100 | First dedicated GPGPU series | ||
Fermi | Fermi | sm_20 | 3.0, 3.1 | GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000 | CUDA 3.2 until CUDA 8 | First to feature CUDA cores and support for ECC memory
|
Kepler | Kepler | sm_30
sm_35, sm_37 |
3.2, 3.5, 3.7 | GTX 600, GTX 700, Tesla K-series, Quadro K-series | CUDA 5 until CUDA 10 | First to feature Dynamic Parallelism and Hyper-Q
|
Maxwell | Maxwell | sm_50,
sm_52, sm_53 |
5.0, 5.2 | GTX 900, GTX 1000, Quadro M-series | CUDA 6 until CUDA 11 | First to support VR and 4K displays
|
Pascal | Pascal | sm_60,
sm_61, sm_62 |
6.0, 6.1, 6.2 | GTX 1000, Quadro P-series | CUDA 8 and later | First to support simultaneous multi-projection
|
Volta | Volta | sm_70,
sm_72 (Xavier) |
7.0, 7.2, 7.5 | Titan V, Tesla V100, Quadro GV100 | CUDA 9 and later | First to feature Tensor Cores and NVLink 2.0
|
Turing | Turing | sm_75 | 7.5, 7.6 | RTX 2000, GTX 1600, Quadro RTX | CUDA 10 and later | First to feature Ray Tracing Cores and RTX technology
|
Ampere | Ampere | sm_80,
sm_86, sm_87 (Orin) |
8.0, 8.6 | RTX 3000, A-series | CUDA 11.1 and later | Features third-generation Tensor Cores and more
|
Lovelace | Ada Lovelace[2] | sm_89 | 8.7, 8.9 | GeForce RTX 4070 Ti (AD104)
GeForce RTX 4080 (AD103) GeForce RTX 4090 (AD102) Nvidia RTX 6000 Ada Generation (AD102, formerly Quadro) Nvidia L40 (AD102, formerly Tesla) |
CUDA 11.8 and later |
|
Hopper | Hopper | sm_90, sm_90a(Thor) | 9.0 | CUDA 12 and later | TODO
|
NVIDIA GPU Models
Model | Architecture | CUDA Cores | Tensor Cores | RT Cores | Memory Size | Memory Type | Memory Bandwidth | TDP | Launch Date |
---|---|---|---|---|---|---|---|---|---|
Tesla C870 | Tesla | 128 | No | No | 1.5 GB GDDR3 | GDDR3 | 76.8 GB/s | 105W | Jun 2006 |
Tesla C1060 | Tesla | 240 | No | No | 4 GB GDDR3 | GDDR3 | 102 GB/s | 238W | Dec 2008 |
Tesla M1060 | Tesla | 240 | No | No | 4 GB GDDR3 | GDDR3 | 102 GB/s | 225W | Dec 2008 |
Tesla M2050 | Fermi | 448 | No | No | 3 GB GDDR5 | GDDR5 | 148 GB/s | 225W | May 2010 |
Tesla M2070 | Fermi | 448 | No | No | 6 GB GDDR5 | GDDR5 | 150 GB/s | 225W | May 2010 |
Tesla K10 | Kepler | 3072 | No | No | 8 GB GDDR5 | GDDR5 | 320 GB/s | 225W | May 2012 |
Tesla K20 | Kepler | 2496 | No | No | 5/6 GB GDDR5 | GDDR5 | 208 GB/s | 225W | Nov 2012 |
Tesla K40 | Kepler | 2880 | No | No | 12 GB GDDR5 | GDDR5 | 288 GB/s | 235W | Nov 2013 |
Tesla K80 | Kepler | 4992 | No | No | 24 GB GDDR5 | GDDR5 | 480 GB/s | 300W | Nov 2014 |
Tesla M40 | Maxwell | 3072 | No | No | 12 GB GDDR5 | GDDR5 | 288 GB/s | 250W | Nov 2015 |
Tesla P4 | Pascal | 2560 | No | No | 8 GB GDDR5 | GDDR5 | 192 GB/s | 75W | Sep 2016 |
Tesla P40 | Pascal | 3840 | No | No | 24 GB GDDR5X | GDDR5X | 480 GB/s | 250W | Sep 2016 |
Tesla V100 | Volta | 5120 | 640 | Yes | 16/32 GB HBM2 | HBM2 | 900 GB/s | 300W | May 2017 |
Tesla T4 | Turing | 2560 | 320 | No | 16 GB | ||||
A100 PCIe | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | HBM2 | 1555 GB/s | 250W | May 2020 |
A100 SXM4 | Ampere | 6912 | 432 | Yes | 40 GB HBM2 / 80 GB HBM2 | HBM2 | 1555 GB/s | 400W | May 2020 |
A30 | Ampere | 7424 | 184 | No | 24 GB GDDR6 | GDDR6 | 696 GB/s | 165W | Apr 2021 |
A40 | Ampere | 10752 | 336 | No | 48 GB GDDR6 | GDDR6 | 696 GB/s | 300W | Apr 2021 |
A10 | Ampere | 10240 | 320 | No | 24 GB GDDR6 | GDDR6 | 624 GB/s | 150W | Mar 2021 |
A16 | Ampere | 16384 | 512 | No | 48 GB GDDR6 | GDDR6 | 768 GB/s | 250W | Mar 2021 |
A100 80GB | Ampere | 6912 | 432 | Yes | 80 GB HBM2 | HBM2 | 2025 GB/s | 400W | Apr 2021 |
A100 40GB | Ampere | 6912 | 432 | Yes | 40 GB HBM2 | HBM2 | 1555 GB/s | 250W | May 2020 |
A200 PCIe | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | HBM2 | 2050 GB/s | 400W | Nov 2021 |
A200 SXM4 | Ampere | 10752 | 672 | Yes | 80 GB HBM2 / 160 GB HBM2 | HBM2 | 2050 GB/s | 400W | Nov 2021 |
A5000 | Ampere | 8192 | 256 | Yes | 24 GB GDDR6 | GDDR6 | 768 GB/s | 230W | Apr 2021 |
A4000 | Ampere | 6144 | 192 | Yes | 16 GB GDDR6 | GDDR6 | 512 GB/s | 140W | Apr 2021 |
A3000 | Ampere | 3584 | 112 | Yes | 24 GB G | ||||
Titan RTX | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | GDDR6 | 672 GB/s | 280W | Dec 2018 |
GeForce RTX 3090 | Turing | 10496 | 328 | Yes | 24 GB GDDR6X | GDDR6X | 936 GB/s | 350W | Sep 2020 |
GeForce RTX 3080 Ti | Turing | 10240 | 320 | Yes | 12 GB GDDR6X | GDDR6X | 912 GB/s | 350W | May 2021 |
GeForce RTX 3080 | Turing | 8704 | 272 | Yes | 10 GB GDDR6X | GDDR6X | 760 GB/s | 320W | Sep 2020 |
GeForce RTX 3070 Ti | Turing | 6144 | 192 | Yes | 8 GB GDDR6X | GDDR6X | 608 GB/s | 290W | Jun 2021 |
GeForce RTX 3070 | Turing | 5888 | 184 | Yes | 8 GB GDDR6 | GDDR6 | 448 GB/s | 220W | Oct 2020 |
GeForce RTX 3060 Ti | Turing | 4864 | 152 | Yes | 8 GB GDDR6 | GDDR6 | 448 GB/s | 200W | Dec 2020 |
GeForce RTX 3060 | Turing | 3584 | 112 | No | 12 GB GDDR6 | GDDR6 | 360 GB/s | 170W | Feb 2021 |
Quadro RTX 8000 | Turing | 4608 | 576 | Yes | 48 GB GDDR6 | GDDR6 | 624 GB/s | 295W | Aug 2018 |
Quadro RTX 6000 | Turing | 4608 | 576 | Yes | 24 GB GDDR6 | GDDR6 | 432 GB/s | 260W | Aug 2018 |
Quadro RTX 5000 | Turing | 3072 | 384 | Yes | 16 GB GDDR6 | GDDR6 | 448 GB/s | 230W | Nov 2018 |
Quadro RTX 4000 | Turing | 2304 | 288 | Yes | 8 GB GDDR6 | GDDR6 | 416 GB/s | 160W | Nov 2018 |
Titan RTX (T-Rex) | Turing | 4608 | 576 | Yes | 24 GB | ||||
Titan V | Volta | 5120 | 640 | 12 GB HBM2 | HBM2 | 652.8 GB/s | 250W | Dec 2017 | |
Tesla V100 (PCIe) | Volta | 5120 | 640 | 16 GB HBM2 | HBM2 | 900 GB/s | 250W | June 2017 | |
Tesla V100 (SXM2) | Volta | 5120 | 640 | 16 GB HBM2 | HBM2 | 900 GB/s | 300W | June 2017 | |
Quadro GV100 | Volta | 5120 | 640 | 32 GB HBM2 | HBM2 | 870 GB/s | 250W | Mar 2018 | |
Tesla GV100 (SXM2) | Volta | 5120 | 640 | 32 GB HBM2 | HBM2 | 900 GB/s | 300W | Mar 2018 | |
DGX-1 (Volta) | Volta | 5120 | 640 | 16 x 32 GB HBM2 (512 GB total) | HBM2 | 2.7 TB/s | 3200W | Mar 2018 |
NVIDIA Features by Architecture[3]
NVIDIA Flagship Gaming GPUs | |||||
---|---|---|---|---|---|
VideoCardz.com | AD102 | GA102 | TU102 | GV100 | GP102 |
Launch Year | 2022 | 2020 | 2018 | 2017 | 2017 |
Architecture | Ada Lovelace | Ampere | Turing | Volta | Pascal |
Node | TSMC 4N | SAMSUNG 8N | TSMC 12nm | TSMC 12nm | TSMC 16nm |
Die Size | 608 mm² | 628 mm² | 754 mm² | 815 mm² | 471 mm² |
Transistors | 76.3B | 28.3B | 18.6B | 21.1B | 12.0B |
Trans. Density | 125.5M TRAN/mm2 | 45.1M TRAN/mm2 | 24.7M TRAN/mm2 | 25.9M TRAN/mm2 | 25.5M TRAN/mm2 |
CUDA Cores | 18432 | 10752 | 4608 | 5120 | 3840 |
Tensor Cores | 576 Gen4 | 336 Gen3 | 576 Gen2 | 640 | – |
RT Cores | 144 Gen3 | 84 Gen2 | 72 Gen1 | – | – |
Memory Bus | GDDR6X 384-bit | GDDR6X 384-bit | GDDR6 384-bit | HBM2 3072-bit | GDDR6X 384-bit |
NVIDIA Grace Architecture
NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023
Architecture | Key Features |
---|---|
Grace | CPU-GPU integration, ARM Neoverse CPU, HBM2E memory |
900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink | |
10x performance improvement for certain HPC workloads | |
Energy efficiency improvements through unified memory space |