NVIDIA GPU: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
(새 문서: NVIDIA Gpu Architecture {| class="wikitable sortable" !Series !Architecture !Notable Models !Key Features |- |Tesla |Tesla |C1060, M2050, K80, P100, V100, A100 |First dedicated GPGPU series |- |Fermi |Fermi |GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000 |First to feature CUDA cores and support for ECC memory |- |Kepler |Kepler |GTX 600, GTX 700, Tesla K-series, Quadro K-series |First to feature Dynamic Parallelism and Hyper-Q |- |Maxwell |Maxwell |GTX 900, GTX 1000, Quadro...)
 
No edit summary
Line 1: Line 1:
NVIDIA Gpu Architecture
=== NVIDIA GPU Architecture ===
{| class="wikitable sortable"
{| class="wikitable sortable"
!Series
!Series
Line 46: Line 46:
|Features third-generation Tensor Cores and more
|Features third-generation Tensor Cores and more
|-
|-
|Grace
|
|
|
|
|
|
* CPU-GPU integration, ARM Neoverse CPU, HBM2E memory
|
* 900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink
|}
* 10x performance improvement for certain HPC workloads
 
* Energy efficiency improvements through unified memory space
=== NVIDIA GPU Models ===
{| class="wikitable sortable"
!Model
!Architecture
!CUDA Cores
!Tensor Cores
!RT Cores
!Memory Size
!Memory Type
!Memory Bandwidth
!TDP
!Launch Date
|-
|Tesla C870
|Tesla
|128
|No
|No
|1.5 GB GDDR3
|GDDR3
|76.8 GB/s
|105W
|Jun 2006
|-
|Tesla C1060
|Tesla
|240
|No
|No
|4 GB GDDR3
|GDDR3
|102 GB/s
|238W
|Dec 2008
|-
|Tesla M1060
|Tesla
|240
|No
|No
|4 GB GDDR3
|GDDR3
|102 GB/s
|225W
|Dec 2008
|-
|Tesla M2050
|Fermi
|448
|No
|No
|3 GB GDDR5
|GDDR5
|148 GB/s
|225W
|May 2010
|-
|Tesla M2070
|Fermi
|448
|No
|No
|6 GB GDDR5
|GDDR5
|150 GB/s
|225W
|May 2010
|-
|Tesla K10
|Kepler
|3072
|No
|No
|8 GB GDDR5
|GDDR5
|320 GB/s
|225W
|May 2012
|-
|Tesla K20
|Kepler
|2496
|No
|No
|5/6 GB GDDR5
|GDDR5
|208 GB/s
|225W
|Nov 2012
|-
|Tesla K40
|Kepler
|2880
|No
|No
|12 GB GDDR5
|GDDR5
|288 GB/s
|235W
|Nov 2013
|-
|Tesla K80
|Kepler
|4992
|No
|No
|24 GB GDDR5
|GDDR5
|480 GB/s
|300W
|Nov 2014
|-
|Tesla M40
|Maxwell
|3072
|No
|No
|12 GB GDDR5
|GDDR5
|288 GB/s
|250W
|Nov 2015
|-
|Tesla P4
|Pascal
|2560
|No
|No
|8 GB GDDR5
|GDDR5
|192 GB/s
|75W
|Sep 2016
|-
|Tesla P40
|Pascal
|3840
|No
|No
|24 GB GDDR5X
|GDDR5X
|480 GB/s
|250W
|Sep 2016
|-
|Tesla V100
|Volta
|5120
|640
|Yes
|16/32 GB HBM2
|HBM2
|900 GB/s
|300W
|May 2017
|-
|Tesla T4
|Turing
|2560
|320
|No
|16 GB
|
|
|
|
|-
|A100 PCIe
|Ampere
|6912
|432
|Yes
|40 GB HBM2 / 80 GB HBM2
|HBM2
|1555 GB/s
|250W
|May 2020
|-
|A100 SXM4
|Ampere
|6912
|432
|Yes
|40 GB HBM2 / 80 GB HBM2
|HBM2
|1555 GB/s
|400W
|May 2020
|-
|A30
|Ampere
|7424
|184
|No
|24 GB GDDR6
|GDDR6
|696 GB/s
|165W
|Apr 2021
|-
|A40
|Ampere
|10752
|336
|No
|48 GB GDDR6
|GDDR6
|696 GB/s
|300W
|Apr 2021
|-
|A10
|Ampere
|10240
|320
|No
|24 GB GDDR6
|GDDR6
|624 GB/s
|150W
|Mar 2021
|-
|A16
|Ampere
|16384
|512
|No
|48 GB GDDR6
|GDDR6
|768 GB/s
|400W
|Mar 2021
|-
|A100 80GB
|Ampere
|6912
|432
|Yes
|80 GB HBM2
|HBM2
|2025 GB/s
|400W
|Apr 2021
|-
|A100 40GB
|Ampere
|6912
|432
|Yes
|40 GB HBM2
|HBM2
|1555 GB/s
|250W
|May 2020
|-
|A200 PCIe
|Ampere
|10752
|672
|Yes
|80 GB HBM2 / 160 GB HBM2
|HBM2
|2050 GB/s
|400W
|Nov 2021
|-
|A200 SXM4
|Ampere
|10752
|672
|Yes
|80 GB HBM2 / 160 GB HBM2
|HBM2
|2050 GB/s
|400W
|Nov 2021
|-
|A5000
|Ampere
|8192
|256
|Yes
|24 GB GDDR6
|GDDR6
|768 GB/s
|230W
|Apr 2021
|-
|A4000
|Ampere
|6144
|192
|Yes
|16 GB GDDR6
|GDDR6
|512 GB/s
|140W
|Apr 2021
|-
|A3000
|Ampere
|3584
|112
|Yes
|24 GB G
|
|
|
|
|-
|Titan RTX
|Turing
|4608
|576
|Yes
|24 GB GDDR6
|GDDR6
|672 GB/s
|280W
|Dec 2018
|-
|GeForce RTX 3090
|Turing
|10496
|328
|Yes
|24 GB GDDR6X
|GDDR6X
|936 GB/s
|350W
|Sep 2020
|-
|GeForce RTX 3080 Ti
|Turing
|10240
|320
|Yes
|12 GB GDDR6X
|GDDR6X
|912 GB/s
|350W
|May 2021
|-
|GeForce RTX 3080
|Turing
|8704
|272
|Yes
|10 GB GDDR6X
|GDDR6X
|760 GB/s
|320W
|Sep 2020
|-
|GeForce RTX 3070 Ti
|Turing
|6144
|192
|Yes
|8 GB GDDR6X
|GDDR6X
|608 GB/s
|290W
|Jun 2021
|-
|GeForce RTX 3070
|Turing
|5888
|184
|Yes
|8 GB GDDR6
|GDDR6
|448 GB/s
|220W
|Oct 2020
|-
|GeForce RTX 3060 Ti
|Turing
|4864
|152
|Yes
|8 GB GDDR6
|GDDR6
|448 GB/s
|200W
|Dec 2020
|-
|GeForce RTX 3060
|Turing
|3584
|112
|No
|12 GB GDDR6
|GDDR6
|360 GB/s
|170W
|Feb 2021
|-
|Quadro RTX 8000
|Turing
|4608
|576
|Yes
|48 GB GDDR6
|GDDR6
|624 GB/s
|295W
|Aug 2018
|-
|Quadro RTX 6000
|Turing
|4608
|576
|Yes
|24 GB GDDR6
|GDDR6
|432 GB/s
|260W
|Aug 2018
|-
|Quadro RTX 5000
|Turing
|3072
|384
|Yes
|16 GB GDDR6
|GDDR6
|448 GB/s
|230W
|Nov 2018
|-
|Quadro RTX 4000
|Turing
|2304
|288
|Yes
|8 GB GDDR6
|GDDR6
|416 GB/s
|160W
|Nov 2018
|-
|Titan RTX (T-Rex)
|Turing
|4608
|576
|Yes
|24 GB
|
|
|
|
|-
|Titan V
|Volta
|5120
|640
|
|12 GB HBM2
|HBM2
|652.8 GB/s
|250W
|Dec 2017
|-
|Tesla V100 (PCIe)
|Volta
|5120
|640
|
|16 GB HBM2
|HBM2
|900 GB/s
|250W
|June 2017
|-
|Tesla V100 (SXM2)
|Volta
|5120
|640
|
|16 GB HBM2
|HBM2
|900 GB/s
|300W
|June 2017
|-
|Quadro GV100
|Volta
|5120
|640
|
|32 GB HBM2
|HBM2
|870 GB/s
|250W
|Mar 2018
|-
|Tesla GV100 (SXM2)
|Volta
|5120
|640
|
|32 GB HBM2
|HBM2
|900 GB/s
|300W
|Mar 2018
|-
|DGX-1 (Volta)
|Volta
|5120
|640
|
|16 x 32 GB HBM2 (512 GB total)
|HBM2
|2.7 TB/s
|3200W
|Mar 2018
|}
 
=== NVIDIA Grace Architecture ===
NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023
{| class="wikitable"
!Architecture
!Key Features
|-
| rowspan="4" |Grace
|CPU-GPU integration, ARM Neoverse CPU, HBM2E memory
|-
|900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink
|-
|10x performance improvement for certain HPC workloads
|-
|Energy efficiency improvements through unified memory space
|}
|}

Revision as of 09:36, 18 March 2023

NVIDIA GPU Architecture

Series Architecture Notable Models Key Features
Tesla Tesla C1060, M2050, K80, P100, V100, A100 First dedicated GPGPU series
Fermi Fermi GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000 First to feature CUDA cores and support for ECC memory
Kepler Kepler GTX 600, GTX 700, Tesla K-series, Quadro K-series First to feature Dynamic Parallelism and Hyper-Q
Maxwell Maxwell GTX 900, GTX 1000, Quadro M-series First to support VR and 4K displays
Pascal Pascal GTX 1000, Quadro P-series First to support simultaneous multi-projection
Volta Volta Titan V, Tesla V100, Quadro GV100 First to feature Tensor Cores and NVLink 2.0
Turing Turing RTX 2000, GTX 1600, Quadro RTX First to feature Ray Tracing Cores and RTX technology
Ampere Ampere RTX 3000, A-series Features third-generation Tensor Cores and more

NVIDIA GPU Models

Model Architecture CUDA Cores Tensor Cores RT Cores Memory Size Memory Type Memory Bandwidth TDP Launch Date
Tesla C870 Tesla 128 No No 1.5 GB GDDR3 GDDR3 76.8 GB/s 105W Jun 2006
Tesla C1060 Tesla 240 No No 4 GB GDDR3 GDDR3 102 GB/s 238W Dec 2008
Tesla M1060 Tesla 240 No No 4 GB GDDR3 GDDR3 102 GB/s 225W Dec 2008
Tesla M2050 Fermi 448 No No 3 GB GDDR5 GDDR5 148 GB/s 225W May 2010
Tesla M2070 Fermi 448 No No 6 GB GDDR5 GDDR5 150 GB/s 225W May 2010
Tesla K10 Kepler 3072 No No 8 GB GDDR5 GDDR5 320 GB/s 225W May 2012
Tesla K20 Kepler 2496 No No 5/6 GB GDDR5 GDDR5 208 GB/s 225W Nov 2012
Tesla K40 Kepler 2880 No No 12 GB GDDR5 GDDR5 288 GB/s 235W Nov 2013
Tesla K80 Kepler 4992 No No 24 GB GDDR5 GDDR5 480 GB/s 300W Nov 2014
Tesla M40 Maxwell 3072 No No 12 GB GDDR5 GDDR5 288 GB/s 250W Nov 2015
Tesla P4 Pascal 2560 No No 8 GB GDDR5 GDDR5 192 GB/s 75W Sep 2016
Tesla P40 Pascal 3840 No No 24 GB GDDR5X GDDR5X 480 GB/s 250W Sep 2016
Tesla V100 Volta 5120 640 Yes 16/32 GB HBM2 HBM2 900 GB/s 300W May 2017
Tesla T4 Turing 2560 320 No 16 GB
A100 PCIe Ampere 6912 432 Yes 40 GB HBM2 / 80 GB HBM2 HBM2 1555 GB/s 250W May 2020
A100 SXM4 Ampere 6912 432 Yes 40 GB HBM2 / 80 GB HBM2 HBM2 1555 GB/s 400W May 2020
A30 Ampere 7424 184 No 24 GB GDDR6 GDDR6 696 GB/s 165W Apr 2021
A40 Ampere 10752 336 No 48 GB GDDR6 GDDR6 696 GB/s 300W Apr 2021
A10 Ampere 10240 320 No 24 GB GDDR6 GDDR6 624 GB/s 150W Mar 2021
A16 Ampere 16384 512 No 48 GB GDDR6 GDDR6 768 GB/s 400W Mar 2021
A100 80GB Ampere 6912 432 Yes 80 GB HBM2 HBM2 2025 GB/s 400W Apr 2021
A100 40GB Ampere 6912 432 Yes 40 GB HBM2 HBM2 1555 GB/s 250W May 2020
A200 PCIe Ampere 10752 672 Yes 80 GB HBM2 / 160 GB HBM2 HBM2 2050 GB/s 400W Nov 2021
A200 SXM4 Ampere 10752 672 Yes 80 GB HBM2 / 160 GB HBM2 HBM2 2050 GB/s 400W Nov 2021
A5000 Ampere 8192 256 Yes 24 GB GDDR6 GDDR6 768 GB/s 230W Apr 2021
A4000 Ampere 6144 192 Yes 16 GB GDDR6 GDDR6 512 GB/s 140W Apr 2021
A3000 Ampere 3584 112 Yes 24 GB G
Titan RTX Turing 4608 576 Yes 24 GB GDDR6 GDDR6 672 GB/s 280W Dec 2018
GeForce RTX 3090 Turing 10496 328 Yes 24 GB GDDR6X GDDR6X 936 GB/s 350W Sep 2020
GeForce RTX 3080 Ti Turing 10240 320 Yes 12 GB GDDR6X GDDR6X 912 GB/s 350W May 2021
GeForce RTX 3080 Turing 8704 272 Yes 10 GB GDDR6X GDDR6X 760 GB/s 320W Sep 2020
GeForce RTX 3070 Ti Turing 6144 192 Yes 8 GB GDDR6X GDDR6X 608 GB/s 290W Jun 2021
GeForce RTX 3070 Turing 5888 184 Yes 8 GB GDDR6 GDDR6 448 GB/s 220W Oct 2020
GeForce RTX 3060 Ti Turing 4864 152 Yes 8 GB GDDR6 GDDR6 448 GB/s 200W Dec 2020
GeForce RTX 3060 Turing 3584 112 No 12 GB GDDR6 GDDR6 360 GB/s 170W Feb 2021
Quadro RTX 8000 Turing 4608 576 Yes 48 GB GDDR6 GDDR6 624 GB/s 295W Aug 2018
Quadro RTX 6000 Turing 4608 576 Yes 24 GB GDDR6 GDDR6 432 GB/s 260W Aug 2018
Quadro RTX 5000 Turing 3072 384 Yes 16 GB GDDR6 GDDR6 448 GB/s 230W Nov 2018
Quadro RTX 4000 Turing 2304 288 Yes 8 GB GDDR6 GDDR6 416 GB/s 160W Nov 2018
Titan RTX (T-Rex) Turing 4608 576 Yes 24 GB
Titan V Volta 5120 640 12 GB HBM2 HBM2 652.8 GB/s 250W Dec 2017
Tesla V100 (PCIe) Volta 5120 640 16 GB HBM2 HBM2 900 GB/s 250W June 2017
Tesla V100 (SXM2) Volta 5120 640 16 GB HBM2 HBM2 900 GB/s 300W June 2017
Quadro GV100 Volta 5120 640 32 GB HBM2 HBM2 870 GB/s 250W Mar 2018
Tesla GV100 (SXM2) Volta 5120 640 32 GB HBM2 HBM2 900 GB/s 300W Mar 2018
DGX-1 (Volta) Volta 5120 640 16 x 32 GB HBM2 (512 GB total) HBM2 2.7 TB/s 3200W Mar 2018

NVIDIA Grace Architecture

NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023

Architecture Key Features
Grace CPU-GPU integration, ARM Neoverse CPU, HBM2E memory
900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink
10x performance improvement for certain HPC workloads
Energy efficiency improvements through unified memory space