NVIDIA GPU Architecture
Series
|
Architecture
|
Notable Models
|
Key Features
|
Tesla
|
Tesla
|
C1060, M2050, K80, P100, V100, A100
|
First dedicated GPGPU series
|
Fermi
|
Fermi
|
GTX 400, GTX 500, Tesla 20-series, Quadro 4000/5000
|
First to feature CUDA cores and support for ECC memory
|
Kepler
|
Kepler
|
GTX 600, GTX 700, Tesla K-series, Quadro K-series
|
First to feature Dynamic Parallelism and Hyper-Q
|
Maxwell
|
Maxwell
|
GTX 900, GTX 1000, Quadro M-series
|
First to support VR and 4K displays
|
Pascal
|
Pascal
|
GTX 1000, Quadro P-series
|
First to support simultaneous multi-projection
|
Volta
|
Volta
|
Titan V, Tesla V100, Quadro GV100
|
First to feature Tensor Cores and NVLink 2.0
|
Turing
|
Turing
|
RTX 2000, GTX 1600, Quadro RTX
|
First to feature Ray Tracing Cores and RTX technology
|
Ampere
|
Ampere
|
RTX 3000, A-series
|
Features third-generation Tensor Cores and more
|
Lovelace
|
Ada Lovelace
|
|
- Fourth-Gen Tensor Cores increasing throughput by up to 5X, to 1.4 Tensor-petaFLOPS using the new FP8 Transformer Engine (like H100 model)
- Third-generation RT Cores have twice the ray-triangle intersection throughput, increasing RT-TFLOP performance by over 2x
- The new RT Cores also include a new Opacity Micromap (OMM) Engine and a new Displaced Micro-Mesh (DMM) Engine. The OMM Engine enables much faster ray tracing of alpha-tested textures often used for foliage, particles, and fences. The DMM Engine delivers up to 10X faster Bounding Volume Hierarchy (BVH) build time with up to 20X less BVH storage space, enabling real-time ray tracing of geometrically complex scenes
- Shader Execution Reordering (SER) technology dynamically reorganizes these previously inefficient workloads into considerably more efficient ones. SER can improve shader performance for ray tracing operations by up to 3X, and in-game frame rates by up to 25%.
- DLSS 3 is a revolutionary breakthrough in AI-powered graphics that massively boosts performance. Powered by the new fourth-gen Tensor Cores and Optical Flow Accelerator on GeForce RTX 40 Series GPUs, DLSS 3 uses AI to create additional high-quality frames
- Graphics cards built upon the Ada architecture feature new eighth generation NVIDIA Encoders (NVENC) with AV1 encoding, enabling a raft of new possibilities for streamers, broadcasters, and video callers.
- It’s 40% more efficient than H.264 and allows users who are streaming at 1080p to increase their stream resolution to 1440p while running at the same bitrate and quality.
|
NVIDIA GPU Models
Model
|
Architecture
|
CUDA Cores
|
Tensor Cores
|
RT Cores
|
Memory Size
|
Memory Type
|
Memory Bandwidth
|
TDP
|
Launch Date
|
Tesla C870
|
Tesla
|
128
|
No
|
No
|
1.5 GB GDDR3
|
GDDR3
|
76.8 GB/s
|
105W
|
Jun 2006
|
Tesla C1060
|
Tesla
|
240
|
No
|
No
|
4 GB GDDR3
|
GDDR3
|
102 GB/s
|
238W
|
Dec 2008
|
Tesla M1060
|
Tesla
|
240
|
No
|
No
|
4 GB GDDR3
|
GDDR3
|
102 GB/s
|
225W
|
Dec 2008
|
Tesla M2050
|
Fermi
|
448
|
No
|
No
|
3 GB GDDR5
|
GDDR5
|
148 GB/s
|
225W
|
May 2010
|
Tesla M2070
|
Fermi
|
448
|
No
|
No
|
6 GB GDDR5
|
GDDR5
|
150 GB/s
|
225W
|
May 2010
|
Tesla K10
|
Kepler
|
3072
|
No
|
No
|
8 GB GDDR5
|
GDDR5
|
320 GB/s
|
225W
|
May 2012
|
Tesla K20
|
Kepler
|
2496
|
No
|
No
|
5/6 GB GDDR5
|
GDDR5
|
208 GB/s
|
225W
|
Nov 2012
|
Tesla K40
|
Kepler
|
2880
|
No
|
No
|
12 GB GDDR5
|
GDDR5
|
288 GB/s
|
235W
|
Nov 2013
|
Tesla K80
|
Kepler
|
4992
|
No
|
No
|
24 GB GDDR5
|
GDDR5
|
480 GB/s
|
300W
|
Nov 2014
|
Tesla M40
|
Maxwell
|
3072
|
No
|
No
|
12 GB GDDR5
|
GDDR5
|
288 GB/s
|
250W
|
Nov 2015
|
Tesla P4
|
Pascal
|
2560
|
No
|
No
|
8 GB GDDR5
|
GDDR5
|
192 GB/s
|
75W
|
Sep 2016
|
Tesla P40
|
Pascal
|
3840
|
No
|
No
|
24 GB GDDR5X
|
GDDR5X
|
480 GB/s
|
250W
|
Sep 2016
|
Tesla V100
|
Volta
|
5120
|
640
|
Yes
|
16/32 GB HBM2
|
HBM2
|
900 GB/s
|
300W
|
May 2017
|
Tesla T4
|
Turing
|
2560
|
320
|
No
|
16 GB
|
|
|
|
|
A100 PCIe
|
Ampere
|
6912
|
432
|
Yes
|
40 GB HBM2 / 80 GB HBM2
|
HBM2
|
1555 GB/s
|
250W
|
May 2020
|
A100 SXM4
|
Ampere
|
6912
|
432
|
Yes
|
40 GB HBM2 / 80 GB HBM2
|
HBM2
|
1555 GB/s
|
400W
|
May 2020
|
A30
|
Ampere
|
7424
|
184
|
No
|
24 GB GDDR6
|
GDDR6
|
696 GB/s
|
165W
|
Apr 2021
|
A40
|
Ampere
|
10752
|
336
|
No
|
48 GB GDDR6
|
GDDR6
|
696 GB/s
|
300W
|
Apr 2021
|
A10
|
Ampere
|
10240
|
320
|
No
|
24 GB GDDR6
|
GDDR6
|
624 GB/s
|
150W
|
Mar 2021
|
A16
|
Ampere
|
16384
|
512
|
No
|
48 GB GDDR6
|
GDDR6
|
768 GB/s
|
400W
|
Mar 2021
|
A100 80GB
|
Ampere
|
6912
|
432
|
Yes
|
80 GB HBM2
|
HBM2
|
2025 GB/s
|
400W
|
Apr 2021
|
A100 40GB
|
Ampere
|
6912
|
432
|
Yes
|
40 GB HBM2
|
HBM2
|
1555 GB/s
|
250W
|
May 2020
|
A200 PCIe
|
Ampere
|
10752
|
672
|
Yes
|
80 GB HBM2 / 160 GB HBM2
|
HBM2
|
2050 GB/s
|
400W
|
Nov 2021
|
A200 SXM4
|
Ampere
|
10752
|
672
|
Yes
|
80 GB HBM2 / 160 GB HBM2
|
HBM2
|
2050 GB/s
|
400W
|
Nov 2021
|
A5000
|
Ampere
|
8192
|
256
|
Yes
|
24 GB GDDR6
|
GDDR6
|
768 GB/s
|
230W
|
Apr 2021
|
A4000
|
Ampere
|
6144
|
192
|
Yes
|
16 GB GDDR6
|
GDDR6
|
512 GB/s
|
140W
|
Apr 2021
|
A3000
|
Ampere
|
3584
|
112
|
Yes
|
24 GB G
|
|
|
|
|
Titan RTX
|
Turing
|
4608
|
576
|
Yes
|
24 GB GDDR6
|
GDDR6
|
672 GB/s
|
280W
|
Dec 2018
|
GeForce RTX 3090
|
Turing
|
10496
|
328
|
Yes
|
24 GB GDDR6X
|
GDDR6X
|
936 GB/s
|
350W
|
Sep 2020
|
GeForce RTX 3080 Ti
|
Turing
|
10240
|
320
|
Yes
|
12 GB GDDR6X
|
GDDR6X
|
912 GB/s
|
350W
|
May 2021
|
GeForce RTX 3080
|
Turing
|
8704
|
272
|
Yes
|
10 GB GDDR6X
|
GDDR6X
|
760 GB/s
|
320W
|
Sep 2020
|
GeForce RTX 3070 Ti
|
Turing
|
6144
|
192
|
Yes
|
8 GB GDDR6X
|
GDDR6X
|
608 GB/s
|
290W
|
Jun 2021
|
GeForce RTX 3070
|
Turing
|
5888
|
184
|
Yes
|
8 GB GDDR6
|
GDDR6
|
448 GB/s
|
220W
|
Oct 2020
|
GeForce RTX 3060 Ti
|
Turing
|
4864
|
152
|
Yes
|
8 GB GDDR6
|
GDDR6
|
448 GB/s
|
200W
|
Dec 2020
|
GeForce RTX 3060
|
Turing
|
3584
|
112
|
No
|
12 GB GDDR6
|
GDDR6
|
360 GB/s
|
170W
|
Feb 2021
|
Quadro RTX 8000
|
Turing
|
4608
|
576
|
Yes
|
48 GB GDDR6
|
GDDR6
|
624 GB/s
|
295W
|
Aug 2018
|
Quadro RTX 6000
|
Turing
|
4608
|
576
|
Yes
|
24 GB GDDR6
|
GDDR6
|
432 GB/s
|
260W
|
Aug 2018
|
Quadro RTX 5000
|
Turing
|
3072
|
384
|
Yes
|
16 GB GDDR6
|
GDDR6
|
448 GB/s
|
230W
|
Nov 2018
|
Quadro RTX 4000
|
Turing
|
2304
|
288
|
Yes
|
8 GB GDDR6
|
GDDR6
|
416 GB/s
|
160W
|
Nov 2018
|
Titan RTX (T-Rex)
|
Turing
|
4608
|
576
|
Yes
|
24 GB
|
|
|
|
|
Titan V
|
Volta
|
5120
|
640
|
|
12 GB HBM2
|
HBM2
|
652.8 GB/s
|
250W
|
Dec 2017
|
Tesla V100 (PCIe)
|
Volta
|
5120
|
640
|
|
16 GB HBM2
|
HBM2
|
900 GB/s
|
250W
|
June 2017
|
Tesla V100 (SXM2)
|
Volta
|
5120
|
640
|
|
16 GB HBM2
|
HBM2
|
900 GB/s
|
300W
|
June 2017
|
Quadro GV100
|
Volta
|
5120
|
640
|
|
32 GB HBM2
|
HBM2
|
870 GB/s
|
250W
|
Mar 2018
|
Tesla GV100 (SXM2)
|
Volta
|
5120
|
640
|
|
32 GB HBM2
|
HBM2
|
900 GB/s
|
300W
|
Mar 2018
|
DGX-1 (Volta)
|
Volta
|
5120
|
640
|
|
16 x 32 GB HBM2 (512 GB total)
|
HBM2
|
2.7 TB/s
|
3200W
|
Mar 2018
|
NVIDIA Grace Architecture
NVIDIA has announced that they will be partnering with server manufacturers such as HPE, Atos, and Supermicro to create servers that integrate the Grace architecture with ARM-based CPUs. These servers are expected to be available in the second half of 2023
Architecture
|
Key Features
|
Grace
|
CPU-GPU integration, ARM Neoverse CPU, HBM2E memory
|
900 GB/s memory bandwidth, support for PCIe 5.0 and NVLink
|
10x performance improvement for certain HPC workloads
|
Energy efficiency improvements through unified memory space
|