Nvidia GPU Tips and Tricks
Jump to navigation
Jump to search
Xid errors along with the potential causes for each[1]
XID | Nvidia GPU Failure | Linux Kernel message | Causes | ||||||
---|---|---|---|---|---|---|---|---|---|
HW Error | Driver Error | User App Error | System Memory Corruption | Bus Error | Thermal Issue | FB Corruption | |||
1 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
2 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
3 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
4 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
GPU semaphore timeout | X | X | X | X | X | ||||
5 | Unused | ||||||||
6 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
7 | Invalid or corrupted push buffer address | X | X | X | |||||
8 | GPU stopped processing | X | X | X | X | ||||
9 | Driver error programming GPU | X | |||||||
10 | Unused | ||||||||
11 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
12 | Driver error handling GPU exception | X | |||||||
13 | Graphics Engine Exception | X | X | X | X | X | X | ||
14 | Unused | ||||||||
15 | Unused | ||||||||
16 | Display engine hung | X | |||||||
17 | Unused | ||||||||
18 | Bus mastering disabled in PCI Config Space | X | |||||||
19 | Display Engine error | X | |||||||
20 | Invalid or corrupted Mpeg push buffer | X | X | X | X | ||||
21 | Invalid or corrupted Motion Estimation push buffer | X | X | X | X | ||||
22 | Invalid or corrupted Video Processor push buffer | X | X | X | X | ||||
23 | Unused | ||||||||
24 | GPU semaphore timeout | X | X | X | X | X | X | ||
25 | Invalid or illegal push buffer stream | X | X | X | X | X | |||
26 | Framebuffer timeout | X | |||||||
27 | Video processor exception | X | |||||||
28 | Video processor exception | X | |||||||
29 | Video processor exception | X | |||||||
30 | GPU semaphore access error | X | |||||||
31 | GPU memory page fault | X | X | ||||||
32 | Invalid or corrupted push buffer stream | X | X | X | X | X | |||
33 | Internal micro-controller error | X | |||||||
34 | Video processor exception | X | |||||||
35 | Video processor exception | X | |||||||
36 | Video processor exception | X | |||||||
37 | Driver firmware error | X | X | X | |||||
38 | Driver firmware error | X | |||||||
39 | Unused | ||||||||
40 | Unused | ||||||||
41 | Unused | ||||||||
42 | Video processor exception | X | |||||||
43 | GPU stopped processing | X | X | ||||||
44 | Graphics Engine fault during context switch | X | |||||||
45 | Preemptive cleanup, due to previous errors -- Most likely to see when running multiple cuda applications and hitting a DBE | Usually Kernel shows following message before XID 45
sched: RT throttling activated |
X | ||||||
46 | GPU stopped processing | X | |||||||
47 | Video processor exception | X | |||||||
48 | Double Bit ECC Error | X | |||||||
49 | Unused | ||||||||
50 | Unused | ||||||||
51 | Unused | ||||||||
52 | Unused | ||||||||
53 | Unused | ||||||||
54 | Auxiliary power is not connected to the GPU board | ||||||||
55 | Unused | ||||||||
56 | Display Engine error | X | X | ||||||
57 | Error programming video memory interface | X | X | X | |||||
58 | Unstable video memory interface detected | X | X | ||||||
EDC error – clarified in printout | X | ||||||||
59 | Internal micro-controller error
(older drivers) |
X | |||||||
60 | Video processor exception | X | |||||||
61 | Internal micro-controller breakpoint/warning
(newer drivers) |
||||||||
62 | Internal micro-controller halt
(newer drivers) |
X | X | X | |||||
63 | ECC page retirement or row remapping recording event | X | X | X | |||||
64 | ECC page retirement or row remapper recording failure | X | X | ||||||
65 | Video processor exception | X | X | ||||||
66 | Illegal access by driver | X | X | ||||||
67 | Illegal access by driver | X | X | ||||||
68 | NVDEC0 Exception | X | X | ||||||
69 | Graphics Engine class error | X | X | ||||||
70 | CE3: Unknown Error | X | X | ||||||
71 | CE4: Unknown Error | X | X | ||||||
72 | CE5: Unknown Error | X | X | ||||||
73 | NVENC2 Error | X | X | ||||||
74 | NVLINK Error | X | X | X | |||||
75 | CE6: Unknown Error | X | X | ||||||
76 | CE7: Unknown Error | X | X | ||||||
77 | CE8: Unknown Error | X | X | ||||||
78 | vGPU Start Error | X | |||||||
79 | GPU has fallen off the bus | X | X | X | X | X | |||
80 | Corrupted data sent to GPU | X | X | X | X | X | |||
81 | VGA Subsystem Error | X | |||||||
82 | NVJPG0 Error | X | X | ||||||
83 | NVDEC1 Error | X | X | ||||||
84 | NVDEC2 Error | X | X | ||||||
85 | CE9: Unknown Error | X | X | ||||||
86 | OFA Exception | X | X | ||||||
87 | Reserved | ||||||||
88 | NVDEC3 Error | X | X | ||||||
89 | NVDEC4 Error | X | X | ||||||
90 | Reserved | ||||||||
91 | Reserved | ||||||||
92 | High single-bit ECC error rate | X | X | ||||||
93 | Non-fatal violation of provisioned InfoROM wear limit | X | X | ||||||
94 | Contained ECC error | X | X | X | |||||
95 | Uncontained ECC error | X | X | X | |||||
96 | NVDEC5 Error | X | X | ||||||
97 | NVDEC6 Error | X | X | ||||||
98 | NVDEC7 Error | X | X | ||||||
99 | NVJPG1 Error | X | X | ||||||
100 | NVJPG2 Error | X | X | ||||||
101 | NVJPG3 Error | X | X | ||||||
102 | NVJPG4 Error | X | X | ||||||
103 | NVJPG5 Error | X | X | ||||||
104 | NVJPG6 Error | X | X | ||||||
105 | NVJPG7 Error | X | X | ||||||
106 | SMBPBI Test Message | X | |||||||
107 | SMBPBI Test Message Silent | X | |||||||
108-
109 |
Reserved | ||||||||
110 | Security Fault Error | X | |||||||
111 | Display Bundle Error Event | X | X | X | |||||
112 | Display Supervisor Error | X | X | ||||||
113 | DP Link Training Error | X | X | ||||||
114 | Display Pipeline Underflow Error | X | X | X | |||||
115 | Display Core Channel Error | X | X | ||||||
116 | Display Window Channel Error | X | X | ||||||
117 | Display Cursor Channel Error | X | X | ||||||
118 | Display Pixel Pipeline Error | X | X | ||||||
119 | GSP RPC Timeout | X | X | X | X | X | X | ||
120 | GSP Error | X | X | X | X | X | X | ||
121 | Reserved | ||||||||
122 | SPI PMU RPC Read Failure | X | X | ||||||
123 | SPI PMU RPC Write Failure | X | X | ||||||
124 | SPI PMU RPC Erase Failure | X | X | ||||||
125 | Inforom FS Failure | X | X |