Nvidia GPU Tips and Tricks
Jump to navigation
Jump to search
XID 62 or 119 error
Based on many cases, XID 62 and 119 error caused by incompatible driver version issues. it's recommended to install right version of GPU driver based on customer CUDA application version.
Remove Nvidia GPU driver on Ubuntu
$sudo apt-get remove --purge '^nvidia-.*'
apt search or get download GPU driver from nvidia then install and test
Xid errors along with the potential causes for each[1]
XID | Nvidia GPU Failure | Linux Kernel message | Causes | ||||||
---|---|---|---|---|---|---|---|---|---|
HW Error | Driver Error | User App Error | System Memory Corruption | Bus Error | Thermal Issue | FB Corruption | |||
1 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
2 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
3 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
4 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
GPU semaphore timeout | X | X | X | X | X | ||||
5 | Unused | ||||||||
6 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
7 | Invalid or corrupted push buffer address | X | X | X | |||||
8 | GPU stopped processing | X | X | X | X | ||||
9 | Driver error programming GPU | X | |||||||
10 | Unused | ||||||||
11 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
12 | Driver error handling GPU exception | X | |||||||
13 | Graphics Engine Exception | X | X | X | X | X | X | ||
14 | Unused | ||||||||
15 | Unused | ||||||||
16 | Display engine hung | X | |||||||
17 | Unused | ||||||||
18 | Bus mastering disabled in PCI Config Space | X | |||||||
19 | Display Engine error | X | |||||||
20 | Invalid or corrupted Mpeg push buffer | X | X | X | X | ||||
21 | Invalid or corrupted Motion Estimation push buffer | X | X | X | X | ||||
22 | Invalid or corrupted Video Processor push buffer | X | X | X | X | ||||
23 | Unused | ||||||||
24 | GPU semaphore timeout | X | X | X | X | X | X | ||
25 | Invalid or illegal push buffer stream | X | X | X | X | X | |||
26 | Framebuffer timeout | X | |||||||
27 | Video processor exception | X | |||||||
28 | Video processor exception | X | |||||||
29 | Video processor exception | X | |||||||
30 | GPU semaphore access error | X | |||||||
31 | GPU memory page fault | X | X | ||||||
32 | Invalid or corrupted push buffer stream | X | X | X | X | X | |||
33 | Internal micro-controller error | X | |||||||
34 | Video processor exception | X | |||||||
35 | Video processor exception | X | |||||||
36 | Video processor exception | X | |||||||
37 | Driver firmware error | X | X | X | |||||
38 | Driver firmware error | X | |||||||
39 | Unused | ||||||||
40 | Unused | ||||||||
41 | Unused | ||||||||
42 | Video processor exception | X | |||||||
43 | GPU stopped processing | X | X | ||||||
44 | Graphics Engine fault during context switch | X | |||||||
45 | Preemptive cleanup, due to previous errors -- Most likely to see when running multiple cuda applications and hitting a DBE | Usually Kernel shows following message before XID 45
sched: RT throttling activated |
X | ||||||
46 | GPU stopped processing | X | |||||||
47 | Video processor exception | X | |||||||
48 | Double Bit ECC Error | X | |||||||
49 | Unused | ||||||||
50 | Unused | ||||||||
51 | Unused | ||||||||
52 | Unused | ||||||||
53 | Unused | ||||||||
54 | Auxiliary power is not connected to the GPU board | ||||||||
55 | Unused | ||||||||
56 | Display Engine error | X | X | ||||||
57 | Error programming video memory interface | X | X | X | |||||
58 | Unstable video memory interface detected | X | X | ||||||
EDC error – clarified in printout | X | ||||||||
59 | Internal micro-controller error
(older drivers) |
X | |||||||
60 | Video processor exception | X | |||||||
61 | Internal micro-controller breakpoint/warning
(newer drivers) |
||||||||
62 | Internal micro-controller halt
(newer drivers) |
X | X | X | |||||
63 | ECC page retirement or row remapping recording event | X | X | X | |||||
64 | ECC page retirement or row remapper recording failure | X | X | ||||||
65 | Video processor exception | X | X | ||||||
66 | Illegal access by driver | X | X | ||||||
67 | Illegal access by driver | X | X | ||||||
68 | NVDEC0 Exception | X | X | ||||||
69 | Graphics Engine class error | X | X | ||||||
70 | CE3: Unknown Error | X | X | ||||||
71 | CE4: Unknown Error | X | X | ||||||
72 | CE5: Unknown Error | X | X | ||||||
73 | NVENC2 Error | X | X | ||||||
74 | NVLINK Error | X | X | X | |||||
75 | CE6: Unknown Error | X | X | ||||||
76 | CE7: Unknown Error | X | X | ||||||
77 | CE8: Unknown Error | X | X | ||||||
78 | vGPU Start Error | X | |||||||
79 | GPU has fallen off the bus | X | X | X | X | X | |||
80 | Corrupted data sent to GPU | X | X | X | X | X | |||
81 | VGA Subsystem Error | X | |||||||
82 | NVJPG0 Error | X | X | ||||||
83 | NVDEC1 Error | X | X | ||||||
84 | NVDEC2 Error | X | X | ||||||
85 | CE9: Unknown Error | X | X | ||||||
86 | OFA Exception | X | X | ||||||
87 | Reserved | ||||||||
88 | NVDEC3 Error | X | X | ||||||
89 | NVDEC4 Error | X | X | ||||||
90 | Reserved | ||||||||
91 | Reserved | ||||||||
92 | High single-bit ECC error rate | X | X | ||||||
93 | Non-fatal violation of provisioned InfoROM wear limit | X | X | ||||||
94 | Contained ECC error | X | X | X | |||||
95 | Uncontained ECC error | X | X | X | |||||
96 | NVDEC5 Error | X | X | ||||||
97 | NVDEC6 Error | X | X | ||||||
98 | NVDEC7 Error | X | X | ||||||
99 | NVJPG1 Error | X | X | ||||||
100 | NVJPG2 Error | X | X | ||||||
101 | NVJPG3 Error | X | X | ||||||
102 | NVJPG4 Error | X | X | ||||||
103 | NVJPG5 Error | X | X | ||||||
104 | NVJPG6 Error | X | X | ||||||
105 | NVJPG7 Error | X | X | ||||||
106 | SMBPBI Test Message | X | |||||||
107 | SMBPBI Test Message Silent | X | |||||||
108-
109 |
Reserved | ||||||||
110 | Security Fault Error | X | |||||||
111 | Display Bundle Error Event | X | X | X | |||||
112 | Display Supervisor Error | X | X | ||||||
113 | DP Link Training Error | X | X | ||||||
114 | Display Pipeline Underflow Error | X | X | X | |||||
115 | Display Core Channel Error | X | X | ||||||
116 | Display Window Channel Error | X | X | ||||||
117 | Display Cursor Channel Error | X | X | ||||||
118 | Display Pixel Pipeline Error | X | X | ||||||
119 | GSP RPC Timeout | X | X | X | X | X | X | ||
120 | GSP Error | X | X | X | X | X | X | ||
121 | Reserved | ||||||||
122 | SPI PMU RPC Read Failure | X | X | ||||||
123 | SPI PMU RPC Write Failure | X | X | ||||||
124 | SPI PMU RPC Erase Failure | X | X | ||||||
125 | Inforom FS Failure | X | X |