Nvidia GPU Tips and Tricks
Jump to navigation
Jump to search
Xid errors along with the potential causes for each[1]
| XID | Nvidia GPU Failure | Linux Kernel message | Causes | ||||||
|---|---|---|---|---|---|---|---|---|---|
| HW Error | Driver Error | User App Error | System Memory Corruption | Bus Error | Thermal Issue | FB Corruption | |||
| 1 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
| 2 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
| 3 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
| 4 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
| GPU semaphore timeout | X | X | X | X | X | ||||
| 5 | Unused | ||||||||
| 6 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
| 7 | Invalid or corrupted push buffer address | X | X | X | |||||
| 8 | GPU stopped processing | X | X | X | X | ||||
| 9 | Driver error programming GPU | X | |||||||
| 10 | Unused | ||||||||
| 11 | Invalid or corrupted push buffer stream | X | X | X | X | ||||
| 12 | Driver error handling GPU exception | X | |||||||
| 13 | Graphics Engine Exception | X | X | X | X | X | X | ||
| 14 | Unused | ||||||||
| 15 | Unused | ||||||||
| 16 | Display engine hung | X | |||||||
| 17 | Unused | ||||||||
| 18 | Bus mastering disabled in PCI Config Space | X | |||||||
| 19 | Display Engine error | X | |||||||
| 20 | Invalid or corrupted Mpeg push buffer | X | X | X | X | ||||
| 21 | Invalid or corrupted Motion Estimation push buffer | X | X | X | X | ||||
| 22 | Invalid or corrupted Video Processor push buffer | X | X | X | X | ||||
| 23 | Unused | ||||||||
| 24 | GPU semaphore timeout | X | X | X | X | X | X | ||
| 25 | Invalid or illegal push buffer stream | X | X | X | X | X | |||
| 26 | Framebuffer timeout | X | |||||||
| 27 | Video processor exception | X | |||||||
| 28 | Video processor exception | X | |||||||
| 29 | Video processor exception | X | |||||||
| 30 | GPU semaphore access error | X | |||||||
| 31 | GPU memory page fault | X | X | ||||||
| 32 | Invalid or corrupted push buffer stream | X | X | X | X | X | |||
| 33 | Internal micro-controller error | X | |||||||
| 34 | Video processor exception | X | |||||||
| 35 | Video processor exception | X | |||||||
| 36 | Video processor exception | X | |||||||
| 37 | Driver firmware error | X | X | X | |||||
| 38 | Driver firmware error | X | |||||||
| 39 | Unused | ||||||||
| 40 | Unused | ||||||||
| 41 | Unused | ||||||||
| 42 | Video processor exception | X | |||||||
| 43 | GPU stopped processing | X | X | ||||||
| 44 | Graphics Engine fault during context switch | X | |||||||
| 45 | Preemptive cleanup, due to previous errors -- Most likely to see when running multiple cuda applications and hitting a DBE | Usually Kernel shows following message before XID 45
sched: RT throttling activated |
X | ||||||
| 46 | GPU stopped processing | X | |||||||
| 47 | Video processor exception | X | |||||||
| 48 | Double Bit ECC Error | X | |||||||
| 49 | Unused | ||||||||
| 50 | Unused | ||||||||
| 51 | Unused | ||||||||
| 52 | Unused | ||||||||
| 53 | Unused | ||||||||
| 54 | Auxiliary power is not connected to the GPU board | ||||||||
| 55 | Unused | ||||||||
| 56 | Display Engine error | X | X | ||||||
| 57 | Error programming video memory interface | X | X | X | |||||
| 58 | Unstable video memory interface detected | X | X | ||||||
| EDC error – clarified in printout | X | ||||||||
| 59 | Internal micro-controller error
(older drivers) |
X | |||||||
| 60 | Video processor exception | X | |||||||
| 61 | Internal micro-controller breakpoint/warning
(newer drivers) |
||||||||
| 62 | Internal micro-controller halt
(newer drivers) |
X | X | X | |||||
| 63 | ECC page retirement or row remapping recording event | X | X | X | |||||
| 64 | ECC page retirement or row remapper recording failure | X | X | ||||||
| 65 | Video processor exception | X | X | ||||||
| 66 | Illegal access by driver | X | X | ||||||
| 67 | Illegal access by driver | X | X | ||||||
| 68 | NVDEC0 Exception | X | X | ||||||
| 69 | Graphics Engine class error | X | X | ||||||
| 70 | CE3: Unknown Error | X | X | ||||||
| 71 | CE4: Unknown Error | X | X | ||||||
| 72 | CE5: Unknown Error | X | X | ||||||
| 73 | NVENC2 Error | X | X | ||||||
| 74 | NVLINK Error | X | X | X | |||||
| 75 | CE6: Unknown Error | X | X | ||||||
| 76 | CE7: Unknown Error | X | X | ||||||
| 77 | CE8: Unknown Error | X | X | ||||||
| 78 | vGPU Start Error | X | |||||||
| 79 | GPU has fallen off the bus | X | X | X | X | X | |||
| 80 | Corrupted data sent to GPU | X | X | X | X | X | |||
| 81 | VGA Subsystem Error | X | |||||||
| 82 | NVJPG0 Error | X | X | ||||||
| 83 | NVDEC1 Error | X | X | ||||||
| 84 | NVDEC2 Error | X | X | ||||||
| 85 | CE9: Unknown Error | X | X | ||||||
| 86 | OFA Exception | X | X | ||||||
| 87 | Reserved | ||||||||
| 88 | NVDEC3 Error | X | X | ||||||
| 89 | NVDEC4 Error | X | X | ||||||
| 90 | Reserved | ||||||||
| 91 | Reserved | ||||||||
| 92 | High single-bit ECC error rate | X | X | ||||||
| 93 | Non-fatal violation of provisioned InfoROM wear limit | X | X | ||||||
| 94 | Contained ECC error | X | X | X | |||||
| 95 | Uncontained ECC error | X | X | X | |||||
| 96 | NVDEC5 Error | X | X | ||||||
| 97 | NVDEC6 Error | X | X | ||||||
| 98 | NVDEC7 Error | X | X | ||||||
| 99 | NVJPG1 Error | X | X | ||||||
| 100 | NVJPG2 Error | X | X | ||||||
| 101 | NVJPG3 Error | X | X | ||||||
| 102 | NVJPG4 Error | X | X | ||||||
| 103 | NVJPG5 Error | X | X | ||||||
| 104 | NVJPG6 Error | X | X | ||||||
| 105 | NVJPG7 Error | X | X | ||||||
| 106 | SMBPBI Test Message | X | |||||||
| 107 | SMBPBI Test Message Silent | X | |||||||
| 108-
109 |
Reserved | ||||||||
| 110 | Security Fault Error | X | |||||||
| 111 | Display Bundle Error Event | X | X | X | |||||
| 112 | Display Supervisor Error | X | X | ||||||
| 113 | DP Link Training Error | X | X | ||||||
| 114 | Display Pipeline Underflow Error | X | X | X | |||||
| 115 | Display Core Channel Error | X | X | ||||||
| 116 | Display Window Channel Error | X | X | ||||||
| 117 | Display Cursor Channel Error | X | X | ||||||
| 118 | Display Pixel Pipeline Error | X | X | ||||||
| 119 | GSP RPC Timeout | X | X | X | X | X | X | ||
| 120 | GSP Error | X | X | X | X | X | X | ||
| 121 | Reserved | ||||||||
| 122 | SPI PMU RPC Read Failure | X | X | ||||||
| 123 | SPI PMU RPC Write Failure | X | X | ||||||
| 124 | SPI PMU RPC Erase Failure | X | X | ||||||
| 125 | Inforom FS Failure | X | X | ||||||