AER (Advanced Error Reporting): Difference between revisions
No edit summary |
|||
Line 1: | Line 1: | ||
== NVMe AER Issues == | == NVMe AER Issues == | ||
There are many community reports AER error on boot on various systems. <ref>https://forums.linuxmint.com/viewtopic.php?t=380602</ref> | There are many community reports AER error on boot on various systems. <ref>https://forums.linuxmint.com/viewtopic.php?t=380602</ref> with following error or simlar<blockquote>Bad TLP associated with device xxxx:xx:x</blockquote>That means in this particular case, ''something'' goes wrong when the [[PCIe]] controller uses this method to access the configuraton space of a particular device. | ||
== | It may be a hardware bug in the device, in the PCIe root controller on the [[motherboard]], in the specific interaction of those two, or something else. | ||
== '''Kernal param, pci=noaer''' == | |||
The '''pci=noaer''' directive tells AER to not report errors. Those error reports would go into a log file, and each error sends a time-consuming interrupt request (IRQ) to the central processor. A rapid flow of error reports could thus flood the drive -- and clog NVMe bandwidth, slowing or even halting bootup. | The '''pci=noaer''' directive tells AER to not report errors. Those error reports would go into a log file, and each error sends a time-consuming interrupt request (IRQ) to the central processor. A rapid flow of error reports could thus flood the drive -- and clog NVMe bandwidth, slowing or even halting bootup. | ||
Line 8: | Line 10: | ||
So, the NVMe drive may be healthy, but there could be trouble brewing around the PCIe subsystem | So, the NVMe drive may be healthy, but there could be trouble brewing around the PCIe subsystem | ||
== Kernal param, pci=nommconf<ref>https://unix.stackexchange.com/questions/327730/what-causes-this-pcieport-00000003-0-pcie-bus-error-aer-bad-tlp</ref> == | |||
The kernel option <code>pci=nommconf</code> disables Memory-Mapped PCI Configuration Space, which is available in [[Linux]] since kernel 2.6. Very roughly, all PCI devices have an area that describe this device (which you see with <code>lspci -vv</code>), and the originally method to access this area involves going through I/O ports, while PCIe allows this space to be mapped to memory for simpler access. | |||
By using <code>pci=nommconf</code>, the configuration space of all devices will be accessed in the original way, and changing the access methods works around the AER problem. | |||
== Reference == | == Reference == |
Revision as of 17:57, 2 April 2023
NVMe AER Issues
There are many community reports AER error on boot on various systems. [1] with following error or simlar
Bad TLP associated with device xxxx:xx:x
That means in this particular case, something goes wrong when the PCIe controller uses this method to access the configuraton space of a particular device.
It may be a hardware bug in the device, in the PCIe root controller on the motherboard, in the specific interaction of those two, or something else.
Kernal param, pci=noaer
The pci=noaer directive tells AER to not report errors. Those error reports would go into a log file, and each error sends a time-consuming interrupt request (IRQ) to the central processor. A rapid flow of error reports could thus flood the drive -- and clog NVMe bandwidth, slowing or even halting bootup.
The nvme 0000:xx:xx.x AER message identifies that error as from the NVMe M.2 connection to the PCIe bus.
So, the NVMe drive may be healthy, but there could be trouble brewing around the PCIe subsystem
Kernal param, pci=nommconf[2]
The kernel option pci=nommconf
disables Memory-Mapped PCI Configuration Space, which is available in Linux since kernel 2.6. Very roughly, all PCI devices have an area that describe this device (which you see with lspci -vv
), and the originally method to access this area involves going through I/O ports, while PCIe allows this space to be mapped to memory for simpler access.
By using pci=nommconf
, the configuration space of all devices will be accessed in the original way, and changing the access methods works around the AER problem.