Intermittent PCIe C5 link when using bsp 32.6.1
Hello,
We are using Jetson AGX Xavier with a custom carrier board with an embedded FPGA communicating over PCIe C5 to the Xavier. The link is a gen 2 x4 link. Our hardware is verified because this PCIe link consistently comes up and the FPGA works when using bsp 32.5.1. However when we upgrade to 32.6.1, which is the current production version in our application, the PCIe link comes up only intermittently.
This custom carrier also routes out the M.2 slot to PCIe C0 on the Xavier, in the same way that the dev kit does. When we put an NVME drive into this M.2 slot, both the NVME and the FPGA PCIe links consistently come up. But if the NVME drive is missing, the C5 link to the FPGA comes up only on ~25% of boots.
We used the schematics for the dev kit as reference for routing out these PCIe links. Accordingly: M2 connector pins PERSTN, CLREQN, ALERT, and WAKEN are routed to Xavier connector pins D10, E11, A55, A7, respectively. M2 ALERT and WAKEN are pulled up to 1.8V with a 100k resistors. The FPGA does not make use of PERSTN, CLKREQN, or WAKEN. Instead, there are traces connecting these signals to Xavier pins K56, C55, and G56. We are not sure why these need to be connected to GPIO lines, but they are on the dev kit carrier board so we referenced that design over. We can cut these traces or set to gnd/1.8V/3.3V if needed. This connection info is for your reference - again, the hardware design has been proven with an Xavier running an older bsp.
We make use of lane polarity inversion in our design.
Things we have tried so far:
- echoing to bind and unbind in /sys/bus/platform/drivers/tegra-pcie-dw/ based on this topic
- Removing nvidia,enable-power-down in the device tree
- Removing nvidia,plat-gpios in the device tree
None of these changes seem to make the PCIe link come up consistently.
What changed between 32.5.1 and 32.6.1? What changes to the device tree or kernel can we try to get the device to come up again consistently?
Thank you for your time,
Anya