PCIe DMA read from NX memory is slow
I have an Intel based PC connected to the x4 PCIe slot of the Orin NX system, using two PCIe Gen3 switch cards. The PCIe switch on the PC is configured with an NT endpoint, which provides the PC an interface to access NX memory. I have a simple driver to allocate memory on NX using dma_alloc_coherent(). The software on the PC DMAs to/from the allocated buffers on NX through the NT Bridge’s memory window (BAR).
I have finally got both PCIe DMA reads and write working, so that the PC can read and write to/from NX RAM, but for some reason the DMA read speeds much slower than the write speeds.
For example, for DMA writes to NX RAM I’m getting around 3620 MB/s, while DMA read from NX RAM is only 1450 MB/s. These rates are after running jetson_clocks on NX, which helped to improve performance slightly. I have also tried setting the nvpmodel to 0 (MAXN), but that doesn’t seem to help performance much. I’d expect PCIe reads to be slower than writes, but not less than half of the write rates.
When run the same test with two PCs instead of the Orin NX, the DMA read rates are much closer to the DMA write rate. The DMA writes with two PCs were 3620 MB/s and DMA reads were around 3200 MB/s.
Why are PCIe DMA reads so much slower from an Orin NX board? Is there any power management capabilities that might be bottlenecking PCIe DMA reads from NX RAM? Is there anything I can adjust in PCIe configuration space or in the device tree that might help improve a PCIe device’s DMA read performance from NX memory?