Cbb-fabric TIMEOUT_ERR at shutdown
Mar 04, 2024
Environment
L4T 35.4.1
Orin NX based CustomBoard
As a result of updating from R35.3.1 to R35.4.1, the following error occurs during shutdown.
# shutdown -h now
Session terminated, killing shell...[ 298.034503] CPU:0, Error: cbb-fabric@0x13a00000, irq=25
[ 298.039895] **************************************
[ 298.044829] CPU:0, Error:cbb-fabric, Errmon:2
[ 298.049319] Error Code : TIMEOUT_ERR
[ 298.053360] Overflow : Multiple TIMEOUT_ERR
[ 298.058032]
[ 298.059562] Error Code : TIMEOUT_ERR
[ 298.063602] MASTER_ID : CCPLEX
[ 298.067100] Address : 0x2a080082
[ 298.070780] Cache : 0x1 -- Bufferable
[ 298.075086] Protection : 0x2 -- Unprivileged, Non-Secure, Data Access
[ 298.082075] Access_Type : Read
[ 298.085567] Access_ID : 0x17
[ 298.085569] Fabric : cbb-fabric
[ 298.092467] Slave_Id : 0x14
[ 298.095692] Burst_length : 0x0
[ 298.099188] Burst_type : 0x1
[ 298.102507] Beat_size : 0x1
[ 298.105734] VQC : 0x0
[ 298.108521] GRPSEC : 0x7e
[ 298.111575] FALCONSEC : 0x0
[ 298.114811] **************************************
[ 298.120029] WARNING: CPU: 0 PID: 101 at drivers/soc/tegra/cbb/tegra234-cbb.c:578 tegra234_cbb_isr+0x134/0x180
[ 298.130628] ---[ end trace e109e362407d590f ]---
[ 298.135446] CPU:0, Error: cbb-fabric@0x13a00000, irq=25
[ 298.140810] **************************************
[ 298.145735] CPU:0, Error:cbb-fabric, Errmon:2
[ 298.150216] Error Code : TIMEOUT_ERR
[ 298.154241] Overflow : Multiple TIMEOUT_ERR
[ 298.158905]
[ 298.160435] Error Code : TIMEOUT_ERR
[ 298.164469] MASTER_ID : CCPLEX
[ 298.167963] Address : 0x2a080082
[ 298.171635] Cache : 0x1 -- Bufferable
[ 298.175930] Protection : 0x2 -- Unprivileged, Non-Secure, Data Access
[ 298.182914] Access_Type : Read
[ 298.186405] Access_ID : 0x14
[ 298.186407] Fabric : cbb-fabric
[ 298.193308] Slave_Id : 0x14
[ 298.196529] Burst_length : 0x0
[ 298.200019] Burst_type : 0x1
[ 298.203335] Beat_size : 0x1
[ 298.206560] VQC : 0x0
[ 298.209343] GRPSEC : 0x7e
[ 298.212395] FALCONSEC : 0x0
[ 298.215626] **************************************
[ 298.220774] WARNING: CPU: 0 PID: 101 at drivers/soc/tegra/cbb/tegra234-cbb.c:578 tegra234_cbb_isr+0x134/0x180
[ 298.231278] ---[ end trace e109e362407d5910 ]---
[ 298.669155] nvgpu: 17000000.ga10b ga10b_intr_log_pending_intrs:306 [ERR] Pending TOP[0]: 0x00000004, LEAF[4]: 0x11000000
[ 298.682004] arm-smmu 8000000.iommu: disabling translation
[ 298.687734] arm-smmu 10000000.iommu: disabling translation
[ 298.693435] arm-smmu 12000000.iommu: disabling translation
[ 298.729137] CPU1: shutdown
[ 298.749035] CPU2: shutdown
[ 298.768747] CPU3: shutdown
[ 298.77籬3198] reboot: P瞋hutdown state requested 0
Shutting down syste
Address : 0x2a080082 is、
PCIE_C7_32BIT_DMA 0x28040000 0x2807ffff SYSTEM
PCIE_C8_32BIT 0x2a000000 0x2bffffff SYSTEM
PCIE_C8_32BIT_EP 0x2a000000 0x2a001fff SYSTEM_CFG.PCIE_C8_CTL.PCIE_RP_A PPL_DM_TYPE_0.DEVICE_TYPE.END_PO INT
PCIE_C8_32BIT_RP 0x2a000000 0x2a001fff SYSTEM_CFG.PCIE_C8_CTL.PCIE_RP_A PPL_DM_TYPE_0.DEVICE_TYPE.ROOT_P ORT
PCIE_C8_32BIT_DMA 0x2a040000 0x2a07ffff SYSTEM
PCIE_C9_32BIT 0x2c000000 0x2dffffff SYSTEM
The problem appears to be related to PCIE_C8 BUS.
# lspci
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1)
0001:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0004:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0004:01:00.0 Non-Volatile memory controller: Device 1bc0:1002 (rev 01)
0007:00:00.0 PCI bridge: NVIDIA Corporation Device 229a (rev a1)
0007:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0008:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0008:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
0009:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0009:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
To confirm, we disabled PCIE C8 BUS at DTB and no error occurred.
pcie@140a0000 {/* C8 */
status = "disabled";
};
The following commits seem to be kernel changes in R35.3.1 → R35.4.1.
[drivers/pci/controller/dwc/pcie-tegra194.c]
commit d8913655d85710f9987bb47885f9fb2c14ccb12e
Author: Manikanta Maddireddy
AuthorDate: Thu Feb 2 15:31:37 2023 +0530
Revert "PCI: tegra194: Reduce AXI slave timeout value"
This reverts commit 492afa913684df8f212b44e16b819afc57b2818d.
CBB timeout for PCIe is reverted back to default value, 64 msec.
So, there is no need for PCIe completion timeout to be low(<10msec).
Revert the PCIe completion timeout back to default value.
Bug 4017244
However, Revert of this Commit did not resolve the problem.
Could you please let me know if there is a workaround for this TIMEOUT error? Thanks.
20230824_shutdown_cbb_timeout_verbose_log.txt (117.1 KB)