Problems with nvidia-driver-470-server on Ubuntu 20.04 with GeForce RTX 2060
I’m using ubuntu 20.04 . I had the driver “nvidia-driver-470-server” but the GPU wasn’t being recognized by tensorflow, even though it was by “nvidia-smi”. So I upgraded to the driver “nvidia-driver-535-server”, and now it recognizes the GPU but I have some new errors/issues.
1- Nvidia settings still shows the version 470 and I had to reinstall it manually. It seemed to have been removed when I changed to the new driver.
❯ nvidia-settings --version
nvidia-settings: version 470.57.01
The NVIDIA X Server Settings tool.
This program is used to configure the NVIDIA Linux graphics driver.
For more detail, please see the nvidia-settings(1) man page.
2- It seems whenever I do a GPU related operation in tensorflow I get the message below. The url asks me to report it to Nvidia.
2023-08-11 21:49:12.192557: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
3- Now when I shutdown my PC I get these messages, that won’t let me shutdown:
[ 917.213263] alx 0000:07:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
[ 917.213269] alx 0000:07:00.0: AER: device [1969:e0b1] error status/mask=00000080/00002000
[ 917.213273] alx 0000:07:00.0: AER: [ 7] BadDLLP
[ 917.218152] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:07:00.0
(I didn’t use the installer, just the Ubuntu’s “Additional Drivers” but I did try to use it when nothing else was working, so I included the logs. I don’t think it went far enough to cause any issue)
nvidia-installer.log (1.2 KB)
nvidia-bug-report.log.gz (384.8 KB)
Thank you in advance.