• info@bizmate.biz

ubuntu kernel updates and unstable nvidia drivers

ubuntu kernel updates and unstable nvidia drivers

Notice this page is an informal log of some of the debugging and troubleshooting for Nvidia drivers and Kernel updates problems. Nvidia provides Linux/Ubuntu drivers out of the box. Installing them is really easy such as running

`sudo ubuntu-drivers install` .

Quite often though these drivers might not be compatible or still have problems so here we go with some troubleshooting.

Ubuntu drivers not fully installed.

Sometimes despite ubuntu-drivers install suggesting the drivers are indeed already installed we might be in an unstable situation where a new kernel has been installed but when drivers are updated the kernel relative modules are not. As such we could run

`sudo dkms autoninstall && reboot`

 

Check if driver is indeed installed

Running nvidia-smi gives you an output of the currently installed drivers. If this application is not installed then the proprietary drivers are missing

$ nvidia-smi
Thu Dec 7 11:28:46 2023 
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2070 Off | 00000000:09:00.0 On | N/A |
| 0% 49C P8 22W / 175W | 1142MiB / 8192MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 6270 G /usr/lib/xorg/Xorg 497MiB |
| 0 N/A N/A 6433 G /usr/bin/gnome-shell 105MiB |
| 0 N/A N/A 7347 G /usr/bin/nextcloud 1MiB |
| 0 N/A N/A 7519 G cairo-dock 5MiB |
| 0 N/A N/A 7691 C+G ...83750398,4784756152597305294,262144 276MiB |
| 0 N/A N/A 26615 G ...zmate/jcef_26128.log --shared-files 13MiB |
| 0 N/A N/A 28209 G ...,WinRetrieveSuggestionsOnlyOnDemand 86MiB |
| 0 N/A N/A 40478 G /usr/lib/thunderbird/thunderbird 150MiB |
+---------------------------------------------------------------------------------------+

Troubleshooting errors by checking system messages

Very often boot up or start up errors are also recorded and they could help explaining why there is an error or conflict causing your Nvidia card from not working correctly, for instance displaying/detecting only one of the two monitors plugged in with your card. Journalctl is a great tool to check these errors. To debug from this messages buffer you can

journalctl -kb | less

In many cases you can see errors like

[   5.004707] nvidia-gpu 0000:05:00.3: i2c timeout error e0000000

You can then search for an error. Quite often some errors are indeed conflicts with other boot/start up processes. As such these modules can be blacklisted by adding modprobe specific entries such as blacklisting the ic2_nvidia_gpu if it errors

echo "blacklist i2c_nvidia_gpu" > /etc/modprobe.d/blacklist_i2c-nvidia-gpu.conf

Change Kernels

In some cases you might have an old or new kernel and the nvidia driver might be not fully compatible with this kernel. You can use a tooling called mainline, that installed on linux/ubuntu allows to install and set as the main kernel another one other than the one currently installed.

Bizmate