Nvidia-Settings

Contents

Nvidia Driver and Configuration Sites

The below sites are excellent references for Nvidia cards, drivers, and configuration settings:

http://us.download.nvidia.com/XFree86/Linux-x86/270.26/README/index.html

Really, read the driver notes above - they are packed with information. Just substitute your version as needed.

http://en.gentoo-wiki.com/wiki/Nvidia

https://wiki.archlinux.org/index.php/NVIDIA

Bug reporting

nvidia-bug-report.sh will collect information from dmesg, proc, Xorg.conf and log, /var/log/messages, etc. and compile it all together for posting on the NV forums, etc.

Xid Errors

From the 270.41.19 Driver README

My kernel log contains messages that are prefixed with "Xid"; what do these messages mean?
	
"Xid" messages indicate that a general GPU error occurred, most often due to the driver misprogramming the GPU or to corruption of the commands sent to the GPU. These messages provide diagnostic information that can be used by NVIDIA to aid in debugging reported problems.

I have been unable to find any reference which defines the Xid error classes.

In one instance, I had a GPU which would show up fine in nvidia-smi, but any attempt to actually use it, and any use of cudaMalloc would seg fault, logging:

NVRM: Xid (0000:02:00):48, An uncorrectable double bit error (DBE) has been detected on GPU0 (00 00 00).

Turns out it appears that one of the 8 pin power harnesses, which was actually a 6+2 pin harness, did not have the 2 pin harness connected, and so the GPU was slightly underpowered. At least that seems to be the case.

Kernel Interfaces

Also from the driver readme:

How can I see the source code to the kernel interface layer?

The source files to the kernel interface layer are in the kernel directory of the extracted .run file. To get to these sources, run:

    # sh NVIDIA-Linux-x86-270.41.19.run --extract-only
    # cd NVIDIA-Linux-x86-270.41.19/kernel/

Persistence Mode

When running CUDA on a compute node, on which X Windows is not running, the Nvidia driver is loaded when needed, and then unloaded. This results in several seconds of sys time overhead when launching kernels, which can be eliminated by (as root) setting the GPUs to persistent mode (requires CUDA 4.0 I believe).

This eliminates reloading the driver on each kernel launch. (If a kernel crashes, you may need to manually unset and reset this.) It is unset by default.

$ time ./arraycp-gpu0 

real    0m5.650s
user    0m0.002s
sys     0m5.500s

# nvidia-smi -pm 1
Enabled persistence mode for GPU 0:2:0.
Enabled persistence mode for GPU 0:3:0.

$ time ./arraycp-gpu0 

real    0m0.374s
user    0m0.006s
sys     0m0.367s

Manual Fan Control for nVIDIA Settings

NOTE: Unless you have cause, leave thermal management alone else you risk damaging your GPU.

(This entire section is duplicated from the Gentoo Wiki site here. Once again, I've found the Gentoo wiki to be without peer.

Some combinations of nvidia cards and driver versions report that fan-speed is "variable", but do not actually ever change the fan speed regardless of temperature. If you experience an unreasonably hot GPU and nvidia-settings reports your fan speed as "Variable" but never leaves its assigned value, try the below.

It's probably a good idea to read about the CoolBits option before we begin. Take a look at the nvidia-settings manual (man nvidia-settings), and the nvidia-drivers manual, available at http://us.download.nvidia.com/XFree86/Linux-x86/195.36.24/README/xconfigoptions.html

(adjust the version in the URL as appropriate - be careful about looking at out-of-date documentation about the CoolBits option!)

The CoolBits setting is also used for overclocking - make sure that you have checked this guide against the docs linked above (update this article if things have changed)

File|/etc/X11/xorg.conf

Section "Device"
     ...
     Option "Coolbits" "5" # Check Coolbits documentation for your driver version.
     ...
EndSection

If your card is described in multiple "Device" sections, put the above in each of them.

Be careful setting fan speed manually - it is possible to break your card by letting it get too hot!

Inside X, run nvidia-settings. You should now find in the "Thermal Settings" section, "GPU Fan Settings" controls. My suggestion is to crank this up to 100.

You may also modify your fan speed from the command line;

Enable GPU fan control:

nvidia-settings -a [gpu:0]/GPUFanControlState=1

Find out the fan's resource id using:

nvidia-settings -q fans

Then set the speed using:

nvidia-settings -a [fan:0]/GPUCurrentFanSpeed=<n>

Where <n> is percentage of full speed.

These settings will not be permanent - to have them take effect every time that X is launched, add the below to your ~/.xinitrc File|~/.xinitrc

nvidia-settings \
	-a "[gpu:0]/GPUFanControlState=1" \
	-a "[fan:0]/GPUCurrentFanSpeed=100" &

KDE 4 users will need to add a symlink to ~/.xinitrc in the Autostart directory since ~/.xinitrc isn't sourced by KDM:

cd ~/.kde4/Autostart
ln -s ~/.xinitrc xinitrc
chmod +x ~/.xinitrc

If ~/.xinitrc is not being autostarted, then make sure your ~/.xinitrc has a shebang (#!/bin/sh) at the top.

Reference Sits for Thermal Settings:

http://us.download.nvidia.com/XFree86/Linux-x86/195.36.24/README/xconfigoptions.html

http://forums.nvidia.com/index.php?showtopic=165327

Python Fan Control Script

A python script, which uses the above nvidia-settings options, can be found at:

http://www.evga.com/forums/tm.aspx?m=903040&mpage=1

Headless Fan Control

For multiple GPUs, which are not attached to monitor, see the following:

http://blog.cryptohaze.com/2011/02/nvidia-fan-speed-control-for-headless.html

http://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness

Custom EDID

http://analogbit.com/fix_nvidia_edid

Interesting bit on VGA vs. HDMI and HDCP, etc.

http://www.avsforum.com/avs-vb/showthread.php?t=997923

Looking for something?

Use the form below to search the wiki:

 

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!