Nvidia CUDA
Contents |
CUDA
You most likely have some idea of what CUDA is all about or you wouldn't be here, but for a general introduction see:
http://en.wikipedia.org/wiki/CUDA
http://www.nvidia.com/object/fermi_architecture.html
In short, CUDA provides a language (CUDA C), compiler, SDK and run time environment to allow you to write general purpose C code which is executed in a massively parallel manner by taking advantage of the capabilities of NVIDIA GPUs. It is actually quite easy to write code for and requires only a basic or intermediate level proficiency with C programming and does not require any knowledge of OpenGL or other graphics languages whatsoever. If you have an NVIDIA 8 series or newer card, including many mobile chipsets, it most likely supports CUDA and you can start writing parallel code to run on the device more easily than you may realize. It is sitting there quietly, waiting for you to dive in and take advantage of it.
OpenCL
In comparison to Nvidia's CUDA, there is also OpenCL. Newer Nvidia GPUs, when using the correct drivers, also support OpenCL
http://www.khronos.org/opencl/
Nvidia is one of the companies developing OpenCL and you can find more information and driver at the Nvidia OpenCL page:
http://www.nvidia.com/object/cuda_opencl_new.html
AMD also supports OpenCL and describes the implementation of the language at:
http://www.amd.com/us/products/technologies/stream-technology/opencl/pages/opencl-intro.aspx
A short tutorial in OpenCL can be found here.
(You can use the GPU Caps utility to see if your GPU supports OpenCL.)
Tutorials
Here is a good CUDA intro tutorial
There is also the tutorial series "Supercomputing for the Masses" by Rob Farber (a senior scientist at Pacific Northwest National Laboratory) available here and also described in the Linux Journal article.
A 2011 Linux Journal article by Alejandro Segovia is available here
Of course, there is also the Official Programming Guide as a definitive reference.
I would also recommend CUDA by Example: An Introduction to General Purpose GPU Programming by Jason Sanders and Edward Kandrot as an excellent way to get started. I am reading this now and find it quite helpful.
Additionally, there are of course, the SDK code examples you can download and get started with immediately.
CUDA Toolkit 4.0 Update
Documentation is now at:
http://developer.nvidia.com/nvidia-gpu-computing-documentation
Release notes are now here
Getting Started Guide is now here
Earlier versions, release notes, etc. are at http://developer.nvidia.com/cuda-toolkit-archive
CUDA SDK 3.2 Update
UPDATE I just did a fresh install of Suse 11.3 64-bit and the new CUDA SDK 3.2 and found the following issues:
- Unfortunately, the CUDA SDK 3.2 release notes do not include the same mention of the missing symlinks as do the 3.1 release notes. Follow the procedure in the 3.1 notes, which is also described below.
- While I had hoped the new CUDA SDK 3.2 would be compatible with gcc 4.5, it is not as described in this Nvidia Forum thread. You will still need to install and configure gcc43 and gcc43-g++ as alternatives as described below in the section GCC Version Issues.
- Several X11 development libraries are required. On Suse 11.3 install these and their dependencies with the following:
zypper install libXi6-devel libXmu-devel xorg-x11-libXmu-devel xorg-x11-libXext-devel
- For RHEL the packages are:
yum install libXi-devel libXmu-devel libXext-devel freeglut freeglut-devel
Make Errors:
The error:
86_64-suse-linux/bin/ld: cannot find -lX11
Is caused from a missing xorg-x11-libXext-devel
Errors like:
iomanip(64): error: expected an expression
Are likely from using an incompatible version of gcc (like 4.5). This is best resolved by installed gcc43 as an alternative (below) or as described on the Nvidia post:
A simple workaround to compiling the rest of the SDK would be to remove the folder Interval from the SDK's src folder and put it somewhere in your home directory where you will remember it later. The complete list of programs that don't compile from the 3.2 SDK with GCC 4.5.1 are: Interval SobelFilter FunctionPointers
Errors regarding:
AC_PROG_LIBTOOL
Can be solved by installing libtool
The remaining setup for SDK 3.2 is as given for the previous version.
Install / Compile Issues
NOTE: Please see Nvidia-Settings for information on the Nvidia driver itself, module error codes, install options, etc.
Firstly, follow the very detailed Getting Started Linux guide to get your development environment setup. The documentation is quite good and provides all the basic info on installing the Nvidia driver, the CUDA Toolkit and SDK and setting up your paths for locating the binaries and libraries. All required files are available on the CUDA downloads page.
Note: Ensure your NVIDIA driver meets or exceeds the version required by the SDK. The version in your distro's restricted driver (non-OSS) repository may not be sufficient. I recommend installing the NVIDIA driver from the CUDA downloads area to prevent any potential trouble.
Missing Symlinks
(Applies to Red Hat, CentOS, OpenSUSE, etc.)
It is important to read the SDK 3.1 SDK release notes - really, stop now and read them. You will fine under Section III. (b) Known Issues on CUDA SDK for Linux that often there are a few key libraries which need to have symlinks created so the linker can find them. In my case this was required for libglut and libGLU. If missing you will see errors such as:
/usr/bin/ld: cannot find -lglut /usr/bin/ld: cannot find -lGLU
In either case, create the required symlinks:
#ln -s /usr/lib/libglut.so.3 /usr/lib/libglut.so #ln -s /usr/lib/libGLU.so.1 /usr/lib/libGLU.so #ln -s /usr/lib/libX11.so.6 /usr/lib/libX11 (Or /usr/lib64/ if running 64-bit)
If no write access to /usr/lib, create in another location and modify -L in Makefile (or add the path to the symlink to /etc/ld.so.conf and run ldconfig).
If you get a linking error: cannot find -lcuda then create this additional symlink (your library version may vary slightly):
#ln -s /usr/lib/libcuda.so.256.35 /usr/lib/libcuda.so
Compiling Order - Do what the Guide Says
Being an eager beaver, I decided to just compile the deviceQuery program first to make sure everything was working. This was a bad decision as it, like many others, requires shared objects which had not been created yet. As a result, I was getting compile errors such as:
paracelsus@Callandor:~/NVIDIA_GPU_Computing_SDK/C/src/deviceQuery> make deviceQuery.cpp:126:11: warning: extra tokens at end of #else directive deviceQuery.cpp:135:11: warning: extra tokens at end of #else directive /usr/lib/gcc/i586-suse-linux/4.5/../../../../i586-suse-linux/bin/ld: cannot find -lcutil_i386 collect2: ld returned 1 exit status make: *** [../../bin/linux/release/deviceQuery] Error 1
I resolved this by compiling in this order:
paracelsus@Callandor:~/NVIDIA_GPU_Computing_SDK/C/common> make paracelsus@bob:~/NVIDIA_GPU_Computing_SDK/shared> make paracelsus@Callandor:~/NVIDIA_GPU_Computing_SDK/C/src/deviceQuery> make
The real solution though, oddly enough, is to do what the Getting Started Guide said: You should compile them all by changing to NVIDIA_GPU_Computing_SDK/C in the userʹs home directory and typing make. The resulting binaries will be installed under the home directory in NVIDIA_GPU_Computing_SDK/C/bin/linux/release
So, just compile them all and avoid such problems:
paracelsus@Callandor:~/NVIDIA_GPU_Computing_SDK/C/make
GCC Version Issues
(For OpenSUSE 11.3, Fedora 13, Ubuntu 10.04, etc.)
If you are running a distro newer than the latest release of the CUDA Toolkit and SDK, then your versions of gcc and glibc may be newer and not compatible. This is generally resolveable by installing the earlier versions, and setting up your system to allow easy selection of which version to compile with. On my OpenSuse 11.3 install I used the following links to implement the solution which follows:
http://lukas.ahrenberg.se/archives/154
http://forums.nvidia.com/index.php?showtopic=157513
http://forums.nvidia.com/index.php?showtopic=170454
http://forums.nvidia.com/lofiversion/index.php?t50404.html
bob:~ # gcc --version
gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292] ### <== Too new, does not allow code examples to compile!
(Use Yast to install gcc43 gcc43-c++ gcc43-info and any required dependencies. My packages are as follows:)
bob:~ # rpm -qa | grep gcc
gcc-4.5-4.2.i586
gcc45-info-4.5.0_20100604-1.12.noarch
gcc43-4.3.4_20091019-3.1.i586
libgcc45-4.5.0_20100604-1.12.i586
gcc-info-4.5-4.2.i586
gcc45-4.5.0_20100604-1.12.i586
gcc43-c++-4.3.4_20091019-3.1.i586
libstlport_gcc4-4.6.2-6.1.i586
gcc-c++-4.5-4.2.i586
gcc43-info-4.3.4_20091019-3.1.i586
gcc45-c++-4.5.0_20100604-1.12.i586
(Now, set up both versions so you can easily select which one is the default:)
bob:~ #sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.3 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.3
bob:~ #sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.5 40 --slave /usr/bin/g++ g++ /usr/bin/g++-4.5
bob:~ # update-alternatives --config gcc
There are 2 alternatives which provide `gcc'.
Selection Alternative
-----------------------------------------------
*+ 1 /usr/bin/gcc-4.3
2 /usr/bin/gcc-4.5
Press enter to keep the default[*], or type selection number: 1
Using '/usr/bin/gcc-4.3' to provide 'gcc'.
bob:~ # gcc --version
gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] ## <== You are now set to use CUDA with SDK version 3.1
No X Windows Running?
If you are installing on a server, and do not have X running, you will need to ensure the required device files are created. The getting started guide provides the following script to accomplish such. You can save this as nvidia_setup.sh and then invoke it from /etc/rc.local during boot:
#!/bin/bash /sbin/modprobe nvidia if [ "$?" -eq 0 ]; then # Count the number of NVIDIA controllers found. NVDEVS=`lspci | grep -i NVIDIA` N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l` NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l` N=`expr $N3D + $NVGA - 1` for i in `seq 0 $N`; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 # Set GPUs to persistent mode so driver stays loaded nvidia-smi -pm 1 else exit 1 fi
MultiGPU Systems
If you have multiple GPUs, you can select which ones are available to the runtime environment with:
$ export CUDA_VISIBLE_DEVICES=0,1,3
This will mask GPU 2. This does not change what nvidia-smi -L shows, but does seem to direct which GPU the runtime environment uses.
OS X
Some OS X OpenCL documentation is here
CUDA vs OpenCL
http://www.infoworld.com/d/developer-world/cuda-and-opencl-265
GPU Caps Viewer
This utility polls GPUs and returns the device capabilities.
This utility can run under Wine on Linux as well, at least as of version 1.7:
http://www.geeks3d.com/20090414/gpu-caps-viewer-170-available-with-cuda-support/
http://www.ozone3d.net/gpu_caps_viewer/index.php#screens
Now what?
So what else can you do with CUDA? What applications exist out there to sink the teeth of your GPU into? Well, the list is growing all the time, but I've started a page where I'll be adding some too as I discover them, and you can find it at CUDA Applications
Check out CUDA Data Parallel Primitives Library - CUDPP
Explore the CUDA Performance Profiler
VirtualGL
If you wish to view OpenGL simulations remotely you will quickly discover that OpenGL is actually rendered on the client. Although the computations may be done on the remote system, all rendering commands are sent to the client to be displayed locally, essentially killing performance. VirtualGL provides a solution to this by rendering results on the server, storing them in a pixel buffer and transferring that to the client. It does so in a clever way, by attaching a loadable module to the binary run which intercepts the OpenGL calls and redirects them locally.
See the official VirtualGL site for complete details. There is also some useful information in this Sun documentation and this Nvidia forum thread.
(The newest version does not require the libjpeg-turbo. There are several steps required to configure X, so see the install guide prior to trying the below commands.)
To start a VGL forwarded SSH connection from the client:
/opt/VirtualGL/bin/vglconnect -force paracelsus@10.100.10.48
Then, start the OpenGL app on the server with vglrun:
paracelsus@bob:~> cd NVIDIA_GPU_Computing_SDK/C/bin/linux/release/ paracelsus@bob:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release> /opt/VirtualGL/bin/vglrun ./oceanFFT
Overclocking
Coolbits
You can enable overclocking via the nvidia-settings utility by simply adding an option to your xorg.conf Nvidia Device section:
Option "Coolbits" "1"
Restart X and the nvidia-setting GUI will now have overclocking options.
http://www.overclockers.com/forums/showthread.php?t=605405
http://www.phoronix.com/scan.php?page=article&item=197&num=1
NVClock
http://www.linuxhardware.org/nvclock/
Provides basic overclocking capabilities, though newer cards may not be supported. (Unfortunately, the prospects of porting other overclocking tools such as EVGA's Precision utility are not showing much promise at this time. This does not run under Wine and a Linux implementation is not planned per this post.
The qt version seems to be broken in ./configure but the gtk version compiles fine. Command line and GTK versions binaries are:
nvclock0.8b4> src/gtk/nvclock_gtk
nvclock0.8b4> src/nvclock
#!/bin/bash ## This will overclock the GPU to 300 and Memory to 400 – Change accordingly! nvclock -b coolbits -n 300.000 -m 400.000
Language Bindings
Looking for more ways to leverage CUDAs capabilities from other languages? Try these resources:
Kappa Framework:
CUDA Library for R:
http://brainarray.mbni.med.umich.edu/brainarray/rgpgpu/
CUDA Library for IDL:
PyCUDA:
http://mathema.tician.de/software/pycuda
Books
Programming Massively Parallel Processors: A Hands-on Approach
ISBN-13: 978-0123814722
CUDA by Example: An Introduction to General-Purpose GPU Programming
ISBN-13: 978-0131387683
Scientific Computing with Multicore and Accelerators
ISBN: 978-1-4398253-6-5