Crafty Chess Compiler Benchmarks
After discovering this interesting Macles* blog article discussing using the Intel C compiler for Linux to compile the Crafty chess engine, I wanted to replicate this authors steps and benchmarks.
The basic premise is to compile Crafty with both the Intel C compiler and gcc with similar CFLAGS and benchmark the results. I am comparing my results to Macles* using I believe identical hardware platforms (the Aspire One).
I am compiling and testing on my new Acer Aspire One, a new entry in the Ultra Mobile PC class of tiny notebooks. I have been waiting to buy one until they were available with mutli-threaded processors, and the Aspire One has the Intel Atom CPU.
Crafty provides a perfect experimental application as it compiles quickly and runs with extremely high load, and if compiled with support will run multi-threaded.
Please see Intel C Compiler for Linux for a general article on how to obtain and set up the Intel C compiler for Linux first. The rest assumes this is already installed and the environment set up to use the icc.
Compiling Crafty and CFLAGS
NOTE: ICC version 12 now supports the documented switch -xSSE3_ATOM for the Atom. This article on Macles site has some info on benchmarks with this optimization.
I downloaded the Crafty source from: http://www.craftychess.com/ and modified the makefile to alter the CFLAGS. Running make by default will attempt to make-icc using the Intel C compiler profile, however with the default make file options I could not get it to compile correctly. I changed the makefile to:
linux-icc: $(MAKE) target=LINUX \ CC=icc CXX=icc \ CFLAGS='$(CFLAGS) -O3 \ -xL \ ' \ CXFLAGS='$(CFLAGS) -O3 \ -xL ' \ LDFLAGS='$(LDFLAGS) -lstdc++' \ opt='$(opt) -DTEST -DINLINE32 -DCPUS=2 ' \ crafty-make
The above CFLAGS -O3 -xL are intended to compile with high optimization -03 and using the CFLAG for the Atom architecture -xL However, including the -xL resulted in this error:
egtb.cpp(5206): (col. 9) remark: LOOP WAS VECTORIZED. icc -lstdc++ -o crafty crafty.o egtb.o -lm crafty.o: In function `ValidatePosition.': crafty.c:(.text+0x7926): undefined reference to `__svml_irem4' crafty.o: In function `DisplayChessBoard.': crafty.c:(.text+0x1bfb3): undefined reference to `__svml_irem4' make: *** [crafty] Error 1 make: Leaving directory `/mnt/home/Downloads/crafty-22.1' make: *** [crafty-make] Error 2 make: Leaving directory `/mnt/home/Downloads/crafty-22.1' make: *** [linux-icc] Error 2 make: Leaving directory `/mnt/home/Downloads/crafty-22.1' make: *** [default] Error 2
Now, by removing the -xL flag it compiles fine, however my performance running crafty in both single and multi-threaded mode fell far short of the results obtained at the blog site I was using for a guide.
Also, the same error occurs with trying to use the -ipo flag which is discussed in the Intel paper Optimizing Applications with Intel® C++ and Fortran Compilers for Windows*, Linux*, and Mac OS* Version 10.x available here. The lack of inter procedural optimization in running Crafty is likely a large performance hit.
The benchmark capability is build into crafty, but appears missing from any official documentation I could find. Documentation and manual page was not included in the source code for version 21, but version 18 documentation and man page is available here - though the bench function is omitted.
Benchmarking Crafty was done by running the compiled client and then entering:
[root@localhost crafty-22.1]# crafty Crafty v22.1 (1 cpus) White(1): bench Running benchmark. . . ...... Total nodes: 50402631 Raw nodes per second: 413780.732288 Total elapsed time: 121.810000 White(1): smpmt=2 max threads set to 2 White(1): bench Running benchmark. . . ...... Total nodes: 56588055 Raw nodes per second: 597866.402536 Total elapsed time: 94.650000
After setting smpmt=2 and running the benchmark again, two threads were visible in top and the total nodes were significantly higher at 56588055, so while it was running with multi-threaded support (even without compiling with -xL ) it was still significantly lower than Macles* which obtain a result of approx. 6700000 +/- nodes per second. My results compiling with the Intel compiler without -xL were slightly under the gcc results Macles* obtained on the same Aspire One hardware platform. So it would appear getting this to work with the -xL or -ipo flags is significant.
I will update this as I learn more about resolving the compile issues and post further benchmarks.
As I like chess and might want to actually play using the Optimized Crafty engine I checked out xboard. However, under Fedora Core 8 on the Aspire One, xboard was unable to find crafty as the engine, even if specifying the -fd path to it. Starting xboard like this:
xboard -fcp crafty -fd /mnt/home/Downloads/crafty-22.1/crafty &
Would result in "Error writing to first chess program: Broken pipe" and "Failed to find the first chess program crafty on localhost No such file or directory" despite the fact the path was correct and the crafty binary was there.
To resolve this I just copied the compiled crafty binary to a location in my path, for example /usr/local/bin and xboard was able to find it. Odd it does not despite specifying the directory with -fd, but oh well.