Getting started with CUDA
The following instructions are for the Unix and OSX platforms. Windows support is coming soon.
In order to utilize the GPU, Delite requires that CUDA 3.2 or later be installed. In addition Delite also requires the CUDA libraries Thrust and cuBLAS. These libraries are included automatically with the CUDA toolkit download as of CUDA 4.0.
Once installed, you must add the information necessary for Delite to connect to the CUDA installation in Delite's configuration files. From the Delite home directory navigate to config/delite
and create CUDA.xml
with a text editor. An example template is available in this directory that you can copy and then modify. You must now add 3 pieces of information:
<compiler>
tag, add the absolute path to nvcc
. For example
<compiler> /usr/local/cuda/bin/nvcc </compiler>
<arch>
tag, specify the compute capability of the installed gpu (e.g., "2.0" for Fermi devices or "1.x" for Tesla devices)<headers>
tag, add the path to "jni.h"
of your current JDK distribution as shown below:
<path> ${JAVA_HOME}/include </path>
"jni_md.h"
(/include/linux for Linux and /include/darwin for OS X):
<path> ${JAVA_HOME}/include/linux </path>
Almost done. Now just open up cuBLAS.xml
and modify the <compiler>
tag with the path to nvcc
and the <arch>
tag with your GPU's compute capability just as you did in CUDA.xml
.
Congratulations! You're now ready to run OptiML applications on the GPU. You can enable or disable gpu execution for each application. In order to enable CUDA code generation in the compiler use the "--cuda"
flag when calling delitec
:
delitec AppRunner --cuda
With this flag the compiler generates both Scala and CUDA versions of the application kernels. You still have the option to run the application using only the CPU(s) or the CPU(s) and GPU. In order to utilize the GPU when running the application, add the "--cuda 1"
flag when calling delite
:
delite AppRunner [app_params] --cuda 1
Getting started with BLAS
OptiML can optionally utilize BLAS libraries for performing certain linear algebra operations on the CPU. It is highly recommended that you provide OptiML with a BLAS implementation as certain dense matrix operations (e.g., matrix multiplication) are significantly faster than OptiML's default implementation. You should be able to use any implementation of BLAS that adheres to the CBLAS specification (e.g., MKL, GotoBLAS). Once BLAS is installed on your machine, from the Delite home directory navigate to config/delite
and create BLAS.xml
in a text editor. You must now add a few pieces of information so that Delite can link to BLAS when compiling. A few example templates are available in this directory that you can copy and then modify.
<compiler>
tag, add the absolute path to the C compiler installed on your machine (e.g., gcc
, icc
) <headers>
tag, add the path to "jni.h"
of your current JDK distribution as shown below:
<path> ${JAVA_HOME}/include </path>
"jni_md.h"
(/include/linux for Linux and /include/darwin for OS X):
<path> ${JAVA_HOME}/include/linux </path>
<headers>
, add the name of the header file that contains the CBLAS interface within the <include>
tag. This is typically "cblas.h"
(or "mkl.h"
if using MKL). <libs>
tag, add all paths to BLAS library directories within <path>
tags. <library>
tags (e.g., -lcblas
). Delite can only be linked with shared object libraries (should end in ".so"
rather than ".a"
). Congratulations! Your OptiML applications will now be accelerated with BLAS. To ensure that the OptiML compiler is using BLAS libraries when possible, make sure that the option "no_blas=False"
is included in the echoed options when running delitec
.