1. How do I fix a "java.lang.OutOfMemoryError" while compiling?
  2. What is the line { val IR: SimpleVectorScalaOpsPkgExp } in code generator traits for?
  3. What does { this: SimpleVectorApplication => } mean?
  4. What are the methods syms and boundSyms for?
  5. What are reifyEffect(..) and reflectEffect(..) for?
  6. What are the copyTransformedOrElse and mirror methods for?
  7. How to benefit from loop fusion?

How do I fix a "java.lang.OutOfMemoryError" while compiling?"

Modify the sbt startup script (typically located at ~/bin/sbt) to increase the available JVM memory. A typical script that should work on most machines (with enough physical memory) is:

java -Xmx3g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=128m -jar `dirname $0`/sbt-launch.jar "$@"

What is the line { val IR: SimpleVectorScalaOpsPkgExp } in code generator traits for?

This line declares an abstract value IR of type SimpleVectorScalaOpsPkgExp. It is instantiated in the getCodeGenPkg method to be the value of the DeliteApplication instance that mixes together all of the DSL traits - in this way, the IR definitions are injected into the code generator traits so that they can be referenced without being the same object. Another way of thinking about this is that when all of the traits are finally mixed in, we have two "super" objects. One contains all of the application and IR definitions (DeliteApplication, *Lift, *Ops, *OpsExp, *ImplOps traits), and one contains all the code generators (*CodeGenPkg, *CodeGen{Target}). The IR object defines a method getCodeGenPkg that returns an instance of the code generator object, and the code generator object contains a reference to the IR object via the IR val.

The benefit of keeping code generators in a separate object from the IR is that we can associate multiple generator objects with a single IR object instance and let each generator operate independently.

What does { this: SimpleVectorApplication => } mean?

This declaration is called a self-type declaration in Scala. It means that this trait must eventually be mixed in with a trait of type SimpleVectorApplication when it is instantiated (and therefore declarations from SimpleVectorApplication can be used in this scope). The compiler will check that all traits that extend this one have the same self-type declaration, and that any instantiations of this trait do indeed mix-in SimpleVectorApplication. Note that this is different from inheriting from SimpleVectorApplication; in particular, there is no subtyping relationship between a trait with a self-type declaration and the trait named in the declaration.

What are the methods syms and boundSyms for?

syms and boundSyms are methods used to communicate dependency information about ops to the framework. In most cases (when using Delite ops), you don't need to define syms and boundSyms, because we define them for you in DeliteOps.scala. The idea is that syms represents true input dependencies to an op, while boundSyms represents symbols that are bound (created) within the scope of an op and other symbols that should not be hoisted out of the op.

For example, consider the RangeForeach op (for (i -> 0 until 100) ):

case class RangeForeach(start: Exp[Int], end: Exp[Int], i: Sym[Int], body: Exp[Unit]) 
  extends Def[Unit]

  override def syms(e: Any): List[Sym[Any]] = e match {
    case RangeForeach(start, end, i, body) => syms(start):::syms(end):::syms(body)
    case _ => super.syms(e)

  override def boundSyms(e: Any): List[Sym[Any]] = e match {
    case RangeForeach(start, end, i, y) => i :: effectSyms(y)
    case _ => super.boundSyms(e)

What we are saying here is that i is a bound symbol; it is created inside the RangeForeach op, and therefore is not a dependency *to* the RangeForeach op (its definition does not need to have been emitted before we emit RangeForeach). Thus, i is left out of the syms definition, and included in the boundSyms definition. We also say that any effectful statements inside the RangeForeach body are bound to the body - we don't want to allow these to be hoisted out because that would be unsafe.

What are reifyEffect(..) and reflectEffect(..) for?

At a high level, ReflectXXX and ReifyXXX are used to convert control dependencies into data dependencies. We always keep a context of effects in the current scope in the order they have happened (as denoted by a reflectEffect(..) or reflectWrite(...) call). ReifyXXX then aggregates all of the effects in a given scope by creating an IR node that contains the list of effects that happened in the argument to the ReifyXXX method. When we schedule the program, a Reify node will depend on all of the effects in its list, forcing them to be emitted before the result of the Reify. It may be a little easier to understand with an example:

val block = {
  println("hello world")

  while ( i < 100 ) {
      i += 1

There are two scopes inside 'block', each with different effects. The effects at the outer scope are "println('hello world')" and "while (...)". The effect in the inner scope in "println(i)". Reflect and Reify capture this as follows:

override def __whileDo(cond: => Exp[Boolean], body: => Rep[Unit]) {
    val c = reifyEffects(cond)
    val a = reifyEffects(body)
    val ce = summarizeEffects(c)
    val ae = summarizeEffects(a)
    reflectEffect(While(c, a), ce andThen ((ae andThen ce).star))

This means that when we create a While node, we first symbolically evaluate each function parameter and keep track of what effects happened (using Reify). Then, we say that the While itself is effectful, and the type of effect it has depends on the effects that its parameters had. In the end, we will end up with a representation that looks roughly like this:

Reify((), List(Reflect(Println(..)), Reflect(While(LTE(i,100), Reify(Reflect(Println(i))))))

which means that the result of 'block' was Unit (), but 'block' also had two effects. The representation of the While effect is a pure condition and an effectful body. The effects in the body are encapsulated in the inner Reify node.

What are the copyTransformedOrElse and mirror methods for?

copyTransformedOrElse is used in the fusion algorithm and works along with the method mirror. The idea is that that fusion has to rewrite the IR to modify where inputs and outputs go when nodes are fused together. mirror tells the framework how to create a new instance of a given IR node with some transformed parameters. copyTransformedOrElse allows mirror to use a previous symbol if it already exists instead of creating a new symbol everytime and having to maintain relations between all these fresh symbols. For example, let's say I have a VectorPlus node. It has a size symbol. If I mirror this node with different inputs (in order to create a fused version of it and something else), I want to reuse my original size symbol instead of creating a new one and trying to remember that the new size symbol is actually exactly the same as the old one. That's what copyTransformedOrElse does.

We know that copyTransformed and mirror are not particular user friendly, and we would like to hide this from the DSL developer completely. We are also working on developing a more robust analysis and transformation API that abstracts the details of rewriting the IR like this. Hopefully, as these come together, this situation will dramatically improve.

How to benefit from loop fusion?

Delite supports loop fusion through the trait AbstractLoop from the underlying LMS framework. Operations like DeliteOpMap, DeliteOpReduce etc. which are the basic building blocks of parallel DSLs all extend from the AbstractLoop and are candidates for loop fusion.

In the optimization phase the framework tries to apply both vertical (output of one loop is iterated in the other loop) and horizontal (loops have the same range over different inputs) fusion. Vertical fusion is applied only if there are no negative dependencies among loops and at most one loop has side-effects.

Since loop fusion heavily transforms the code it is the first place where mirroring and effects errors appear. To debug these errors make sure that LMS verbosity level (`lms.verbosity` flag) is set to 1 and that loop fusion is enabled (`delite.enable.fusion` flag). Most common errors that happen are:

  • Loops do not fuse - either there is an accidental negative dependency among loops (search for "wtableneg" in logs) or some symbols have unwanted effects.
  • Code is missing after fusion - there is probably a mirroring error. You can check the logs for mirroring warnings which will show which symbol is badly transformed.