Monday, January 5, 2015

Compiling ParaSail to LLVM, first in a series...

We now have an LLVM-based compiler for ParaSail, thanks to great work this past summer by Justin Hendrick, a summer intern from Cornell.  To download the latest source release for both the ParaSail interpreter and the LLVM-based compiler, visit http://parasail-lang.org 

There are a number of interesting design issues associated with creating a compiler for a pointer-free, pervasively parallel language, and it seemed worth writing about some of them here.  So here is the first in a series of such blog entries.

Clearly pervasive fine-grained parallelism is one of the things that makes ParaSail interesting and a bit different from typical languages.  So let's start with that -- how are picothreads, or parallel work items (as we have taken to describing them in some contexts), represented in the generated LLVM?  The ParaSail front end already does the work to decide where it is worth creating a separate thread of control as part of generating instructions for the ParaSail Virtual Machine (PSVM instructions).  In the PSVM, these are represented as nested blocks within a single PSVM routine.  That is, a single ParaSail operation (func or op) produces a single PSVM routine, but that routine may have zero or more nested blocks to represent code suitable for execution by a separate picothread.  A nested block is invoked by the Start_Parallel_Op or Add_Parallel_Op PSVM instructions, awaited by the Wait_For_Parallel_Op, and terminated by an Exit_Op instruction.  On the other hand, a whole routine is invoked by a Call_Op, Start_Parallel_Call_Op, or Add_Parallel_Call_Op instruction, and terminated by the Return_Op instruction -- Wait_For_Parallel_Op is used for parallel invocations of both nested blocks and whole routines.

At the LLVM level, we made the decision to make the calling sequence essentially the same whether it was an invocation of a nested block or a routine.  Every such call has three arguments.  These are the Context, the Param_List, and the Static_Link.  The Param_List points to a stack-based parameter list, where the first parameter is the result, if any, and the other parameters are the input parameters.  The Context points to a data structure that is created when a new picothread is created, and is updated when a new storage region is created (more about that in a later blog entry).  The Context is effectively the per-picothread information.  The Static_Link either points at a type descriptor, for a routine that corresponds to an operation of a module, or to the enclosing stack frame, for a nested block or a routine that corresponds to a nested operation.  The first word of a stack frame holds a copy of the passed-in static link, so this supports up-level references across multiple levels, as well as access to the type descriptor passed into an enclosing module operation.

Type descriptors are fundamental to the ParaSail run-time model.  Remember that at the source level, all ParaSail modules are considered parameterized, and a ParaSail type is produced by instantiating a module.  Operations are generally associated with modules, and unlike in C++ or Ada, instantiating a module does not produce a new set of code for these operations.  Instead, the code is generated once for a given module, and then used for all instantiations.  But that means that the code needs to be passed some information that identifies the particular instantiation for which it is being executed.  This is what a type descriptor provides -- all of the information that is dependent on the particular instantiation being used. 

For example, a Set module might have various operations like Union and Intersection and Insert and Count, but the type of the elements being manipulated is determined by the particular instantiation, be it Set<Integer>, Set<String>, or Set<Map<String, Tree<Float>>>.  Presuming it is a hashed Set, the only requirements on the element type is that it have the operations of the abstract interface Hashable, which generally means Hash and "=?" (aka compare).  So when we call the Insert operation of Set, we pass it the type descriptor (as the Static_Link argument) for the particular instance of Set we are using.  Since Set only has one type parameter, the type descriptor of Set consists primarily of a reference to a type descriptor for this one type parameter.  In addition to having references to the type descriptors for the type parameters, a type descriptor also includes the equivalent of a C++ virtual function table.  That is, for each of the operations of Set, we have a reference to the (compiled or interpreted) routine for that operation.  This is not generally needed for the code within Set itself, since that code generally knows at compile time what routines are used for all of the other Set operations.  However, since we also use a type descriptor for the type parameter of Set, we will need to use the routine table within that type descriptor to gain access to the Hash and "=?" operation for the particular element type being used.  So in the type descriptor for Set<Integer>, there will be a reference to the type descriptor for Integer, and from there you can gain access to the Integer Hash and Integer "=?" routines, which will certainly be needed at least for the Insert routine.

One last wrinkle on type descriptors -- because ParaSail uses a variant of the class-and-interface model of inheritance, it supports multiple inheritance, so different types that implement the Hashable interface might have different sets of routines in their routine table.  So how does the code for the Insert routine know where it can find the Hash and "=?" routines for Integer or String, given that they might also have many other routines?  The solution here is an extra level of indirection.  We actually have two types of type descriptors, one is the full type descriptor, and has all of the information so far described (and more, such as nested constants).  The second kind of type descriptor is called an operation map (or op map for short).  This consists of a reference to the full type descriptor for the type, plus a simple mapping from an interface's operation index to the underlying type's operation index.  If we sequentially number the operations of any given module's interface, then that gives us an operation index which we can use at run-time for selecting from the routine table in a type descriptor for an instance of that module.  The op map provides the necessary conversion when the operations defined in the abstract interface (specified for the element's type when declaring a module like Set) are numbered differently than the actual element type's operations.  An op-map for a type that implements Hashable might be defined by a simple mapping such as (1 => 5, 2 => 3), meaning that the first operation of Hashable is in fact the fifth operation of the actual type, and the second operation of Hashable is the third operation of the actual type.  An op-map provides for constant-time execution of calls on operations, even in the presence of multiple inheritance.

Note that this op-map approach does not work for languages where all objects carry around a reference to their type descriptor at run-time, since a single type descriptor must work for all uses of the object.  One of the features of ParaSail is that typical objects have no type descriptor overhead.  The compiler generally knows where to find a type descriptor for any object, without the object having to carry it around.  The one exception to this are explicitly polymorphic objects.  In ParaSail these are declared with a syntax of a type name followed by a "+".   For example, Set<Imageable+> would be a set of objects that implement the Imageable interface, where each object might be of a different type.  This means that each Imageable+ object consists of a reference to the underlying object, plus a reference to a type descriptor identifying the actual type of the object.  But even these sorts of objects can preserve the constant-time execution of operation calls, because an object is polymorphic for a particular interface (Imageable in this case), and so the type descriptor that such a polymorphic object carries around will likely be an op-map tailored to the particular interface being implemented.   The ParaSail Imageable interface actually has four operations (To_String, From_String, Hash, and "=?"), so such an op-map might look like (1 => 13, 2 => 5, 3 =>9, 4 => 11), which provides constant-time access to any of the operations of Imageable.

One final point worth noting -- type descriptors are used in various contexts, as suggested above (as a static link, as a type parameter, and for run-time-type identification).  In most of these contexts op-maps can be used instead of a full type descriptor.  An important exception to this is a type descriptor used as a static link.  In that case, it is always a full type descriptor, and the compiled code can rely on that fact.

2 comments:

  1. For a somewhat higher-level (but less up-to-date) introduction to the ParaSail Virtual Machine(PSVM), see the blog entry which first described it: 2010/11/virtual-machine-for-parasail-with.html

    ReplyDelete