Friday, December 3, 2010

Deterministic parallelism and aliasing in ParaSail

There is growing interest in parallelization approaches that provide deterministic results.  See for example Deterministic Parallel Java.  In ParaSail, we have a somewhat different model.  Determinism should be provided for computations when it makes sense, but when the program is interactive or part of a real-time system, then clearly the environment is dynamic and time-dependent, and the notion of determinism becomes less relevant, and arguably not even desirable. 

It turns out that it is easy to know when non-determinism might be entering the execution of a ParaSail program.  It occurs only as a result of operations on concurrent objects.  Because ParaSail eliminates all race-conditions associated with non-concurrent objects at compile time, the results of a ParaSail program that manipulates only non-concurrent objects are inherently deterministic.  However, should the programmer decide to use a concurrent object, then the order in which operations on the concurrent object are performed by concurrent threads is unspecified, though the programmer can control it to some extent by specifying dequeue conditions when the operand is marked as queued.  (See blog posting parasail-concurrent-interfaces.html for more information on queued operands.)

Minimizing use of concurrent objects is generally a good idea anyway for performance reasons, as there is inevitably more overhead involved in coordinating access to a shared object, even when using lock-free approaches.  But how can a group of threads work in parallel to build up a large data structure, without the data structure itself being a concurrent object?  The normal answer to such a question is divide and conquer

What does divide and conquer really mean in ParaSail terms?  That means that a data structure needs to be conceptually divided up into non-overlapping parts and passed out to multiple threads for manipulation.  In ParaSail, a function can return a reference to a part of an object passed by reference as an input to the function.  However, the compiler needs to understand the aliasing implications of that, since in ParaSail, the compiler disallows race-conditions, which implies it must understand all aliasing related to non-concurrent objects at compile-time.  If two potentially concurrent computations are both updating parts of the same non-concurrent object, then the compiler must be able to convince itself that the two parts are non-overlapping (i.e. not aliased with one another).

Because there are no (re)assignable pointers in ParaSail, there are no cycles in data structures, so it is clear that Obj.Left and Obj.Right do not overlap.  On the other hand, when dealing with generalized containers, it is more difficult to know whether Container[X] and Container[Y] overlap, and even more complex when we are calling arbitrary functions that happen to return a reference to part of their input parameter, such as Lower_Left(Matrix) and Upper_Right(Matrix).  Clearly the programmer needs some way in a postcondition of such a function to convey the aliasing properties of the result.

ParaSail has a built-in, primitive, Basic_Array interface.  The compiler has built-in knowledge that, given an object BA based on the Basic_Array interface, BA[X] does not alias with BA[Y] if X != Y.  So one way to describe the "part" of a container that is included within the result of calling a function like "Lower_Left" is to describe the set of underlying array indices associated with the elements of the container that can be referenced from the result of Lower_Left.  More generally we can see that there is a set of primitive subcomponents accessible via any reference to a part of a container, and the aliasing problem comes down to proving that the subcomponent set from one reference does not overlap with the subcomponent set from some other reference.  This all implies that ParaSail needs a notation for identifying the set of accessible subcomponents of a reference, and a built-in Subcomponent_Set interface for specifying the values and the relationships between these subcomponent sets.

1 comment:

  1. Making interesting statements about whether parts of a container are non-overlapping seems closely related to the challenge of formalizing programs that manipulate heap-based data structures, where the indices into the container play the role of pointers into the heap.

    These days the leading candidate for the best way to formalize programs that manipulate heap-based data structures seems to be "separation logic." The best introduction to that I have found is the LICS 2002 paper by John Reynolds: