Friday, December 17, 2010

Parallel Quicksort in ParaSail and "slicing"

Quicksort is a classic "divide-and-conquer" algorithm, and lends itself naturally to parallelization.  However, unlike some of the other algorithms we have discussed in past postings, it is updating rather than merely walking a data structure.  This means that we need a way to safely partition the array to be sorted so that the ParaSail compiler knows that the threads are manipulating non-overlapping parts of the array, so it can be certain there are no race conditions possible.

In several languages there is the notion of an array slice.  In particular, in Fortran and Ada there is built-in syntax for selecting a contiguous part of an array.  In newer versions of Fortran this can be done on multi-dimensional arrays.  In Ada slicing is limited to single-dimensional arrays.  In both Fortran and Ada, array slices are updatable, that is they effectively represent a view of the underlying array.  APL (not surprisingly!) has all of these capabilities and more.  In many other languages there are operations for selecting a slice of a string (and in some cases arrays), but these are generally not updatable views, and hence cannot be used on the left-hand side of an assignment nor be passed as an in out parameter.  In some cases there are explicit "replace-slice" operations which do allow for updating a slice of an array.

In any case, in ParaSail, because we expect the compiler to catch race-conditions at compile-time, operations that return a view of a part of an object need to have well-defined properties so the compiler can correctly determine whether two different views have any overlap (i.e. are potentially aliased).  The simplest solution seems to be to recognize a "slicing" operator, similar to the "indexing" operator.  A "slicing" operator would take an indexable container, and then one or more operands at least one of which is an interval or a set, and return a view of the container that represents the subset of components identified by the operand(s).  For example, this might be an interface for a One_Dim_Array with both "indexing" and "slicing" operators:
interface One_Dim_Array
  <Element_Type is Assignable<>;
   Index_Type is Discrete<>> is

    function Bounds(A : One_Dim_Array) -> Interval<Index_Type>; 

    function Create(Bounds : Interval<Index_type>)
      -> One_Dim_Array {Bounds(Create) == Bounds};

    operator "indexing"(
       A : ref One_Dim_Array;
       Index : Index_Type {Index in Bounds(A)})
     -> ref Element_Type;

    operator "slicing"(
       A : ref One_Dim_Array;
       Slice : Interval<Index_Type> {Is_Subset(Slice, Bounds(A))})
     -> ref Result: One_Dim_Array {Bounds(Result) == Slice};
    ... 
end interface One_Dim_Array; 

As with the "indexing" operator, the "slicing" operator would be invoked by using A[...] notation, but where one or more of the operands inside the [] would be intervals or sets.  The ParaSail compiler would impose requirements on implementations of "indexing" and "slicing" operators.  For a given "indexing" operator, if the operands are different in two different calls, then the resulting elements must be distinct.  For a given "slicing" operator, if the operands are non-overlapping sets or intervals, then the resulting slices must be non-overlapping.   Note that the implementor of the "slicing" operator can decide whether the bounds of the result correspond to those of the selecting interval, or slide back to always starting at zero or one.  In the above example, the bounds correspond to the original indices of the selected elements, as indicated by the postcondition on the "slicing" operator.  In an interface designed for string manipulation, it might be more convenient for the slice to have bounds starting at zero or one.

Note that if there are multiple "indexing" operators, or multiple "slicing" operators for the same type, then they are presumed to be creating potentially overlapping elements or views, as they presumably represent different indexing schemes into the same data structure.  For example, a hash-table might be indexable both by the key and by some unique internal index.  That is, there is no guarantee that A["key"] and A[35] are unaliased just because "key" and 35 are distinct values, presuming these represent invocations of different "indexing" operators defined for the same indexable container.

Note that in all languages, you can achieve the equivalent of slicing by passing in the entire array and the bounds for the sub-array of interest.  This would however not allow the compiler to determine as easily whether two different calls are manipulating non-overlapping slices by looking only at the call sites.  A goal with ParaSail is that the compiler can identify any possible race-conditions by looking only at call sites and associated pre- and postconditions.  This argues for having explicit ways of indicating in a precondition or postcondition what parts of an object are read, what parts are updated, and what parts are not referenced at all.  Bu this probably comes back to being able to specify slices of a container, if only in annotations. 

In any case, given the above notion of "slicing" operators, we are now in a position to write our parallel Quicksort:
interface Sorting<One_Dim_Array<>> is
    procedure Quicksort(Arr : ref var One_Dim_Array;
       function Before(Left, Right : One_Dim_Array::Element_Type) 
         -> Boolean is "<");
          // Sort Arr according to the sorting function "Before" which returns
          // True if Left must appear before Right in the sorted order.
          // Before returns False if Left = Right.
end interface Sorting;

class Sorting is
  exports
    procedure Quicksort(Arr : ref var One_Dim_Array;
       function Before(Left, Right : One_Dim_Array::Element_Type) 
         -> Boolean is "<")
    is
        // Handle short arrays directly.  Partition longer arrays.
        case Length(Arr) of
          [0..1] => return;
          
          [2] => 
               if Before(Arr[Arr.Last], Arr[Arr.First]) then
                   // Swap elements
                   Arr[Arr.First] :=: Arr[Arr.Last];
               end if;
               return;
               
          [..] =>
               // Partition array
               const Mid := Arr[Arr.First + Length(Arr)/2];
               var Left : Index_Type := Arr.First;
               var Right : Index_Type := Arr.Last;
               until Left > Right loop
                 var New_Left : Index_Type := Right+1;
                 var New_Right : Index_Type := Left-1;
                 block
                   // Find item in left half to swap
                   for I in Left .. Right forward loop
                       if not Before(Arr[I], Mid) then
                           // Found an item that can go into right partitition
                           New_Left := I;
                           if Before(Mid, Arr[I]) then
                               // Found an item that *must* go into right part
                               exit loop;
                           end if;
                       end if;
                   end loop;
                 ||  
                   // Find item in right half to swap
                   for J in Left .. Right reverse loop
                       if not Before(Mid, Arr[J]) then
                           // Found an item that can go into left partitition
                           New_Right := J;
                           if Before(Arr[J], Mid) then
                               // Found an item that *must* go into left part
                               exit loop;
                           end if;
                       end if;
                   end loop;
                 end block;
                 
                 if New_Left > New_Right then
                     // Nothing more to swap
                     // Exit loop and recurse on two partitions
                     Left := New_Left;
                     Right := New_Right;
                     exit loop;
                 end if;
                 
                 // Swap items
                 Arr[New_Left] :=: Arr[New_Right];
                 
                 // continue looking for items to swap
                 Left := New_Left + 1;
                 Right := New_Right - 1;
               end loop;
               
               // At this point, "Right" is right end of left partition
               // and "Left" is left end of right partition
               // and the partitions don't overlap
               // and neither is the whole array
               // and everything in the left partition can precede Mid
               // and everything in the right partition can follow Mid
               // and everything between the partitions is equal to Mid.
               {Left > Right;
                Right < Arr.Last;
                Left > Arr.First;
                (for all I in Arr.First .. Right : not Before(Mid, Arr[I]));
                (for all J in Left .. Arr.Last : not Before(Arr[J], Mid));
                (for all K in Right+1 .. Left-1 : 
                  not Before(Mid, Arr[K]) and not Before(Arr[K], Mid))}
               
               // Recurse on two halves
               block
                   Quicksort(Arr[Arr.First .. Right], Before);
                 ||
                   Quicksort(Arr[Left .. Arr.Last], Before);
               end block;
        end case;        
   end procedure Quicksort;
end class Sorting;
We have used slicing as part of the recursion. Because the compiler can prove Left > Right, there is no overlap between the two slices, and hence no race condition.  It turns out that in ParaSail we could have written this without explicit recursion by using the continue loop with ... construct. We will save that for a later posting.

Friday, December 3, 2010

Deterministic parallelism and aliasing in ParaSail

There is growing interest in parallelization approaches that provide deterministic results.  See for example Deterministic Parallel Java.  In ParaSail, we have a somewhat different model.  Determinism should be provided for computations when it makes sense, but when the program is interactive or part of a real-time system, then clearly the environment is dynamic and time-dependent, and the notion of determinism becomes less relevant, and arguably not even desirable. 

It turns out that it is easy to know when non-determinism might be entering the execution of a ParaSail program.  It occurs only as a result of operations on concurrent objects.  Because ParaSail eliminates all race-conditions associated with non-concurrent objects at compile time, the results of a ParaSail program that manipulates only non-concurrent objects are inherently deterministic.  However, should the programmer decide to use a concurrent object, then the order in which operations on the concurrent object are performed by concurrent threads is unspecified, though the programmer can control it to some extent by specifying dequeue conditions when the operand is marked as queued.  (See blog posting parasail-concurrent-interfaces.html for more information on queued operands.)

Minimizing use of concurrent objects is generally a good idea anyway for performance reasons, as there is inevitably more overhead involved in coordinating access to a shared object, even when using lock-free approaches.  But how can a group of threads work in parallel to build up a large data structure, without the data structure itself being a concurrent object?  The normal answer to such a question is divide and conquer

What does divide and conquer really mean in ParaSail terms?  That means that a data structure needs to be conceptually divided up into non-overlapping parts and passed out to multiple threads for manipulation.  In ParaSail, a function can return a reference to a part of an object passed by reference as an input to the function.  However, the compiler needs to understand the aliasing implications of that, since in ParaSail, the compiler disallows race-conditions, which implies it must understand all aliasing related to non-concurrent objects at compile-time.  If two potentially concurrent computations are both updating parts of the same non-concurrent object, then the compiler must be able to convince itself that the two parts are non-overlapping (i.e. not aliased with one another).

Because there are no (re)assignable pointers in ParaSail, there are no cycles in data structures, so it is clear that Obj.Left and Obj.Right do not overlap.  On the other hand, when dealing with generalized containers, it is more difficult to know whether Container[X] and Container[Y] overlap, and even more complex when we are calling arbitrary functions that happen to return a reference to part of their input parameter, such as Lower_Left(Matrix) and Upper_Right(Matrix).  Clearly the programmer needs some way in a postcondition of such a function to convey the aliasing properties of the result.

ParaSail has a built-in, primitive, Basic_Array interface.  The compiler has built-in knowledge that, given an object BA based on the Basic_Array interface, BA[X] does not alias with BA[Y] if X != Y.  So one way to describe the "part" of a container that is included within the result of calling a function like "Lower_Left" is to describe the set of underlying array indices associated with the elements of the container that can be referenced from the result of Lower_Left.  More generally we can see that there is a set of primitive subcomponents accessible via any reference to a part of a container, and the aliasing problem comes down to proving that the subcomponent set from one reference does not overlap with the subcomponent set from some other reference.  This all implies that ParaSail needs a notation for identifying the set of accessible subcomponents of a reference, and a built-in Subcomponent_Set interface for specifying the values and the relationships between these subcomponent sets.