Friday, December 17, 2010

Parallel Quicksort in ParaSail and "slicing"

Quicksort is a classic "divide-and-conquer" algorithm, and lends itself naturally to parallelization.  However, unlike some of the other algorithms we have discussed in past postings, it is updating rather than merely walking a data structure.  This means that we need a way to safely partition the array to be sorted so that the ParaSail compiler knows that the threads are manipulating non-overlapping parts of the array, so it can be certain there are no race conditions possible.

In several languages there is the notion of an array slice.  In particular, in Fortran and Ada there is built-in syntax for selecting a contiguous part of an array.  In newer versions of Fortran this can be done on multi-dimensional arrays.  In Ada slicing is limited to single-dimensional arrays.  In both Fortran and Ada, array slices are updatable, that is they effectively represent a view of the underlying array.  APL (not surprisingly!) has all of these capabilities and more.  In many other languages there are operations for selecting a slice of a string (and in some cases arrays), but these are generally not updatable views, and hence cannot be used on the left-hand side of an assignment nor be passed as an in out parameter.  In some cases there are explicit "replace-slice" operations which do allow for updating a slice of an array.

In any case, in ParaSail, because we expect the compiler to catch race-conditions at compile-time, operations that return a view of a part of an object need to have well-defined properties so the compiler can correctly determine whether two different views have any overlap (i.e. are potentially aliased).  The simplest solution seems to be to recognize a "slicing" operator, similar to the "indexing" operator.  A "slicing" operator would take an indexable container, and then one or more operands at least one of which is an interval or a set, and return a view of the container that represents the subset of components identified by the operand(s).  For example, this might be an interface for a One_Dim_Array with both "indexing" and "slicing" operators:
interface One_Dim_Array
  <Element_Type is Assignable<>;
   Index_Type is Discrete<>> is

    function Bounds(A : One_Dim_Array) -> Interval<Index_Type>; 

    function Create(Bounds : Interval<Index_type>)
      -> One_Dim_Array {Bounds(Create) == Bounds};

    operator "indexing"(
       A : ref One_Dim_Array;
       Index : Index_Type {Index in Bounds(A)})
     -> ref Element_Type;

    operator "slicing"(
       A : ref One_Dim_Array;
       Slice : Interval<Index_Type> {Is_Subset(Slice, Bounds(A))})
     -> ref Result: One_Dim_Array {Bounds(Result) == Slice};
    ... 
end interface One_Dim_Array; 

As with the "indexing" operator, the "slicing" operator would be invoked by using A[...] notation, but where one or more of the operands inside the [] would be intervals or sets.  The ParaSail compiler would impose requirements on implementations of "indexing" and "slicing" operators.  For a given "indexing" operator, if the operands are different in two different calls, then the resulting elements must be distinct.  For a given "slicing" operator, if the operands are non-overlapping sets or intervals, then the resulting slices must be non-overlapping.   Note that the implementor of the "slicing" operator can decide whether the bounds of the result correspond to those of the selecting interval, or slide back to always starting at zero or one.  In the above example, the bounds correspond to the original indices of the selected elements, as indicated by the postcondition on the "slicing" operator.  In an interface designed for string manipulation, it might be more convenient for the slice to have bounds starting at zero or one.

Note that if there are multiple "indexing" operators, or multiple "slicing" operators for the same type, then they are presumed to be creating potentially overlapping elements or views, as they presumably represent different indexing schemes into the same data structure.  For example, a hash-table might be indexable both by the key and by some unique internal index.  That is, there is no guarantee that A["key"] and A[35] are unaliased just because "key" and 35 are distinct values, presuming these represent invocations of different "indexing" operators defined for the same indexable container.

Note that in all languages, you can achieve the equivalent of slicing by passing in the entire array and the bounds for the sub-array of interest.  This would however not allow the compiler to determine as easily whether two different calls are manipulating non-overlapping slices by looking only at the call sites.  A goal with ParaSail is that the compiler can identify any possible race-conditions by looking only at call sites and associated pre- and postconditions.  This argues for having explicit ways of indicating in a precondition or postcondition what parts of an object are read, what parts are updated, and what parts are not referenced at all.  Bu this probably comes back to being able to specify slices of a container, if only in annotations. 

In any case, given the above notion of "slicing" operators, we are now in a position to write our parallel Quicksort:
interface Sorting<One_Dim_Array<>> is
    procedure Quicksort(Arr : ref var One_Dim_Array;
       function Before(Left, Right : One_Dim_Array::Element_Type) 
         -> Boolean is "<");
          // Sort Arr according to the sorting function "Before" which returns
          // True if Left must appear before Right in the sorted order.
          // Before returns False if Left = Right.
end interface Sorting;

class Sorting is
  exports
    procedure Quicksort(Arr : ref var One_Dim_Array;
       function Before(Left, Right : One_Dim_Array::Element_Type) 
         -> Boolean is "<")
    is
        // Handle short arrays directly.  Partition longer arrays.
        case Length(Arr) of
          [0..1] => return;
          
          [2] => 
               if Before(Arr[Arr.Last], Arr[Arr.First]) then
                   // Swap elements
                   Arr[Arr.First] :=: Arr[Arr.Last];
               end if;
               return;
               
          [..] =>
               // Partition array
               const Mid := Arr[Arr.First + Length(Arr)/2];
               var Left : Index_Type := Arr.First;
               var Right : Index_Type := Arr.Last;
               until Left > Right loop
                 var New_Left : Index_Type := Right+1;
                 var New_Right : Index_Type := Left-1;
                 block
                   // Find item in left half to swap
                   for I in Left .. Right forward loop
                       if not Before(Arr[I], Mid) then
                           // Found an item that can go into right partitition
                           New_Left := I;
                           if Before(Mid, Arr[I]) then
                               // Found an item that *must* go into right part
                               exit loop;
                           end if;
                       end if;
                   end loop;
                 ||  
                   // Find item in right half to swap
                   for J in Left .. Right reverse loop
                       if not Before(Mid, Arr[J]) then
                           // Found an item that can go into left partitition
                           New_Right := J;
                           if Before(Arr[J], Mid) then
                               // Found an item that *must* go into left part
                               exit loop;
                           end if;
                       end if;
                   end loop;
                 end block;
                 
                 if New_Left > New_Right then
                     // Nothing more to swap
                     // Exit loop and recurse on two partitions
                     Left := New_Left;
                     Right := New_Right;
                     exit loop;
                 end if;
                 
                 // Swap items
                 Arr[New_Left] :=: Arr[New_Right];
                 
                 // continue looking for items to swap
                 Left := New_Left + 1;
                 Right := New_Right - 1;
               end loop;
               
               // At this point, "Right" is right end of left partition
               // and "Left" is left end of right partition
               // and the partitions don't overlap
               // and neither is the whole array
               // and everything in the left partition can precede Mid
               // and everything in the right partition can follow Mid
               // and everything between the partitions is equal to Mid.
               {Left > Right;
                Right < Arr.Last;
                Left > Arr.First;
                (for all I in Arr.First .. Right : not Before(Mid, Arr[I]));
                (for all J in Left .. Arr.Last : not Before(Arr[J], Mid));
                (for all K in Right+1 .. Left-1 : 
                  not Before(Mid, Arr[K]) and not Before(Arr[K], Mid))}
               
               // Recurse on two halves
               block
                   Quicksort(Arr[Arr.First .. Right], Before);
                 ||
                   Quicksort(Arr[Left .. Arr.Last], Before);
               end block;
        end case;        
   end procedure Quicksort;
end class Sorting;
We have used slicing as part of the recursion. Because the compiler can prove Left > Right, there is no overlap between the two slices, and hence no race condition.  It turns out that in ParaSail we could have written this without explicit recursion by using the continue loop with ... construct. We will save that for a later posting.

Friday, December 3, 2010

Deterministic parallelism and aliasing in ParaSail

There is growing interest in parallelization approaches that provide deterministic results.  See for example Deterministic Parallel Java.  In ParaSail, we have a somewhat different model.  Determinism should be provided for computations when it makes sense, but when the program is interactive or part of a real-time system, then clearly the environment is dynamic and time-dependent, and the notion of determinism becomes less relevant, and arguably not even desirable. 

It turns out that it is easy to know when non-determinism might be entering the execution of a ParaSail program.  It occurs only as a result of operations on concurrent objects.  Because ParaSail eliminates all race-conditions associated with non-concurrent objects at compile time, the results of a ParaSail program that manipulates only non-concurrent objects are inherently deterministic.  However, should the programmer decide to use a concurrent object, then the order in which operations on the concurrent object are performed by concurrent threads is unspecified, though the programmer can control it to some extent by specifying dequeue conditions when the operand is marked as queued.  (See blog posting parasail-concurrent-interfaces.html for more information on queued operands.)

Minimizing use of concurrent objects is generally a good idea anyway for performance reasons, as there is inevitably more overhead involved in coordinating access to a shared object, even when using lock-free approaches.  But how can a group of threads work in parallel to build up a large data structure, without the data structure itself being a concurrent object?  The normal answer to such a question is divide and conquer

What does divide and conquer really mean in ParaSail terms?  That means that a data structure needs to be conceptually divided up into non-overlapping parts and passed out to multiple threads for manipulation.  In ParaSail, a function can return a reference to a part of an object passed by reference as an input to the function.  However, the compiler needs to understand the aliasing implications of that, since in ParaSail, the compiler disallows race-conditions, which implies it must understand all aliasing related to non-concurrent objects at compile-time.  If two potentially concurrent computations are both updating parts of the same non-concurrent object, then the compiler must be able to convince itself that the two parts are non-overlapping (i.e. not aliased with one another).

Because there are no (re)assignable pointers in ParaSail, there are no cycles in data structures, so it is clear that Obj.Left and Obj.Right do not overlap.  On the other hand, when dealing with generalized containers, it is more difficult to know whether Container[X] and Container[Y] overlap, and even more complex when we are calling arbitrary functions that happen to return a reference to part of their input parameter, such as Lower_Left(Matrix) and Upper_Right(Matrix).  Clearly the programmer needs some way in a postcondition of such a function to convey the aliasing properties of the result.

ParaSail has a built-in, primitive, Basic_Array interface.  The compiler has built-in knowledge that, given an object BA based on the Basic_Array interface, BA[X] does not alias with BA[Y] if X != Y.  So one way to describe the "part" of a container that is included within the result of calling a function like "Lower_Left" is to describe the set of underlying array indices associated with the elements of the container that can be referenced from the result of Lower_Left.  More generally we can see that there is a set of primitive subcomponents accessible via any reference to a part of a container, and the aliasing problem comes down to proving that the subcomponent set from one reference does not overlap with the subcomponent set from some other reference.  This all implies that ParaSail needs a notation for identifying the set of accessible subcomponents of a reference, and a built-in Subcomponent_Set interface for specifying the values and the relationships between these subcomponent sets.

Monday, November 1, 2010

Type conversion in ParaSail

Allowing for user-defined type conversion is complex, as it is an NxN problem, where you want to allow conversion of N different types to each other, and each pair might involve a different set of operations. ParaSail addresses this problem in the following ways:
  • Allows use of the [[ ... ]] operation, which converts the operand to a universal type.  
    • The user defines which universal type(s) a given type converts to by defining the "to_univ" operator(s).  Once the value has been converted to a universal type, it will be implicitly converted to any other type which defines a "from_univ" operator from that same universal type.  
      • Using a prefix can disambiguate if necessary:  T2::[[T1_Obj]] will convert T1_Obj first to a universal type using T1's "from_univ" operator, and then to T2 using T2's "to_univ" operator.
  •  Allows use of the target type's name as a function name.
    • This will convert between two types that are structurally equivalent (that is, the same module parameters were used in their defining instantiation), but which were distinguished by one or both being defined as new.  For example:
      • type T1 is new Vector<Integer<>> and type T2 is new Vector<Integer<>> define two distinct types because of the use of new, but T2(T1_Obj) and T1(T2_Obj) will convert T1_Obj to type T2, and T2_Obj to type T1, respectively.
    • The target type's name can also convert an object from some source type if the user defines a "convert" operator for either of the two types, and this operator's input and output parameter types match the source and target of the conversion.  Note that this matching might be thanks to the input or output formal being itself parameterized.  For example, the interface Int_Vector below provides a "convert" operator that converts to any other instance of Int_Vector:
            interface Int_Vector<Element is Integer<>> is
               ...
               operator "convert"(X : Int_Vector) 
                 -> Result_Type is
                      Int_Vector<Target_Element is Integer<>>;
            end interface Int_Vector;
These capabilities seem to provide enough flexibility for the user to define the desired explicit conversion functions. Note that the only implicit conversion in ParaSail is from a universal type to a type with an appropriate "from_univ" operator.

A virtual machine for ParaSail with picothread scheduling

As we have worked toward having an initial compiler/interpreter for ParaSail, we have decided to define a ParaSail Virtual Machine (PSVM) which will support the kind of very-light-weight threading structure (picothreading) needed to be able to evaluate small constructs, like parameters in parallel.  We have decided to turn each ParaSail operation into a single PSVM Routine, even if its execution will involve multiple threads executing bits and pieces of the code for the operation. Each PSVM routine is identified by a unique index, the routine index, and is represented by a routine header and a sequence of PSVM instructions.

The ParaSail Virtual Machine instructions use relatively conventional addressing modes to refer to operands in memory. Memory is presumed to be organized into areas, each area being a contiguous sequence of 64-bit words. While executing, a PSVM routine has access to the following areas:
Parameter_Area
An area for input and output parameters. Output parameters, if any, come first, followed by input parameters. Parameters may be a single 64-bit value, or may be a 64-bit pointer to a larger object.
Local_Area
An area for storing local variables, and parameter lists being built up to pass as a Parameter_Area to a called routine. The first word of a local area is a link to the local area of the enclosing block or operation, in the presence of nesting. This sort of link is generally called a static link. The second word of a local area is a link to the associated parameter area.
Type_Area
Each type, which is an instance of a module, is represented by an area containing the actual module parameters, and a table of operations. Each operation in the operation table is represented by a pair: routine index and type area. In many cases the type area for an operation is the same as the enclosing Type_Area, but for inherited operations, the type area for the operation would refer to the type area for the super-class from which the operation was inherited. The actual module parameters are most frequently other types (represented by a pointer to their type area), but can be values, object references, or operations.
Instructions identify a memory location of interest by a pair: memory area and offset. This pair is called an object locator. In addition to the Parameter_Area, Local_Area, and Type_Area, an object locator can specify a memory area pointed-to by a local variable, and the locator offset is the offset within the pointed-to area. The local pointers are called local base registers, and may reside in any of the first 64 words of the local area. Finally, in the presence of nesting, the chain of static links may be followed to find an enclosing local area or enclosing parameter area.

Here is a sampling of ParaSail Virtual Machine instructions:

Move_Word, Store_Int_Literal, Store_String_Literal, Store_Real_Literal, Jump, Conditional_Jump, Call, Return, Parallel_Call, Parallel_Block, Parallel_Wait.

Operands are generally identified with object locators, except for literal operands which are identified either by their actual value, or by an index into a table of literals.

Note that there are no instructions for doing normal arithmetic or logical operations. These are all implemented by calling routines. There are a set of built-in routines for doing the typical set of arithmetic and logical operations, on operands of various sizes.

The more interesting instructions are the Parallel_Call, Parallel_Block, and Parallel_Wait. Parallel_Call is essentially equivalent to Call, where the calling routine computes the input parameters and places them in a parameter area, and then calls the routine. The difference with a Parallel_Call is that the caller also identifies a picothread master and allocates a small area for a picothread control block (pTCB), and the instruction completes without waiting for the call to complete. Instead, the Parallel_Call instruction adds the pTCB to a queue waiting to be serviced by one of the virtual processors (vCPUs) executing the ParaSail program. When the caller thread has finished its own work and wants to wait for the Parallel_Call to complete, it uses the Parallel_Wait instruction, identifying the same picothread master as was specified in the Parallel_Call instruction. This suspends the calling thread until all of the parallel picothreads associated with the picothread master complete.

The Parallel_Block instruction is very similar to the Parallel_Call instruction, except that it identifies instructions that are part of the current routine, rather than calling a separate routine. The execution of these instructions is performed by a separate picothread, which has its own pTCB, and local area. The static link of the local area for the Parallel_Block picothread refers to the local area of the thread invoking the Parallel_Block instruction, allowing the nested picothread to use up-level references to reach the local variables of the enclosing picothread.

The Return instruction is used to complete processing of both a Parallel_Call and a Parallel_Block, and the Parallel_Wait is used to wait for either kind of parallel activity.

We recently completed a prototype implementation of the ParaSail Virtual Machine, including the picothread scheduling. We learned some interesting lessons along the way. Initially, a vCPU that was executing a picothread that performed a Parallel_Wait was itself suspended. That quickly exhausted the number of vCPUs, and led us to start dynamically creating new vCPUs. That caused the overall stack space to grow dramatically, since each vCPU needed its own heavy-weight threading context in the underlying operating system, along with a stack.

At this point, we concluded that a vCPU that executed a Parallel_Wait instruction should service the queue of waiting pTCBs if the picothread master it was waiting for was not yet complete. That significantly reduced the number of vCPUs needed. However, it still didn't seem to be as efficient as it could be. As originally implemented, the queue of waiting pTCBs was first-in, first-out (FIFO). However, after looking at various traces of execution, we realized that it was the last pTCB that was created which was always the first pTCB to be awaited. Hence, we concluded that the pTCB queue should be last-in, first-out (LIFO). That is, a vCPU should preferentially service the most recently queued pTCB when it had cycles to spare, since that would more likely be associated with a picothread master that is being awaited, and by servicing that pTCB first, it will reduce the number of pTCBs that were suspended in a Parallel_Wait instruction. After making this final change, even a heavily recursive algorithm throwing off lots of parallel picothreads was handled efficiently.

Tuesday, September 14, 2010

Notation for "intervals" in ParaSail

It is quite common in a precondition or postcondition to want to indicate that the value of a variable must be within some interval of values (aka "range" of values), such as 1..10, or 0..+inf.  When dealing with real values (e.g. floating point values) as opposed to integer values, it is often desirable to represent an open or half-open interval, where the boundary value is not included in the specified interval.  For example, to specify that X must be in the interval 0.0 .. 10.0, but not including zero itself, the notation "(0.0 .. 10.0]" is sometimes used, where where "(" and ")" represent an open (exclusive) boundary, while "[" and "]" represent a closed (inclusive) boundary.

For ParaSail, because "()" and "[]" are already used in the syntax, we have adopted a different notation for open and half-open intervals:

  0.0 .. 10.0       closed interval  
  0.0 <.. 10.0     half-open on left
  0.0 ..< 10.0     half-open on right
  0.0 <..< 10.0   open on both sides

Hence, one can write in an annotation:

  { X in A <.. B }

as a short-hand for

  { A < X and then X <= B }

with the additional difference that X is only evaluated once (though that will rarely matter in an annotation). 

Like the other relational operators, these interval operators are defined automatically once the "=?" operator has been appropriately defined.

Wednesday, September 1, 2010

Finalization (destructors) and multi-thread exits in ParaSail

Languages such as C++ and Ada that support both finalization (destructors) and exceptions generally face some degree of distributed overhead due to the complex interactions between these two features.  Since exceptions can arise virtually anywhere, if there is finalization to be performed (e.g. destructors to be called) on the way "out" of a scope, there is generally a need for some kind of per-thread cleanup list to be walked as part of propagating exceptions, or the need for numerous implicit exception handlers and a precise indication of exactly how much initialization has been performed (e.g. constructors that have been completed) at any given moment.

As indicated in an earlier post, we have tentatively decided to do without exceptions in ParaSail, but the exit ... with feature can result in potentially numerous scopes being exited in the various other threads which are all being terminated as a result of some thread initiating an exit from a compound statement such as a block or loop.  We have also suggested that ParaSail will support some notion of finalization, if an "end" operator is associated with the type of an object, such that on exiting the scope of the object, the "end" operator will be invoked automatically.  So the question is: must these "end" operator calls be performed on all of the local objects of the threads being terminated as a side effect of an exit by some parallel thread?  Our tentative answer is "no." 

To avoid race conditions between a thread being terminated and the code executing after an exit, we believe that we need to restrict a thread that might be prematurely terminated, so that it can update only objects local to the exitable construct.  The only code of an exitable construct that would be permitted to update objects declared outside the construct would be the code following an "exit ... with" or the code following "end ... with".

For example, here is the exitable construct we used in an earlier post:

const Result : optional Tree_Id;
for T => Root then T.Left || T.Right
  while T not null concurrent loop
    if T.Value == Desired_Value then
        // Found desired node, exit with its identifier
        exit loop with (Result => T.Id);
    end if;
end loop with (Result => null);

In this example, we might have many threads all executing inside the loop concurrently.  To avoid race conditions, we would not allow any of these threads to update objects declared outside the loop, because they might be terminated at any moment, and the update might be disrupted in the middle.  However, we would allow the code following the "exit loop with" to update Result, as well as the code following "end loop with." This is safe because only one of these is ever executed for a given execution of the loop, and once we begin executing such code it won't be disrupted by some other thread of the same loop initiating an exit.

Note that the code following an exit ... with or end ... with might be disrupted by a thread exiting some enclosing construct, but this code would not be allowed to update objects outside that enclosing construct, thereby avoiding the race condition.

Given this rule that code in a compound statement with multiple threads may not update objects declared outside the compound statement, if there is a chance that at least one of those threads might perform an exit ... with, we can simplify the finalization problem.  There is no need to invoke the "end" operator on an object if that operator cannot possibly affect objects that will survive the exit statement.

Thinking more specifically about the "end" operator, and the general lack of access to global variables within an operation, what exactly can an "end" operator do?  The answer is that an "end" operator cannot really do anything unless the object being finalized includes a reference of some sort to an object that will outlive the finalization.  We will talk more in a future posting about what if any restrictions exist on incorporating a reference to one object as part of another object, but for now let us presume that such objects can exist, but they cannot be overwritten by a whole-object assignment statement (i.e. they are not "assignable"). 

What would be the purpose of such an object with an embedded reference?  One use would be to perform a kind of "mark/release" style of resource allocation.  On entry to a scope, a "mark" object could be created with a reference to an enclosing resource manager of some sort.  Inside the scope, allocations of the resource would be mediated by the "mark" object, such that when exiting the scope, the "end" operator applied to the "mark" object could automatically release all of the resources allocated on behalf of objects local to the scope.

Now returning to our original question about whether finalization needs to be performed on all objects local to a thread that is being prematurely terminated -- if we presume that the "end" operators are performing something analogous to a "release" operation, then we can see how we could safely skip all intermediary release operations so long as we perform the ones associated with the actual compound statement being exited.  This also presumes that the only references permitted from an object local to a multi-threaded compound statement with an exit, to an object declared outside the compound statement, are references to concurrent objects that are themselves local to the innermost enclosing multi-threaded-with-exit compound statement.  For example:

var Enclosing_Obj1 : concurrent T := ...
    ...
*Block1*
block
    var Mark1 := Create_Mark(Enclosing_Obj1, ...);
        // Mark1 contains a reference to Enclosing_Obj1
        // and has an "end" operator which performs a release operation.
    for I in 1..10 concurrent loop
          var Mark2 := Create_Mark(Mark1, ...);
              // Here we have an inner mark
          Allocate_Resource(Mark2, ...);
             // Here we allocate some resource using Mark2 to mediate the
             // allocation from Mark1 which in turn is mediating allocation
             // from Enclosing_Obj1.  The marks allow the resources to be 
             // automatically released on block exit as a side effect of
             // finalization via the "end" operator.
          if This_Looks_Good(I, ...) then
               exit block Block1 with (Result => I);
                 // This terminates any other threads inside Block1.
          end if;
    end loop;
end block Block1 with (Result => 0);

Now if some thread is deep within a call on "This_Looks_Good" when some other thread initiates the block exit, the only object that gets explicitly finalized will be Mark1.  The multiple Mark2 objects (one for each thread of the concurrent loop) will not be finalized, as presumably performing an "end" on Mark1 will also release any allocations that were mediated by one of the Mark2 objects.

The bottom line is that when a tree of threads is terminated by an exit from a multi-threaded compound statement, the only finalization that needs to be performed is for the objects immediately local to the compound statement being exited.  Objects local to the various threads being terminated need not be separately finalized.  This avoids the kind of complex interaction that exists between finalization and exception propagation, including avoiding the overhead of maintaining precise initialization information, and avoiding performing a lot of potentially wasteful finalization.

Tuesday, August 24, 2010

No exceptions in ParaSail, but exitable multi-thread constructs

We have been mulling over the idea of exceptions in ParaSail, and have pretty firmly concluded that they aren't worth the trouble.  In a highly parallel language, with lots of threads, exception propagation across threads becomes a significant issue, and that is a nasty area in general.  Also, exceptions can introduce many additional paths into a program, making thorough testing that much harder.  And the whole business of declaring what exceptions might be propagated, and then deciding what to do if some other exception is propagated can create numerous maintenance headaches.

There is a feature in ParaSail as currently designed which provides some of the same capabilities of exceptions, but is particularly suited to parallel programming.  This is the "exit with" statement, which allows a construct to be exited with one or more values specified as results, and at the same time terminating any other threads currently executing within the construct.  For example, here is a loop implementing a parallel search of a tree with the first thread finding the desired node exiting and killing off all of the other threads as part of the "exit ... with" statement:

const Result : optional Tree_Id;
for T => Root then T.Left || T.Right
  while T not null concurrent loop
    if T.Value == Desired_Value then
        // Found desired node, exit with its identifier
        exit loop with (Result => T.Id);
    end if;
end loop with (Result => null);

This declares a Result object of type Tree_Id.  It then walks the tree in parallel, starting at Root and continuing with T.Left and T.Right concurrently.  It continues until it reaches "null" on each branch, or some node is found with its Value component matching the Desired_Value.  The value of Identifier at the end indicates the identifier of the node having the desired Value, or null to indicate that no node was found.  The presence of optional in the declaration for Result indicates that its value might be null.

Supporting this kind of intentional "race" seems important in parallel programming, as many problems are amenable to a divide and conquer approach, but it is important that as soon as a solution is found, no further time is wasted searching other parts of the solution space.  The "end ... with" phrase allows the specification of one or more results if the construct ends normally, as opposed to via an "exit ... with" (in this case, ending normally means all threads reach a null branch in the walk of the tree without finding the desired node).  Effectively the "exit ... with" skips over the "end ... with" phrase.

So how does this all relate to exceptions?  Well given the "exit ... with" capability, one can establish two or more threads, one which monitors for a failure condition, and the others which do the needed computation.  The thread monitoring for a failure condition performs an "exit ... with" if it detects a failure, with the result indicating the nature of the failure, and as a side-effect killing off any remaining computation threads.  If the normal computation succeeds, then an "exit ... with" giving the final result will kill off the monitoring thread.  Note that the "exit ... with" statements must occur textually within the construct being exited, so it is visible whether such a premature exit can occur, unlike an exception which can arise deep within a call tree and be propagated out many levels.

As an example of the kind of failure condition which might be amenable to this kind of monitoring, imagine a resource manager object, which provides up to some fixed maximum of some kind of resource (e.g. storage) to code within a block.  This resource manager (which is presumably of a concurrent type) could be passed down to operations called within the block for their use.  Meanwhile, a separate monitoring thread would be created immediately within the block which would call an operation on the resource manager which would suspend the thread until the resource runs out, at which point it would be awakened with an appropriate indication of the resource exhaustion, and any other information that might be helpful in later diagnosis.  On return from this Wait_For_Exhaustion operation, the monitoring thread would do an "exit block with (Result => Failure, ...)" or equivalent, to indicate that the computation required more resources than were provided.  The code following the block would then be able to take appropriate action.