Wednesday, September 1, 2010

Finalization (destructors) and multi-thread exits in ParaSail

Languages such as C++ and Ada that support both finalization (destructors) and exceptions generally face some degree of distributed overhead due to the complex interactions between these two features.  Since exceptions can arise virtually anywhere, if there is finalization to be performed (e.g. destructors to be called) on the way "out" of a scope, there is generally a need for some kind of per-thread cleanup list to be walked as part of propagating exceptions, or the need for numerous implicit exception handlers and a precise indication of exactly how much initialization has been performed (e.g. constructors that have been completed) at any given moment.

As indicated in an earlier post, we have tentatively decided to do without exceptions in ParaSail, but the exit ... with feature can result in potentially numerous scopes being exited in the various other threads which are all being terminated as a result of some thread initiating an exit from a compound statement such as a block or loop.  We have also suggested that ParaSail will support some notion of finalization, if an "end" operator is associated with the type of an object, such that on exiting the scope of the object, the "end" operator will be invoked automatically.  So the question is: must these "end" operator calls be performed on all of the local objects of the threads being terminated as a side effect of an exit by some parallel thread?  Our tentative answer is "no." 

To avoid race conditions between a thread being terminated and the code executing after an exit, we believe that we need to restrict a thread that might be prematurely terminated, so that it can update only objects local to the exitable construct.  The only code of an exitable construct that would be permitted to update objects declared outside the construct would be the code following an "exit ... with" or the code following "end ... with".

For example, here is the exitable construct we used in an earlier post:

const Result : optional Tree_Id;
for T => Root then T.Left || T.Right
  while T not null concurrent loop
    if T.Value == Desired_Value then
        // Found desired node, exit with its identifier
        exit loop with (Result => T.Id);
    end if;
end loop with (Result => null);

In this example, we might have many threads all executing inside the loop concurrently.  To avoid race conditions, we would not allow any of these threads to update objects declared outside the loop, because they might be terminated at any moment, and the update might be disrupted in the middle.  However, we would allow the code following the "exit loop with" to update Result, as well as the code following "end loop with." This is safe because only one of these is ever executed for a given execution of the loop, and once we begin executing such code it won't be disrupted by some other thread of the same loop initiating an exit.

Note that the code following an exit ... with or end ... with might be disrupted by a thread exiting some enclosing construct, but this code would not be allowed to update objects outside that enclosing construct, thereby avoiding the race condition.

Given this rule that code in a compound statement with multiple threads may not update objects declared outside the compound statement, if there is a chance that at least one of those threads might perform an exit ... with, we can simplify the finalization problem.  There is no need to invoke the "end" operator on an object if that operator cannot possibly affect objects that will survive the exit statement.

Thinking more specifically about the "end" operator, and the general lack of access to global variables within an operation, what exactly can an "end" operator do?  The answer is that an "end" operator cannot really do anything unless the object being finalized includes a reference of some sort to an object that will outlive the finalization.  We will talk more in a future posting about what if any restrictions exist on incorporating a reference to one object as part of another object, but for now let us presume that such objects can exist, but they cannot be overwritten by a whole-object assignment statement (i.e. they are not "assignable"). 

What would be the purpose of such an object with an embedded reference?  One use would be to perform a kind of "mark/release" style of resource allocation.  On entry to a scope, a "mark" object could be created with a reference to an enclosing resource manager of some sort.  Inside the scope, allocations of the resource would be mediated by the "mark" object, such that when exiting the scope, the "end" operator applied to the "mark" object could automatically release all of the resources allocated on behalf of objects local to the scope.

Now returning to our original question about whether finalization needs to be performed on all objects local to a thread that is being prematurely terminated -- if we presume that the "end" operators are performing something analogous to a "release" operation, then we can see how we could safely skip all intermediary release operations so long as we perform the ones associated with the actual compound statement being exited.  This also presumes that the only references permitted from an object local to a multi-threaded compound statement with an exit, to an object declared outside the compound statement, are references to concurrent objects that are themselves local to the innermost enclosing multi-threaded-with-exit compound statement.  For example:

var Enclosing_Obj1 : concurrent T := ...
    var Mark1 := Create_Mark(Enclosing_Obj1, ...);
        // Mark1 contains a reference to Enclosing_Obj1
        // and has an "end" operator which performs a release operation.
    for I in 1..10 concurrent loop
          var Mark2 := Create_Mark(Mark1, ...);
              // Here we have an inner mark
          Allocate_Resource(Mark2, ...);
             // Here we allocate some resource using Mark2 to mediate the
             // allocation from Mark1 which in turn is mediating allocation
             // from Enclosing_Obj1.  The marks allow the resources to be 
             // automatically released on block exit as a side effect of
             // finalization via the "end" operator.
          if This_Looks_Good(I, ...) then
               exit block Block1 with (Result => I);
                 // This terminates any other threads inside Block1.
          end if;
    end loop;
end block Block1 with (Result => 0);

Now if some thread is deep within a call on "This_Looks_Good" when some other thread initiates the block exit, the only object that gets explicitly finalized will be Mark1.  The multiple Mark2 objects (one for each thread of the concurrent loop) will not be finalized, as presumably performing an "end" on Mark1 will also release any allocations that were mediated by one of the Mark2 objects.

The bottom line is that when a tree of threads is terminated by an exit from a multi-threaded compound statement, the only finalization that needs to be performed is for the objects immediately local to the compound statement being exited.  Objects local to the various threads being terminated need not be separately finalized.  This avoids the kind of complex interaction that exists between finalization and exception propagation, including avoiding the overhead of maintaining precise initialization information, and avoiding performing a lot of potentially wasteful finalization.


  1. Does this guarantee that finalization is done only once? And can a finalizer resurrect an object, e.g. by putting it into a Ref?

    1. Finalization is done at most once. You cannot put an object into a "ref" that outlives the object, since "ref"s are not re-assignable -- refs are initialized when created and necessarily refer to objects that outlive the reference.