Saturday, July 30, 2011

A load-and-null-out operation

As we have continued to write more programs in ParaSail, we have found the pointer-free expandable-object data structuring primitives very interesting to use.  A simple tree structure can be defined as:

interface Binary_Tree<Element_Type is Assignable<>> is
    var Left, Right : optional Binary_Tree;
    var Contents : Element_Type;
end interface Binary_Tree;

In most cases you would keep this structure private by declaring it inside the class defining the Binary_Tree module rather than declaring it in the externally-visible interface, but this illustrates the basic structuring technique, using optional to allow recursive structures to be created. 

Because there are no pointers, only sub-objects, essentially any arbitrarily complex data structure can be copied using a simple assignment:
  var Copy_Of_Tree : optional Binary_Tree := Original_Tree;

This will create a copy of Original_Tree, no matter how complex it is.  Similarly, we can expand a tree by assigning a new tree into its Left or Right subtree:
  Original_Tree.Left := Copy_Of_Tree;

So what did the above two assignments do?  Well they made a complete copy of the Original_Tree, and then assigned that copy into the left subtree of the Original_Tree, wiping out whatever was in the left subtree before.  But we now have a complete copy of what was in the Original_Tree.Left before in Original_Tree.Left.Left.  And Original_Tree.Left.Right is now identical in structure to Original_Tree.Right.  We could have done this without explicitly declaring the intermediate "Copy_Of_Tree" object, with the same effect:
  Original_Tree.Left := Original_Tree; 
I'm not sure whether what we just did was of any use, but it does give you a feel for the power of expandable objects.  The user doesn't have to write explicitly any deep copy operation, since from ParaSail's point of view, these are just subobjects, and ":=" assigns the whole thing, including all of the subobjects. 

The old contents of the left-hand side of an assignment are wiped out, and automatically reclaimed immediately -- no need for the user to write a deep reclaim operation.  We don't need to wait for a garbage collector to decide whether it is safe to reclaim the storage associated with the old contents, since there is no sharing between objects in ParaSail.  If we overwrite the contents of an object or part of an object, whatever was there before can be reclaimed without danger of creating any dangling references.

Here is another example:
  Original_Tree := Original_Tree.Left;
This wipes out the old contents of Original _Tree and replaces it with a copy of its left subtree.  But there seems to be some unnecessary copying going on here.  The underlying sequence is:
  • Make a copy of Original_Tree.Left
  • Reclaim the old contents of Original_Tree
  • Overwrite it with the copy of Original_Tree.Left
This makes a complete copy of Original_Tree.Left, and then proceeds to reclaim all of Original_Tree (including the left subtree), and then stores the copy of the left subtree back into Original_Tree.  But wouldn't it make more sense to just carve off the left subtree, leaving behind a stub (i.e. null), then reclaim Original_Tree which now has a null left subtree, and then set Original_Tree to be the carved-off left subtree?  In fact, if the compiler were somewhat smarter, that is exactly what it would do, and should be expected to do, since it knows that Original_Tree.Left is about to be destroyed by the assignment to Original_Tree, so saving its content and replacing it with null rather than making a complete copy of it would be equivalent and more efficient.  Be that as it may, it is conceivable that there are times when the compiler can't see what is obvious to us, or perhaps the compiler is still just a prototype and isn't quite as smart as it will be eventually (;-).  In that case, we might want to be able to be explicit in cases where we want to fetch the value of an object or a part of an object, and after we fetch the value we want to set the object to null, or perhaps to some other appropriate value.

Given this situation, which seems like it might come up fairly frequently when making structural modifications to a data structure (such as rebalancing the tree), or destructively removing elements from a data structure (e.g. "Remove_First(Queue)"), we believe it might be useful to have a notion of load-and-null-out.  That is, we want the value of Original.Left_Tree, but after fetching the value, we are "done" with it, so set it to null.  This means we may not need to make a copy of the subobject, since it has been carved out of its old enclosing object, with a null left behind.

We considered various notations for representing load-and-null-out, and came up with the notion of a deferred assignment, that is an assignment where before we overwrite the left-hand-side, we fetch its old value and return it.  This is reminiscent of the compare-and-swap operation, or fetch-and-add, or the post increment X++ of C (and PDP-11/Vax-11 fame, for those familiar with those older DEC machines).  Here is a tentative notation for a deferred assignment:

   Original_Tree := (Original_Tree.Left ::= null); 

What the "::= null" operation means is to fetch the value of Original_Tree.Left, and after doing so, assign it to null.  One could imagine generalizing this extra ':' notation to support post-increment:

X := A[(I :+= 1)]; 

This would mean fetch the value of I, and after doing so, apply the "+= 1" operation to it.  It is not obvious whether this generalization is a good idea, or just cute.  But the deferred assignment of null seems like an operation that will be quite useful in the pointer-free world of ParaSail, to provide more control when restructuring a complex object.

4 comments:

  1. On further reflection, this seems like overkill, since probably 90% of the time the use will be to load a value and leave a "null" behind. Hence, I think we are going to go for a "move" operation, probably "<==" which means move the value from the RHS to the LHS, leaving the RHS null. For example, to move the tail of a list to be its head:
      List <== List.Next;
    To move the right subtree to be the left subtree:
      Tree.Left <== Tree.Right;
    If we adopt "<==" to mean "move" then it probably makes sense to change the swap operation from ":=:" to "<=>", since the latter is easier to notice, and is more like a "move" than an "assignment" (since assignment generally involves copying). For example, to switch left and right subtrees:
      Tree.Left <=> Tree.Right;
    Or to simply swap X and Y:
      X <=> Y;
    This seems to make the swap operation more readily recognizable than the subtler ":=:".
    -Tuck

    ReplyDelete
  2. (my first time seeing the language, so it might be inconsistent with other things or have beed discussed in the past).

    another option to overcome the "get-and-null" thing, is to have assignment statements return the ORIGINAL value of the assigned variable.

    reasoning:
    since the old data is no longer needed, there is no need to copy the data again, just to return a reference to it.
    since we don't use the new value any more than needed to the original assignment, we don't need to copy it either.
    because the original value is mostly unwanted, we need to destroy it anyway.

    the use on the tree:
    mytree:=(mytree.left:=NULL)

    potential problems:
    * it's inconsistent with C/C++ way (where assignments return the new value)
    * it forces the compiler work with references

    ReplyDelete
  3. bestdnd wrote:

    another option to overcome the "get-and-null" thing, is to have assignment statements return the ORIGINAL value of the assigned variable.

    That is essentially what the "::=" deferred assignment was doing. However in trying to write some programs using this, I found it hard to read and not very intuitive. Switching to the idea of a "move" operation "<==" seems more natural, and reads pretty well.

    ReplyDelete
  4. (me again)

    yet another option is :.=
    this way, the implementation of the above example is:
    mytree :.= left
    since the problem is only about sub-objects, having the dot is very consistent with :+= (although it only goes one level at a time.

    ReplyDelete