Designing ParaSail, a new programming language: July 2011

Saturday, July 30, 2011

A load-and-null-out operation

As we have continued to write more programs in ParaSail, we have found the pointer-free expandable-object data structuring primitives very interesting to use. A simple tree structure can be defined as:

interface Binary_Tree<Element_Type is Assignable<>> is
    var Left, Right : optional Binary_Tree;
    var Contents : Element_Type;
end interface Binary_Tree;

In most cases you would keep this structure private by declaring it inside the class defining the Binary_Tree module rather than declaring it in the externally-visible interface, but this illustrates the basic structuring technique, using optional to allow recursive structures to be created.

Because there are no pointers, only sub-objects, essentially any arbitrarily complex data structure can be copied using a simple assignment:

  var Copy_Of_Tree : optional Binary_Tree := Original_Tree;

This will create a copy of Original_Tree, no matter how complex it is. Similarly, we can expand a tree by assigning a new tree into its Left or Right subtree:

  Original_Tree.Left := Copy_Of_Tree;

So what did the above two assignments do? Well they made a complete copy of the Original_Tree, and then assigned that copy into the left subtree of the Original_Tree, wiping out whatever was in the left subtree before. But we now have a complete copy of what was in the Original_Tree.Left before in Original_Tree.Left.Left. And Original_Tree.Left.Right is now identical in structure to Original_Tree.Right. We could have done this without explicitly declaring the intermediate "Copy_Of_Tree" object, with the same effect:

  Original_Tree.Left := Original_Tree;

I'm not sure whether what we just did was of any use, but it does give you a feel for the power of expandable objects. The user doesn't have to write explicitly any deep copy operation, since from ParaSail's point of view, these are just subobjects, and ":=" assigns the whole thing, including all of the subobjects.

The old contents of the left-hand side of an assignment are wiped out, and automatically reclaimed immediately -- no need for the user to write a deep reclaim operation. We don't need to wait for a garbage collector to decide whether it is safe to reclaim the storage associated with the old contents, since there is no sharing between objects in ParaSail. If we overwrite the contents of an object or part of an object, whatever was there before can be reclaimed without danger of creating any dangling references.

Here is another example:

  Original_Tree := Original_Tree.Left;

This wipes out the old contents of Original _Tree and replaces it with a copy of its left subtree. But there seems to be some unnecessary copying going on here. The underlying sequence is:

Make a copy of Original_Tree.Left
Reclaim the old contents of Original_Tree
Overwrite it with the copy of Original_Tree.Left

This makes a complete copy of Original_Tree.Left, and then proceeds to reclaim all of Original_Tree (including the left subtree), and then stores the copy of the left subtree back into Original_Tree. But wouldn't it make more sense to just carve off the left subtree, leaving behind a stub (i.e. null), then reclaim Original_Tree which now has a null left subtree, and then set Original_Tree to be the carved-off left subtree? In fact, if the compiler were somewhat smarter, that is exactly what it would do, and should be expected to do, since it knows that Original_Tree.Left is about to be destroyed by the assignment to Original_Tree, so saving its content and replacing it with null rather than making a complete copy of it would be equivalent and more efficient. Be that as it may, it is conceivable that there are times when the compiler can't see what is obvious to us, or perhaps the compiler is still just a prototype and isn't quite as smart as it will be eventually (;-). In that case, we might want to be able to be explicit in cases where we want to fetch the value of an object or a part of an object, and after we fetch the value we want to set the object to null, or perhaps to some other appropriate value.

Given this situation, which seems like it might come up fairly frequently when making structural modifications to a data structure (such as rebalancing the tree), or destructively removing elements from a data structure (e.g. "Remove_First(Queue)"), we believe it might be useful to have a notion of load-and-null-out. That is, we want the value of Original.Left_Tree, but after fetching the value, we are "done" with it, so set it to null. This means we may not need to make a copy of the subobject, since it has been carved out of its old enclosing object, with a null left behind.

We considered various notations for representing load-and-null-out, and came up with the notion of a deferred assignment, that is an assignment where before we overwrite the left-hand-side, we fetch its old value and return it. This is reminiscent of the compare-and-swap operation, or fetch-and-add, or the post increment X++ of C (and PDP-11/Vax-11 fame, for those familiar with those older DEC machines). Here is a tentative notation for a deferred assignment:

   Original_Tree := (Original_Tree.Left ::= null);

What the "::= null" operation means is to fetch the value of Original_Tree.Left, and after doing so, assign it to null. One could imagine generalizing this extra ':' notation to support post-increment:

X := A[(I :+= 1)];

This would mean fetch the value of I, and after doing so, apply the "+= 1" operation to it. It is not obvious whether this generalization is a good idea, or just cute. But the deferred assignment of null seems like an operation that will be quite useful in the pointer-free world of ParaSail, to provide more control when restructuring a complex object.

Monday, July 25, 2011

Adapting one interface to implement another

We bumped into an interesting problem in ParaSail recently. We have an abstract module Countable which has the following interface:

abstract interface Countable<> is
    op "+"(Left : Countable; Right : Univ_Integer) -> Countable;
    op "+"(Left : Univ_Integer; Right : Countable) -> Countable;
    op "-"(Left : Countable; Right : Univ_Integer) -> Countable;
    op "-"(Left, Right : Countable) -> Univ_Integer;
    op "=?"(Left, Right : Countable) -> Ordering;
end interface Countable;

This interface is used when defining a "countable interval" such as "X .. Y" where you want to be able to iterate from X up to Y, even though X and Y are not themselves of an integer type. For example, if X and Y are characters, it makes sense to define an interval such as 'a' .. 'z' and there is a desire to be able to go from 'a' to the next character in the interval, so by using the "+" operator from Countable that is easy to do. Just add 1. Similarly, we can iterate through the interval in reverse by subtracting 1. Or we can find out how big is the interval X..Y by computing (Y - X) + 1, where we are using the second "-" operator.

Other examples of "countable" types are representations of time, where you can add and subtract some number of clock ticks, or compute the number of clock ticks between two times, even if the "time" type itself isn't necessarily represented as a single integer. And in general any ordered enumeration type (such as Enum<[#monday,#tuesday,#wednesday, ...]>) might want to be considered Countable, so that intervals can be defined and manipulated.

A Countable_Set abstraction would be useful as a way to represent sets of Countable elements, while still efficiently representing contiguous ranges of values such as "1..1000" as part of the set. For example:

interface Countable_Set<Element_Type is Countable<>> is
    op ".."(Left, Right : Element_Type) -> Countable_Set;
    op "|"(Left, Right : Countable_Set) -> Countable_Set;
    op "in"(Left : Element_Type; Right : Countable_Set) -> Boolean;
    func Count(CS : Countable_Set) -> Univ_Integer;
    ...
end interface Countable_Set;

Implementing the Count function clearly will need to make use of the "+" or "-" Countable operations. But this now brings us to the question, is Integer itself a Countable type? Can we legally write Countable_Set<Integer>? Well if we look at the binary "+" and "-" operators for Integer we see:

interface Integer<...> is
    op "+"(Left, Right : Integer) -> Integer;
    op "-"(Left, Right : Integer) -> Integer;
    ...
end interface Integer;

These don't actually match the operators expected for a Countable type. And if we were to simply add the missing operators, we would create significant ambiguity, since "X + 1" could either resolve to the existing Integer,Integer->Integer "+" or to the added Countable one, Integer,Univ_Integer->Integer "+". Not ideal.

More generally, we can see a situation where a useful abstraction, such as a Countable_Set, expects a type parameter such as Countable<>, and the type we have is close but is missing some of the required operations, and we don't really want to add the missing operations for general use because of ambiguity, or they don't quite do the right thing, or whatever. Ideally we would want to make them available for implementing a given interface or interfaces, but not for general use.

Some languages provide mechanisms for addressing problems like these. In Eiffel, as part of inheriting from another class the derived class can rename some of the operations. With that approach, on inheriting these Countable operations we would rename them to something like "Countable_Plus" and "Countable_Minus" and then define them to do whatever we want. In C++ it is possible to disambiguate particular operations by specifying from which base type they are inherited.

In ParaSail we are pursuing somewhat of a different approach. Essentially we are allowing a module to publicly declare that it is implementing various operations that might be needed to be considered "Countable" or "Hashable" or whatever, while not making them available for "general" use. Here is how the ParaSail Integer module solves the Countable conundrum:

interface Integer<...> is
    op "+"(Left, Right : Integer) -> Integer;
    op "-"(Left, Right : Integer) -> Integer;
    ...
  implements
      for Countable
    op "+"(Left : Integer; Right : Univ_Integer) -> Integer;
    op "+"(Left : Univ_Integer; Right : Integer) -> Integer;
    op "-"(Left : Integer; Right : Univ_Integer) -> Integer;
    op "-"(Left, Right : Integer) -> Univ_Integer;
end interface Integer;

You can also omit the "for Countable" part and then the operations are available to implement any other module's interface. But the key thing is that you cannot call these operations directly on an Integer type. You can only invoke them when "viewing" an Integer as a "Countable" object. You can have multiple sections after "implements" each section starting with "for Interface1, Interface2, ..." if there are several different sets of operations that need to be implemented to satisfy the needs for different sets of interfaces. In general situations like this are expected to be rare, but when you bump into one, it is useful to have a way to avoid the ambiguity and still satisfy the needs of the other interface.

Saturday, July 23, 2011

Another slight shift in syntax

As we use ParaSail more, we are making some additional small changes in syntax. The first one has to do with allowing a component to be declared as a reference to another object. The ability to return a short-lived object reference is important for user-defined indexing. We also imagine that certain data structures will want to incorporate references as components. Such data structures will not allow assignment, but nevertheless they can be useful for iterating over some other data structure. For example, one might have a complex tree structure, and then an iterator over the structure which contains a reference to the current item in the tree, and references to the enclosing nodes in the tree, so that given the iterator one can generate a reference to the next element of the tree (presuming some ordering such as depth-first left-to-right).

Currently, object references are recognized by using "=>" rather than ":=" for defining their initial value. However, that is somewhat subtle, and for a component, the initial value can't be specified when the component is declared. Having already used the term "ref" for parameters that act as references, we have chosen to change the syntax for other objects that are references by explicitly using the term "ref" for them as well. For example:

    ref X => M[I];
    ref const R => T.Right;
    ref var L => T.Left;

A "ref" by itself means that the reference is a read-write reference only if the object to which it refers is a variable. Explicitly using "ref const" makes the reference read-only even if the object to which it refers is a variable. Explicitly using "ref var" makes the reference read-write, and requires the object to which it refers to be a variable. So the first example above creates a reference to the Ith element of M, providing read-write access only if M is a variable, the second creates a read-only reference to the Right component of T, and the third creates a read-write reference to the Left component of T, and requires that T itself is a variable.

Having adopted the use of "ref", we can now leave off the "=> obj" part and simply specify a type, and in that way declare a component which will be a reference. Here is an example of an iterator that walks over a tree structure. It uses a "ref" component to point to a particular node in the tree:

    class Tree_Iterator is
        ref Current_Node : Tree_Node;
        var Enclosing_Node : optional Tree_Iterator;
        var Which_Child : Child_Index; 
      exports
        func First_Item(ref Root : Tree_Node) -> Tree_Iterator is
            // Return leftmost child of Root to start the iteration
            return Leftmost_Child
              ((Current_Node => Root, Enclosing_Node => null, Which_Child => 0));
        end func First_Item;

        func Leftmost_Child(It : Tree_Iterator) -> Tree_Iterator is
             // Return iterator designating leftmost child
             if Num_Children(It.Current_Node) == 0 then
                 // We are at the leftmost child already
                 return It;
             else
                 // Recurse down the left side
                 return
                   (Current_Node => It.Current_Node[1],
                    Enclosing_Node => It,
                    Which_Child => 1);
             end if;
        end func Leftmost_Child;

        func Next_Item(It : Tree_Iterator) -> optional Tree_Iterator is
            // Advance iterator to next item in left-to-right depth-first walk
            if It.Enclosing_Node is null then 
                // All done
                return null;
            elsif It.Which_Child < Num_Children(It.Enclosing_Node.Current_Node) then
                // Return next child of same enclosing node
                return Leftmost_Child
                  ((Current_Node => It.Enclosing_Node[It.Which_Child + 1],
                    Enclosing_Node => It.Enclosing_Node,
                    Which_Child => It.Which_Child + 1));
            else
                // Recurse to go to next sibling of enclosing node
                return Next_Item(It.Enclosing_Node);
            end if;
        end func Next_Item;
        ...
    end class Tree_Iterator;

This Tree_Iterator has a component that is a reference to a node in a tree, an optional Tree_Iterator for the enclosing node in the tree, and an indication of which child the current node is of the enclosing node.

By specifying simply "ref" rather than "ref var" or "ref const" for the Current_Node reference, we are saying that it is a reference to a variable when the enclosing Tree_Iterator object is a variable.

Any type that has a (constructor) function in its interface that takes a "ref" input and produces a new object of the type as an output is presumed to be burying the "ref" somewhere in the object. (Of course the compiler can just "look" in the class and see this, but a client of the module wants to know strictly from looking at the interface.) If the objects of a type incorporate a reference within them, then objects of that type do not permit assignment, since references in ParaSail are always required to be short-lived. The result of such a constructor function therefore can only be used to initialize a new object, be passed as a parameter, or be used to define the next value for a loop variable. The result of such a function cannot be assigned as the new value for an existing object.

Now that we are using "ref" explicitly for objects and components, it makes the syntax used for parameters, where "ref" comes after the parameter name and the ":", seem inconsistent. Hence we are changing the syntax for parameters to more closely match that of objects and components:

    operation_input ::= 
       id ':' type [ := default ]
     | 'var' id ':' type
     | 'ref' [ 'var' | 'const' ] id ':' type
     | 'global' [ 'var' ] id ':' type
     | [ 'locked' | 'queued' ] [ 'var' ] id ':' type 

   operation_output ::=
       [ id ':' ] type
     | 'ref' [ 'var' | 'const' ] [ id ':' ] type

For example:

    func Append(var L : List; E : Element);
    op "indexing"(ref V : Vector; I : Index) -> ref Element;
    func Get_Next(queued var Q : Queue) -> Element;

We will be posting a new draft of the ParaSail reference manual reflecting these changes shortly to the ParaSail google group:
http://groups.google.com/group/parasail-programming-language

Wednesday, July 13, 2011

Everything in a func?

Here is an issue which is trivial semantically, but potentially has a more substantial effect on the look and feel of the language. Currently, reflecting to some extent the Pascal heritage, ParaSail uses the terms "function," "procedure," and "operator" for the various kinds of operations. Functions must have at least one output, procedures may not have any outputs, and operators may have zero or more outputs, but have a name given in quotes and are used for defining operations that connect into special syntax or semantics, such as unary or binary operators, implicit conversions from universal types, etc.

As we go about creating more examples in ParaSail, and compare with various other recent languages, we find the use of these three distinct terms not particularly helpful, and certainly more verbose than most other languages. On the other hand, the terms do communicate something about the appropriate use of the operations.

We are now leaning toward replacing all three terms with the term "func". Our sense is this would give the syntax a somewhat less heavy, and perhaps more modern feel. We would retain the requirement to echo the term at the end of the operation, hence:

func GCD(X, Y : Integer) -> Integer is
    case X =? Y of
      [#less] => return GCD(X, Y mod X);
      [#equal] => return X;
      [#greater] => return GCD(X mod Y, Y);
    end case;
end func GCD;

func Print(F : ref var File; X : Integer) is
    if abs X >= 10 then
        Print(F, X/10);
    end if;
    Print(F, '0' + (X rem 10))
end func Print;

func "|"(X : Set; E : Elem) -> Set is
    return Union(X, Make_Set(E)); 
end func "|";

The above three "func"s would be "function GCD," "procedure Print," and operator "|" using the current ParaSail syntax. Given that we are already abbreviating "constant" to const and "variable" to var in ParaSail, it doesn't seem inconsistent to adopt a standard abbreviation here for function/procedure/operator, and func is one that appears in a number of languages.

Now this is an issue about which anyone and everyone probably has an opinion...

Thursday, July 7, 2011

Implementing concurrent loops in ParaSail

The implementation of the ParaSail compiler and virtual machine (see http://parasail-programming-language.blogspot.com/2010/11/virtual-machine-for-parasail-with.html) are proceeding apace. Basic support is there for interfaces, classes, operators, functions, procedures, module instantiations, object and type declarations, if, case, while, and assignment statements, user-defined indexing, parallel evaluation of expressions, parallel execution of statements on either side of a "||", etc. However, a big gap at the moment is for loops, and in particular concurrent for loops.

For loops are a bit unusual in ParaSail, as they support both a continue loop and an exit loop (i.e. "break") capability in the presence of parallelism. Here is an example using continue loop:

for X => Root while X not null loop
    case X.Kind of
      [#leaf] => Process(X.Data);
      [#unary] =>
          Process(X.Data);
        ||
          continue loop with X => X.Operand;
      [#binary] =>
          Process(X.Data);
        ||
          continue loop with X => X.Left;
        ||
          continue loop with X => X.Right;
    end case;
end loop;

Here is an example using exit loop:

const Result : Id_Type; 
for X => Root then X.Left || X.Right while X not null 
  concurrent loop
    if X.Value == Desired_Value then
        exit loop with Result => X.Id;
    end if;
end loop with Result => null;

This first example processes an expression tree, and creates parallelism by using an explicit continue loop statement in concert with a concurrent sequence ("||") construct. The second example searches a binary tree and creates parallelism by both the use of concurrent loop and by having two values in the then part of the for loop header, which will result in the execution of the loop splitting into two separate threads at each binary node.

The semantics of saying concurrent loop rather than simply loop is that the next iteration starts before even starting the current iteration. If we had said simply loop in the second example, then it would first check the Root node to see if it has the Desired_Value, and only if it did not would we proceed on to the next iteration(s), at which point we would check both the Left and Right subtrees (presuming both were non-null). Since we said concurrent loop, the iteration on the Left and Right subtrees begins in parallel with executing the loop body on the Root.

The semantics of the exit loop statement are that all other threads within the loop being exited are stopped, and when the thread first reaching the exit loop is the only thread remaining, the assignments specified by the with-values clause are performed. If the loop ends normally without an exit loop, then the assignments specified by the end with-values clause are performed. The net effect is that if the Desired_Value is in the tree, Result ends up with the Id of that node. If it is not found, Result ends up null.

See http://parasail-programming-language.blogspot.com/2010/06/generalized-for-loops-in-parasail.html and http://parasail-programming-language.blogspot.com/2010/06/intentional-race-condition-in-parasail.html for more details about the generalized for loop capabilities.

So given all of the above, what should be the implementation model for a for loop in ParaSail? As mentioned in the blog entry on the ParaSail virtual machine (PSVM -- see link in first paragraph above), the implementation uses pico-threads to represent the potential parallelism in the code. There are PSVM instructions for invoking a separate operation, or a local block, as a separate parallel pico-thread. Currently this is used to support parallel evaluation of expressions or concurrent sequences ("||"), where each operand of a call on an operation, or each side of "||", can be evaluated in its own pico-thread. In fact, the compiler is a bit smarter than that, and it doesn't bother to create a pico-thread if the computation of the operand involves only simple built-in operations (such as basic arithmetic, numeric comparisons, etc.). If it involves an out-of-line call then it becomes a candidate for being turned into a pico-thread (this is clearly an area for tuning of the compiler as more experience is gained).

So it makes sense for the body of a concurrent loop, or a loop that gains parallelism through multiple parallel "next" values or explicit uses of continue loop inside a concurrent sequence, to be executed as a separate pico-thread. The compiler can turn the loop body into a local operation, with the parameter(s) being the loop index (or indices). A continue loop is essentially a (tail) recursive call on this local loop-body operation, with the parameter being the next value specified in the with-values clause. Similarly, if there are explicit "next" values given after then in the loop header, these can be handled using (parallel) calls on the loop-body operation, with the "next" value as the parameter.

Rather than using tail recursion, an alternative is to use a parallel call followed by a return, with the pico-thread for the parallel call being associated with a pico-thread master that controls the entire loop. At the outer level, the whole loop can then be executed by creating the pico-thread master, and then starting the loop off with a parallel call on the loop-body operation, followed by waiting for the pico-thread master to be complete.

The overall flow would be:

Create pico-thread master.
Initiate parallel-call on loop-body operation with parameter referring to initial value of loop index, presuming initial value satisfies the while condition (if any).
Wait for pico-thread master to be complete.

At the end of an iteration of a loop, or at a continue loop statement, the flow would be:

Determine next value(s) for loop index.
Initiate parallel call(s) on loop-body operation with parameter referring to next value, presuming value satisfies the while condition (if any).
Return from the loop-body operation (don't wait for the call(s) to complete).

At the beginning of an iteration of a concurrent loop, the flow would be:

Determine next value(s) for loop index.
Initiate parallel call(s) on loop-body operation with parameter referring to next value, presuming value satisfies the while condition (if any).
Perform the loop body for the current value.
Return from the loop-body operation (don't wait for the call(s) to complete).

When reaching an exit loop statement, the flow would be:

Tell pico-thread master of loop to terminate all but the current thread.
Wait for other threads to terminate.
Perform with-values assignment(s).
Return from the loop-body operation.

When the master is complete:

If there is a with-values clause on the end loop and the master completed normally, as opposed to being terminated early by an exit loop statement:

Perform the assignment(s) specified by the end loop with-values clause.

Designing ParaSail, a new programming language