Designing ParaSail, a new programming language

Wednesday, September 7, 2011

ParaSail compiler reaches a milestone

The ParaSail compiler has reached a bit of a milestone, in that it now can compile and run what might be considered to be "interesting" programs. Annotations are not yet enforced, and explicitly concurrent lock-based objects are not yet there, but many other things are working, including implicit and explicit parallelism, dynamic storage management using expandable/shrinkable objects (rather than pointers and garbage collection), fully-shared generic template instantiations, etc. We are trying to solidify the features that are currently supported so we can make a version of the prototype compiler available for experimentation by others within the next couple of weeks.

Below is one of the "interesting" programs. This one is an implementation of balanced "AA" trees. This doesn't illustrate any explicit parallelism, but does illustrate creating and updating interesting data structures without using explicit pointers. Pointer-free data structures allow the compiler to easily understand any aliasing, since there is no hidden sharing between distinct objects. That allows the compiler to eliminate race conditions at compile-time.

Here is the interface to the AA_Tree module. (As a side-note, the program that "HTML"-ized this code was itself written in ParaSail. I've included it at the bottom of this posting.)

interface AA_Tree<Element is Comparable<>> is

    // This module implements a balanced "AA" tree, originally
    // described by Arne Andersson in the "Proceedings of the Workshop
    // on Algorithms and Data Structures," pp 60-71, Springer Verlag, 1993.
    // The following algorithm and descriptions were taken from the
    // WikiPedia article on AA_Tree: 
    //       http://en.wikipedia.org/wiki/AA_tree
    // Note that various additional checks for a null tree have been added.

    // Only two operations are needed for maintaining balance in an AA tree.
    // These operations are called skew and split. Skew is a right rotation
    // when an insertion or deletion creates a left horizontal link. Split
    // is a conditional left rotation when an insertion or deletion creates two
    // horizontal right links, which once again corresponds to two
    // consecutive red links in red-black trees.

    op "[]"() -> optional AA_Tree;
        // Create an empty tree

    func Insert(var T : optional AA_Tree; X : Element);
        // input: X, the value to be inserted, and 
        // T, the root of the tree to insert it into.
        // output: A balanced T' including X.

    func Delete(var T : optional AA_Tree; X : Element);
        // input: X, the value to delete, and T, 
        // the root of the tree from which it should be deleted.
        // output: T', balanced, without the value X.

    op "in"(X : Element; T : optional AA_Tree) -> Boolean;

    func Overlapping(T : optional AA_Tree; X : Element) -> optional Element;
        // input: X, the value to find, and T, 
        // the root of the tree to be searched.
        // output: the element equal to or "unordered" relative to X.

    op "|="(var T : optional AA_Tree; X : Element);

    func First(T : optional AA_Tree) -> optional Element;
      // Return first (smallest) element in tree

    func Last(T : optional AA_Tree) -> optional Element;
      // Return last (greatest) element in tree

    func Remove_First(var T : optional AA_Tree) -> optional Element;
      // Remove first (smallest) element in tree

    func Remove_Last(var T : optional AA_Tree) -> optional Element;
      // Remove last (greatest) element in tree

    func Remove_Any(var T : optional AA_Tree) -> optional Element;
      // Remove some element from tree

    func Count(T : optional AA_Tree) -> Univ_Integer;
      // Return a count of the nodes in the tree

end interface AA_Tree;

Here is the class that provides the implementation for the AA_Tree module:

class AA_Tree is
    var Left : optional AA_Tree;
    var Right : optional AA_Tree;
    var Value : Element;
    var Level : Univ_Integer := 0;

    func Node(Value : Element; Level : Univ_Integer; Left : optional AA_Tree;
      Right : optional AA_Tree) -> AA_Tree is
        // Create a new tree.
        return (Left => Left, Right => Right, Value => Value,
          Level => Level);
    end func Node;

    func Is_Leaf(T : optional AA_Tree) -> Boolean is
        return T not null and then
          T.Left is null and then T.Right is null;
    end func Is_Leaf;

    func Leftmost(ref T : optional AA_Tree) -> ref optional AA_Tree is
        for L => T loop
            if L not null and then L.Left not null then
                // Continue with Left until we reach null
                continue loop with L.Left;
            else
                // Found left-most
                return L;
            end if;
        end loop;
    end func Leftmost;

    func Successor(T : optional AA_Tree) -> optional Element is
        // Return element in tree greater than but closest to T.Value
        if T.Right not null then
            return Leftmost(T.Right).Value;
        else
            return null;
        end if;
    end func Successor;

    func Rightmost(ref T : optional AA_Tree) -> ref optional AA_Tree is
        for R => T loop
            if R not null and then R.Right not null then
                // Keep following down Right side
                continue loop with R.Right;
            else
                // Found right-most
                return R;
            end if;
        end loop;
    end func Rightmost;

    func Predecessor(T : optional AA_Tree) -> optional Element is
        // Return element in tree less than but closest to T.Value
        if T.Left not null then
            return Rightmost(T.Left).Value;
        else
            return null;
        end if;
    end func Predecessor;

    func Skew(var T : optional AA_Tree) is
      // input: T, a node representing an AA tree that needs to be rebalanced.
      // output: T' Another node representing the rebalanced AA tree.

        if T not null and then
          T.Left not null and then
          T.Left.Level == T.Level then
            // The current T.Left becomes new root

            // Exchange value of T.Left with root
            T.Value <=> T.Left.Value;
           
            // Move old root and T.Left.Right over to right side of tree
            T.Left.Right <=> T.Right;
            T.Left.Left <=> T.Right;
            T.Left <=> T.Right;
        end if;
    end func Skew;

    func Split(var T : optional AA_Tree) is
        // input: T, a node representing an AA tree that needs to be rebalanced.
        // output: T' Another node representing the rebalanced AA tree.

        if T not null and then
          T.Right not null and then
          T.Right.Right not null and then
          T.Level == T.Right.Right.Level then
            // T.Right becomes the new root
            // Exchange value and level between root and T.Right
            T.Value <=> T.Right.Value;
            T.Level <=> T.Right.Level;

            // Move old root and T.Right.Left to left side of tree
            T.Left <=> T.Right.Right;
            T.Right.Left <=> T.Right.Right;
            T.Left <=> T.Right;

            // Increment level
            T.Level += 1;
        end if;
    end func Split;

    func Decrease_Level(var T : optional AA_Tree) is
        // input: T, a tree for which we want to remove links that skip levels.
        // output: T with its level decreased.

        if T is null then
            return;
        end if;
           
        var Should_Be : Univ_Integer := 1;

        if T.Left not null then
            Should_Be := T.Left.Level + 1;
        end if;

        if T.Right not null then
            Should_Be := Min(Should_Be, T.Right.Level + 1);
        end if;
            
        if Should_Be < T.Level then
            T.Level := Should_Be;
            if T.Right not null and then
              Should_Be < T.Right.Level then
                T.Right.Level := Should_Be;
            end if;
        end if;
    end func Decrease_Level;

  exports

    op "[]"() -> optional AA_Tree is
        // Create an empty tree
        return null;
    end op "[]";

    // Insertion begins with the normal binary tree search and insertion
    // procedure. Then, as the call stack unwinds (assuming a recursive
    // implementation of the search), it's easy to check the validity of the
    // tree and perform any rotations as necessary. If a horizontal left link
    // arises, a skew will be performed, and if two horizontal right links
    // arise, a split will be performed, possibly incrementing the level of the
    // new root node of the current subtree. Note, in the code as given above,
    // the increment of T.Level. This makes it necessary to continue checking
    // the validity of the tree as the modifications bubble up from the leaves.
    
    func Insert(var T : optional AA_Tree; X : Element) is
        // input: X, the value to be inserted, and 
        // T, the root of the tree to insert it into.
        // output: A balanced T' including X.

        // Do the normal binary tree insertion procedure. 
        // Set the result of the recursive call to the correct 
        // child in case a new node was created or the
        // root of the subtree changes.

        if T is null then
            // Create a new leaf node with X.
            T := Node(X, 1, null, null);
            return;
        end if;

        case X =? T.Value of
          [#less] =>
            Insert(T.Left, X);
          [#greater] =>
            Insert(T.Right, X);
          [#equal | #unordered] =>
            // Note that the case of X == T.Value is unspecified. 
            // As given, an insert will have no effect. 
            // The implementor may desire different behavior.
            return;
        end case;

        // Perform skew and then split. 
        // The conditionals that determine whether or
        // not a rotation will occur or not are inside 
        // of the procedures, as given above.

        Skew(T);
        Split(T);
    end func Insert;

    // As in most balanced binary trees, the deletion of an internal node can
    // be turned into the deletion of a leaf node by swapping the internal node
    // with either its closest predecessor or successor, depending on which are
    // in the tree or on the implementor's whims. Retrieving a predecessor is
    // simply a matter of following one left link and then all of the remaining
    // right links. Similarly, the successor can be found by going right once
    // and left until a null pointer is found. Because of the AA property of
    // all nodes of level greater than one having two children, the successor
    // or predecessor node will be in level 1, making their removal trivial.
    // 
    // To re-balance a tree, there are a few approaches. The one described by
    // Andersson in his original paper is the simplest, and it is described
    // here, although actual implementations may opt for a more optimized
    // approach. After a removal, the first step to maintaining tree validity
    // is to lower the level of any nodes whose children are two levels below
    // them, or who are missing children. Then, the entire level must be skewed
    // and split. This approach was favored, because when laid down
    // conceptually, it has three easily understood separate steps:
    // 
    //     Decrease the level, if appropriate.
    //     Skew the level.
    //     Split the level.
    // 
    // However, we have to skew and split the entire level this time instead of
    // just a node, complicating our code.

    func Delete(var T : optional AA_Tree; X : Element) is
        // input: X, the value to delete, and T, 
        // the root of the tree from which it should be deleted.
        // output: T', balanced, without the value X.

        if T is null then
            // Not in tree -- should we complain?
            return;
        end if;

        case X =? T.Value of
          [#less] =>
            Delete(T.Left, X);
          [#greater] =>
            Delete(T.Right, X);
          [#equal] =>
            // If we're a leaf, easy, otherwise reduce to leaf case. 
            if Is_Leaf(T) then
                T := null;
            elsif T.Left is null then
                // Get successor value and delete it from right tree,
                // and set root to have that value
                const Succ := Successor(T);
                Delete(T.Right, Succ);
                T.Value := Succ;
            else
                // Get predecessor value and delete it from left tree,
                // and set root to have that value
                const Pred := Predecessor(T);
                Delete(T.Left, Pred);
                T.Value := Pred;
            end if;
          [#unordered] =>
            // Not in tree; should we complain?
            return;  
        end case;

        // Rebalance the tree. Decrease the level of all nodes in this level if
        // necessary, and then skew and split all nodes in the new level.

        if T is null then
            return;
        end if;

        Decrease_Level(T);
        Skew(T);
        Skew(T.Right);
        if T.Right not null then
            Skew(T.Right.Right);
        end if;
        Split(T);
        Split(T.Right);
    end func Delete;

    op "in"(X : Element; T : optional AA_Tree) -> Result : Boolean is
        for P => T while P not null loop
            case X =? P.Value of
              [#less] =>
                continue loop with P.Left;
              [#greater] =>
                continue loop with P.Right;
              [#equal] =>
                return #true;
              [#unordered] =>
                return #false;
            end case;
        end loop;
        return #false;  // Not found
    end op "in";

    func First(T : optional AA_Tree) -> optional Element is
      // Return first (smallest) element in tree
        if T is null then
            return null;
        else 
            return Leftmost(T).Value;
        end if;
    end func First;

    func Last(T : optional AA_Tree) -> optional Element is
      // Return last (greatest) element in tree
        if T is null then
            return null;
        else
            return Rightmost(T).Value;
        end if;
    end func Last;


    func Remove_First(var T : optional AA_Tree) -> Result : optional Element is
      // Remove first (smallest) element in tree
        Result := First(T);
        if Result not null then
            Delete(T, Result);
        end if;
    end func Remove_First;

    func Remove_Last(var T : optional AA_Tree) -> Result : optional Element is
      // Remove last (greatest) element in tree
        Result := Last(T);
        if Result not null then
            Delete(T, Result);
        end if;
    end func Remove_Last;

    func Remove_Any(var T : optional AA_Tree) -> Result : optional Element is
      // Remove some element from tree
        if T is null then
            return null;
        end if;
        Result := T.Value;
        if Result not null then
            Delete(T, Result);
        end if;
    end func Remove_Any;

    func Count(T : optional AA_Tree) -> Univ_Integer is
      // Return a count of the nodes in the tree
        if T is null then
            return 0;
        else
            return Count(T.Left) + Count(T.Right) + 1;
        end if;
    end func Count;

    op "|="(var T : optional AA_Tree; X : Element) is
        Insert(T, X);
    end op "|=";

    func Overlapping(T : optional AA_Tree; X : Element) -> optional Element is
        // input: X, the value to find, and T, 
        // the root of the tree to be searched.
        // output: the element equal to or "unordered" relative to X.
        if T is null or else T.Value is null then
            return null;
        else
            case X =? T.Value of
              [#less] =>
                return Overlapping(T.Left, X);
              [#greater] =>
                return Overlapping(T.Right, X);
              [#equal | #unordered] =>
                // Close enough
                return T.Value;
            end case;
        end if;
    end func Overlapping;

end class AA_Tree;

Here is a small test program for AA_Tree:

func Test_AA_Tree(A : Univ_Integer; B : Univ_Integer; C : Univ_Integer) is
    type Univ_Tree is AA_Tree<Univ_Integer>;
    var T : Univ_Tree := [];
    var X : Univ_Integer := A;

    Insert(T, A);
    Println("Count = " | Count(T) | " after insert of " | A);
    Insert(T, B);
    Println("Count = " | Count(T) | " after insert of " | B);
    Insert(T, C);
    Println("Count = " | Count(T) | " after insert of " | C);

    Insert(T, A);
    Println("Count = " | Count(T) | " after another insert of " | A);

    Println(A | " in T = " | [[A in T]]);
    Println(B | " in T = " | [[B in T]]);
    Println(C | " in T = " | [[C in T]]);
    Println("7 in T = " | [[7 in T]]);

    for E := Remove_First(T) then Remove_First(T) while E not null loop
        Println("Remove_First = " | E);
    end loop;

    Println("Count after loop : " | Count(T));

    for I in 1..10 forward loop
        Insert(T, I);
        Println("Count = " | Count(T) | " after insert of " | I);
    end loop;

    for L := Remove_Last(T) then Remove_Last(T) while L not null loop
        Println("Remove_Last = " | L);
    end loop;

    Println("Count after loop : " | Count(T));

    for J in 1..10 reverse loop
        Insert(T, J);
        Println("Count = " | Count(T) | " after insert of " | J);
    end loop;

    Println("Count after loop : " | Count(T));

    Println("Overlapping(T, 5) = " | Overlapping(T, 5));

    for Z := Remove_Any(T) then Remove_Any(T) while Z not null loop
        Println("Remove_Any = " | Z);
    end loop;

    Println("Count after loop : " | Count(T));

    for K in 1..10 loop
        Insert(T, K);
        Println("Count = " | Count(T) | " after insert of " | K);
    end loop;

    for F := Remove_First(T) then Remove_First(T) while F not null loop
        Println("Remove_First = " | F);
    end loop;

    Println("Count after loop : " | Count(T));

end func Test_AA_Tree;

Here is the ParaSail program that was used to "HTML"-ize the listings above (as well as the one below!). Not terribly sophisticated, but illustrates some of the character handling:

func Html_Escape(C : Univ_Character) -> Univ_String is
    // Do single-character escapes
    case C of
      ['<'] =>
        return "&lt;";
      ['>'] =>
        return "&gt;";
      ['&'] =>
        return "&amp;";
      ['\\'] =>
        return "\\";    // Prevent "Print" from expanding control chars
      [..] =>
        return "" | C;  // Convert character into a string
    end case;
end func Html_Escape;

func Htmlize_Line(Orig : Univ_String) -> Result : Univ_String is
    // bold lower case words that are not in comments or
    // character, string, or enum literals

    Result := "";
    var I := 1;
    const L := Length(Orig);
    var State : Univ_Enumeration := #other;
    while I <= L loop
        var C := Orig[I];
        case C of
          ['\\'] =>
            if State == #string_literal or else State == #char_literal then
                // skip next character no matter what it is
                if I < L then
                    Result |= Html_Escape(C);
                    I += 1;
                    C := Orig[I];
                end if;
            end if;
          ['/'] =>
            if I < L and then Orig[I+1] == '/' then
                // comment, italicize it
                Result |= "<i>" | Html_Escape(C);
                while I < L loop
                    I += 1;
                    Result |= Html_Escape(Orig[I]);
                end loop;
                Result |= "</i>";
                C := null;
            end if;
          ['a' .. 'z'] =>
            if State == #other then
                // Presume this is a reserved word, so bold it
                Result |= "<b>" | C;
                while I < L and then Orig[I+1] in 'a' .. 'z' loop
                    I += 1;
                    Result |= Orig[I];
                end loop;
                Result |= "</b>";
                C := null;
            end if;
            
          ['#' | '0'..'9' | 'A' .. 'Z' | '_'] =>
            // Presume this is not a reserved word, pass through as is
            if State == #other then
                State := #unreserved;
            end if;

          ['"'] =>
            if State == #string_literal then
                // End of string literal
                State := #other;
            elsif State != #char_literal then
                State := #string_literal;
            end if;

          ['\''] =>
            if State == #char_literal then
                // End of char literal
                State := #other;
            elsif State != #string_literal then
                State := #char_literal;
            end if;

          [..] =>
            if State != #string_literal and then State != #char_literal then
                State := #other;
            end if;
        end case;

        if C not null then
            Result |= Html_Escape(C);
        end if;
        I += 1;
    end loop;

    Println(Result);
end func Htmlize_Line;

func Htmlize() is
    // Htmlize the standard input, 
    //  putting the result on the standard output
    Println("<pre>");
    loop
        const Line : Univ_String := Readln();
        if Line is null then
            // End of file indicated by "null"
            exit loop;
        end if;
        Htmlize_Line(Line);
    end loop;
    Println("</pre>");
end func Htmlize;

Friday, September 2, 2011

ParaSail on Intel Parallel Programming Talk video

The talk about ParaSail at OSCON 2011 generated a nice bit of media attention, and as a result we were invited to be on the Intel Software Network "Parallel Programming Talk" video show. The show is now online at:

http://software.intel.com/en-us/blogs/2011/08/26/parasail-a-new-programming-language-parallel-programming-talk-120/

There is a relatively long intro discussion of about 8 minutes, and then we get into the nitty gritty of ParaSail.

Saturday, July 30, 2011

A load-and-null-out operation

As we have continued to write more programs in ParaSail, we have found the pointer-free expandable-object data structuring primitives very interesting to use. A simple tree structure can be defined as:

interface Binary_Tree<Element_Type is Assignable<>> is
    var Left, Right : optional Binary_Tree;
    var Contents : Element_Type;
end interface Binary_Tree;

In most cases you would keep this structure private by declaring it inside the class defining the Binary_Tree module rather than declaring it in the externally-visible interface, but this illustrates the basic structuring technique, using optional to allow recursive structures to be created.

Because there are no pointers, only sub-objects, essentially any arbitrarily complex data structure can be copied using a simple assignment:

  var Copy_Of_Tree : optional Binary_Tree := Original_Tree;

This will create a copy of Original_Tree, no matter how complex it is. Similarly, we can expand a tree by assigning a new tree into its Left or Right subtree:

  Original_Tree.Left := Copy_Of_Tree;

So what did the above two assignments do? Well they made a complete copy of the Original_Tree, and then assigned that copy into the left subtree of the Original_Tree, wiping out whatever was in the left subtree before. But we now have a complete copy of what was in the Original_Tree.Left before in Original_Tree.Left.Left. And Original_Tree.Left.Right is now identical in structure to Original_Tree.Right. We could have done this without explicitly declaring the intermediate "Copy_Of_Tree" object, with the same effect:

  Original_Tree.Left := Original_Tree;

I'm not sure whether what we just did was of any use, but it does give you a feel for the power of expandable objects. The user doesn't have to write explicitly any deep copy operation, since from ParaSail's point of view, these are just subobjects, and ":=" assigns the whole thing, including all of the subobjects.

The old contents of the left-hand side of an assignment are wiped out, and automatically reclaimed immediately -- no need for the user to write a deep reclaim operation. We don't need to wait for a garbage collector to decide whether it is safe to reclaim the storage associated with the old contents, since there is no sharing between objects in ParaSail. If we overwrite the contents of an object or part of an object, whatever was there before can be reclaimed without danger of creating any dangling references.

Here is another example:

  Original_Tree := Original_Tree.Left;

This wipes out the old contents of Original _Tree and replaces it with a copy of its left subtree. But there seems to be some unnecessary copying going on here. The underlying sequence is:

Make a copy of Original_Tree.Left
Reclaim the old contents of Original_Tree
Overwrite it with the copy of Original_Tree.Left

This makes a complete copy of Original_Tree.Left, and then proceeds to reclaim all of Original_Tree (including the left subtree), and then stores the copy of the left subtree back into Original_Tree. But wouldn't it make more sense to just carve off the left subtree, leaving behind a stub (i.e. null), then reclaim Original_Tree which now has a null left subtree, and then set Original_Tree to be the carved-off left subtree? In fact, if the compiler were somewhat smarter, that is exactly what it would do, and should be expected to do, since it knows that Original_Tree.Left is about to be destroyed by the assignment to Original_Tree, so saving its content and replacing it with null rather than making a complete copy of it would be equivalent and more efficient. Be that as it may, it is conceivable that there are times when the compiler can't see what is obvious to us, or perhaps the compiler is still just a prototype and isn't quite as smart as it will be eventually (;-). In that case, we might want to be able to be explicit in cases where we want to fetch the value of an object or a part of an object, and after we fetch the value we want to set the object to null, or perhaps to some other appropriate value.

Given this situation, which seems like it might come up fairly frequently when making structural modifications to a data structure (such as rebalancing the tree), or destructively removing elements from a data structure (e.g. "Remove_First(Queue)"), we believe it might be useful to have a notion of load-and-null-out. That is, we want the value of Original.Left_Tree, but after fetching the value, we are "done" with it, so set it to null. This means we may not need to make a copy of the subobject, since it has been carved out of its old enclosing object, with a null left behind.

We considered various notations for representing load-and-null-out, and came up with the notion of a deferred assignment, that is an assignment where before we overwrite the left-hand-side, we fetch its old value and return it. This is reminiscent of the compare-and-swap operation, or fetch-and-add, or the post increment X++ of C (and PDP-11/Vax-11 fame, for those familiar with those older DEC machines). Here is a tentative notation for a deferred assignment:

   Original_Tree := (Original_Tree.Left ::= null);

What the "::= null" operation means is to fetch the value of Original_Tree.Left, and after doing so, assign it to null. One could imagine generalizing this extra ':' notation to support post-increment:

X := A[(I :+= 1)];

This would mean fetch the value of I, and after doing so, apply the "+= 1" operation to it. It is not obvious whether this generalization is a good idea, or just cute. But the deferred assignment of null seems like an operation that will be quite useful in the pointer-free world of ParaSail, to provide more control when restructuring a complex object.

Monday, July 25, 2011

Adapting one interface to implement another

We bumped into an interesting problem in ParaSail recently. We have an abstract module Countable which has the following interface:

abstract interface Countable<> is
    op "+"(Left : Countable; Right : Univ_Integer) -> Countable;
    op "+"(Left : Univ_Integer; Right : Countable) -> Countable;
    op "-"(Left : Countable; Right : Univ_Integer) -> Countable;
    op "-"(Left, Right : Countable) -> Univ_Integer;
    op "=?"(Left, Right : Countable) -> Ordering;
end interface Countable;

This interface is used when defining a "countable interval" such as "X .. Y" where you want to be able to iterate from X up to Y, even though X and Y are not themselves of an integer type. For example, if X and Y are characters, it makes sense to define an interval such as 'a' .. 'z' and there is a desire to be able to go from 'a' to the next character in the interval, so by using the "+" operator from Countable that is easy to do. Just add 1. Similarly, we can iterate through the interval in reverse by subtracting 1. Or we can find out how big is the interval X..Y by computing (Y - X) + 1, where we are using the second "-" operator.

Other examples of "countable" types are representations of time, where you can add and subtract some number of clock ticks, or compute the number of clock ticks between two times, even if the "time" type itself isn't necessarily represented as a single integer. And in general any ordered enumeration type (such as Enum<[#monday,#tuesday,#wednesday, ...]>) might want to be considered Countable, so that intervals can be defined and manipulated.

A Countable_Set abstraction would be useful as a way to represent sets of Countable elements, while still efficiently representing contiguous ranges of values such as "1..1000" as part of the set. For example:

interface Countable_Set<Element_Type is Countable<>> is
    op ".."(Left, Right : Element_Type) -> Countable_Set;
    op "|"(Left, Right : Countable_Set) -> Countable_Set;
    op "in"(Left : Element_Type; Right : Countable_Set) -> Boolean;
    func Count(CS : Countable_Set) -> Univ_Integer;
    ...
end interface Countable_Set;

Implementing the Count function clearly will need to make use of the "+" or "-" Countable operations. But this now brings us to the question, is Integer itself a Countable type? Can we legally write Countable_Set<Integer>? Well if we look at the binary "+" and "-" operators for Integer we see:

interface Integer<...> is
    op "+"(Left, Right : Integer) -> Integer;
    op "-"(Left, Right : Integer) -> Integer;
    ...
end interface Integer;

These don't actually match the operators expected for a Countable type. And if we were to simply add the missing operators, we would create significant ambiguity, since "X + 1" could either resolve to the existing Integer,Integer->Integer "+" or to the added Countable one, Integer,Univ_Integer->Integer "+". Not ideal.

More generally, we can see a situation where a useful abstraction, such as a Countable_Set, expects a type parameter such as Countable<>, and the type we have is close but is missing some of the required operations, and we don't really want to add the missing operations for general use because of ambiguity, or they don't quite do the right thing, or whatever. Ideally we would want to make them available for implementing a given interface or interfaces, but not for general use.

Some languages provide mechanisms for addressing problems like these. In Eiffel, as part of inheriting from another class the derived class can rename some of the operations. With that approach, on inheriting these Countable operations we would rename them to something like "Countable_Plus" and "Countable_Minus" and then define them to do whatever we want. In C++ it is possible to disambiguate particular operations by specifying from which base type they are inherited.

In ParaSail we are pursuing somewhat of a different approach. Essentially we are allowing a module to publicly declare that it is implementing various operations that might be needed to be considered "Countable" or "Hashable" or whatever, while not making them available for "general" use. Here is how the ParaSail Integer module solves the Countable conundrum:

interface Integer<...> is
    op "+"(Left, Right : Integer) -> Integer;
    op "-"(Left, Right : Integer) -> Integer;
    ...
  implements
      for Countable
    op "+"(Left : Integer; Right : Univ_Integer) -> Integer;
    op "+"(Left : Univ_Integer; Right : Integer) -> Integer;
    op "-"(Left : Integer; Right : Univ_Integer) -> Integer;
    op "-"(Left, Right : Integer) -> Univ_Integer;
end interface Integer;

You can also omit the "for Countable" part and then the operations are available to implement any other module's interface. But the key thing is that you cannot call these operations directly on an Integer type. You can only invoke them when "viewing" an Integer as a "Countable" object. You can have multiple sections after "implements" each section starting with "for Interface1, Interface2, ..." if there are several different sets of operations that need to be implemented to satisfy the needs for different sets of interfaces. In general situations like this are expected to be rare, but when you bump into one, it is useful to have a way to avoid the ambiguity and still satisfy the needs of the other interface.

Saturday, July 23, 2011

Another slight shift in syntax

As we use ParaSail more, we are making some additional small changes in syntax. The first one has to do with allowing a component to be declared as a reference to another object. The ability to return a short-lived object reference is important for user-defined indexing. We also imagine that certain data structures will want to incorporate references as components. Such data structures will not allow assignment, but nevertheless they can be useful for iterating over some other data structure. For example, one might have a complex tree structure, and then an iterator over the structure which contains a reference to the current item in the tree, and references to the enclosing nodes in the tree, so that given the iterator one can generate a reference to the next element of the tree (presuming some ordering such as depth-first left-to-right).

Currently, object references are recognized by using "=>" rather than ":=" for defining their initial value. However, that is somewhat subtle, and for a component, the initial value can't be specified when the component is declared. Having already used the term "ref" for parameters that act as references, we have chosen to change the syntax for other objects that are references by explicitly using the term "ref" for them as well. For example:

    ref X => M[I];
    ref const R => T.Right;
    ref var L => T.Left;

A "ref" by itself means that the reference is a read-write reference only if the object to which it refers is a variable. Explicitly using "ref const" makes the reference read-only even if the object to which it refers is a variable. Explicitly using "ref var" makes the reference read-write, and requires the object to which it refers to be a variable. So the first example above creates a reference to the Ith element of M, providing read-write access only if M is a variable, the second creates a read-only reference to the Right component of T, and the third creates a read-write reference to the Left component of T, and requires that T itself is a variable.

Having adopted the use of "ref", we can now leave off the "=> obj" part and simply specify a type, and in that way declare a component which will be a reference. Here is an example of an iterator that walks over a tree structure. It uses a "ref" component to point to a particular node in the tree:

    class Tree_Iterator is
        ref Current_Node : Tree_Node;
        var Enclosing_Node : optional Tree_Iterator;
        var Which_Child : Child_Index; 
      exports
        func First_Item(ref Root : Tree_Node) -> Tree_Iterator is
            // Return leftmost child of Root to start the iteration
            return Leftmost_Child
              ((Current_Node => Root, Enclosing_Node => null, Which_Child => 0));
        end func First_Item;

        func Leftmost_Child(It : Tree_Iterator) -> Tree_Iterator is
             // Return iterator designating leftmost child
             if Num_Children(It.Current_Node) == 0 then
                 // We are at the leftmost child already
                 return It;
             else
                 // Recurse down the left side
                 return
                   (Current_Node => It.Current_Node[1],
                    Enclosing_Node => It,
                    Which_Child => 1);
             end if;
        end func Leftmost_Child;

        func Next_Item(It : Tree_Iterator) -> optional Tree_Iterator is
            // Advance iterator to next item in left-to-right depth-first walk
            if It.Enclosing_Node is null then 
                // All done
                return null;
            elsif It.Which_Child < Num_Children(It.Enclosing_Node.Current_Node) then
                // Return next child of same enclosing node
                return Leftmost_Child
                  ((Current_Node => It.Enclosing_Node[It.Which_Child + 1],
                    Enclosing_Node => It.Enclosing_Node,
                    Which_Child => It.Which_Child + 1));
            else
                // Recurse to go to next sibling of enclosing node
                return Next_Item(It.Enclosing_Node);
            end if;
        end func Next_Item;
        ...
    end class Tree_Iterator;

This Tree_Iterator has a component that is a reference to a node in a tree, an optional Tree_Iterator for the enclosing node in the tree, and an indication of which child the current node is of the enclosing node.

By specifying simply "ref" rather than "ref var" or "ref const" for the Current_Node reference, we are saying that it is a reference to a variable when the enclosing Tree_Iterator object is a variable.

Any type that has a (constructor) function in its interface that takes a "ref" input and produces a new object of the type as an output is presumed to be burying the "ref" somewhere in the object. (Of course the compiler can just "look" in the class and see this, but a client of the module wants to know strictly from looking at the interface.) If the objects of a type incorporate a reference within them, then objects of that type do not permit assignment, since references in ParaSail are always required to be short-lived. The result of such a constructor function therefore can only be used to initialize a new object, be passed as a parameter, or be used to define the next value for a loop variable. The result of such a function cannot be assigned as the new value for an existing object.

Now that we are using "ref" explicitly for objects and components, it makes the syntax used for parameters, where "ref" comes after the parameter name and the ":", seem inconsistent. Hence we are changing the syntax for parameters to more closely match that of objects and components:

    operation_input ::= 
       id ':' type [ := default ]
     | 'var' id ':' type
     | 'ref' [ 'var' | 'const' ] id ':' type
     | 'global' [ 'var' ] id ':' type
     | [ 'locked' | 'queued' ] [ 'var' ] id ':' type 

   operation_output ::=
       [ id ':' ] type
     | 'ref' [ 'var' | 'const' ] [ id ':' ] type

For example:

    func Append(var L : List; E : Element);
    op "indexing"(ref V : Vector; I : Index) -> ref Element;
    func Get_Next(queued var Q : Queue) -> Element;

We will be posting a new draft of the ParaSail reference manual reflecting these changes shortly to the ParaSail google group:
http://groups.google.com/group/parasail-programming-language

Wednesday, July 13, 2011

Everything in a func?

Here is an issue which is trivial semantically, but potentially has a more substantial effect on the look and feel of the language. Currently, reflecting to some extent the Pascal heritage, ParaSail uses the terms "function," "procedure," and "operator" for the various kinds of operations. Functions must have at least one output, procedures may not have any outputs, and operators may have zero or more outputs, but have a name given in quotes and are used for defining operations that connect into special syntax or semantics, such as unary or binary operators, implicit conversions from universal types, etc.

As we go about creating more examples in ParaSail, and compare with various other recent languages, we find the use of these three distinct terms not particularly helpful, and certainly more verbose than most other languages. On the other hand, the terms do communicate something about the appropriate use of the operations.

We are now leaning toward replacing all three terms with the term "func". Our sense is this would give the syntax a somewhat less heavy, and perhaps more modern feel. We would retain the requirement to echo the term at the end of the operation, hence:

func GCD(X, Y : Integer) -> Integer is
    case X =? Y of
      [#less] => return GCD(X, Y mod X);
      [#equal] => return X;
      [#greater] => return GCD(X mod Y, Y);
    end case;
end func GCD;

func Print(F : ref var File; X : Integer) is
    if abs X >= 10 then
        Print(F, X/10);
    end if;
    Print(F, '0' + (X rem 10))
end func Print;

func "|"(X : Set; E : Elem) -> Set is
    return Union(X, Make_Set(E)); 
end func "|";

The above three "func"s would be "function GCD," "procedure Print," and operator "|" using the current ParaSail syntax. Given that we are already abbreviating "constant" to const and "variable" to var in ParaSail, it doesn't seem inconsistent to adopt a standard abbreviation here for function/procedure/operator, and func is one that appears in a number of languages.

Now this is an issue about which anyone and everyone probably has an opinion...

Thursday, July 7, 2011

Implementing concurrent loops in ParaSail

The implementation of the ParaSail compiler and virtual machine (see http://parasail-programming-language.blogspot.com/2010/11/virtual-machine-for-parasail-with.html) are proceeding apace. Basic support is there for interfaces, classes, operators, functions, procedures, module instantiations, object and type declarations, if, case, while, and assignment statements, user-defined indexing, parallel evaluation of expressions, parallel execution of statements on either side of a "||", etc. However, a big gap at the moment is for loops, and in particular concurrent for loops.

For loops are a bit unusual in ParaSail, as they support both a continue loop and an exit loop (i.e. "break") capability in the presence of parallelism. Here is an example using continue loop:

for X => Root while X not null loop
    case X.Kind of
      [#leaf] => Process(X.Data);
      [#unary] =>
          Process(X.Data);
        ||
          continue loop with X => X.Operand;
      [#binary] =>
          Process(X.Data);
        ||
          continue loop with X => X.Left;
        ||
          continue loop with X => X.Right;
    end case;
end loop;

Here is an example using exit loop:

const Result : Id_Type; 
for X => Root then X.Left || X.Right while X not null 
  concurrent loop
    if X.Value == Desired_Value then
        exit loop with Result => X.Id;
    end if;
end loop with Result => null;

This first example processes an expression tree, and creates parallelism by using an explicit continue loop statement in concert with a concurrent sequence ("||") construct. The second example searches a binary tree and creates parallelism by both the use of concurrent loop and by having two values in the then part of the for loop header, which will result in the execution of the loop splitting into two separate threads at each binary node.

The semantics of saying concurrent loop rather than simply loop is that the next iteration starts before even starting the current iteration. If we had said simply loop in the second example, then it would first check the Root node to see if it has the Desired_Value, and only if it did not would we proceed on to the next iteration(s), at which point we would check both the Left and Right subtrees (presuming both were non-null). Since we said concurrent loop, the iteration on the Left and Right subtrees begins in parallel with executing the loop body on the Root.

The semantics of the exit loop statement are that all other threads within the loop being exited are stopped, and when the thread first reaching the exit loop is the only thread remaining, the assignments specified by the with-values clause are performed. If the loop ends normally without an exit loop, then the assignments specified by the end with-values clause are performed. The net effect is that if the Desired_Value is in the tree, Result ends up with the Id of that node. If it is not found, Result ends up null.

See http://parasail-programming-language.blogspot.com/2010/06/generalized-for-loops-in-parasail.html and http://parasail-programming-language.blogspot.com/2010/06/intentional-race-condition-in-parasail.html for more details about the generalized for loop capabilities.

So given all of the above, what should be the implementation model for a for loop in ParaSail? As mentioned in the blog entry on the ParaSail virtual machine (PSVM -- see link in first paragraph above), the implementation uses pico-threads to represent the potential parallelism in the code. There are PSVM instructions for invoking a separate operation, or a local block, as a separate parallel pico-thread. Currently this is used to support parallel evaluation of expressions or concurrent sequences ("||"), where each operand of a call on an operation, or each side of "||", can be evaluated in its own pico-thread. In fact, the compiler is a bit smarter than that, and it doesn't bother to create a pico-thread if the computation of the operand involves only simple built-in operations (such as basic arithmetic, numeric comparisons, etc.). If it involves an out-of-line call then it becomes a candidate for being turned into a pico-thread (this is clearly an area for tuning of the compiler as more experience is gained).

So it makes sense for the body of a concurrent loop, or a loop that gains parallelism through multiple parallel "next" values or explicit uses of continue loop inside a concurrent sequence, to be executed as a separate pico-thread. The compiler can turn the loop body into a local operation, with the parameter(s) being the loop index (or indices). A continue loop is essentially a (tail) recursive call on this local loop-body operation, with the parameter being the next value specified in the with-values clause. Similarly, if there are explicit "next" values given after then in the loop header, these can be handled using (parallel) calls on the loop-body operation, with the "next" value as the parameter.

Rather than using tail recursion, an alternative is to use a parallel call followed by a return, with the pico-thread for the parallel call being associated with a pico-thread master that controls the entire loop. At the outer level, the whole loop can then be executed by creating the pico-thread master, and then starting the loop off with a parallel call on the loop-body operation, followed by waiting for the pico-thread master to be complete.

The overall flow would be:

Create pico-thread master.
Initiate parallel-call on loop-body operation with parameter referring to initial value of loop index, presuming initial value satisfies the while condition (if any).
Wait for pico-thread master to be complete.

At the end of an iteration of a loop, or at a continue loop statement, the flow would be:

Determine next value(s) for loop index.
Initiate parallel call(s) on loop-body operation with parameter referring to next value, presuming value satisfies the while condition (if any).
Return from the loop-body operation (don't wait for the call(s) to complete).

At the beginning of an iteration of a concurrent loop, the flow would be:

Determine next value(s) for loop index.
Initiate parallel call(s) on loop-body operation with parameter referring to next value, presuming value satisfies the while condition (if any).
Perform the loop body for the current value.
Return from the loop-body operation (don't wait for the call(s) to complete).

When reaching an exit loop statement, the flow would be:

Tell pico-thread master of loop to terminate all but the current thread.
Wait for other threads to terminate.
Perform with-values assignment(s).
Return from the loop-body operation.

When the master is complete:

If there is a with-values clause on the end loop and the master completed normally, as opposed to being terminated early by an exit loop statement:

Perform the assignment(s) specified by the end loop with-values clause.