Designing ParaSail, a new programming language: 2009

Wednesday, December 30, 2009

ParaSail Universal types in Annotations

As explained in the prior entry, ParaSail has four universal types, Univ_Integer, Univ_Real, Univ_Character, and Univ_String, with corresponding literals. The universal numeric types have the usual arithmetic operators, and the universal character and string types have the usual operations for concatenation and indexing. In some ways values of the universal types may be thought of as abstract mathematical objects, while the values of "normal" types are the typical concrete representations of such mathematical objects, with attendant limitations in range or precision. To connect a normal, concrete type to a corresponding universal type, the type defines the operator "from_univ" to convert from the universal type to the concrete type, and "to_univ" to convert back to the universal type.

In the prior entry, we learned that by defining "from_univ" a type can use the corresponding literal notation. For example, imagine an Integer interface:

interface Integer<First, Last : Univ_Integer> is
    operator "from_univ"
      (Univ : Univ_Integer {Univ in First .. Last})
      -> Integer;
    operator "to_univ"(Int : Integer)
      -> Result : Univ_Integer {Result in First .. Last};
    operator "+"(Left, Right : Integer) -> Integer;
    operator "-"(Left, Right : Integer) -> Integer;
    ...
end interface Integer;

Because the "from_univ" operator is defined from Univ_Integer, integer literals are effectively overloaded on any type produced by instantiating the Integer interface. So what about the "to_univ" operator? If "to_univ" is provided, then ParaSail allows the use of a special convert-to-universal notation "[[...]]", which would typically be used in annotations, such as:

{ [[Left]] + [[Right]] in First .. Last }

This is equivalent to:

{ "to_univ"(Left) + "to_univ"(Right) in First .. Last }

This notation is intended to be a nod to the double bracket notation used in denotational semantics to represent the denotation or abstract meaning of a program expression. Using this notation we can give a more complete definition of the semantics of the Integer interface:

interface Integer<First, Last : Univ_Integer> is
    operator "from_univ"
      (Univ : Univ_Integer {Univ in First .. Last})
      -> Integer;
    operator "to_univ"(Int : Integer)
      -> Result : Univ_Integer {Result in First .. Last};
    operator "+"(Left, Right : Integer 
          {[[Left]] + [[Right]] in First .. Last}) 
      -> Result : Integer 
          {[[Result]] == [[Left]] + [[Right]]};
    operator "-"(Left, Right : Integer 
          {[[Left]] - [[Right]] in First .. Last}) 
      -> Result : Integer 
          {[[Result]] == [[Left]] - [[Right]]};
    ...
end interface Integer;

Here we are defining the pre- and postconditions for the various operators by expressing them in terms of corresponding universal values, analogous to the way that denotational semantics defines the meaning of a program by defining a transformation from the concrete program notations to corresponding abstract mathematical objects.

Monday, December 28, 2009

ParaSail character, string, and numeric literals

Almost all programming languages these days have special syntax for character literals (generally using single quotes), string literals (generally using double quotes), and numeric literals (digits, plus a decimal point for floating point, and optionally some sort of radix indicator for octal, hex, etc.). Languages differ in how they distinguish among multiple string, character, or numeric types in the syntax for literals. ParaSail adopts the usual syntax for these literals, but treats them each as being of a particular universal type. Other "normal" types can provide conversions to and from these universal types, and thereby gain use of the corresponding literal notation.

The four basic kinds of literals in ParaSail, and their corresponding universal types, are as follows:

kind of literal	example	universal type
string literal	"this is a string literal"	Univ_String
character literal	'c'	Univ_Character
integer literal	42	Univ_Integer
real literal	3.14159	Univ_Real

The universal types can be used at run-time, but they are primarily intended for use with literals and in annotations. Univ_String corresponds to UTF-32, which is a sequence of 32-bit characters based on the ISO-10646/Unicode standard. Univ_Character corresponds to a single 32-bit ISO-10646/Unicode character (actually, only 31 bits are used). Univ_Integer is an "infinite" precision signed integer type. Univ_Real is an "infinite" precision signed rational type, with signed zeroes and signed infinities.

The universal numeric types have the normal four arithmetic operators, "+", "-", "*", "/". They both also have an exponentiation operator "**", with signed Univ_Integer exponents for Univ_Real and non-negative Univ_Integer exponents for Univ_Integer. Univ_Integer also has "mod" and "rem", corresponding to remainder operations, with "rem" being the remainder for "normal" truncate-toward-zero division, and "mod" being the remainder for truncate-toward-negative-infinity division.

Bitwise operators "and", "or", and "xor" are defined for non-negative Univ_Integers. "<<" and ">>" are for shifting Univ_Integers, with "<<" defined as equivalent to multiplying by the corresponding power of two, and ">>" defined as equivalent to dividing by the corresponding power of two, but with truncation toward negative infinity.

By providing conversions to and from a universal type, a "normal" type can support the use of the corresponding literal. These special conversion operations are declared as follows (these provide for integer literals):

operator "from_univ"(Univ : Univ_Integer) 
  -> My_Integer_Type;
operator "to_univ"(Int : My_Integer_Type) 
  -> Univ_Integer;

If an interface provides the operator "from_univ" converting from a given universal type to the type defined by the interface, then the corresponding literal is effectively overloaded on that type. The complementary operator "to_univ" is optional, but is useful in annotations to connect operations on a user-defined type back to the predefined operators on the universal types.

Annotations may be provided on the conversion operators to indicate the range of values that the conversion operators accept. So for a 32-bit integer type we might see the following:

interface Integer_32<> is
    operator "from_univ"
     (Univ : Univ_Integer {Univ in -2**31 .. +2**31-1}) 
      -> Integer_32;
    operator "to_univ"(Int : Integer_32) 
      -> Result: Univ_Integer {Result in -2**31 .. +2**31-1};
    ...
end interface Integer_32;

With these annotations it would be an error to write an integer literal in a context expecting an Integer_32 if it were outside the specified range.

Wednesday, December 16, 2009

ParaSail module details

ParaSail has two kinds of modules, interface modules and class modules. Every class module has a corresponding interface module. Every non-abstract interface module has at least one class module implementing it. All modules are effectively generic templates, in that they have formal module parameters which typically represent the types on which the module operates, or other module characteristics, represented by simple values. A type is produced by instantiating a module, which means specifying actual module parameters for each of the formal module parameters, or relying on the defaults associated with the formal parameters.

As indicated above, the typical formal module parameter is that of a type. Such a formal module parameter is of the form:

[type_identifier is] interface_name '<' actual_parameter_list '>'

for example:

Element is Assignable<>

String<>

If a type_name is omitted, then within the module, the interface_name is treated as a type name. If actual parameters are omitted, then any instance of the named interface is acceptable as the actual type. If one or more actual parameters are specified, then the actual type ultimately provided must have matching actual parameters, for those that are specified. Any actual parameters not specified are unrestricted.

Another kind of formal module parameter is a simple value:

value_identifier : type_name

value_identifier : interface_name '<' actual_parameter_list '>'
[ '{' constraint '}' ]

Simple values can be used to control selection among multiple classes that implement the same interface. A class may specify a constraint on one of its value parameters which is more restrictive than the constraint on the corresponding parameter for its interface. This class will only be used as the implementation if the actual value satisfies the given constraint. If the actual values given satisfy multiple classes implementing the same interface, then the class with the highest preference value is chosen. For example, an interface for a numeric type might have value parameters that specify the range or precision of numeric values that are to be represented by the numeric type. Generally the class that uses a faster or more compact representation would be preferred, so long as the specified range or precision is satisfied:

interface Float 
  <First : Univ_Real; Last : Univ_Real; 
   Digits : Univ_Integer := 6> 
is
    operator "+"(Left, Right : Float) -> Float;
    operator "-"(Left, Right : Float) -> Float;
    operator "=?"(Left, Right : Float) -> Ordering;
    ...
end interface Float;

Note that here we are presuming that Univ_Real and Univ_Integer are predefined, infinite precision numeric types, and Ordering is the predefined enumeration type returned by the comparison operator "=?" (see the blog entry on "Using the "=?" operator" for details).

The final kind of parameter is a function or procedure. This is intended to be a short-hand for constructing an interface with a single function or procedure in it, and then using that as a formal parameter. It might be used to specify a special comparison function to be used for sorting, when the desired sort should not use the normal comparison operator of the type. A parameter that represents a function or a procedure follows one of the following general patterns:

function function_identifier '(' formal_parameter_list ')' '->' formal_result
procedure procedure_identifier '(' formal_parameter_list ')'
operator operator_string '(' formal_parameter_list ')' [ '->' formal_result ]

Here is an example Sorting interface that has a formal module parameter that is a comparison function:

interface Sorting
   <Sortable is Array<>;
   operator "=?"(Left, Right : Sortable::Element) 
      -> Ordering>
is 
     procedure Sort(Arr : ref var Sortable);
     procedure Reverse_Sort(Arr : ref var Sortable);
end interface Sorting;

ParaSail end-of-scope operator

When an object goes out of scope in ParaSail it may require some amount of cleanup. For example, if the object provided access to some external resource, then when the object goes away, some release action on the external resource might be required. This is provided by the "end" operator. Here is an example of an interface for a Handle type, which will automatically release the associated resource at end of scope:

interface Handle<Filesystem<>> is
    function Create_Handle(
      FS : ref var Filesystem;
      Name : String) -> Handle;
    operator "end"(Obj : ref var Handle);
    ...
end interface Handle;

The compiler inserts a call on the "end" operator automatically at the end of the scope of an object. The compiler ensures that "end" is called no matter how the scope comes to an end. If an object is allocated from a region, then the "end" operator is invoked on the object immediately prior to deallocating the region. If an object has a value and is then assigned a completely new value, the "end" operator, if any, for its type is applied to the object prior to assigning it the new value.

In a composite object, the "end" operator is invoked on the enclosing object before invoking it on the component objects, so that the "end" operator of the enclosing object may safely refer to its components.

Wednesday, November 11, 2009

ParaSail concurrent interfaces

A ParaSail interface can be marked as concurrent, meaning that an object of a type produced by instantiating such a concurrent interface can be accessed concurrently by multiple parallel threads. We use the term concurrent type for a type produced by instantiating a concurrent interface, and concurrent object for an object of a concurrent type.

Both lock-based and lock-free synchronization techniques can be used to coordinate access to a concurrent object. In addition, operation queues are provided to hold operation requests that cannot be serviced until the concurrent object attains a desired state, as specified by the dequeue condition. When a thread initiates an operation request, if it can't be performed immediately because the condition is False, the thread is blocked until the request gets dequeued and the associated operation is completed, or the operation request is canceled for some reason, such as a timeout. In general, multiple operation requests may be queued simultaneously on behalf of a single thread, with the first one to be dequeued causing the automatic cancellation of the others.

With a lock-based concurrent interface, one of the operands of a procedure or function of the interface, if the operand is of the type defined by the interface, can be marked indicating it should be locked upon call. When calling the operation, the operand's lock will be acquired automatically upon entry to the operation, and the lock will be released upon return. Alternatively, one operand of a concurrent interface operation may be marked queued, meaning that the dequeue condition must be true before the operation will be performed, and when it is performed it will be locked automatically as well.

Within the concurrent class implementing a lock-based concurrent interface, the components of an operand marked locked or queued may be referenced directly, knowing that the access is single-threaded at that point. If there are other (non-locked/queued) operands of the associated concurrent type, their non-concurrent components may not be referenced by the operation. An operation with a queued operand has the further assurance that at the point when the operation starts, the dequeue condition is True. The dequeue condition for a given operation may depend on the values of the non-concurrent parameters, plus the internal state of the (concurrent) operand marked queued.

Within the concurrent class implementing a lock-free concurrent interface, all data components must themselves be concurrent objects, since no locking is performed on entry to the operations of the class. Typically these components will be of low-level concurrent types, such as a single memory cell that supports atomic load, atomic store, and atomic compare-and-swap (CAS), or a somewhat higher-level concurrent type that supports multi-word-compare-and-swap (MCAS). The ParaSail standard library provides a small number of such lower-level concurrent types to support implementing lock-free concurrent interfaces. The higher level queued operations are implemented using a lower-level race-free and lock-free concurrent type similar to a private semaphore, which can be used directly if desired.

Here is an example of a concurrent interface and corresponding concurrent class for implementing a simple bounded buffer:

concurrent interface Bounded_Buffer
  <Element_Type is Assignable<>;
   Index_Type is Integer<>> is
    function Create_Buffer(
      Max_In_Buffer : Index_Type {Max_In_Buffer > 0})
      -> Result : Bounded_Buffer;
      // Create buffer of given capacity

    procedure Put(Buffer : queued Bounded_Buffer; 
      Element : Element_Type);
      // Add element to bounded buffer; 
      // remain queued until there is room
      // in the buffer.

    function Get(Buffer : queued Bounded_Buffer)
      -> Element_Type;
      // Retrieve next element from bounded buffer;
      // remain queued until there is an element.
end interface Bounded_Buffer; 

concurrent class Bounded_Buffer
  <Element_Type is Assignable<>;
   Index_Type is Integer<>> is
    const Max_In_Buffer : Index_Type {Max_In_Buffer > 0};
    var Data : Array<Element_Type, Index_Type> := 
      Create_Array(1, Max_In_Buffer);
    var Next_Add : Index_Type 
      {Next_Add in 1..Max_In_Buffer} := 1;
    var Num_In_Buffer : Index_Type 
      {Num_In_Buffer in 0..Max_In_Buffer} := 0;
  exports
    function Create_Buffer(
      Max_In_Buffer : Index_Type {Max_In_Buffer > 0}) 
      -> Bounded_Buffer
      // Create buffer of given capacity
    is
        return (Max_In_Buffer => Max_In_Buffer);
    end function Create_Buffer;

    procedure Put(Buffer : queued Bounded_Buffer; 
      Element : Element_Type)  
      queued until 
        Buffer.Num_In_Buffer < Buffer.Max_In_Buffer 
      // Add element to bounded buffer;
      // remain queued until there is room
      // in the buffer.
    is
        Buffer.Data[Buffer.Next_Add] := Element; 
        // Advance to next element, cyclically.
        Buffer.Next_Add := 
         (Buffer.Next_Add mod Buffer.Max_In_Buffer) + 1;
        Buffer.Num_In_Buffer += 1;
    end procedure Put;

    function Get(Buffer : queued Bounded_Buffer)
      -> Element_Type
      queued until Buffer.Num_In_Buffer > 0 
      // Retrieve next element from bounded buffer;
      // remain queued until there is an element.
    is
         return Buffer.Data[
           ((Buffer.Next_Add - Buffer.Num_In_Buffer - 1)
             mod Buffer.Max_In_Buffer) + 1];
    end function Get;
end class Bounded_Buffer;

This concurrent interface has one constructor, plus two queued operations, Put and Get. The dequeue conditions are provided as part of the implementation of an operation with a queued operand. Here the dequeue conditions are Buffer.Num_In_Buffer < Buffer.Max_In_Buffer for Put, and Buffer.Num_In_Buffer > 0 for Get. These dequeue conditions ensure that when the operations are performed, they can proceed without an error.

An example of a lock-free concurrent interface will be given in a later post.

Monday, November 9, 2009

ParaSail region-based storage management

In ParaSail, heap storage is broken up into regions which are generally deallocated as a whole, rather than one element at a time. The concept of region-based storage management originated a decade or more ago, and was at the heart of the Cyclone programming language. The region-based storage management in ParaSail is a bit different, but is certainly inspired by all this earlier work.

To prevent dangling references, regions in ParaSail are reference counted. The reference counting is done at the region level rather than at the individual object level, since deallocation happens at the region-as-a-whole level. For a similar reason, rather than counting each individual pointer into the region, we (effectively) just keep track of the number of other regions that might contain objects pointing into the given region. Before storing a pointer from an object in region A to an object in region B, a check is made that region B is reachable from region A via one or more region-to-region "handles." The region-to-region handles form a reachability graph.

The programmer decides when to create a region-to-region handle, in anticipation of creating one or more object-to-object pointers. The operation for creating a region-to-region handle is "Allow_Reference(From_Region, To_Region)." After invoking this operation, objects in From_Region are allowed to point at objects in To_Region, or in other words, To_Region is reachable from From_Region. This cannot be revoked -- the region-to-region handles once created are immutable, until the From_Region goes away as a whole. Reachability is transitive, so after this call, anything reachable from the "To_Region" is now also reachable from the "From_Region."

Allow_Reference can create a loop in the reachability graph. Because the region-to-region handles are immutable, once such a loop is created, all regions in the strongly connected subgraph containing From_Region and To_Region will necessarily be deallocated simultaneously (their lifetimes will end at the same moment). From a reference-counting perspective, this means that the reference counting is really best done at the strongly-connected-subgraph level, and the possibly-cyclic region graph can be collapsed into a directed acyclic graph (DAG) between strongly-connected subgraphs of the original region graph.

In addition to region-to-region handles, there are also scope-to-region handles associated with local scopes of the program. To prevent pointers in local variables from ending up as dangling references, a pointer from a local variable to a region is only permitted if the region is reachable from the immediately enclosing program scope: A region is (directly) reachable from a given program scope if there is a scope-to-region handle in that scope targeting that region. Also, a region is reachable from a given scope if it is reachable from an enclosing scope. The transitivity of reachability completes the picture in the usual way, so that if region A is reachable from region B, and region B is reachable from the current scope, the region A is reachable from the current scope.

As with region-to-region handles, scope-to-region handles are immutable, so that once a region is reachable from a given scope, it will remain so until the scope is exited. The fundamental safety rule is that a region must not be deallocated if it is reachable from any still active scope of the program. Coupled with the requirement that local pointers may only point into reachable regions, this ensures that no local pointer can become a dangling reference. Similarly, a pointer in a region A that is reachable from an active scope, cannot become a dangling reference because it can only point at an object in a region B if B is reachable from A and, transitively, from the same active scope.

An alternative way of describing this notion of scope-to-region handles is that each scope has its own local region, and scope-to-region handles are actually region-to-region, the difference being that they go from a local region to a non-local, reference-counted region. Note that local regions would never be reference counted, since upon leaving the scope, they are always deallocated. For similar reasons, one would never allow a handle, nor a pointer from a non-local region to a local region, nor from the local region of an enclosing scope to a local region of a nested scope. So all in all, calling each scope a region may not actually simplify the model very much -- local regions would still have a number of special rules.

In addition to the two kinds of region handles we have described, there are two additional kinds of references that affect region lifetime -- strong references and weak references. Both of these are internally a pair of a region handle and a pointer to an individual object. The pointed-to object is guaranteed to be in a region reachable from the one designated by the handle.

The region handle in a strong reference is reference counted, meaning that as long as the strong reference exists, the designated region and any region reachable from it will not be deallocated. The big difference between the region handle in a strong reference and a normal region handle is that it is not immutable. A strong reference can be a variable, and may be reassigned to point at a different region/object. A normal region handle is immutable because it is "protecting" an arbitrary number of pointers that depend on it to prevent dangling references. By contrast, the handle in a strong reference is only "protecting" the single pointer with which it is bound, so there is no problem allowing it to be reassigned in conjunction with reassigning the pointer. To actually use a strong reference, the reference must be split into its constituent parts, producing a scope-to-region handle at the point of the use, along with a pointer protected by the scope-to-region handle in a local variable. Once the strong reference has been split out, the strong reference could then be reassigned to point elsewhere, or to point nowhere (i.e. set to null).

A weak reference is similar to a strong reference, except that it does not prevent the designated region from being deallocated. Whether the designated region has been deallocated is determined when the weak reference is split out. If the result of splitting the weak reference is a null region handle and a null pointer, then the designated region has at some point been deallocated, presumably due to the underlying storage manager running low on overall available storage.

A weak reference is useful for caching, by pointing to data that can be reconstructed when needed, but where reconstructing it is somewhat expensive -- for example, data coming from a file or a separate database, or data that is the result of a complex calculation. The weak reference can be put in a cache for recently used results, which is checked first when a result is needed. If a weak reference exists for the needed result, and after splitting turns out to be non-null, then the expense of reconstructing the result can be avoided. If is is null, then the data must be reconstructed.

More on regions in a later posting...

Tuesday, October 20, 2009

Using "=?" in ParaSail to implement "==", "!=", "<=", ">=", etc.

Here is a simple ParaSail feature. It is often the case that a user-defined type will have its own definition for equality and comparison. However, it is always a bit of an issue to make sure that "==" and "!=" are complements, and similarly for "<" and ">=", ">" and "<=". ParaSail skirts this issue by requiring the user to define only one operator, "=?", which returns a value from the enumeration Less, Equal, Greater, Unordered. The comparison and equality operators are defined in terms of "=?" (which is pronounced "compare"), in the natural way:

"==" -- Result of "=?" is Equal
"!=" -- Result of "=?" is not Equal
"<" -- Result of "=?" is Less
">" -- Result of "=?" is Greater
"<=" -- Result of "=?" is Less or Equal
">=" -- Result of "=?" is Greater or Equal

"Unordered" is used for types with only a partial ordering (or no ordering at all). "Ordered" is defined as the subtype containing only the values "Equal," "Less," and "Greater." The typical generic "Sort" operation would expect a "=?" operator whose result is guaranteed (via a postcondition) to be within the "Ordered" subtype.

The "=?" operator can be used explicitly if desired, which is often convenient when searching a binary tree:

    case Desired_Key =? Node.Key of
        Equal :   return Node;
        Less :    return Search(Node.Left, Desired_Key);
        Greater : return Search(Node.Right, Desired_Key);
    end case;

This would be another situation where the "=?" would be required to return an Ordered result, since we don't have an alternative that handles "Unordered."

Saturday, October 17, 2009

ParaSail's implicit parallelism

As mentioned in the original overview of ParaSail, implicit parallelism is at the heart of ParaSail (and of its name!). Every procedure/function call with multiple parameters involves implicit parallelism, in that all of the parameters are evaluated in parallel. Handoff semantics is used for writable parameters, meaning that a (non-concurrent) object that is writable by one parameter evaluation, isn't available to any other parameter evaluation. Operations on concurrent objects are not ordered.

Furthermore, in a sequence of statements, the only default limitation on parallelism is that imposed by operations on non-concurrent objects. Parallelism can be inhibited by using ";;" instead of merely ";" to separate two statements. This forces sequencing for operations on concurrent objects, which would otherwise not have any specified order. By contrast, "||" can be used to indicate that parallelism is expected, and it is an error if there are conflicting operations on non-concurrent objects in the statements on either side of the "||". That is, two statements separated by "||" are executed effectively in parallel, just as though there were embedded within two different parameter evaluations to a single procedure or function call.

For example:

    X := F(A, B);
    Y := G(B, C);;
    Z := H(R, S) || Z2 := Q(R, T);

As implied above by the formatting, "||" has highest precedence (binds most tightly), ";" next, and ";;" lowest (binds least tightly). So in the above, the assignments to X and Y are ordered only if that would be required by the shared use of B (at least one of them has B as a writable parameter). The assignments to Z and Z2 occur in parallel, and it would be an error if the shared use of R is unsafe (e.g. because R is writable by one or both, and is not a concurrent object). The assignments to X and Y will complete before beginning the assignments to Z and Z2 (though of course, a "very intelligent" compiler might still find parts that can be executed in parallel because they can't possibly affect one another). In all cases the evaluations of the parameters to a single procedure/function call happen in parallel.

Parentheses can be used to override the precedence for the statement separators:

    (M := N(A, B) ;; P := T(A, X)) ||
      (J := K(X, Y); M2 := N2(Y, Z));

In the above the assignment to M completes before the assignment to P starts. The assignments to J and M2 occur in parallel unless the shared use of Y makes that unsafe. And the assignments to M and P occur in parallel with the assignments to J and M2, and it is an error if there are conflicts due to shared use of X.

Note that ";" is both a statement separator and a statement terminator. On the other hand, ";;" and "||" are only meaningful as statement separators.

Implicit parallelism also shows up in ParaSail loops. The iterations of a for loop in ParaSail are executed as though each iteration were separated from the next by a ";", but with no particular ordering on the iterations. A particular ordering may be specified by using the forward or reverse keyword, which effectively puts a ";;" between the iterations. Alternatively, the concurrent keyword may be used to indicate that the iterations should be effectively separated by "||", in which case it is an error if there are any conflicts due to operations on a non-concurrent variable. For example:

    // compute sum of array A 
    for I in 1..10 loop
        Sum += A[I];
    end loop;

    // find last element equal to Z
    for I in 1..10 reverse loop
        if A[I] == Z then
             Last_Z := I;
             exit loop;
        end if;
    end loop;

    // sum the arrays B and C to produce A
    for I in 1..10 concurrent loop
        A[I] := B[I] + C[I];
    end loop;

Not too surprisingly, an exit loop statement is only permitted in a forward or reverse loop.

As illustrated above, parallelism is the default in ParaSail. It is our belief that that is the only way that programmers will learn to take full advantage of the multicore world. That is, in ParaSail, it is more work to create non-parallel algorithms than to create parallel ones, so programmers, being fundamentally a lazy lot, will end up writing mostly parallel algorithms in ParaSail.

We will cover ParaSail's structured synchronization mechanism (concurrent interfaces) in another post.

Monday, October 12, 2009

ParaSail has no global variables

ParaSail has no global variables. So how does that work? Global variables have long been recognized as trouble makers in the software world, and they get much worse in a language with lots of parallelism. But can we eliminate global variables completely? ParaSail tries to do that by eliminating any notion of statically allocated variables. In C or C++, global variables are those marked static or extern, or in a language with modules/packages like Ada, global variables are those declared at the module level. Although their declaration might be hidden, such variables still represent global, variable state, and still create trouble.

In ParaSail, interface modules are generic types, and class modules implement interfaces. That is, after providing generic parameters to an interface, you end up with a type. Any variable declared in a class is what is often called an instance variable, that is, there is one per object of the type defined by the class. There is nothing corresponding to what are called class variables. In other words, the variables declared in a class will always be effectively fields/components of some enclosing object. You can of course also have local variables inside the body of a procedure or function, but these are never statically allocated; they are stack-resident variables only (automatic in C parlance).

So what do we have instead? In many cases, we simply substitute local variables declared in some relatively high-level procedure or function body, which are then passed (by reference) to other operations as parameters. In ParaSail, passing a variable as a parameter generally has handoff semantics, meaning that it can't be simultaneously passed to multiple operations that might execute in parallel, nor can it be passed twice to the same operation as two different parameters (thereby eliminating certain kinds of aliasing problems).

When it is necessary to have a data structure that can be accessible to multiple operations running in parallel, the data structure must be explicitly designed for concurrent access. In other words it must be a concurrent data structure. In ParaSail, an interface may be specified as a concurrent interface, and then all objects of a type defined by the interface support concurrent access.

Containers represent a kind of middle ground, where the enclosing container might be designed for concurrent access, but the individual elements might not be, and so handoff semantics would again apply to the individual elements.

There may be some concurrent interfaces that typically have exactly one object; that is they are typically singleton interfaces. For example, the file system, or the database manager, or the system clock. ParaSail provides a conceptual global singleton Context interface for these, whose singleton object has one component for each such singleton interface, identified using the name of the interface. The Context must appear as a parameter for operations that ultimately need access to one of these singleton interfaces, signifying that the operation has potential side effects on these singleton objects. It would always be preferable to pass in the actual singleton objects needed, but as a convenience, the Context pseudo-object is available.

The Context pseudo-object is provided as a parameter to the main entry point of a ParaSail application, so that it can gain access to these singletons and pass them on to callers. If an operation has a Context parameter, and its caller also has a Context parameter, then the caller's parameter is passed automatically to the operation if not specified explicitly. The Context parameter may be read-only ("in") or read-write ("in-out"/"var"). Not surprisingly, the actual Context parameter must be read-write if the formal is read-write.

Saturday, October 3, 2009

ParaSail Topics to come

The prior entries gave an overview of ParaSail. Over the next few weeks I hope to write additional entries about:

ParaSail has no global variables, so how does that work?
how iterators, streams and for-loops are interrelated in ParaSail;
using a single compare primitive ("=?") in ParaSail to implement "==", "!=", "<=", ">=", etc.
ParaSail's implicit parallelism and structured synchronization;
ParaSail private interfaces and interface namespace hierarchy;
ParaSail's use of region-based storage management rather than object-at-a-time garbage collection;
ParaSail's use of compile-time checks (mostly), run-time checks (goal is to never need them), and exceptions (only when complexity or race-conditions make them necessary);
how attributes like taintedness (i.e. might this data come from a malicious source?) or physical units (e.g. meters/second, years, ...) can be "layered" onto existing types, along with appropriate pre- and postconditions, to provide additional security, safety, or correctness checking. Add-on attributes can also be associated with statements or other ParaSail constructs, such as an attribute indicating whether a given statement or operation is potentially blocking (that is, it might wait indefinitely on some queue or some other synchronization structure), or the total amount of stack space required by a given statement or operation.

So stay tuned. If you as a reader have other topics you would be interested in seeing further elaborated, please post a comment to that effect.

Thursday, October 1, 2009

ParaSail pass-by-reference

In ParaSail, operations like array indexing generally take and return a ref. What exactly is a ref? The intent is that it is roughly an address, but we would also like to use them as representing an arbitrary place in a more general container-like structure (such as a hash table). And we might want to use them to query whether an object is present, without actually inserting into the container as a side-effect. So it might be useful to think of a ref as a special kind of object with three operations:

fetch contents
store contents
check if present/exists/is-initialized

One could define the "Ref" interface roughly as follows:

  abstract interface Ref<Target_Type is Any<>> 
  is
      function Exists(R : Ref) -> Boolean;
      function Fetch(R : Ref {Exists(R)}) -> Target_Type;
      procedure Store(R : Ref; 
        New_Contents : Target_Type) {Exists(R)};
  end interface Ref;

A ref is sort of a place-holder, or perhaps a prescription for where to store an object, without necessarily making room for the object until it is actually stored. For a hash-table like container, a ref to one of its elements might be the key value, which isn't actually inserted into the hash-table until the Store occurs. One could imagine a hashed-mapping or a sparse array where you specify a "default" value for elements that have never been stored, and for such a structure, Exists would always return True, and Fetch would return the default value if no prior Store into the same ref had occurred. And conceivably Store would be a no-op if the default value is being stored and no prior non-default value had been stored.

Another characteristic of a ref is whether the target object can be written, or only read. That is, a read-only Ref would not have the Store operation. It is desirable if the read-only-ness of a ref automatically flows through a function that takes the ref as a parameter. That is, if the parameter to a function is a read-only ref, then clearly if the function returns a ref, then that returned ref should also be considered read-only.

This flow-through is common in many programming languages, where if an array is a constant, then so are its elements. It is desirable not to have to define two versions of operator "[]", one for read-only refs, and one for writable refs, since presumably they will have an essentially identical implementation. This argues for using ref as a parameter only when this flow-through semantics is desired, and presumably there will be an output that is also a ref. The read-only-ness of the result ref would be determined from that of the ref parameter(s), and only if all of the ref parameters are writable would the result ref be writable.

If we don't want this flow-through, and instead really want to specify that a parameter be writable, then we might as well use something more explicit such as "var" or "in out" for such a parameter mode.

Rather than building in the flow-through rule, we could instead add another query into our hypothetical Ref interface, namely Is_Writable, and then use appropriate pre- and postconditions to constrain how writability flows through. E.g. the postcondition for operator "[]" would be {Is_Writable(Result) = Is_Writable(Arr)}. Clearly also the precondition of the Store procedure would be {Is_Writable(R)}. This approach sounds more flexible, but it may be that we want the above flow-through rules as the default in the absence of some other specification.

Finally, we might want some default rules about whether Exists() should be True for a given ref parameter or result. One would think that in general one would expect Exists(X) to be true if X is an incoming ref parameter, but one would not necessarily expect Exists(Result) to be true if Result is a ref result of an operation like "[]". That is, the container should exist if you are indexing into it, but the selected element of the container doesn't necessarily exist just because you have used its key to index into the container.

Note that this general notion of Ref as an object in its own right would support object persistence fairly naturally, in that the Ref could be some kind of disk address, and not until you actually invoked Fetch or Store would you actually need to be sure that the relevant page of data is in memory. These Refs are analogous to the "smart pointers" of C++, but they have the added advantage that they automatically have limited lifetimes (like "regular" C++ references), so they won't outlive the container into which they refer.

When inside a function, the non-ref result(s) of the function are in fact probably represented as refs provided by the callers, indicating where the results should be stored. Such a ref would want to carry information about what region (in the region-based-storage-management sense) the result should be stored/built, so region can be thought of as another thing that is associated with a ref. These are sort of write-only refs, or at least ones where the function wouldn't want to presume that Fetch is well-defined before performing at least one Store via the ref. That is, Exists() would quite likely be False on entry to the function for each of the refs representing its results (aka "out" parameters).

Note that a general Ref object could also be used to represent an element of a packed array, where Fetch and Store would do the right thing in terms of unpacking/packing. Of course this is getting dangerously close to pass-by-name of Algol 60 fame, and clearly sounds like it would require a thunk or equivalent. But a (static or dynamic) polymorphic object and a thunk are not that much different anyway. The expectation would be that the types of these Ref objects would often be statically known, so a compiler could choose to inline as appropriate to achieve the desired level of performance.

We might want to generalize Ref objects further by adding Create_Referenced_Object and Delete_Referenced_Object operations, since we already have an operation for testing whether the referenced object Exists. This would make sense for a Ref to an element of a hash table, for example. Essentially for any Ref where Exists might return False, it might make sense to be able to Create or Delete the referenced object. In the above discussion, "Store" is presumed to create the referenced object if necessary, which is perhaps adequate for our purposes. But Delete would be useful as a way to restore the original state where Exists is False. This would only make sense for Refs where Exists can return False.

Wednesday, September 30, 2009

ParaSail extension, inheritance, and polymorphism

ParaSail fully supports object-oriented programming, including interface and class extension/inheritance/reuse, and both static (compile-time) and dynamic (run-time) polymorphism.

Interface extension is the most straightforward. One interface can be defined to extend another, meaning that it inherits all of the operations and generic parameters of some existing interface, and optionally adds more of each:

  interface Extensible_Array extends Array  is
     operator "[]"(Arr : ref Extensible_Array;
        Index : Index_Type {Index >= First(Arr)})
         -> ref Element_Type {Last(Arr) >= Index};
  end interface Extensible_Array;

Here we have essentially the same operation, but now the array automatically grows (at only one end) if the indexing operation is applied to it with an index that is greater than the prior value of Last(Arr). Note that the generic parameters for Extensible_Array are all inherited as is from Array.

The operations that are not redeclared in the new interface are inherited from Array, but with each occurrence of Array replaced systematically with Extensible_Array.

Unless Extensible_Array is declared as an abstract interface, there must be a corresponding Extensible_Array class that provides its default implementation. The Extensible_Array class might be defined as an extension of the Array interface (and indirectly its associated class), but it need not be. It could be a completely different implementation. Here is what it would look like if it were an extension of the Array interface.

  class Extensible_Array extends Parent: Array
  is
    ... 

  exports 
    operator "[]"(...) -> ... is ... end "[]"; 
    function Create_Array(...) -> Extensible_Array is ... end Create_Array;
  end class Extensible_Array;

Local operations and objects could be defined as needed, and exported operations could be redefined as appropriate. An object of the interface being extended (the underlying interface) is created as a local variable of the class (this variable is the underlying object). A name can be given to the underlying object -- we use "Parent" above.

Note that the class is extending an interface, rather than directly another class, and in fact any implementation of the interface can later be optionally specified when using Extensible_Array. Effectively the actual class to use as the implementation of the underlying interface becomes another optional generic parameter. If unspecified, it defaults to the default implementation of the underlying interface.

For those operations of the interface that are not provided explicitly in the class, an inherited operation is provided, based on the corresponding operation of the underlying interface. The implementation of this inherited operation invokes the corresponding operation of the Array interface, passing it the underlying Array object of each operand of type Extended_Array.

An exported operation that has any results of the type of the extended interface must always be explicitly defined, because the corresponding operation of the underlying interface returns the wrong type of object (that of the underlying interface), and there is no implicitly provided mechanism for creating an object of the extended interface given an object of the underlying interface. Hence we are obliged to explicitly define Create_Array, because in the underlying interface it returned a result of type Array, while to implement the extended interface it has to return a result of type Extended_Array. Of course inside the new Create_Array we could (and probably would) call the Create_Array operation of the underlying interface to create the underlying object, and then presumably take additional actions needed to finish creating an Extended_Array.

Although an interface may extend any number of other interfaces, a class can only extend one interface. Of course it may have local variables of many different interfaces, but only one of them is the underlying object, and only inherited operations corresponding to operations on the underlying interface will be automatically provided.

One important thing to note about how ParaSail implementation inheritance works. It is a completely black box. Each implicitly provided inherited operation calls the corresponding operation of the underlying interface, passing it the underlying objects of any operands of the type being defined. The underlying interface operation operates only on the underlying object(s), having no knowledge that it was called "on behalf" of some extended class. If the underlying operation calls other operations, they too are operating only on the underlying object(s). There is no "redispatch" on these internal calls, so the extended class can treat these underlying operations as black boxes, and not worry that if it explicitly defines some operations while inheriting others, that that might somehow interact badly with how the underlying operations are implemented.

Which brings us to polymorphism. Static, compile-time polymorphism has already been illustrated, through the use of generic parameters, and the name resolution rules. We also see it here in that the particular class implementing the underlying interface for an extended class can also vary depending on the particular instantiation of the extended class. Although we didn't mention the syntax for that, it makes sense for it to use the extends keyword as well at the point when we are providing the actual generic parameters:

    var XArr : Extended_Array 
       extends Sparse_Array := Create_Array(...);

Now what about dynamic, run-time polymorphism? When do we actually need it? A classic example is that of a heterogeneous tree structure, where all of the nodes in the tree are extensions of some basic Tree_Node type, but each is specialized to carry one particular kind of information or another. Static polymorphism is great for creating general purpose algorithms and data structures, but they are inevitably homogeneous to some degree. The algorithm manipulates numbers all of the same precision, or the data structure holds elements all of the same type. With dynamic polymorphism, heterogeneity rules.

In ParaSail, dynamic polymorphism is indicated by appending a "+" to a type name. What this means is that the corresponding object, parameter, or result might be of a type coming from any extension of the given type's interface, with the proviso that the generic parameters inherited from this original interface have the same bindings for the two types. That is important because, without that, the operations shared between the two types would not necessarily take the same types of parameters.

To illustrate, presume we give a name to the student-array type we have been using in some of the above examples:

  subtype Student_Array is Array<String, Student_ID>;

Now if we indicate a parameter of type "Student_Array+" then we can take parameters of type Student_Array but also objects of type "Extended_Array<String, Student_Id>" and "Sparse_Array<String, Student_ID>". Similarly, if we indicate a result of "Student_Array+", then the object returned might be of type Student_Array, or it might be of any other "similar" type. A Student_Array "factory" might typically specify its result type as "Student_Array+" to allow run-time flexibility in choosing which particular type of array-like structure to return.

So now we have a polymorphic Student_Array+ object, say "SA_Plus," what can we do with it? We can of course pass it to operations that take a parameter of type Student_Array+. But what about all those nice operations of Array? They take operations of type Array, not Array+. What happens if we try to pass SA_Plus to one of those, say "First(SA_Plus)" or "SA_Plus[Joes_Id]"? This is where dynamic, run-time polymorphism kicks in, and we go to the "appropriate" implementation of "First" or operator "[]", based on a run-time type identification carried around with objects of a polymorphic type.

If we happen to have a binary operation in our interface Array, for example Merge, which is defined to take two Array operands, and somehow combine them into a result Array, how does that work? The requirement is that either all or none of the operands are polymorphic, and if all are polymorphic, then they must have the same run-time type identification (henceforth "type-id"), and the result will also be polymorphic but will be guaranteed to similarly have that same type-id. Since all of the operands are required to have the same type-id, that type-id determines which particular implementation of the operation is actually invoked at run-time.

All of this only applies to operations declared in the interface defining the root type of the polymorphic type (Array in this example). Operations declared elsewhere that take some particular array type have no special semantics. But that makes sense because only the operations defined in the root interface are guaranteed to be present in all of the root interface's extensions.

What about operations that return an object of the enclosing interface type, but don't take any operands of the type? This would be typical for a constructor function, such as Create_Array()->Array, or for something resembling an enumeration literal like "Red" for a color type. Can such an operation be called in a "polymorphic" fashion? Yes, if it is used as an operand in a call and some other operand in the call is polymorphic, then the rule that requires all such operands to have the same type-id is used as a kind of self-fulfilling prophecy: the type-id to use for the call on Create_Array is that of the other operand. So if we use our hypothetical binary operation Merge and one of the operands is our polymorphic object SA_Plus, and the other is Create_Array(...), then we call the Create_Array associated with the type-id we get from SA_Plus, and then proceed to call Merge on the SA_Plus and the result of Create_Array, confident in the knowledge that the two operands will now have the same type-id.

This approach ensures that when you get into the implementation of a binary operation, the operands have the same type, and it is the type of the interface whose operation is actually being invoked.

That's probably enough for now!

Monday, September 28, 2009

Resolving names in ParaSail

In ParaSail, the names exported by an interface can in most cases be used without any explicit qualification of the interface from which they came. Hence, continuing with our Array interface example, we can generally write "First(Arr)" without having to specify which "First" function we mean, presuming we know the type of Arr. If qualification is necessary, the "::" notation of C++ is used, such as "Array::First(Arr)". ParaSail reserves "." for selecting a component of a composite object, such as "Arr.First".

ParaSail's name resolution works roughly as follows: Going bottom up, if you have a simple name (not used as the name of an operation being called), and that name is declared in the current scope, you use that interpretation. If you have an operation call, you determine the types of as many parameters as you can, and then look in the interfaces defining each of those types, for operations of the given name. If you find one or more that "work," then you consider all of those possibilities. If there are some operands which have no interpretations yet (perhaps because the operand is actually a parameterless call or an equivalent interface constant), then the type expected for that operand is used to help resolve the operand. If you get to the "top" and you have exactly one interpretation that "works," you use that one. If you have no interpretations, or you have more than one, the construct cannot be resolved, and an error is indicated. This sort of overload resolution sounds more complicated than it typically ends up being, because most constructs are relatively simple.

How a ParaSail class implements its interface

In ParaSail, a class implements an interface. Unless an interface is marked as abstract, there is required to be a class with the same name as the interface, which provides the default implementation of the interface. There may be multiple implementations of an interface. Some may be specializations of the default. Specializations specify certain restrictions on the generic parameters, along with a preference level, which allows the compiler to choose the appropriate implementation automatically if not specified explicitly.

Here is a class implementing the interface Array shown in an earlier post:

 class Array < Element_Type is Assignable<>;
                    Index_Type is Discrete<>>
 is 
   const First : Index_Type;
   const Last : Index_Type;
   function Length(Arr : Array) return Integer is
     return Last-First+1;
   end Length;
   var Data : Length(Array)*Element_Type;    
 exports  
   function First(Arr : Array) -> Index_Type is
     return Arr.First;
   end First;
   function Last(Arr : Array) -> Index_Type is
     return Arr.Last;
   end Last;
   operator "[]"(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type is
     return Data[Index-First];
   end "[]";
   function Create_Array(First, Last : Index_Type)
       -> Result:Array
           {First(Result) = First & Last(Result) = Last}
   is
     return (First => First, Last => Last, Data => ());
   end Create_Array ;
   ...
 end class Array;

A class must include implementations for each operation in its interface, plus any number of local declarations as necessary to implement the exported operations. The exported declarations inside a class follow the word exports. Declarations preceding exports are local to the class. Even though the interface associated with the class implies what operations are exported, we make the distinction explicit in the class itself, since changing the specification of an exported declaration has a much larger implication than does changing that of a local declaration. Also, by segregating the exported declarations at the end, we make it easier to find them. Finally, because a class might implement interfaces other than its own, the exported declarations allow it to fulfill the set of operations of other interfaces. C uses "extern" vs. "static" to make a similar distinction. Java uses "public" vs. "private".

Note that the above example is being pretty fast and loose with syntax and semantics. But at this point, ParaSail's design isn't complete yet, so any examples should be taken with a grain of salt. Nevertheless, it seems useful to try to produce examples. It is guaranteed that the examples won't all be consistent with one another, since the syntax and semantics will continue to evolve during this design process. And part of the reason for producing examples is to try out the evolving ideas, and see how they look. One good concrete example can often shatter or confirm many days of abstract philosophizing.

More on ParaSail interfaces

Here is the ParaSail example of an Array interface from the last installment:

 interface Array < Element_Type is Assignable<>;
                    Index_Type is Discrete<>>
 is
   function First(Arr : Array) -> Index_Type;
   function Last(Arr : Array) -> Index_Type;
   function Element(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type;
   function Create_Array(First, Last : Index_Type)
       -> Result:Array
           {First(Result) = First & Last(Result) = Last};
   ...
 end interface Array;

To make indexing into an Array look more familiar, we might want to use the "[]" operator instead of the function Element:

  operator "[]"(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type;

Now we could use this interface roughly as follows:

   var Names: Array<String, Student_ID> := 
     Create_Array(First => First_Student_ID,
                  Last => Last_Student_ID);
   Names[ID_Of_Mike] := "Mike";

This kind of thing should look pretty familiar.

A few things to note: There is no separate notion of a constructor. A function that returns an object of the type defined by the enclosing interface is effectively a constructor. A function that returns a ref is not a constructor, but may be used on the left-hand side of an assignment, provided the ref parameter(s) to the function can be used on the left-hand side. Parameters are by default read-only, but ref parameters can be updated. The result of a function is specified after a "->". If the function has multiple results, they are enclosed in parentheses in the same way as the (input) parameters:

  function Two_Outputs(X : Input_Type) -> 
         (Y : Output_Type1; Z : Output_Type2);

Once we start talking about statements, we will see that in ParaSail, as in C++ and Java, declarations and executable statements may be interspersed. All declarations start with a reserved word, such as var, const, interface, subtype, etc. so they are easy to distinguish from assignment statements, procedure calls, etc.

Next time we will discuss how a class implements an interface.

Modules in ParaSail

Since ParaSail doesn't really exist at this point, I should probably write everything in the future tense. But instead I will use what my friends call the software present tense which is used to describe how it ought to work and hopefully will work someday, if that day ever comes...

Another thing I am somewhat uncomfortable with is using the first person ("I") so much in this description, so I will probably use "ParaSail" as the subject rather than "I" in many sentences. "We" seems a bit presumptious, though in a scholarly paper seems better than "I", even for a solo researcher. But in a programming language description, one really hopes that the language quickly comes to exist independently of any creator or creators. So (more than) 'nough said. On to the technical content:

ParaSail has only two kinds of modules: interface modules and class modules. A class module implements one or more interface modules. Perhaps a class ought to be called an implementation module, but that is an awfully long word, and in many ways a class module matches what languages like Simula and C++ and Java and C# and Eiffel call a class. In general ParaSail tries to use familiar terminology in familiar ways. Of course there is danger of confusion, but hopefully the benefits of familiarity outweigh the dangers of confusion.

Both interface and class modules are parameterized, essentially generic templates, as used in Ada, C++, Java, etc. Because of their wide-spread adoption, ParaSail uses "<...>" for generic parameters. While we are talking about notation, ParaSail uses "(...)" for normal function/procedure parameters, uses "[...]" for selecting from an array or similar container. ParaSail uses words for bracketing control flow (e.g. "if ... then ... else ... end if;"). ParaSail uses "{...}" for annotations such as constraints, assertions, pre/post-conditions, etc. This is intended to be familiar from Hoare logic, which uses the notation "{P} S {Q}" to represent a statement S with precondition P and postcondition Q.

Interface modules have the following basic syntax:

    interface name < generic_parameters >
    is
        interface_items     
    end interface name;

The interface_items comprise a sequence of function, procedure, and nested interface specifications. In addition constants and variables may be declared, but these are considered equivalent to corresponding functions. A constant is equivalent to a parameterless function returning an object of the constant's type. A variable is equivalent to a function with one by-reference parameter of the enclosing interface type, returning a by-reference object of the variable's type. In addition, operators may be declared. Operators are essentially functions with special syntax for calling them.

As subtly implied above, an interface defines a type, and within the interface module, the interface's name represents that type. A type is essentially an interface that has been bound to a set of actual generic parameters. There are also subtypes in ParaSail, which are types with some additional constraints (such as a range limitation), specified inside "{...}".

Generic parameters are themselves either types derived from specified interfaces, or static constants used to parameterize the implementation of the interface, and/or select among different implementations of the same interface. A generic array interface, with its generic parameter list, might look like this:

 interface Array < Element_Type is Assignable<>;
                    Index_Type is Discrete<>>
 is
   function First(Arr : Array) -> Index_Type;
   function Last(Arr : Array) -> Index_Type;
   function Element(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type;
   function Create_Array(First, Last : Index_Type)
       -> Result:Array
           {First(Result) = First & Last(Result) = Last};
   ...
 end interface Array;

Note that there is no implicit parameter in ParaSail (identified as this or self in many object-oriented languages). All parameters are explicit. More about this example in the next installment.

Saturday, September 26, 2009

ParaSail language themes and philosophy

So what will make ParaSail an interesting programming language? What is the philosophy behind the language? ParaSail tries to minimize implicit operations, implicit parameters, implicit dynamic binding (virtual function calls), implicit initializations, implicit conversions, etc. This is both in the name of clarity for the human reader, and in the name of formal testability and verifiability. ParaSail uses a small number of concepts to represent all of the various composition mechanisms such as records, packages, classes, modules, templates, capsules, structures, etc. Arrays and more general containers are treated uniformly.

On the other hand, ParaSail allows many things to proceed in parallel by default, effectively inserting implicit parallelism everywhere. Parameter evaluation is logically performed in parallel. The language disallows uses that would make the result depend on the order or concurrency of parameter evaluation. The iterations of a for loop are by default executed in parallel. Explicit ordering must be specified if it is required by the algorithm. Even sequential statements are essentially converted into a data-flow based DAG which is then evaluated in parallel in so far as possible. In all cases, the language disallows code that could result in race conditions due to inadequately synchronized access to shared data, either by using per-thread data, structured safe synchronization, or a handoff semantics (similar to that of linear types, distributed languages like Hermes, or the UVM virtual memory system).

Of course much of this implicit parallelism is possible in pure functional languages, and ParaSail will support a functional programming style where it works naturally. Unfortunately, doing some relatively straightforward things in pure functional languages can require mind-bending (for me) concepts like monads, while a good old assignment statement is something that most programmers understand. With apologies to the political world, ParaSail will try to make assignments statements safe, legal, and rare.

Why blog about designing a programming language?

So why did I decide to start this blog? Designing a new language is a long process, but it is hard to find someone who is willing to sit down and discuss it. Those who like to design languages generally have their own strong biases, and the discussions tend to be more distracting than satisfying, because there are so many good answers to any interesting language design question. Those who don't have any interest in designing a new language tend to lack the patience to even talk about it, as they believe we already have more than enough programming languages, and all we need are better tools, better processes, and better training to be more productive and to achieve higher quality.

So writing a blog seems like a nice way to record the process, to perhaps get some feedback (though that may be optimistic), and to hopefully make progress by being forced to actually get the ideas down onto "paper."

That's probably enough meta-discussion for now. In the next post I plan to dive into the technical issues.

Friday, September 25, 2009

Why design a new programming language?

So why would anyone want to design a new programming language? For some of us who have the bug, it is the ultimate design project. Imagine actually creating the language in which you can express yourself. But there is another reason. I have been in the software business for over 40 years, and despite everything that might have been said to the contrary, I still believe that a well-designed programming language can result in more productive programmers building higher quality software. In the particular area of high-integrity software, including both safety-critical software and high-security software, there is all the more reason to use the very best programming language you can, because the problems you are trying to solve and the level of quality required is at the very limits of what can be accomplished.

This new language is meant to address the goals of producing inherently safe and secure software, while taking advantage of the wider availability of true parallel processing in the form of multi-core chips. It is intended to promote a formal approach to software, where the program text includes pre- and postconditions, liberal use of assertions and invariants, etc., with tool-supported proof of correctness with respect to the formal annotations.

The language is tentatively named ParaSail, for Parallel Specification and Implementation Language. I would have spelled it "ParaSAIL" but for the danger of confusion with the original Stanford AI Language, "SAIL," and its more modern follow-on "MAINSAIL" (for Machine Independent SAIL). I don't mind making the connection with SAIL, as it was a very interesting language in its day, and MAINSAIL remains worth a look today. ParaSail is a completely new language, but it steals liberally from other programming languages, including the ML series (SML, CAML, OCAML, etc.), the Algol/Pascal family (Algol, Pascal, Ada, Modula, Eiffel, Oberon, etc.), the C family (C, C++, Java, C#), and the region-based languages (especially Cyclone). Perhaps one significant deviation from the excellent baseline established by SAIL, ML, Eiffel, Java, etc. is that ParaSail is intended to avoid "fine-granule" garbage collection in favor of stack and region-based storage management.

So why a blog? I guess I'll leave that to the next post.

Designing ParaSail, a new programming language