Monday, July 19, 2010

Pointer-free primitives for ParaSail

Having pretty much settled on eliminating explicit pointers from ParaSail (largely to simplify compile-time alias analysis and race-condition elimination), the question is: what sorts of primitive data structures are needed in ParaSail to allow the user to construct essentially arbitrary abstractions on top of the primitives without using explicit pointers? The ParaSail interface/class module already provides a basic kind of basic record or struct capability.  As mentioned in an earlier posting, we have also suggested the use of the keyword optional to signify that an object might or might not have a value of its specified type, with its default being a null value.  Internally this could be implemented either with an extra bit to signify null-ness, or with an implicit level of indirection with a null pointer used to indicate null-ness (some indirection is clearly necessary for a recursive data structure). 

By using the optional keyword and the basic record capability provided by a class, the user can quite easily construct structures such as linked lists and binary trees.  However, more complex structures will have to end up falling back on a generalized notion of container with pointers replaced by the use of generalized indexing.  For example, a directed graph structure can be represented by a vector of nodes with the edges between the nodes being represented by indices into the vector.  To some extent this is going back to a Fortran array-oriented view of the world.  However, with the arrays being extensible, and the elements of the arrays being arbitrarily complex objects, which themselves are potentially optional and extensible, there is significantly more flexibility available to the programmer than what the original Fortran arrays provided.

As a first stab at what primitive container types are needed, it would seem that a basic array type, indexed by integers, and extensible only by whole-array assignment with a longer or shorter array, might be adequate.  The elements of the array would be of an arbitrary type, including user-defined types.  The whole-array assignment would deal with whatever storage management is necessary to reuse or reclaim the prior value of the object being assigned.  The compiler would attempt to arrange to have the new value be created in the storage region associated with the object to which it will be assigned, so as to avoid having to physically move the new value into that storage region as part of the assignment (see earlier post on region-based storage management).  For example, given:
    var X : Basic_Array<...> := Build_Array(Length => 10);
    ...
    X := Extend_Array(X, New_Length => 20);
the compiler would arrange for the calls on Build_Array and Extend_Array to create their resulting array in a storage region appropriate for X, so that the assignment need not move the created array into that region.  Essentially, as part of being called, a function would receive information about the region in which its result(s) should be allocated.  This information would be propagated as necessary to try to minimize the unnecessary movement of data between storage regions as part of an assignment operation.

It would seem that with this combination of the record-like capability provided by classes, the notion of optional, and something like this Basic_Array type extensible only by whole-array assignment, we have all of the building blocks needed to create arbitrary data structures in ParaSail without the use of pointers.

3 comments:

  1. Hi. Could you give an example in which pointers lead to race-condition and pointers elimination lead to race-condition elimination?

    ReplyDelete
  2. To detect race conditions at compile-time, it is essential to know whether two object references might refer to the same object. If you have a tree represented with arbitrary pointers, it is very difficult for the compiler to prove that the structure is in fact "tree"-ish, so that if threads are operating on different parts of the tree, they won't "run into" each other. On the other hand, if you represent the branches of the tree as sub-objects of the enclosing node, it is trivial to prove that two threads operating in separate sub-objects can never run into each other and end up in a race condition.

    Of course you could adopt a very conservative race-condition detection algorithm in the presence of pointers, but then there would likely be a lot of false alarms, preventing the use of what are in fact "safe" algorithms, presuming potential race conditions are disallowed at compile-time.

    ReplyDelete
  3. Thanks for explanation.

    I also keep eye on this work http://coherence-lang.org, but your project is more concrete already.

    ReplyDelete