Thursday, October 1, 2009

ParaSail pass-by-reference

In ParaSail, operations like array indexing generally take and return a ref.  What exactly is a ref?  The intent is that it is roughly an address, but we would also like to use them as representing an arbitrary place in a more general container-like structure (such as a hash table).  And we might want to use them to query whether an object is present, without actually inserting into the container as a side-effect.  So it might be useful to think of a ref as a special kind of object with three operations:
  • fetch contents
  • store contents
  • check if present/exists/is-initialized
One could define the "Ref" interface roughly as follows:
  abstract interface Ref<Target_Type is Any<>> 
      function Exists(R : Ref) -> Boolean;
      function Fetch(R : Ref {Exists(R)}) -> Target_Type;
      procedure Store(R : Ref; 
        New_Contents : Target_Type) {Exists(R)};
  end interface Ref;
A ref is sort of a place-holder, or perhaps a prescription for where to store an object, without necessarily making room for the object until it is actually stored.  For a hash-table like container, a ref to one of its elements might be the key value, which isn't actually inserted into the hash-table until the Store occurs.  One could imagine a hashed-mapping or a sparse array where you specify a "default" value for elements that have never been stored, and for such a structure, Exists would always return True, and Fetch would return the default value if no prior Store into the same ref had occurred.  And conceivably Store would be a no-op if the default value is being stored and no prior non-default value had been stored.

Another characteristic of a ref is whether the target object can be written, or only read.  That is, a read-only Ref would not have the Store operation.  It is desirable if the read-only-ness of a ref automatically flows through a function that takes the ref as a parameter.  That is, if the parameter to a function is a read-only ref, then clearly if the function returns a ref, then that returned ref should also be considered read-only.

This flow-through is common in many programming languages, where if an array is a constant, then so are its elements.  It is desirable not to have to define two versions of operator "[]", one for read-only refs, and one for writable refs, since presumably they will have an essentially identical implementation.  This argues for using ref as a parameter only when this flow-through semantics is desired, and presumably there will be an output that is also a ref.  The read-only-ness of the result ref would be determined from that of the ref parameter(s), and only if all of the ref parameters are writable would the result ref be writable.

If we don't want this flow-through, and instead really want to specify that a parameter be writable, then we might as well use something more explicit such as "var" or "in out" for such a parameter mode.

Rather than building in the flow-through rule, we could instead add another query into our hypothetical Ref interface, namely Is_Writable, and then use appropriate pre- and postconditions to constrain how writability flows through.  E.g. the postcondition for operator "[]" would be {Is_Writable(Result) = Is_Writable(Arr)}.  Clearly also the precondition of the Store procedure would be {Is_Writable(R)}.  This approach sounds more flexible, but it may be that we want the above flow-through rules as the default in the absence of some other specification.

Finally, we might want some default rules about whether Exists() should be True for a given ref parameter or result.  One would think that in general one would expect Exists(X) to be true if X is an incoming ref parameter, but one would not necessarily expect Exists(Result) to be true if Result is a ref result of an operation like "[]".  That is, the container should exist if you are indexing into it, but the selected element of the container doesn't necessarily exist just because you have used its key to index into the container.

Note that this general notion of Ref as an object in its own right would support object persistence fairly naturally, in that the Ref could be some kind of disk address, and not until you actually invoked Fetch or Store would you actually need to be sure that the relevant page of data is in memory.  These Refs are analogous to the "smart pointers" of C++, but they have the added advantage that they automatically have limited lifetimes (like "regular" C++ references), so they won't outlive the container into which they refer.

When inside a function, the non-ref result(s) of the function are in fact probably represented as refs provided by the callers, indicating where the results should be stored.  Such a ref would want to carry information about what region (in the region-based-storage-management sense) the result should be stored/built, so region can be thought of as another thing that is associated with a ref.  These are sort of write-only refs, or at least ones where the function wouldn't want to presume that Fetch is well-defined before performing at least one Store via the ref.  That is, Exists() would quite likely be False on entry to the function for each of the refs representing its results (aka "out" parameters).

Note that a general Ref object could also be used to represent an element of a packed array, where Fetch and Store would do the right thing in terms of unpacking/packing.  Of course this is getting dangerously close to pass-by-name of Algol 60 fame, and clearly sounds like it would require a thunk or equivalent.  But a (static or dynamic) polymorphic object and a thunk are not that much different anyway.  The expectation would be that the types of these Ref objects would often be statically known, so a compiler could choose to inline as appropriate to achieve the desired level of performance.

We might want to generalize Ref objects further by adding Create_Referenced_Object and Delete_Referenced_Object operations, since we already have an operation for testing whether the referenced object Exists.  This would make sense for a Ref to an element of a hash table, for example.  Essentially for any Ref where Exists might return False, it might make sense to be able to Create or Delete the referenced object.  In the above discussion, "Store" is presumed to create the referenced object if necessary, which is perhaps adequate for our purposes.  But Delete would be useful as a way to restore the original state where Exists is False.  This would only make sense for Refs where Exists can return False.

1 comment:

  1. That's the first I've heard of references as first class objects. Indeed, the applications for persistence are really neat, especially as I am assuming one could inherit from the reference interface and easily change the memory storage mechanism for a program (or parts of a program) rather seamlessly...