For simple types, coming up with a null value is pretty straightforward. For unsigned integer types, a value one greater than the maximum value could be used for null. For example, given an unsigned type going from 0 to 100, the value 101 could be used to represent null. Similarly, for enumeration types that use an internal unsigned integer representation, such as 0 for #false and 1 for #true, the null value could be represented by one more than the maximum internal unsigned integer code, for example 2 for the null value of an optional Boolean type. For a signed integer type, one less than the minimum value might make sense, particularly on a 2's complement machine where there is one "extra" negative value anyway. So for a 64-bit signed integer, one might use -2**63+1 .. 2**63-1 for "normal" values of the type, and reserve -2**63 for the null value. Most floating point representations include the notion of "Not a Number" (NaN), and some NaN value would make sense for null. Since there are no run-time checks in ParaSail (checking all being done at compile-time), it would be fine to use a "non-signaling" NaN for null.
For more complex types, the representation for null is a bit more interesting. One common kind of type would be a simple "wrapper," that is, a type defined by a class that has exactly one var component. For example:
class Box<Contents_Type is Assignable<>> is var Contents : Contents_Type; exports function Create(Value : Contents_Type) -> Box is return (Contents => Value); end function Create; ... end class Box;In this case, it would be nice to have the wrapper type use exactly the same representation as that of the underlying component type (e.g. Contents_Type). This would mean that the null value for the wrapper would be the same as the null value for the component type. This does mean that the component type must not itself be marked as optional, as then there would be no way to distinguish the wrapper being non-null but containing a single null component, from the case where the wrapper itself is null.
So our conclusion is that a wrapper type can use the same representation as its component type so long as the component type is not itself marked optional. If the component type is itself marked optional, then the wrapper needs to allow for its "own" representation for null, which might in some cases be accommodated by simply allowing for yet one more value if the component type is "simple," or a level of indirection for a more complex component type.
Now what about more complex types? For example, a type defined by a class with multiple var components:
class Pair<Element_Type is Assignable<>> is var Left : Element_Type; var Right : Element_Type; exports function Cons(Left, Right : Element_Type) -> Pair is return (Left => Left, Right => Right); end function Cons; ... end class Pair;One obvious representation for a type with multiple components is as a sequence of memory cells long enough to accommodate the representation of each of the components, and then some provision for representing null, which could be by piggy-backing on one of the components if it is not itself allowed to be null, or by adding an additional bit to indicate null-ness. However, in our initial (prototype, ParaSail-Virtual-Machine-based) implementation, we have chosen to represent every object using a single 64-bit memory cell. This basically means that if the value cannot fit in a single 64-bit cell, it will need to use a level of indirection. To simplify further, we won't be doing any "packing" initially, so even if the components are each quite short (such as booleans), we will nevertheless go to an indirect representation. We do anticipate supporting packed arrays, but that would initially be handled by doing explicit shifting and masking, rather than by building in the notion of packing into the ParaSail Virtual Machine instruction set. In the ParaSail Virtual Machine, pretty much everything occupies a single 64-bit word.
So back to the initial question -- how will we represent objects with multiple components (or with a single component whose type is marked optional)? And how will we represent null for such objects? One important thing to remember is that (large) ParaSail objects live in regions, and the regions are associated with stack frames. Whenever assigning a value to an object, if the new value can't "fit" in the same space as its current value, then the space for the old value needs to be released and the space for the new value needs to be allocated, in the appropriate region. Since a "ref var" parameter (or subcomponent thereof) might be assigned a new value, it won't necessarily be known at compile-time what region the object being assigned "lives" in. This suggests that every value for a large object must identify its region, so that its space can be released back to that region when it is freed, and so that the new value can be allocated in that region. This same requirement applies even if the object has a null value to begin with. Hence, we are left with the conclusion that a null value for a "large" type needs to identify its region.
Another issue for "large" objects is that we need to know how to reclaim the entire "subtree" of subcomponents when we release the enclosing object. In some cases we will know the type at compile-time, but in the case of polymorphic objects, or in cases where the type of the object is a formal type parameter of the enclosing module, we won't necessarily know. This implies that we may want to have some kind of "type-id" associated with large objects. This then results in the following representation for large objects:
A 64-bit pointer to: [Region, Type-Id, component1, component2, component3, ...]
where the Type-Id provides enough information to know which of the components are themselves "large" and hence need to be recursively released. In addition, there would probably be a special null Type-Id, which would allow the "is null" operation to be implemented relatively efficiently by comparing the Type-Id against some kind of standard "Null-Type-Id" value. Each region would only need a single null value, which pointed back to the region, and had the Null-Type-Id. Such null objects would not be reclaimed if they were the "old" value of the left-hand side of an assignment, since there is only one such value per region, and it is shared among all objects in that region currently having a null value.
So to summarize, we now see that null is not a single value, but rather each simple type potentially has its own representation for null, and for "large" types that use a level of indirection, there is a separate null value for each region. So the "is null" operation is not necessarily a simple check for equality with the global null value, but instead would depend on the type, and in particular for "large" objects, would be a check of the Type-Id field of the object to see if it has the Null-Type-Id.
On a final note, when a function returns "large" values, it needs to be "told" in which region to allocate the value(s) being returned. One simple way to do this is for the "large" output parameter(s) to be initialized with a (large) null value by the caller. Since such a null value identifies the region, the function can use that information when allocating the space for the returned value.