Tuesday, January 24, 2012

Implementing Polymorphic Objects

ParaSail has a somewhat different approach to (run-time) polymorphism than is typical.  There is a separate notation for indicating an object is polymorphic.  If you declare "var X : T" then X is of type T.  If you declare "var X : T+" then X is a polymorphic object, meaning it can hold an object of type T or any type that extends or implements T.  Adding a "+" sign to the end of a type name is used to indicate a polymorphic type rooted at the given type.  For a longer discussion of ParaSail's object-oriented model, see:

http://parasail-programming-language.blogspot.com/2009/09/parasail-extension-inheritance-and.html

We have just completed the initial implementation of polymorphic objects, and there were some interesting trade-offs involved.  First we should recap the implementation of "regular" objects in ParaSail.  Some of this is covered in:

http://parasail-programming-language.blogspot.com/2011/05/when-null-isnt-just-null-value.html

Regular objects come in two basic varieties, small and large.  A small object is represented by a single 64-bit word.  It has no type information at run-time.  The compiler "knows" from context the type of the object.  One bit pattern is reserved to represent null for a given small type.  As mentioned in the above blog entry, we use NaN for a floating point null, and -2**63 for a 64-bit integer null.  Other values can be used for types with fewer values, of course, and one less than the smallest normal value or one greater than the largest normal value would work.

A large object is represented by a 64-bit address, pointing to a sequence of 64-bit words, the first of which is a header, followed by the components.  The header identifies the object's size (in 64-bit words), its structural type-id (primarily an indication of which components are small and which are large), its storage region-id, and a lock-id if the object is a concurrent object.

If a composite type has exactly one component, then rather than creating another level of indirection, we simply adopt the representation for the component as the representation for the composite type.  This means that a composite type whose only component is of a small type is itself represented as a small object.  Hence, there is no extra storage overhead in wrapping an object in another type, just to give it a new interface.  So a user-defined numeric type, for instance, need be no bigger than the underlying component used to represent its value.  We call these wrapper types.  Objects of a wrapper type are indistinguishable at run-time from objects of their component type, and if their component is large, then the type-id in the header of an object of the wrapper type will be the same as the type-id in the header of an object of the component type.  This is one reason we call the type-id in the header of a large object its structural type-id.

A null for a large type was originally represented by the 64-bit address of an object consisting only of a header, with a structural type-id indicating it was a null object, and the storage region-id indicating where to allocate storage for it when it is assigned a non-null value.  We have since switched over to a representation where the 64-bit address itself is recognizable as representing a large null value, with the storage region-id encoded in 16 bits of the 64-bit address representation, avoiding the extra level of indirection, and the need to create null objects when first creating a new region.

So where does that leave us for polymorphic objects?  Polymorphic objects need to carry around an indication of their true run-time type-id, not just structural information.  For example, a polymorphic object that started out as an object of a wrapper type, needs to be distinguishable at run-time from a polymorphic object that started out as an object of the wrapper's component type.  So we choose to make polymorphic objects always use the large representation, meaning they always have a header.  Furthermore, the type-id in this header explicitly identifies this object as being a polymorphic object, with a single "component." The true (run-time) type-id of this single "component" is also identified by the polymorphic type-id.  Furthermore, if this true type implements (rather than extends) the root type of the polymorphic type, an operation map is provided to map the true type's operations to those of the root type.  For a discussion of these operation maps (called slot-number mappings in the blog entry), see:

http://parasail-programming-language.blogspot.com/2010/08/initial-implementation-model-for.html

So the representation of a polymorphic object is essentially a "real" wrapper object, to allow us to specify the true type of the wrapped underlying object.  To some degree this is the price we pay for having small objects that don't carry around any type information, and for allowing wrapper types to be represented identically to their component types.  When we really need a polymorphic object, we need to add an extra level of header, so we can record the true type information.

What can you do with polymorphic objects?  They can be assigned a value of any type that extends or implements the root type of the polymorphic type.  Furthermore, when passed as a parameter to an operation of the root type, the true type-id of the wrapped object is consulted, and that type-id is used to find the actual body to execute for this given operation.

For example, if T has a visible "Print(X : T)" operation, then any type NT that extends or implements T must also have a corresponding Print operation, but taking NT rather than T (i.e. "Print(X : NT)").  If we pass a polymorphic object of type T+ to the Print operation of T, it will dispatch to the appropriate body for Print.  So if the true type-id for the polymorphic object passed identifies NT, then we will execute the "Print(X : NT)" operation.

As part of passing the polymorphic object, the polymorphic "wrapper" is stripped off, so by the time we get to the appropriate operation's body, the representation for the parameter is back to the "normal" representation expected by that operation.  For example, this means that operations designed to take "small" representations for values, will work fine in this context, because the large-object, polymorphic header will have been stripped off, and just the small value will be passed to the operation's body, as expected.

3 comments: