The following is a first draft of a submission for the forthcoming "Foundations of Object-Oriented Languages" (FOOL 2012) workshop accompanying this year's SPLASH/OOPSLA conference in Tuscon. Any comments would be welcome.
-----------
ParaSail: Pointer-Free Path to Object-Oriented Parallel Programming
Pointers are ubiquitous in modern object-oriented programming languages, and many data structures such as trees, lists, graphs, hash tables, etc. depend on them heavily. Unfortunately, pointers can add significant complexity to programming. Pointers can make storage management more complex, pointers can make assignment and equality semantics more complex, pointers can increase the ways two different names (access paths) can designate the same object, pointers can make program analysis and proof more complex, and pointers can make it harder to "divide and conquer" a data structure for parallel processing.
Is there an alternative to using pointers? ParaSail, a new parallel object-oriented programming language, adopts a different paradigm for defining data structures. Rather than using pointers, ParaSail supports flexible data structuring using "expandable" (and shrinkable) objects, along with generalized indexing. By eliminating pointers, ParaSail significantly reduces the complexity for the programmer, while also allowing ParaSail to provide pervasive, safe, object-oriented parallel programming.
Expandable and Optional Objects
An "expandable" object is one which can grow without using pointers, much as a house can grow through additions. Where once there was a door to the back yard, a new screened-in porch can be added. Where once there was only one floor, a new floor can be added. The basic mechanism for expansion in ParaSail is that every type has one additional value, called "null." A component can initially be null, and then be replaced by a non-null value, thereby expanding the enclosing object. At some later point the enclosing object could shrink, by replacing a non-null component with null.
Not every component of an object is allowed to be null. The component must be declared as "optional" if it is allowed to take on a null value. For example, a Tree structure might have a (non-optional) "Payload" component, and then two additional components, "Left" and "Right," which are each declared as "optional Tree." Similarly, a stand-alone object may be declared to be of a type "T," or of a type "optional T." Only if it is declared "optional" may it take on the null value. The value of an object X declared as "optional" may be tested for nullness using "X is null" or "X not null."
Another example of a data structure using optional components would be a linked list, with each node having two components, one "Payload" component, and a "Tail" component of type "optional List." There is also a built-in parameterized type, "Basic_Array<Component_Type>" which allows the Component_Type to be specified as "optional." This allows the construction of a hash table with buckets represented as linked-lists, by declaring the "backbone" of the hash table as a "Basic_Array<optional List<Hash_Table_Item>>." The components of the hash table would start out as null, but as items are added to the hash table, one or more of the component lists would begin to grow.
Assignment, Move, and Swap Operations
Because there are no pointers, the semantics of assignment in ParaSail are very straightforward, namely the entire right-hand-side object is copied and assigned into the left-hand side, replacing whatever prior value was there. However, there are times when it is desirable to "move" a component from one object to another, or "swap" two components. Because implementing these on top of an assignment that uses copying would be painful, in ParaSail, "move" and "swap" are separate operations. The semantics of "move" is that the value of the left-hand-side is replaced with the value of the right-hand-side, and the right-hand-side ends up null. For "swap," the values of the left- and right-hand-side are swapped. Syntactically, ParaSail uses ":=" for (copying) assignment, "<==" for move, and "<=>" for swap. The ParaSail compiler is smart enough to automatically use "move" semantics when the right-hand-side is the result of a computation, rather than an object or component that persists after the assignment.
As an example of where "move" might be used, if our hash table grows to the point that it would be wise to lengthen the backbone, we could create a new Basic_Array twice as large (for example), and then "move" each list node from the old array into the new array in an appropriate spot, rebuilding each linked list, and then finally "move" the new array into the original hash-table object, replacing the old array. The "swap" operation is also useful in many contexts, for example when balancing a tree structure, or when sorting an array.
Cyclic Data Structures and Generalized Indexing
Expandable objects allow the construction of many kinds of data structures, but a general, possibly cyclic graph is not one of them. For this, ParaSail provides generalized indexing. The array-indexing syntax, "A[I]," is generalized in ParaSail to be usable with any container-like data structure, where A is the container and I is the key into that data structure. A directed graph in ParaSail could be represented as an indexed set of Nodes, where the index is a unique node "Id" of some sort, with edges represented as Predecessor and Successor indice sets that are components of each Node. There is no harm in following a "dangling" unique node-id in ParaSail, as that is easy to detect because the node-id would be missing from the indexed set. In a language with pointers, either a dangling pointer will cause a storage leak, because the target node cannot be reclaimed, or it will lead to a potentially destructive reference to reclaimed storage.
Region-Based Storage Management
Storage management without pointers is significantly simplified. All of the objects declared in a given scope are associated with a storage "region," essentially a local heap. As an object grows, all new storage for it is allocated out of this region. As an object shrinks, the old storage can be immediately released back to this region. When a scope is exited, the entire region is reclaimed. There is no need for asynchronous garbage collection, as garbage never accumulates.
Every object identifies its region, and in addition, when a function is called, the region in which the result object should be allocated is passed as an implicit parameter. This "target" region is determined by how the function result is used. If it is a temporary, then it will be allocated out of a temporary region associated with the point of call. If it is assigned into a longer-lived object, then the function will be directed to allocated the result object out of the region associated with this longer-lived object. The net effect is that there is no copying at the call site upon function return, since the result object is already sitting in the correct region.
Note that pointers are likely still used "behind the scenes" in any implementation of ParaSail, but eliminating them from the surface syntax and semantics can eliminate essentially all of the complexity associated with pointers. That is, a semantic model of expandable and shrinkable objects, operating under "value semantics," rather than a semantic model of nodes connected with pointers, operating under "reference semantics," provides a number of benefits, such as simpler storage management, simpler assignment semantics, easier analyzability, etc.
Parallel and Distributed Programming
In addition to removing pointers, certain other simplifications are made in ParaSail to ease parallel and distributed programming. In particular, there are no global variables; functions may only update objects passed to them as "var" (in-out) parameters. Furthermore, as part of passing an object as a "var" parameter, it is effectively "handed off" to the receiving function, and no further reference may be made to the object, until the function completes. In particular, no part of the "var" parameter may be passed to any other function, or even to this same function as a separate parameter. This eliminates the possibility of aliasing between a "var" parameter and any other object visible to the function. These two additional rules, coupled with the lack of pointers, means that all parameter evaluation may happen in parallel (e.g. in "F(G(X), H(Y))", G(X) and H(Y) may be evaluated in parallel), and function calls may easily cross address-space boundaries, since the objects are self-contained (with no incoming or outgoing references), and only one function at a time can update a given object.
Concurrent Objects
All of the above rules apply to objects that are not designed for concurrent access. ParaSail also supports the construction of concurrent objects, which allow lock-free, locked, and queued simultaneous access. These objects are *not* "handed off" as part of parameter passing, but instead provide operations which synchronize any attempts at concurrent access. Three kinds of synchronization are supported. "Lock-free" synchronization relies on low-level hardware-supported operations such as atomic load and store, and compare-and-swap. "Locked" synchronization relies on automatic locking as part of calling a "locked" operation of a concurrent object, and automatic unlocking as part of returning from the operation. Finally, "queued" synchronization is provided, which evaluates a "dequeue condition" upon call (under a lock), and only if the condition is satisfied is the call allowed to proceed, still under the lock. A typical "dequeue condition" might be that a buffer is not full, or that a mailbox has at least one element in it. If the dequeue condition is not satisfied, then the caller is added to a queue. At the end of any operation on the concurrent object that might change the result of the dequeue condition for a queued caller, the dequeue condition is evaluated and if true, the operation requested by the queued caller is performed before the lock is released. If there are multiple queued callers, then they are serviced in turn until there are none with satisfied dequeue conditions.
Pointer-Free Object-Oriented Parallel Programming
The pointer-free nature of ParaSail is not just an interesting quirk. Rather, we believe it represents a significantly simpler way to build large object-oriented systems. By itself it simplifies storage management, assignment semantics, and analyzability, and when combined with the elimination of global variables and parameter aliasing, it allows for the easy parallelization of all expression evaluation, and the easy distribution of computations across address spaces, without having to make the jump to a pure side-effect-free functional language.