Designing ParaSail, a new programming language: A Pointer-Free Path to Object-Oriented Parallel Programming

Wednesday, August 1, 2012

A Pointer-Free Path to Object-Oriented Parallel Programming

The following is a first draft of a submission for the forthcoming "Foundations of Object-Oriented Languages" (FOOL 2012) workshop accompanying this year's SPLASH/OOPSLA conference in Tuscon. Any comments would be welcome.

-----------

ParaSail: Pointer-Free Path to Object-Oriented Parallel Programming

Pointers are ubiquitous in modern object-oriented programming languages, and many data structures such as trees, lists, graphs, hash tables, etc. depend on them heavily. Unfortunately, pointers can add significant complexity to programming. Pointers can make storage management more complex, pointers can make assignment and equality semantics more complex, pointers can increase the ways two different names (access paths) can designate the same object, pointers can make program analysis and proof more complex, and pointers can make it harder to "divide and conquer" a data structure for parallel processing.

Is there an alternative to using pointers? ParaSail, a new parallel object-oriented programming language, adopts a different paradigm for defining data structures. Rather than using pointers, ParaSail supports flexible data structuring using "expandable" (and shrinkable) objects, along with generalized indexing. By eliminating pointers, ParaSail significantly reduces the complexity for the programmer, while also allowing ParaSail to provide pervasive, safe, object-oriented parallel programming.

Expandable and Optional Objects

An "expandable" object is one which can grow without using pointers, much as a house can grow through additions. Where once there was a door to the back yard, a new screened-in porch can be added. Where once there was only one floor, a new floor can be added. The basic mechanism for expansion in ParaSail is that every type has one additional value, called "null." A component can initially be null, and then be replaced by a non-null value, thereby expanding the enclosing object. At some later point the enclosing object could shrink, by replacing a non-null component with null.

Not every component of an object is allowed to be null. The component must be declared as "optional" if it is allowed to take on a null value. For example, a Tree structure might have a (non-optional) "Payload" component, and then two additional components, "Left" and "Right," which are each declared as "optional Tree." Similarly, a stand-alone object may be declared to be of a type "T," or of a type "optional T." Only if it is declared "optional" may it take on the null value. The value of an object X declared as "optional" may be tested for nullness using "X is null" or "X not null."

Another example of a data structure using optional components would be a linked list, with each node having two components, one "Payload" component, and a "Tail" component of type "optional List." There is also a built-in parameterized type, "Basic_Array<Component_Type>" which allows the Component_Type to be specified as "optional." This allows the construction of a hash table with buckets represented as linked-lists, by declaring the "backbone" of the hash table as a "Basic_Array<optional List<Hash_Table_Item>>." The components of the hash table would start out as null, but as items are added to the hash table, one or more of the component lists would begin to grow.

Assignment, Move, and Swap Operations

Because there are no pointers, the semantics of assignment in ParaSail are very straightforward, namely the entire right-hand-side object is copied and assigned into the left-hand side, replacing whatever prior value was there. However, there are times when it is desirable to "move" a component from one object to another, or "swap" two components. Because implementing these on top of an assignment that uses copying would be painful, in ParaSail, "move" and "swap" are separate operations. The semantics of "move" is that the value of the left-hand-side is replaced with the value of the right-hand-side, and the right-hand-side ends up null. For "swap," the values of the left- and right-hand-side are swapped. Syntactically, ParaSail uses ":=" for (copying) assignment, "<==" for move, and "<=>" for swap. The ParaSail compiler is smart enough to automatically use "move" semantics when the right-hand-side is the result of a computation, rather than an object or component that persists after the assignment.

As an example of where "move" might be used, if our hash table grows to the point that it would be wise to lengthen the backbone, we could create a new Basic_Array twice as large (for example), and then "move" each list node from the old array into the new array in an appropriate spot, rebuilding each linked list, and then finally "move" the new array into the original hash-table object, replacing the old array. The "swap" operation is also useful in many contexts, for example when balancing a tree structure, or when sorting an array.

Cyclic Data Structures and Generalized Indexing

Expandable objects allow the construction of many kinds of data structures, but a general, possibly cyclic graph is not one of them. For this, ParaSail provides generalized indexing. The array-indexing syntax, "A[I]," is generalized in ParaSail to be usable with any container-like data structure, where A is the container and I is the key into that data structure. A directed graph in ParaSail could be represented as an indexed set of Nodes, where the index is a unique node "Id" of some sort, with edges represented as Predecessor and Successor indice sets that are components of each Node. There is no harm in following a "dangling" unique node-id in ParaSail, as that is easy to detect because the node-id would be missing from the indexed set. In a language with pointers, either a dangling pointer will cause a storage leak, because the target node cannot be reclaimed, or it will lead to a potentially destructive reference to reclaimed storage.

Region-Based Storage Management

Storage management without pointers is significantly simplified. All of the objects declared in a given scope are associated with a storage "region," essentially a local heap. As an object grows, all new storage for it is allocated out of this region. As an object shrinks, the old storage can be immediately released back to this region. When a scope is exited, the entire region is reclaimed. There is no need for asynchronous garbage collection, as garbage never accumulates.

Every object identifies its region, and in addition, when a function is called, the region in which the result object should be allocated is passed as an implicit parameter. This "target" region is determined by how the function result is used. If it is a temporary, then it will be allocated out of a temporary region associated with the point of call. If it is assigned into a longer-lived object, then the function will be directed to allocated the result object out of the region associated with this longer-lived object. The net effect is that there is no copying at the call site upon function return, since the result object is already sitting in the correct region.

Note that pointers are likely still used "behind the scenes" in any implementation of ParaSail, but eliminating them from the surface syntax and semantics can eliminate essentially all of the complexity associated with pointers. That is, a semantic model of expandable and shrinkable objects, operating under "value semantics," rather than a semantic model of nodes connected with pointers, operating under "reference semantics," provides a number of benefits, such as simpler storage management, simpler assignment semantics, easier analyzability, etc.

Parallel and Distributed Programming

In addition to removing pointers, certain other simplifications are made in ParaSail to ease parallel and distributed programming. In particular, there are no global variables; functions may only update objects passed to them as "var" (in-out) parameters. Furthermore, as part of passing an object as a "var" parameter, it is effectively "handed off" to the receiving function, and no further reference may be made to the object, until the function completes. In particular, no part of the "var" parameter may be passed to any other function, or even to this same function as a separate parameter. This eliminates the possibility of aliasing between a "var" parameter and any other object visible to the function. These two additional rules, coupled with the lack of pointers, means that all parameter evaluation may happen in parallel (e.g. in "F(G(X), H(Y))", G(X) and H(Y) may be evaluated in parallel), and function calls may easily cross address-space boundaries, since the objects are self-contained (with no incoming or outgoing references), and only one function at a time can update a given object.

Concurrent Objects

All of the above rules apply to objects that are not designed for concurrent access. ParaSail also supports the construction of concurrent objects, which allow lock-free, locked, and queued simultaneous access. These objects are *not* "handed off" as part of parameter passing, but instead provide operations which synchronize any attempts at concurrent access. Three kinds of synchronization are supported. "Lock-free" synchronization relies on low-level hardware-supported operations such as atomic load and store, and compare-and-swap. "Locked" synchronization relies on automatic locking as part of calling a "locked" operation of a concurrent object, and automatic unlocking as part of returning from the operation. Finally, "queued" synchronization is provided, which evaluates a "dequeue condition" upon call (under a lock), and only if the condition is satisfied is the call allowed to proceed, still under the lock. A typical "dequeue condition" might be that a buffer is not full, or that a mailbox has at least one element in it. If the dequeue condition is not satisfied, then the caller is added to a queue. At the end of any operation on the concurrent object that might change the result of the dequeue condition for a queued caller, the dequeue condition is evaluated and if true, the operation requested by the queued caller is performed before the lock is released. If there are multiple queued callers, then they are serviced in turn until there are none with satisfied dequeue conditions.

Pointer-Free Object-Oriented Parallel Programming

The pointer-free nature of ParaSail is not just an interesting quirk. Rather, we believe it represents a significantly simpler way to build large object-oriented systems. By itself it simplifies storage management, assignment semantics, and analyzability, and when combined with the elimination of global variables and parameter aliasing, it allows for the easy parallelization of all expression evaluation, and the easy distribution of computations across address spaces, without having to make the jump to a pure side-effect-free functional language.

16 comments:

maciasAugust 2, 2012 at 3:35 PM
Call me dumb, but those no-pointers look for me like pointers. And assignments with copy semantics is nothing more like dereference on the fly when using regular pointers.

Anyway, a little example would not harm -- for example a single linked list and adding and removing object.
ReplyDelete
Replies
StepheAugust 4, 2012 at 8:42 AM
I like this summary; I have followed this blog, but had forgotten the assign/move/swap distinction.

typo in Region-Based Storage Management: "function will be directed to allocated the result" should be "allocate".

Now if I could just write an Android app in Parasail ...
ReplyDelete
Replies
UnknownAugust 8, 2012 at 8:57 PM
Seems to be a standard OO language, but without the possibility of sharing an object that is not designed to be concurrent.

Looks like a step further for saving the OO model in concurrent environments.

Just a question, do you plan adding lazy/parallel-free/fork-join objects?

Let me try to explain... I have an object encapsulating a huge computation, in order to trigger that computation and store the result, I would need to handle the instance off to a separate function. But I do not want to trigger that computation if none of the other objects needs it, but I do not want to trigger the same computation more than one time. Like a Future of Java.

Objects A, B and C may need the value of a huge computation encapsulated in an object D.
I want to fire a function in all the three objects (A, B and C) that may or may not use the value of D. So I will only fire off the D computation if one of the objects need it, and then if any of the others also need it, they should block and wait the first computation without trigger a new one.

By the way, great post.
ReplyDelete
Replies
FrankyAugust 16, 2012 at 2:32 PM
"Furthermore, as part of passing an object as a "var" parameter, it is effectively "handed off" to the receiving function, and no further reference may be made to the object, until the function completes."
You should take a look at the Clean programming language, http://wiki.clean.cs.ru.nl/ and its uniqueness types, which seem to capture the same meaning.
ReplyDelete
Replies
jojopig.comOctober 22, 2012 at 10:51 AM
thanks for the posts. good stuff.
ReplyDelete
Replies
me22October 27, 2012 at 1:40 AM
'Syntactically, ParaSail uses ":=" for (copying) assignment, "<==" for move, and "<=>" for swap.'

I can't help but think of C++11's Move Semantics, where the equivalent of "x <== y" is spelled "x = std::move(y)". (And " x <=> y" is "swap(x, y)", the implementation of which is just three moves.)

If you're removing pointers entirely, perhaps you should go all the way to a Linear Type System?
ReplyDelete
Replies
QwertieNovember 1, 2012 at 7:58 PM
This reminds me of Rust, which does allow shared objects but also takes pains to do as much as possible with "unique pointers"--objects referenced from only one place. Of course, that's to simplify memory management first and for parallelism second.

ParaSail sounds promising as a very simple and intuitive way to write highly parallel programs (though I am mostly ignorant about highly parallel programming). The trouble with being able to run all functions in parallel, though, is "what kind of processor allows running all functions in parallel"? I don't know of a system that makes forks and joins trivially cheap.

But assuming suitable hardware, it seems like ParaSail would be a neat addition to a conventional programming language, where, if you simply limit parts of your code to certain operations, those parts suddenly become massively parallel. I don't think I'd like to program under these restrictions all the time, though. Using indexes instead of actual pointers sounds terribly cumbersome, because the code would always have to keep track of which array goes with which index, manually, and you cannot, of course, simulate a pointer by bundling the array and index together ... unless you've developed a new abstraction to "fake it".
ReplyDelete
Replies
QwertieMarch 3, 2019 at 7:36 PM
Somewhere in the options it is possible to turn on comment moderation so you can avoid letting spam through.
ReplyDelete
Replies

Add comment

Designing ParaSail, a new programming language

Wednesday, August 1, 2012

A Pointer-Free Path to Object-Oriented Parallel Programming

16 comments:

Contributors

Blog Archive