Designing ParaSail, a new programming language

Tuesday, January 24, 2012

Implementing Polymorphic Objects

ParaSail has a somewhat different approach to (run-time) polymorphism than is typical. There is a separate notation for indicating an object is polymorphic. If you declare "var X : T" then X is of type T. If you declare "var X : T+" then X is a polymorphic object, meaning it can hold an object of type T or any type that extends or implements T. Adding a "+" sign to the end of a type name is used to indicate a polymorphic type rooted at the given type. For a longer discussion of ParaSail's object-oriented model, see:

http://parasail-programming-language.blogspot.com/2009/09/parasail-extension-inheritance-and.html

We have just completed the initial implementation of polymorphic objects, and there were some interesting trade-offs involved. First we should recap the implementation of "regular" objects in ParaSail. Some of this is covered in:

http://parasail-programming-language.blogspot.com/2011/05/when-null-isnt-just-null-value.html

Regular objects come in two basic varieties, small and large. A small object is represented by a single 64-bit word. It has no type information at run-time. The compiler "knows" from context the type of the object. One bit pattern is reserved to represent null for a given small type. As mentioned in the above blog entry, we use NaN for a floating point null, and -2**63 for a 64-bit integer null. Other values can be used for types with fewer values, of course, and one less than the smallest normal value or one greater than the largest normal value would work.

A large object is represented by a 64-bit address, pointing to a sequence of 64-bit words, the first of which is a header, followed by the components. The header identifies the object's size (in 64-bit words), its structural type-id (primarily an indication of which components are small and which are large), its storage region-id, and a lock-id if the object is a concurrent object.

If a composite type has exactly one component, then rather than creating another level of indirection, we simply adopt the representation for the component as the representation for the composite type. This means that a composite type whose only component is of a small type is itself represented as a small object. Hence, there is no extra storage overhead in wrapping an object in another type, just to give it a new interface. So a user-defined numeric type, for instance, need be no bigger than the underlying component used to represent its value. We call these wrapper types. Objects of a wrapper type are indistinguishable at run-time from objects of their component type, and if their component is large, then the type-id in the header of an object of the wrapper type will be the same as the type-id in the header of an object of the component type. This is one reason we call the type-id in the header of a large object its structural type-id.

A null for a large type was originally represented by the 64-bit address of an object consisting only of a header, with a structural type-id indicating it was a null object, and the storage region-id indicating where to allocate storage for it when it is assigned a non-null value. We have since switched over to a representation where the 64-bit address itself is recognizable as representing a large null value, with the storage region-id encoded in 16 bits of the 64-bit address representation, avoiding the extra level of indirection, and the need to create null objects when first creating a new region.

So where does that leave us for polymorphic objects? Polymorphic objects need to carry around an indication of their true run-time type-id, not just structural information. For example, a polymorphic object that started out as an object of a wrapper type, needs to be distinguishable at run-time from a polymorphic object that started out as an object of the wrapper's component type. So we choose to make polymorphic objects always use the large representation, meaning they always have a header. Furthermore, the type-id in this header explicitly identifies this object as being a polymorphic object, with a single "component." The true (run-time) type-id of this single "component" is also identified by the polymorphic type-id. Furthermore, if this true type implements (rather than extends) the root type of the polymorphic type, an operation map is provided to map the true type's operations to those of the root type. For a discussion of these operation maps (called slot-number mappings in the blog entry), see:

http://parasail-programming-language.blogspot.com/2010/08/initial-implementation-model-for.html

So the representation of a polymorphic object is essentially a "real" wrapper object, to allow us to specify the true type of the wrapped underlying object. To some degree this is the price we pay for having small objects that don't carry around any type information, and for allowing wrapper types to be represented identically to their component types. When we really need a polymorphic object, we need to add an extra level of header, so we can record the true type information.

What can you do with polymorphic objects? They can be assigned a value of any type that extends or implements the root type of the polymorphic type. Furthermore, when passed as a parameter to an operation of the root type, the true type-id of the wrapped object is consulted, and that type-id is used to find the actual body to execute for this given operation.

For example, if T has a visible "Print(X : T)" operation, then any type NT that extends or implements T must also have a corresponding Print operation, but taking NT rather than T (i.e. "Print(X : NT)"). If we pass a polymorphic object of type T+ to the Print operation of T, it will dispatch to the appropriate body for Print. So if the true type-id for the polymorphic object passed identifies NT, then we will execute the "Print(X : NT)" operation.

As part of passing the polymorphic object, the polymorphic "wrapper" is stripped off, so by the time we get to the appropriate operation's body, the representation for the parameter is back to the "normal" representation expected by that operation. For example, this means that operations designed to take "small" representations for values, will work fine in this context, because the large-object, polymorphic header will have been stripped off, and just the small value will be passed to the operation's body, as expected.

Friday, January 6, 2012

Rev 1.8 of alpha 0.3 release of ParaSail compiler and virtual machine

Rev 1.8 of the 0.3 Alpha release of ParaSail is now available at:

http://bit.ly/AAWk7Y

This release has initial support for generic operations, where the type of one of the operands is not specified exactly, but merely as being any type that implements a given interface. For example, to declare a concatenation operator between Univ_String and any type that implements the Imageable interface, you can write:

op "|"(Left : Univ_String; Right : Right_Type is Imageable<>) -> Univ_String;

This operator can be implemented using operations on Right that are defined in the Imageable interface, such as To_String.

Monday, November 7, 2011

Rev 1.6 of alpha 0.2 release of ParaSail compiler and VM

Here is a new revision (rev 1.6) of the alpha 0.2 release of the ParaSail compiler and virtual machine

http://bit.ly/tJyc5E

The big addition here is a new example, the "drinking philosophers" example (drinking_phils.psl). This example is a variant on the classic "dining philosophers" example, and makes heavy use of timers and complex dequeue conditions. A lot of changes under the hood, but no major new features.

Monday, October 24, 2011

Alpha release 0.2 of ParaSail Compiler and Virtual Machine

We are now making available alpha release 0.2 (rev 1.5) of the ParaSail Compiler and Virtual Machine:

http://bit.ly/rsxQDX

This release implements several important additional features, including queued parameters for operations on concurrent objects, floating-point and fixed-point numbers along with the universal type Univ_Real, and timers supporting delay for a time interval and delay until a specified timer value, or until a specific wall clock time.

This release also includes two different version of the VM, one which simulates multi-threading on a single O/S thread, and another which uses multiple O/S threads to provide true parallelism on a multi-core/multi-processor machine.

Please read the release notes included in the above download for additional information.
-Tuck

Friday, October 14, 2011

Upcoming panels, tutorials, and workshops on ParaSail

ParaSail will be discussed at two panels at the upcoming SPLASH/OOPSLA conference in Portland, Oregon, and will be covered in depth in a tutorial and workshop/birds-of-a-feather session at the upcoming SIGAda conference in Denver.

The panels at SPLASH/OOPSLA are:

A panel that is part of the Transitioning to Multicore (TMC) workshop, on Sunday Oct. 23rd, at 2:00PM -- see http://tmc.supertriceratops.com/
A panel on Multicore, Manycore, and Cloud computing -- Is a new programming language paradigm required?, on Wednesday Oct. 26th, at 4:00PM -- see http://splashcon.org/2011/program/229

The events at the upcoming SIGAda conference in Denver are:

A tutorial on Experimenting with ParaSail, which will be held Monday November 7th, from 9:00AM to 12:30PM -- see http://www.sigada.org/conf/sigada2011/tutorials-descriptions.html#MA1
A workshop/birds-of-a-feather session on ParaSail, which will be on Tuesday, November 8th, from 4:30PM to 5:30PM -- see http://www.sigada.org/conf/sigada2011/workshops.html

I believe all of these event are still open for additional attendees.

Thursday, October 13, 2011

Deadlocking the Virtual Machine

As mentioned in an earlier posting, ParaSail has its own virtual machine which directly supports the notion of picothreads, very light-weight threads which are served by heavier-weight server processes (see http://parasail-programming-language.blogspot.com/2010/11/virtual-machine-for-parasail-with.html). We are now compiling and executing various interesting ParaSail examples, and we recently began to see some apparent deadlocks in the virtual machine implementation.

The actual ParaSail program (N Queens -- see http://parasail-programming-language.blogspot.com/2011/09/n-queens-join-party.html) does not have any inherent deadlock itself. In fact, it only has one concurrent object, a vector of solutions, and the only locking operation used is to add another solution on to the end of the vector. Nevertheless, we are seeing with a true multi-threaded virtual machine implementation, a deadlock in the virtual machine, such that all forward progress stops. Like many situations in concurrent programs (at least pre-ParaSail...), these kinds of problems are timing dependent. In some cases we can get the N-Queens solver for 8 queens to finish, sometimes not. The version using a single-kernel-thread virtual machine, which merely simulates multi-threading, never has a problem.

So what could be causing the deadlock? After lots of debugging, a hypothesis has emerged. We are now in the process of acting on this hypothesis, so we don't know whether it is correct, or whether making a corresponding fix will solve the entire problem. So here is the hypothesis:

As explained in the posting about the virtual machine (actually, in a comment on that posting), we serve the queue of picothreads associated with a particular server process in a LIFO manner. But when a server steals threads from another server, it picks up the oldest picothread, i.e. uses FIFO. This is essentially identical to the approach used in the Cilk thread scheduler (see
http://supertech.csail.mit.edu/papers/steal.pdf). Our hypothesis is that it is the stealing of picothreads which is creating the deadlock.

So how could the stealing of threads create a deadlock? The hypothesized mechanism begins with a server executing a pictothread that gets a lock on an object. This happens when there is a call on an operation with a locked or queued parameter, or when an individual concurrent object (such as the Solutions vector in N-Queens) is passed to an operation where it is not known to be concurrent (such as the "|=" operation in the Vectors module).

In the virtual machine implementation, getting a lock means acquiring a mutex or equivalent. This mutex is automatically released when the operation completes. However, suppose the operation has internal parallelism. In ParaSail, parallelism is everywhere, and any operation of any complexity is likely to have some internal parallelism. The "|=" operation is relatively complex, particularly if the underlying array needs to be expanded before adding on another element to the vector. So there is plenty of opportunity for some internal parallelism. That is manifested in the virtual machine by one or more additional picothreads being created and queued on the same server's thread, and then eventually, the server waits for the parallel picothreads to complete.

As soon as new picothreads show up on the server's queue, other servers may start stealing them. As mentioned above, stealing works FIFO, i.e. oldest first, but if the original server's queue is essentially empty at the moment when a picothread from the locked operation is queued, then it will be the oldest thread on the queue, and so is a candidate for stealing. So let's presume that this new sub-picothread gets stolen. And perhaps that was the only picothread spawned by the original server. When it gets around to waiting for it to complete, it finds it is not yet done, and in fact its own queue is now empty, so it steals from a third server. So now we have our original server thread which is executing a picothread which is waiting for a subthread, going off and stealing work from some random third server's queue. Now let's suppose that the thread on that third server's queue wants to get the original lock. So the original server is now waiting on the lock that it already holds on behalf of some other picothread. A classic tail-chasing deadlock. Pretty soon all of the other servers stack up behind this same lock and everything grinds to a halt.

So how do we avoid this problem? Fundamentally we don't want a server that is holding a lock, to go off and start executing a thread which is not a sub-thread of the original thread which acquired the lock. If each server and each picothread keeps track of the innermost lock it holds, and locks are chained together so that an inner lock points to the next outer lock, if any, held, then we should be able to avoid the problem. Whenever a server goes looking for a thread, if it holds a lock, then the thread it chooses must also hold that same lock. It might also have some additional lock, but it must, somewhere on the chain of locks starting at the innermost one, have the same lock as that held by the server. This argues for having at least two queues per server, one of picothreads holding no locks, and one of picothreads holding at least one lock. Then when a queue is served by a server holding a lock, either FIFO or LIFO, it will only look at the queue of picothreads holding at least one lock, and it will ignore those on that queue that don't have the server's lock somewhere in their chain of locks.

Deadlocks are fun -- no doubt about it...

Monday, October 3, 2011

Alpha release of ParaSail Compiler and Virtual Machine

We are very pleased to announce the first, alpha, release of the ParaSail prototype Compiler and Virtual Machine. This release includes executables for the Mac, Linux, and Windows. Please visit the ParaSail Google Group to download this release:

http://groups.google.com/group/parasail-programming-language