Saturday, April 5, 2014

A simple and useful iterator pattern in ParaSail

ParaSail has three basic iterator formats, with some sub-formats (unbolded "[ ... ]" indicate optional parts, unbolded "|" indicate alternatives, bold indicates reserved words or symbols that are part of the syntax itself):

    1. Set iterator:
    for I in Set [forward | reverse | concurrent] loop ...

    2. Element iterator:
    for each E of Container [forward | reverse | concurrent] loop ...
    for each [K => V] of Container 
                         [forward | reverse | concurrent] loop ...

    3. Value iterator:
    for V := Initial [then Next_Value(V)] [while Predicate(V)] loop ...
    for T => Root [then T.Left | T.Right] [while T not null] loop ...

These iterators can be combined into a single loop by parenthesizing them;  the loop quits as soon as any of the iterators completes:

   for (each E of Container; I in 1..100) forward loop ...
   //  Display up to 100 elements of Container

As we have written more ParaSail, one pattern that has come up multiple times (particularly when doing output) is the case where you have a special value to be used for the first iteration, and then the same value thereafter.  For example, presume we want to display each of the values of a Vector, separated by commas, enclosed in brackets.  A common way of doing this is to have an "Is_First" boolean value which is tested, and then set to #false, to decide whether to display the opening "[" or the separating ", ".  By using a combination of two iterators, this becomes simpler, with no extra conditionals:

    for (each V of Vec; Sep := "[" then ", ") forward loop
       Print(Sep | V);
    end loop

Note that the value iterator has no "while" part, so it will continue forever to have the loop variable "Sep" bound to the value ", "; when combined with another iterator as it is in this case, the loop will end once the container iterator ends.

Nothing too magical here, but it is sometimes interesting to see a pattern like this that keeps cropping up.  The ParaSail reference manual has more details on these iterators.  The most recent manual may be found at .

Thursday, February 20, 2014

ParaSail Work Stealing speeds up

We are now releasing revision 5.2 of the ParaSail compiler and interpreter, which incorporates a new implementation of work stealing.  Links to both binaries and sources can be found at:

in the Documentation and Download section of the web page.

Here is the new entry in the release notes for revision 5.2:

* [rev 5.2] Re-implementation of work stealing to reduce contention between processors.  Each server now has a private double-ended queue (deque) of pico-threads, along with the shared triplet of deques which has existed before.  This produced another two times speedup (in addition to the rev 4.9 improvements), thereby speeding up execution by four times or more since rev 4.8. The main procedures for each language (ParaSail, Sparkel, etc.) have been refactored to share more common code. Allow a command-line flag "-servers nnn" to specify the number of heavy-weight server threads to use.  Default is 6. Command can also be given interactively, as "servers nnn". Interactively it only works to increase the number; there is no way to shut down servers that already have been activated.

Wednesday, February 19, 2014

Worklist Algorithms and Parallel Iterators

ParaSail has some very general parallel iterator constructs.  One of the more unusual involves the use of a continue statement to initiate a new iteration of an outer loop statement, as illustrated in this breadth-first search of a graph:

        var Seen : Array<Atomic<Boolean>, Indexed_By => Node_Id> :=
          [for I in 1 .. |DGraph.G| => Create(#false)];

        for Next_Set => Root_Set loop  // Start with the initial set of roots
            for N in Next_Set concurrent loop  // Look at each node in set
                if not Value(Seen[N]) then
                    Set_Value(Seen[N], #true);
                    if Is_Desired_Node(DGraph.G[N]) then // Found a node
                        return N;
                    end if;
                    // Continue outer loop with successors of this node
                    continue loop Outer with Next_Set => DGraph.G[N].Succs;
                end if;
            end loop;
        end loop Outer;
        // No node found
        return null;

It has been a bit hard to explain what it means for an inner concurrent loop to continue an outer loop.   More recently, we have developed an alternative explanation.  The basic idea is to appeal to the notion of a worklist algorithm.  Various (sequential) algorithms are organized around the use of a list of work items to be processed, with the overall algorithm continuing until the worklist is empty, or, for a search, until an item of interest is found.  For example, the work-list approach to data flow analysis algorithms is described here:
Here is another worklist-based algorithm, for solving a constraint satisfaction problem using the Arc Consistency approach:
Finally, here is a breadth-first graph search algorithm, which uses a work queue instead of a work list:
So one way of explaining this kind of ParaSail iterator is as a built-in work-list-based iterator, where the loop keeps going until there are no more work items to process, and continue adds a new work-item to the worklist.  This approach makes it clear that such iterators can work even if there is only one physical processor available, and also suggests that the work items are not full threads of control, but rather light-weight representations of some work to be done.  This terminology of work items and work list or work queue also matches well with the terminology of the underlying work stealing scheduling mechanism in use.

We used to refer to such ParaSail's parallel iterators as a bag of threads, but now it seems easier to describe each iterator as having its own list of work items (i.e. a worklist) and the loop keeps going until the worklist is empty, or until we encounter a return or exit (aka break) statement.

One interesting detail at this point is that for a breadth-first search, you want the work list/queue to be processed in a FIFO manner, while for a depth-first search, you want to process the work list/queue as a stack, in a LIFO manner.  The work-stealing scheduler we use does some of both, using FIFO when stealing, and LIFO when the work-items are served by the same core that generated them.  This argues for perhaps giving more control to the programmer when using an explicit continue statement, to be able to specify FIFO or LIFO.  However, it is not entirely clear if that added complexity is worthwhile.

Monday, December 23, 2013

Concurrency vs. Parallelism

Over the past few years there seems to have been an increasing number of discussions of the difference between concurrency and parallelism.  These discussions didn't seem very convincing at first, but over time a useful distinction did begin to emerge.  So here is another attempt at trying to distinguish these two:
  • Concurrent programming constructs allow a programmer to simplify their program by using multiple (heavy-weight) threads to reflect the natural concurrency in the problem domain.  Because restructuring the program in this way can provide significant benefits in understandability, relatively heavy weight constructs are acceptable.  Performance is also not necessarily as critical, so long as it is predictable.
  • Parallel programming constructs allow a programmer to divide and conquer a problem, using multiple (light-weight) threads to work in parallel on independent parts of the problem.  Such restructuring does not necessarily simplify the program, and as such parallel programming constructs need to be relatively light weight, both syntactically and at run-time.  If the constructs introduce too much overhead, they defeat the purpose.
Given sufficiently flexible parallel programming constructs, it is not clear that you also need traditional concurrent programming constructs.  But it may be useful to provide some higher-level notion of process, which forgoes the single-threaded presumption of a thread, but brings some of the benefits from explicitly representing the natural concurrency of the problem domain within the programming language.

ParaSail is focused on providing parallel programming constructs, but concurrent objects are useful for more traditional concurrent programming approaches, particularly when coupled with explicit use of the "||" operator.  It is an interesting question whether some higher-level process-like construct might be useful.

Thursday, November 21, 2013

First release of Javallel, Parython; new release 5.1 of ParaSail and Sparkel

The ParaSail family of languages is growing, with two more additions now available for experimentation.  We have made a new release 5.1 which includes all four members of the family -- ParaSail itself, Sparkel based on the SPARK subset of Ada, Javallel based on Java, and Parython based on Python.  Binaries plus examples for these are all available in a single (large) download:

As before, if you are interested in sources, visit:

The biggest change in ParaSail was a rewrite of the region-based storage manager (actually, this same storage manager is used for all four languages), to dramatically reduce the contention between cores/processors related to storage management.  The old implementation was slower, and nevertheless still had a number of race conditions.  This one is faster and (knock on wood) free of (at least those ;-) race conditions.

As far as how Javallel relates to Java, here are some of the key differences:
  1. Classes require a "class interface" to declare their visible operations and fields
  2. There is no notion of a "this" parameter -- all parameters must be declared
  3. There is no notion of "static" -- a method is effectively static if it doesn't have any parameters whose type matches the enclosing class; no variables are static
  4. You only say "public" once, in the class, separating the private stuff (before the word "public") from the implementation of the visible methods.
  5. Semi-colons are optional at the end of a line
  6. Parentheses are optional around the condition of an "if"
  7. "for" statements use a different syntax; e.g:
    •  for I in 1..10 [forward | reverse | parallel] { ... }
  8. "{" and "}" are mandatory for all control structures
  9. You can give a name to the result of a method via:  
    • Vector createVec(...) as Result { ... Result = blah; ... } 
    and then use that name (Result) inside as a variable whose final value is returned
  10. You have to say "T+" rather than simply "T" if you want to accept actual parameter that are of any subclass of T (aka polymorphic).  "T" by itself only allows actuals of exactly class T.
  11. Object declarations must start with "var," "final," or "ref" corresponding to variable objects, final objects, or ref objects (short-lived renames).
  12. There are no special constructors; any method that returns a value of the enclosing is effectively a constructor;  objects may be created inside a method using a tuple-like syntax "(a => 2, b => 3)" whose type is determined from context
  13. X.Foo(Y) is equivalent to Foo(X, Y)
  14. Top-level methods are permitted, to simplify creating a "main" method
  15. uses "and then" and "or else" instead of "&&" and "||"; uses "||" to introduce explicit parallelism.
  16. "synchronized" applies to classes, not to methods or blocks
  17. enumeration literals start with "#"
There are examples in javallel_examples/*.jl?, which should give a better idea of what javallel is really like.  Parython examples are in parython_examples/*.pr?

Thursday, November 14, 2013

Using ParaSail as a Modeling Language for Distributed Systems

The ACM HILT 2013 conference just completed in Pittsburgh, and we had some great tutorials, keynotes, and sessions on model-based engineering, as well as on formal methods applied to both modeling and programming languages.  One of the biggest challenges identified was integrating complex systems with components defined in various modeling or domain-specific languages, along with an overall architecture, which might be specified in a language like AADL or SysML or might just be sketches on a whiteboard somewhere.  A big part of the challenge is that different languages have different underlying semantic models, with different type systems, different notions of time, different concurrency and synchronization models (if any),  etc.  The programming language designer in me wants to somehow bring these various threads (so to speak) together in a well-defined semantic framework, ideally founded on a common underlying language.

One way to start is by asking how can you "grow" a programming language into a modeling language (without killing it ;-)?  ParaSail has some nice features that might fit well at the modeling level, in that its pointer-free, implicitly parallel control and data semantics are already at a relatively high level, and don't depend on a single-address-space view, nor a central-processor-oriented view.  As an aside, Sebastian Burckhardt from Microsoft Research gave a nice talk on "cloud sessions" at the recent SPLASH/OOPSLA conference in Indianapolis (,, and we chatted afterward about what a perfect fit the ParaSail pointer-free type model was to the Cloud Sessions indefinitely-persistent data model. Modeling often abstracts away some of the details of the distribution and persistence of processing and data, so the friendliness of the ParaSail model to Cloud Sessions might also bode well for its friendliness to modeling of other kinds of long-lived distributed systems.

ParaSail's basic model is quite simple, involving parameterized modules, with separate definition of interface and implementation, types as instantiations of modules, objects as instances of types, and operations defined as part of defining modules, operating on objects.  Polymorphism is possible in that an object may be explicitly identified as having a polymorphic type (denoted by T+ rather than simply T) and then the object carries a run-time type identifier, and the object can hold a value of any type that implements the interface defined by the type T, including T itself (if T is not an abstract type), as well as types that provide in their interface all the same operations defined in T's interface.

So how does this model relate to a modeling language like Simulink or a Statemate?  Is a Simulink "block" a module, a type, an object, or an operation (or something entirely different)?  What about a box on a state-machine chart?  For Simulink, one straightforward answer is that a Simulink block is a ParaSail object.  The associated type of the block object defines a set of operations or parameter values that determine how it is displayed, how it is simulated, how it is code-generated, how it is imported/exported using some XML-like representation, etc.  A Simulink graph would be an object as well, being an instance of a directed graph type, with a polymorphic type, say "Simulink_Block+," being the type of the elements in the graph (e.g. DGraph).

Clearly it would be useful to define new block types using the Simulink-like modeling language itself, rather than having to "drop down" to the underlying programming language.  One could imagine a predefined block type "User_Defined_Block" used to represent such blocks, where the various display/simulation/code-generation/import/export operations would be defined in a sub-language that is itself graphical, but relies on some additional (built-in) block types specifically designed for defining such lower-level operations.  Performing code-generation on these graphs defining the various primitive operations of this new user-defined block type would ideally create code in the underlying programming language (e.g. ParaSail) that mimics closely what a (ParaSail) programmer might have written to define a new block type directly in the (ParaSail) programming language.  This begins to become somewhat of a "meta" programming language, which always makes my head spin a little...

A practical issue at the programming language level when you go this direction is that, what was a simple interface/implementation module model, may want to support "sub-modules" in various dimensions.  In particular, there may be sets of operations associated with a given type devoted to relatively distinct problems, such as display vs. code generation, and it might be useful to allow both the interface, and certainly the implementation of a block-type-defining module to be broken up into sub-modules.  The ParaSail design includes this notion, which we have called "interface extensions" (which is a bit ambiguous, so the term "interface add-on" might be clearer).  These were described in:
but have not as of yet been implemented.  Clearly interface add-ons, for say, [#display] or [#code_gen], could help separate out the parts of the definition of a given block type.

A second dimension for creating sub-modules would be alternative implementations of the same interface, with automatic selection of the particular implementation based on properties of the parameters specified when the module is instantiated.  In particular, each implementation might have its own instantiation "preconditions" which indicate what must be true about the actual module parameters provided before a given implementation is chosen.  In addition, there needs to be some sort of a preference rule to use when more than one implementations' preconditions are satisfied by a given instantiation.  For example, presume we have one implementation of an Integer interface that handles 32-bit ranges of integers, a second that handles 64-bit ranges, and one that handles infinite range.  Clearly the 32-bit implementation would have a precondition that the range required be within +/- 2^31, the 64-bit one would require the range be within +/- 2^63, and the infinite-range-handling implementation would have no precondition.  If we were to instantiate this Integer module with a 25-bit range, the preconditions of all three of the implementations would be satisfied, but there would presumably be a preference to use the 32-bit implementation over the other two.  The approach we have considered for this is to allow a numeric "preference" level to be specified when providing an implementation of a module along with the implementation "preconditions," with the default level being "0" and the default precondition being "#true." The compiler would choose the implementation with the maximum preference level with satisfied preconditions.  It would complain if there were a tie, requiring the user to specify explicitly which implementation of the module is to be chosen at the point of instantiation.

Wednesday, November 6, 2013

Tech talk and Tutorial at SPLASH 2013 on parallel programming

I gave an 80-minute "tech talk" and a 3-hour tutorial on parallel programming last week at SPLASH 2013 in Indianapolis.  The audiences were modest but enthusiastic. 

The tech talk was entitled:

  Living without Pointers: Bringing Value Semantics to Object-Oriented Parallel Programming

Here is the summary:

The heavy use of pointers in modern object-oriented programming languages interferes with the ability to adapt computations to the new distributed and/or multicore architectures. This tech talk will introduce an alternative pointer-free approach to object-oriented programming, which dramatically simplifies adapting computations to the new hardware architectures. This tech talk will illustrate the pointer-free approach by demonstrating the transformation of two popular object-oriented languages, one compiled, and one scripting, into pointer-free languages, gaining easier adaptability to distributed and/or multicore architectures, while retaining power and ease of use.

Here is a link to the presentation:

The tutorial was entitled:

   Multicore Programming using Divide-and-Conquer and Work Stealing

Here is the summary for the tutorial:

This tutorial will introduce attendees to the various paradigms for creating algorithms that take advantage of the growing number of multicore processors, while avoiding the overhead of excessive synchronization overhead. Many of these approaches build upon the basic divide-and-conquer approach, while others might be said to use a multiply-and-conquer paradigm. Attendees will also learn the theory and practice of "work stealing," a multicore scheduling approach which is being adopted in numerous multicore languages and frameworks, and how classic work-list algorithms can be restructured to take best advantage of the load balancing inherent in work stealing. Finally, the tutorial will investigate some of the tradeoffs associated with different multicore storage management approaches, including task-local garbage collection and region-based storage management.
Here is a link to the presentation:

Comments welcome!