Friday, July 25, 2014

Progress Update

Bootstrapping

The compiler is now able to compile many ParaSail programs and most of the ParaSail Standard Library. It's probably easier to just list the things we know it can't do yet:

  • functions as parameters
  • functions with locked or queued parameters

Pretty small list, right! Well, to be honest, it's the list of known bugs. There are probably bugs we don't know about.

The most exciting thing about this is that the compiler itself uses neither of these features, so it's able to compile itself! The process is called bootstrapping. The executable it produces is able to compile other ParaSail programs.

Debugging Information

Over the past few weeks, we've spent a lot of time debugging compiled ParaSail programs. We're tired of stepping through assembly, and so, if given a "-g", the compiler will produce some debugging information usable by gdb. Right now it's only ParaSail source file line numbers and function names, but it's a start.

Syntax Highlighting on the Web with Pygments

Pygments is a generic syntax highlighter written in python. We've submitted a pull request to add a ParaSail highlighter and are awaiting a response from the maintainers.

Static Analysis

Next, we're going to spend more time on static analysis, especially checking for nullness of objects and recognizing unused variables.

OpenMP

We've been evaluating different alternatives for synchronization primitives in the ParaSail runtime and we've found that OpenMP performs the best when compared against Ada protected objects and our own library. Thus, we are moving our runtime system to OpenMP for the performance benefits and the fact that it's supported in both gcc and llvm.

Thursday, June 12, 2014

Linkers and Types and Built-ins! Oh, my!

Hello, It's Justin again. See the previous blog post for a bit of context. (Hint: we're building the compiler)

Linking to the Built-ins

The interpreter has a library of functions it uses to evaluate ParaSail code. We don't want to and can't write every ParaSail operation directly in llvm. So, it was necessary to link the generated llvm code with the interpreter's built-in functions. At first we thought the built-ins needed to be translated to llvm code to successfully link with our generated llvm. To accomplish this, we used a tool called AdaMagic to convert the Ada source code (in which the built-ins are currently written) and Ada's run time system (RTS) to C source code then used the llvm C front-end "clang" to compile the rest of the way. Clang complained with hundreds of warnings, but, it worked. We were able to print integers, floats, and characters! We couldn't do much more with the built-ins at the time because we didn't yet have type descriptors working (see below).

After all the effort and dealing with the messy generated C code, we found out that the system linker could link object files compiled with the llvm backend called "llc" together with object files produced by the GNAT Ada compiler with no problem. That was a relief, as we really didn't want to go mucking around in the generated C code more than we had to.

Types

ParaSail types are represented in the interpreter with Type_Descriptors. Basically,a type descriptor is a large record with everything you need to know about a type. Most importantly, it holds a list of operations, a "null" value to be used for the type, and descriptors for any other type it depends on (such as the component type for an array type). When we want to execute compiled ParaSail code that has types in it, we need much of this information at run time, particularly when executing code that is "generic" in some way (such as a Set where we need to get the Hash operation for Some_Hashable_Type, and perhaps its representation for "null").

Here's an example:

func Count_Nulls(Numbers : Basic_Array<optional Univ_Integer>) -> Univ_Integer is
   var Count := 0;
   var I := 1;
   while I <= Length(Numbers) loop
      if Numbers[I] is null then
         Count += 1;
      end if;
      I += 1;
   end loop;
   return Count;
end func Count_Nulls;

func main(Args : Basic_Array<Univ_String>) is
   var A : Basic_Array<optional Univ_Integer> := Create(10, null);
   A[1] := 42;
   A[5] := 0;
   Print("There are ");
   Print(Count_Nulls(A));
   Println(" nulls");
end func main;

There are quite a few places that need type information, but the easiest to show is "Numbers[I] is null". There is a specific 64 bit value that represents null for the Univ_Integer type that needs to be checked against Numbers[I].

To accomplish this, we encode the type descriptors as "linkonce" global byte arrays in llvm. Before the ParaSail main routine is invoked, these byte streams are reconstructed into Type_Descriptors for use at run time. We reconstruct these by appending our own parameterless function to the @llvm.global_ctors global variable. But, we need to call into the Ada RTS while reconstructing type descriptors, and, much to our dismay, the functions in @llvm.global_ctors are called before the Ada RTS has a chance to initialize. To solve this problem, the parameterless function in global_ctors adds a node to a todo (linked) list. Then, after the Ada RTS has been initialized, we walk the "todo" list and reconstruct the type descriptors from the streams.

A very similar method is used to reconstruct the appropriate run-time representations for Univ_Strings and Univ_Enumerations from their stream representations.

Back to the example

First of all: It compiles and runs! Here's what it prints out:

There are 8 nulls

Woohoo!
(that part was me)

You might wonder why we used a 'while' loop instead of a 'for each' loop. Well, that's because for loops use parallel stuff we haven't implemented yet. Even a forward/reverse for loop uses parallel stuff because it allows you to do things like this

for I := 1 while I < N forward loop
   block
      continue loop with I => I * 2;
   ||
      continue loop with I => I + 1;
   end block;
end loop;

While loops can be implemented with simple llvm 'br' instructions.

We can't compile aaa.psi (the ParaSail standard library) yet, so we don't have access to all the modules defined there, but Basic_Array::Create, Basic_Array::"var_indexing", and Basic_Array::"indexing" are all built-ins, so we're able to use that container.

Other Updates

Main

If a ParaSail function has the name "main," with one argument of type Basic_Array<Univ_String>, and no return value, then that is treated as a special case by the llvm code generator, and it's name will be changed to "_parasail_main_routine". A real main function  "int main(int argc, char** argv)" function exists in ParaSail's RTS that builds up the Args array (well actually not yet, but it will!)  and calls this _parasail_main_routine. Why the preceding underscore you ask? Well, it's not legal to start an identifier with an underscore in ParaSail, so, we're insuring no name collisions this way. However, one downside of this is that it is currently not possible to call main from elsewhere in your program. However, that's not unprecedented, C++ disallows it as well. This isn't set in stone, though.

Reflection

   "Reflection" is a ParaSail module that allows ParaSail code to inspect itself and retrieve the ParaSail Virtual Machine (PSVM) instructions for any routine that is in the Interpreter's memory at the same time. This is how we compile: the user loads compiler.psl (the llvm code generator written in ParaSail itself), its dependent module(s), and the ParaSail source file to be compiled. Then, the main function in compiler.psl, called "Compile," searches through all the routines in the interpreter's memory (using the reflection module) until it finds those that come from the file whose name matches the file name given as an argument to Compile.  It then proceeds to translate each of those routines from PSVM instructions into LLVM instructions, as well as build up the various tables need for reconstructing type descriptors, strings, etc.

Static Link

   The Static Link passed implicitly in each ParaSail function call is used by the called function to retrieve data from outside the function.  If the function is nested inside another function, then the static link points to the local data area of that enclosing function.  If the function is a top-level operation of some module, then the static link points to a type descriptor for some instance of that module.  In a call on a stand-alone function, the static link is null.

We're making tremendous headway on this compiler and I'm very excited about where we could reach by the end of this summer.

Tuesday, May 27, 2014

From an Interpreter to a Compiler

My name is Justin Hendrick and I'm an intern working on ParaSail for the summer. My first task is code generation.

Up until now, ParaSail has been interpreted. We’re proud to announce that we’ve begun building the code generation phase, written in ParaSail, that generates LLVM Intermediate Representation code. For information about LLVM see llvm.org and the reference manual.

So far, we can successfully compile simple functions that branch and do arithmetic on Univ_Integers and Univ_Reals.

For example, this ParaSail max function:

func Max(A : Univ_Integer; B : Univ_Integer) -> Univ_Integer is
   if A > B then
      return A;
   else
      return B;
   end if
end func Max;

Is converted to ParaSail Virtual Machine Intermediate Representation. See this post for more details on the PSVM and the calling convention in use.

 ----- ParaSail Virtual Machine Instructions ---- 
Max: Routine # 720
(A : Univ_Integer<>; B : Univ_Integer<>) -> Max : Univ_Integer<>
Uses_Queuing = FALSE
Local_Area_Length = 17
Start_Callee_Locals = 7
Code_Length = 13
(COPY_WORD_OP, Destination => (Local_Area, 5), Source => (Param_Area, 1))
(COPY_WORD_OP, Destination => (Local_Area, 6), Source => (Param_Area, 2))
(CALL_OP, Params => (Local_Area, 4), Static_Link => (Zero_Base, 5) = Univ_Integer<>, Call_Target => (Zero_Base, 59) = "=?")
(STORE_INT_LIT_OP, Destination => (Local_Area, 5), Int_Value => 4)
(CALL_OP, Params => (Local_Area, 3), Static_Link => (Zero_Base, 30) = Ordering<>, Call_Target => (Zero_Base, 19) = "to_bool")
(IF_OP, If_Source => (Local_Area, 3), If_Condition => 2, Skip_If_False => 4)
(COPY_WORD_OP, Destination => (Local_Area, 3), Source => (Param_Area, 1))
(COPY_WORD_OP, Destination => (Param_Area, 0), Source => (Local_Area, 3))
(RETURN_OP)
(SKIP_OP, Skip_Count => 3)
(COPY_WORD_OP, Destination => (Local_Area, 3), Source => (Param_Area, 2))
(COPY_WORD_OP, Destination => (Param_Area, 0), Source => (Local_Area, 3))
(RETURN_OP)

After code generation and a pass through LLVM’s optimizer this becomes

define void @Max(i64* %_Static_Link, i64* %_Param_Area) {
  %_source1 = getelementptr inbounds i64* %_Param_Area, i64 1
  %_source_val1 = load i64* %_source1, align 8
  %_source2 = getelementptr inbounds i64* %_Param_Area, i64 2
  %_source_val2 = load i64* %_source2, align 8
  %_result3 = icmp sgt i64 %_source_val1, %_source_val2
  br i1 %_result3, label %_lbl7, label %_lbl11

_lbl7:                                            ; preds = %0
  store i64 %_source_val1, i64* %_Param_Area, align 8
  ret void

_lbl11:                                           ; preds = %0
  store i64 %_source_val2, i64* %_Param_Area, align 8
  ret void
}

And from there, LLVM can generate machine code for many architectures.

The next steps are to enable calls to built-in functions, handle types other than Univ_Integer and Univ_Real, and implement Parallel Calls.

We plan to make more frequent releases this summer than in the past.

Stay tuned for more updates on the state of the compiler.

Saturday, April 5, 2014

A simple and useful iterator pattern in ParaSail

ParaSail has three basic iterator formats, with some sub-formats (unbolded "[ ... ]" indicate optional parts, unbolded "|" indicate alternatives, bold indicates reserved words or symbols that are part of the syntax itself):

    1. Set iterator:
    for I in Set [forward | reverse | concurrent] loop ...

    2. Element iterator:
    for each E of Container [forward | reverse | concurrent] loop ...
         or
    for each [K => V] of Container 
                         [forward | reverse | concurrent] loop ...

    3. Value iterator:
    for V := Initial [then Next_Value(V)] [while Predicate(V)] loop ...
         or
    for T => Root [then T.Left | T.Right] [while T not null] loop ...

These iterators can be combined into a single loop by parenthesizing them;  the loop quits as soon as any of the iterators completes:

   for (each E of Container; I in 1..100) forward loop ...
   //  Display up to 100 elements of Container

As we have written more ParaSail, one pattern that has come up multiple times (particularly when doing output) is the case where you have a special value to be used for the first iteration, and then the same value thereafter.  For example, presume we want to display each of the values of a Vector, separated by commas, enclosed in brackets.  A common way of doing this is to have an "Is_First" boolean value which is tested, and then set to #false, to decide whether to display the opening "[" or the separating ", ".  By using a combination of two iterators, this becomes simpler, with no extra conditionals:

    for (each V of Vec; Sep := "[" then ", ") forward loop
       Print(Sep | V);
    end loop
    Print("]")

Note that the value iterator has no "while" part, so it will continue forever to have the loop variable "Sep" bound to the value ", "; when combined with another iterator as it is in this case, the loop will end once the container iterator ends.

Nothing too magical here, but it is sometimes interesting to see a pattern like this that keeps cropping up.  The ParaSail reference manual has more details on these iterators.  The most recent manual may be found at http://parasail-lang.org .

Thursday, February 20, 2014

ParaSail Work Stealing speeds up

We are now releasing revision 5.2 of the ParaSail compiler and interpreter, which incorporates a new implementation of work stealing.  Links to both binaries and sources can be found at:

    http://parasail-lang.org

in the Documentation and Download section of the web page.

Here is the new entry in the release notes for revision 5.2:

* [rev 5.2] Re-implementation of work stealing to reduce contention between processors.  Each server now has a private double-ended queue (deque) of pico-threads, along with the shared triplet of deques which has existed before.  This produced another two times speedup (in addition to the rev 4.9 improvements), thereby speeding up execution by four times or more since rev 4.8. The main procedures for each language (ParaSail, Sparkel, etc.) have been refactored to share more common code. Allow a command-line flag "-servers nnn" to specify the number of heavy-weight server threads to use.  Default is 6. Command can also be given interactively, as "servers nnn". Interactively it only works to increase the number; there is no way to shut down servers that already have been activated.

Wednesday, February 19, 2014

Worklist Algorithms and Parallel Iterators

ParaSail has some very general parallel iterator constructs.  One of the more unusual involves the use of a continue statement to initiate a new iteration of an outer loop statement, as illustrated in this breadth-first search of a graph:

        var Seen : Array<Atomic<Boolean>, Indexed_By => Node_Id> :=
          [for I in 1 .. |DGraph.G| => Create(#false)];

       *Outer*
        for Next_Set => Root_Set loop  // Start with the initial set of roots
            for N in Next_Set concurrent loop  // Look at each node in set
                if not Value(Seen[N]) then
                    Set_Value(Seen[N], #true);
                    if Is_Desired_Node(DGraph.G[N]) then // Found a node
                        return N;
                    end if;
                    // Continue outer loop with successors of this node
                    continue loop Outer with Next_Set => DGraph.G[N].Succs;
                end if;
            end loop;
        end loop Outer;
        // No node found
        return null;

It has been a bit hard to explain what it means for an inner concurrent loop to continue an outer loop.   More recently, we have developed an alternative explanation.  The basic idea is to appeal to the notion of a worklist algorithm.  Various (sequential) algorithms are organized around the use of a list of work items to be processed, with the overall algorithm continuing until the worklist is empty, or, for a search, until an item of interest is found.  For example, the work-list approach to data flow analysis algorithms is described here:
Here is another worklist-based algorithm, for solving a constraint satisfaction problem using the Arc Consistency approach:
Finally, here is a breadth-first graph search algorithm, which uses a work queue instead of a work list:
So one way of explaining this kind of ParaSail iterator is as a built-in work-list-based iterator, where the loop keeps going until there are no more work items to process, and continue adds a new work-item to the worklist.  This approach makes it clear that such iterators can work even if there is only one physical processor available, and also suggests that the work items are not full threads of control, but rather light-weight representations of some work to be done.  This terminology of work items and work list or work queue also matches well with the terminology of the underlying work stealing scheduling mechanism in use.

We used to refer to such ParaSail's parallel iterators as a bag of threads, but now it seems easier to describe each iterator as having its own list of work items (i.e. a worklist) and the loop keeps going until the worklist is empty, or until we encounter a return or exit (aka break) statement.

One interesting detail at this point is that for a breadth-first search, you want the work list/queue to be processed in a FIFO manner, while for a depth-first search, you want to process the work list/queue as a stack, in a LIFO manner.  The work-stealing scheduler we use does some of both, using FIFO when stealing, and LIFO when the work-items are served by the same core that generated them.  This argues for perhaps giving more control to the programmer when using an explicit continue statement, to be able to specify FIFO or LIFO.  However, it is not entirely clear if that added complexity is worthwhile.

Monday, December 23, 2013

Concurrency vs. Parallelism

Over the past few years there seems to have been an increasing number of discussions of the difference between concurrency and parallelism.  These discussions didn't seem very convincing at first, but over time a useful distinction did begin to emerge.  So here is another attempt at trying to distinguish these two:
  • Concurrent programming constructs allow a programmer to simplify their program by using multiple (heavy-weight) threads to reflect the natural concurrency in the problem domain.  Because restructuring the program in this way can provide significant benefits in understandability, relatively heavy weight constructs are acceptable.  Performance is also not necessarily as critical, so long as it is predictable.
  • Parallel programming constructs allow a programmer to divide and conquer a problem, using multiple (light-weight) threads to work in parallel on independent parts of the problem.  Such restructuring does not necessarily simplify the program, and as such parallel programming constructs need to be relatively light weight, both syntactically and at run-time.  If the constructs introduce too much overhead, they defeat the purpose.
Given sufficiently flexible parallel programming constructs, it is not clear that you also need traditional concurrent programming constructs.  But it may be useful to provide some higher-level notion of process, which forgoes the single-threaded presumption of a thread, but brings some of the benefits from explicitly representing the natural concurrency of the problem domain within the programming language.

ParaSail is focused on providing parallel programming constructs, but concurrent objects are useful for more traditional concurrent programming approaches, particularly when coupled with explicit use of the "||" operator.  It is an interesting question whether some higher-level process-like construct might be useful.