Thursday, June 12, 2014

Linkers and Types and Built-ins! Oh, my!

Hello, It's Justin again. See the previous blog post for a bit of context. (Hint: we're building the compiler)

Linking to the Built-ins

The interpreter has a library of functions it uses to evaluate ParaSail code. We don't want to and can't write every ParaSail operation directly in llvm. So, it was necessary to link the generated llvm code with the interpreter's built-in functions. At first we thought the built-ins needed to be translated to llvm code to successfully link with our generated llvm. To accomplish this, we used a tool called AdaMagic to convert the Ada source code (in which the built-ins are currently written) and Ada's run time system (RTS) to C source code then used the llvm C front-end "clang" to compile the rest of the way. Clang complained with hundreds of warnings, but, it worked. We were able to print integers, floats, and characters! We couldn't do much more with the built-ins at the time because we didn't yet have type descriptors working (see below).

After all the effort and dealing with the messy generated C code, we found out that the system linker could link object files compiled with the llvm backend called "llc" together with object files produced by the GNAT Ada compiler with no problem. That was a relief, as we really didn't want to go mucking around in the generated C code more than we had to.

Types

ParaSail types are represented in the interpreter with Type_Descriptors. Basically,a type descriptor is a large record with everything you need to know about a type. Most importantly, it holds a list of operations, a "null" value to be used for the type, and descriptors for any other type it depends on (such as the component type for an array type). When we want to execute compiled ParaSail code that has types in it, we need much of this information at run time, particularly when executing code that is "generic" in some way (such as a Set where we need to get the Hash operation for Some_Hashable_Type, and perhaps its representation for "null").

Here's an example:

func Count_Nulls(Numbers : Basic_Array<optional Univ_Integer>) -> Univ_Integer is
   var Count := 0;
   var I := 1;
   while I <= Length(Numbers) loop
      if Numbers[I] is null then
         Count += 1;
      end if;
      I += 1;
   end loop;
   return Count;
end func Count_Nulls;

func main(Args : Basic_Array<Univ_String>) is
   var A : Basic_Array<optional Univ_Integer> := Create(10, null);
   A[1] := 42;
   A[5] := 0;
   Print("There are ");
   Print(Count_Nulls(A));
   Println(" nulls");
end func main;

There are quite a few places that need type information, but the easiest to show is "Numbers[I] is null". There is a specific 64 bit value that represents null for the Univ_Integer type that needs to be checked against Numbers[I].

To accomplish this, we encode the type descriptors as "linkonce" global byte arrays in llvm. Before the ParaSail main routine is invoked, these byte streams are reconstructed into Type_Descriptors for use at run time. We reconstruct these by appending our own parameterless function to the @llvm.global_ctors global variable. But, we need to call into the Ada RTS while reconstructing type descriptors, and, much to our dismay, the functions in @llvm.global_ctors are called before the Ada RTS has a chance to initialize. To solve this problem, the parameterless function in global_ctors adds a node to a todo (linked) list. Then, after the Ada RTS has been initialized, we walk the "todo" list and reconstruct the type descriptors from the streams.

A very similar method is used to reconstruct the appropriate run-time representations for Univ_Strings and Univ_Enumerations from their stream representations.

Back to the example

First of all: It compiles and runs! Here's what it prints out:

There are 8 nulls

Woohoo!
(that part was me)

You might wonder why we used a 'while' loop instead of a 'for each' loop. Well, that's because for loops use parallel stuff we haven't implemented yet. Even a forward/reverse for loop uses parallel stuff because it allows you to do things like this

for I := 1 while I < N forward loop
   block
      continue loop with I => I * 2;
   ||
      continue loop with I => I + 1;
   end block;
end loop;

While loops can be implemented with simple llvm 'br' instructions.

We can't compile aaa.psi (the ParaSail standard library) yet, so we don't have access to all the modules defined there, but Basic_Array::Create, Basic_Array::"var_indexing", and Basic_Array::"indexing" are all built-ins, so we're able to use that container.

Other Updates

Main

If a ParaSail function has the name "main," with one argument of type Basic_Array<Univ_String>, and no return value, then that is treated as a special case by the llvm code generator, and it's name will be changed to "_parasail_main_routine". A real main function  "int main(int argc, char** argv)" function exists in ParaSail's RTS that builds up the Args array (well actually not yet, but it will!)  and calls this _parasail_main_routine. Why the preceding underscore you ask? Well, it's not legal to start an identifier with an underscore in ParaSail, so, we're insuring no name collisions this way. However, one downside of this is that it is currently not possible to call main from elsewhere in your program. However, that's not unprecedented, C++ disallows it as well. This isn't set in stone, though.

Reflection

   "Reflection" is a ParaSail module that allows ParaSail code to inspect itself and retrieve the ParaSail Virtual Machine (PSVM) instructions for any routine that is in the Interpreter's memory at the same time. This is how we compile: the user loads compiler.psl (the llvm code generator written in ParaSail itself), its dependent module(s), and the ParaSail source file to be compiled. Then, the main function in compiler.psl, called "Compile," searches through all the routines in the interpreter's memory (using the reflection module) until it finds those that come from the file whose name matches the file name given as an argument to Compile.  It then proceeds to translate each of those routines from PSVM instructions into LLVM instructions, as well as build up the various tables need for reconstructing type descriptors, strings, etc.

Static Link

   The Static Link passed implicitly in each ParaSail function call is used by the called function to retrieve data from outside the function.  If the function is nested inside another function, then the static link points to the local data area of that enclosing function.  If the function is a top-level operation of some module, then the static link points to a type descriptor for some instance of that module.  In a call on a stand-alone function, the static link is null.

We're making tremendous headway on this compiler and I'm very excited about where we could reach by the end of this summer.

Tuesday, May 27, 2014

From an Interpreter to a Compiler

My name is Justin Hendrick and I'm an intern working on ParaSail for the summer. My first task is code generation.

Up until now, ParaSail has been interpreted. We’re proud to announce that we’ve begun building the code generation phase, written in ParaSail, that generates LLVM Intermediate Representation code. For information about LLVM see llvm.org and the reference manual.

So far, we can successfully compile simple functions that branch and do arithmetic on Univ_Integers and Univ_Reals.

For example, this ParaSail max function:

func Max(A : Univ_Integer; B : Univ_Integer) -> Univ_Integer is
   if A > B then
      return A;
   else
      return B;
   end if
end func Max;

Is converted to ParaSail Virtual Machine Intermediate Representation. See this post for more details on the PSVM and the calling convention in use.

 ----- ParaSail Virtual Machine Instructions ---- 
Max: Routine # 720
(A : Univ_Integer<>; B : Univ_Integer<>) -> Max : Univ_Integer<>
Uses_Queuing = FALSE
Local_Area_Length = 17
Start_Callee_Locals = 7
Code_Length = 13
(COPY_WORD_OP, Destination => (Local_Area, 5), Source => (Param_Area, 1))
(COPY_WORD_OP, Destination => (Local_Area, 6), Source => (Param_Area, 2))
(CALL_OP, Params => (Local_Area, 4), Static_Link => (Zero_Base, 5) = Univ_Integer<>, Call_Target => (Zero_Base, 59) = "=?")
(STORE_INT_LIT_OP, Destination => (Local_Area, 5), Int_Value => 4)
(CALL_OP, Params => (Local_Area, 3), Static_Link => (Zero_Base, 30) = Ordering<>, Call_Target => (Zero_Base, 19) = "to_bool")
(IF_OP, If_Source => (Local_Area, 3), If_Condition => 2, Skip_If_False => 4)
(COPY_WORD_OP, Destination => (Local_Area, 3), Source => (Param_Area, 1))
(COPY_WORD_OP, Destination => (Param_Area, 0), Source => (Local_Area, 3))
(RETURN_OP)
(SKIP_OP, Skip_Count => 3)
(COPY_WORD_OP, Destination => (Local_Area, 3), Source => (Param_Area, 2))
(COPY_WORD_OP, Destination => (Param_Area, 0), Source => (Local_Area, 3))
(RETURN_OP)

After code generation and a pass through LLVM’s optimizer this becomes

define void @Max(i64* %_Static_Link, i64* %_Param_Area) {
  %_source1 = getelementptr inbounds i64* %_Param_Area, i64 1
  %_source_val1 = load i64* %_source1, align 8
  %_source2 = getelementptr inbounds i64* %_Param_Area, i64 2
  %_source_val2 = load i64* %_source2, align 8
  %_result3 = icmp sgt i64 %_source_val1, %_source_val2
  br i1 %_result3, label %_lbl7, label %_lbl11

_lbl7:                                            ; preds = %0
  store i64 %_source_val1, i64* %_Param_Area, align 8
  ret void

_lbl11:                                           ; preds = %0
  store i64 %_source_val2, i64* %_Param_Area, align 8
  ret void
}

And from there, LLVM can generate machine code for many architectures.

The next steps are to enable calls to built-in functions, handle types other than Univ_Integer and Univ_Real, and implement Parallel Calls.

We plan to make more frequent releases this summer than in the past.

Stay tuned for more updates on the state of the compiler.

Saturday, April 5, 2014

A simple and useful iterator pattern in ParaSail

ParaSail has three basic iterator formats, with some sub-formats (unbolded "[ ... ]" indicate optional parts, unbolded "|" indicate alternatives, bold indicates reserved words or symbols that are part of the syntax itself):

    1. Set iterator:
    for I in Set [forward | reverse | concurrent] loop ...

    2. Element iterator:
    for each E of Container [forward | reverse | concurrent] loop ...
         or
    for each [K => V] of Container 
                         [forward | reverse | concurrent] loop ...

    3. Value iterator:
    for V := Initial [then Next_Value(V)] [while Predicate(V)] loop ...
         or
    for T => Root [then T.Left | T.Right] [while T not null] loop ...

These iterators can be combined into a single loop by parenthesizing them;  the loop quits as soon as any of the iterators completes:

   for (each E of Container; I in 1..100) forward loop ...
   //  Display up to 100 elements of Container

As we have written more ParaSail, one pattern that has come up multiple times (particularly when doing output) is the case where you have a special value to be used for the first iteration, and then the same value thereafter.  For example, presume we want to display each of the values of a Vector, separated by commas, enclosed in brackets.  A common way of doing this is to have an "Is_First" boolean value which is tested, and then set to #false, to decide whether to display the opening "[" or the separating ", ".  By using a combination of two iterators, this becomes simpler, with no extra conditionals:

    for (each V of Vec; Sep := "[" then ", ") forward loop
       Print(Sep | V);
    end loop
    Print("]")

Note that the value iterator has no "while" part, so it will continue forever to have the loop variable "Sep" bound to the value ", "; when combined with another iterator as it is in this case, the loop will end once the container iterator ends.

Nothing too magical here, but it is sometimes interesting to see a pattern like this that keeps cropping up.  The ParaSail reference manual has more details on these iterators.  The most recent manual may be found at http://parasail-lang.org .

Thursday, February 20, 2014

ParaSail Work Stealing speeds up

We are now releasing revision 5.2 of the ParaSail compiler and interpreter, which incorporates a new implementation of work stealing.  Links to both binaries and sources can be found at:

    http://parasail-lang.org

in the Documentation and Download section of the web page.

Here is the new entry in the release notes for revision 5.2:

* [rev 5.2] Re-implementation of work stealing to reduce contention between processors.  Each server now has a private double-ended queue (deque) of pico-threads, along with the shared triplet of deques which has existed before.  This produced another two times speedup (in addition to the rev 4.9 improvements), thereby speeding up execution by four times or more since rev 4.8. The main procedures for each language (ParaSail, Sparkel, etc.) have been refactored to share more common code. Allow a command-line flag "-servers nnn" to specify the number of heavy-weight server threads to use.  Default is 6. Command can also be given interactively, as "servers nnn". Interactively it only works to increase the number; there is no way to shut down servers that already have been activated.

Wednesday, February 19, 2014

Worklist Algorithms and Parallel Iterators

ParaSail has some very general parallel iterator constructs.  One of the more unusual involves the use of a continue statement to initiate a new iteration of an outer loop statement, as illustrated in this breadth-first search of a graph:

        var Seen : Array<Atomic<Boolean>, Indexed_By => Node_Id> :=
          [for I in 1 .. |DGraph.G| => Create(#false)];

       *Outer*
        for Next_Set => Root_Set loop  // Start with the initial set of roots
            for N in Next_Set concurrent loop  // Look at each node in set
                if not Value(Seen[N]) then
                    Set_Value(Seen[N], #true);
                    if Is_Desired_Node(DGraph.G[N]) then // Found a node
                        return N;
                    end if;
                    // Continue outer loop with successors of this node
                    continue loop Outer with Next_Set => DGraph.G[N].Succs;
                end if;
            end loop;
        end loop Outer;
        // No node found
        return null;

It has been a bit hard to explain what it means for an inner concurrent loop to continue an outer loop.   More recently, we have developed an alternative explanation.  The basic idea is to appeal to the notion of a worklist algorithm.  Various (sequential) algorithms are organized around the use of a list of work items to be processed, with the overall algorithm continuing until the worklist is empty, or, for a search, until an item of interest is found.  For example, the work-list approach to data flow analysis algorithms is described here:
Here is another worklist-based algorithm, for solving a constraint satisfaction problem using the Arc Consistency approach:
Finally, here is a breadth-first graph search algorithm, which uses a work queue instead of a work list:
So one way of explaining this kind of ParaSail iterator is as a built-in work-list-based iterator, where the loop keeps going until there are no more work items to process, and continue adds a new work-item to the worklist.  This approach makes it clear that such iterators can work even if there is only one physical processor available, and also suggests that the work items are not full threads of control, but rather light-weight representations of some work to be done.  This terminology of work items and work list or work queue also matches well with the terminology of the underlying work stealing scheduling mechanism in use.

We used to refer to such ParaSail's parallel iterators as a bag of threads, but now it seems easier to describe each iterator as having its own list of work items (i.e. a worklist) and the loop keeps going until the worklist is empty, or until we encounter a return or exit (aka break) statement.

One interesting detail at this point is that for a breadth-first search, you want the work list/queue to be processed in a FIFO manner, while for a depth-first search, you want to process the work list/queue as a stack, in a LIFO manner.  The work-stealing scheduler we use does some of both, using FIFO when stealing, and LIFO when the work-items are served by the same core that generated them.  This argues for perhaps giving more control to the programmer when using an explicit continue statement, to be able to specify FIFO or LIFO.  However, it is not entirely clear if that added complexity is worthwhile.

Monday, December 23, 2013

Concurrency vs. Parallelism

Over the past few years there seems to have been an increasing number of discussions of the difference between concurrency and parallelism.  These discussions didn't seem very convincing at first, but over time a useful distinction did begin to emerge.  So here is another attempt at trying to distinguish these two:
  • Concurrent programming constructs allow a programmer to simplify their program by using multiple (heavy-weight) threads to reflect the natural concurrency in the problem domain.  Because restructuring the program in this way can provide significant benefits in understandability, relatively heavy weight constructs are acceptable.  Performance is also not necessarily as critical, so long as it is predictable.
  • Parallel programming constructs allow a programmer to divide and conquer a problem, using multiple (light-weight) threads to work in parallel on independent parts of the problem.  Such restructuring does not necessarily simplify the program, and as such parallel programming constructs need to be relatively light weight, both syntactically and at run-time.  If the constructs introduce too much overhead, they defeat the purpose.
Given sufficiently flexible parallel programming constructs, it is not clear that you also need traditional concurrent programming constructs.  But it may be useful to provide some higher-level notion of process, which forgoes the single-threaded presumption of a thread, but brings some of the benefits from explicitly representing the natural concurrency of the problem domain within the programming language.

ParaSail is focused on providing parallel programming constructs, but concurrent objects are useful for more traditional concurrent programming approaches, particularly when coupled with explicit use of the "||" operator.  It is an interesting question whether some higher-level process-like construct might be useful.

Thursday, November 21, 2013

First release of Javallel, Parython; new release 5.1 of ParaSail and Sparkel

The ParaSail family of languages is growing, with two more additions now available for experimentation.  We have made a new release 5.1 which includes all four members of the family -- ParaSail itself, Sparkel based on the SPARK subset of Ada, Javallel based on Java, and Parython based on Python.  Binaries plus examples for these are all available in a single (large) download:

   http://bit.ly/ps51bin

As before, if you are interested in sources, visit:

   http://sparkel.org

The biggest change in ParaSail was a rewrite of the region-based storage manager (actually, this same storage manager is used for all four languages), to dramatically reduce the contention between cores/processors related to storage management.  The old implementation was slower, and nevertheless still had a number of race conditions.  This one is faster and (knock on wood) free of (at least those ;-) race conditions.

As far as how Javallel relates to Java, here are some of the key differences:
  1. Classes require a "class interface" to declare their visible operations and fields
  2. There is no notion of a "this" parameter -- all parameters must be declared
  3. There is no notion of "static" -- a method is effectively static if it doesn't have any parameters whose type matches the enclosing class; no variables are static
  4. You only say "public" once, in the class, separating the private stuff (before the word "public") from the implementation of the visible methods.
  5. Semi-colons are optional at the end of a line
  6. Parentheses are optional around the condition of an "if"
  7. "for" statements use a different syntax; e.g:
    •  for I in 1..10 [forward | reverse | parallel] { ... }
  8. "{" and "}" are mandatory for all control structures
  9. You can give a name to the result of a method via:  
    • Vector createVec(...) as Result { ... Result = blah; ... } 
    and then use that name (Result) inside as a variable whose final value is returned
  10. You have to say "T+" rather than simply "T" if you want to accept actual parameter that are of any subclass of T (aka polymorphic).  "T" by itself only allows actuals of exactly class T.
  11. Object declarations must start with "var," "final," or "ref" corresponding to variable objects, final objects, or ref objects (short-lived renames).
  12. There are no special constructors; any method that returns a value of the enclosing is effectively a constructor;  objects may be created inside a method using a tuple-like syntax "(a => 2, b => 3)" whose type is determined from context
  13. X.Foo(Y) is equivalent to Foo(X, Y)
  14. Top-level methods are permitted, to simplify creating a "main" method
  15. uses "and then" and "or else" instead of "&&" and "||"; uses "||" to introduce explicit parallelism.
  16. "synchronized" applies to classes, not to methods or blocks
  17. enumeration literals start with "#"
There are examples in javallel_examples/*.jl?, which should give a better idea of what javallel is really like.  Parython examples are in parython_examples/*.pr?