Tuesday, June 8, 2010

Generalized For loops in ParaSail

As we start to dive into the lower-level control structures for ParaSail, one clearly very important structure is the for loop.  We have already suggested the basic structure:
    for I in 1..10 [forward | reverse | concurrent] loop
However, it would be nice if the for loop could also be used for more general looping structures. A number of languages have adopted a for loop where there is an initialization, a "next step", and a termination/continuation condition.  The best known is probably that of C, inherited by C++ and Java:
    for (init; test; next)...
The init and next parts are arbitrary statements (to be more precise, expressions evaluated for their side-effects), while test is treated as a condition whose truth determines whether iteration continues.

A number of LISP variants, and languages based to some degree on LISP such as Dylan, also have a similar generalized looping capability:
Scheme:
    (do ((I initial next) (J initial next) ... )
      (test) 
        loop body
    )
Dylan:
    for (I = initial then next while: test)
        loop body
    end for
In the case of the C for loop, next is an arbitrary statement, whereas for the LISP-based languages, next is the next value for the associated loop variable. This is perhaps related to the fact that in C there is special syntax for incrementing and decrementing a variable, meaning that a statement that increments the loop variable (e.g. I++) can be as concise as simply specifying the next value (e.g. I+1) in languages without this special syntax.

The LISP-based languages generally define these sorts of for/do loops in terms of tail recursion, where the next value is what would be passed as the parameter in the tail-recursive call.  It is interesting to consider whether more general kinds of recursive algorithms could be mapped to an iterative control structure.  For example, could a recursive walk of a binary tree be represented as some kind of for loop?  This is particularly interesting for a language with implicit parallelism, where it might be desirable to have multiple threads involved in walking the various branches of the tree.

The above considerations lead us to the following possible approach to a generalized for loop in ParaSail (borrowing heavily from the Dylan syntax):
    for T := Root then T.Left || T.Right 
      while T != null loop
        loop body
    end loop
This would represent a "loop" which on each iteration splits into two threads, one processing T.Left and the other T.Right. The iteration stops when all of the threads have hit a test for T != null which returns false.  The general syntax would be:
    for var := initial then next {|| next} 
      [while | until] test [concurrent] loop
        loop body
    end loop
If we were to specify concurrent loop then it would presumably imply that the daughter threads are to be created immediately once the test was found to be true, not waiting for the loop body of the parent thread to finish. This would effectively create a thread for every node in the tree, with the loop body executing concurrently for each node.

8 comments:

  1. This may be a silly question, but why do you need sequential for loops? It seems like you could cut back on a lot of syntax if you just replace all of the nonconcurrent for loop variants with sequence data structures. These would express the idea of iterating over some set of indices in a specific order, without needing separate syntax for the user to learn

    ReplyDelete
  2. You should probably think of "1..10" as a stand-in for any sort of well-defined sequence. I agree that the fundamental idea is to iterate through a sequence, where a contiguous integer sequence is probably the simplest example.

    ReplyDelete
  3. Right, but then why do you need the extra "forward" and "reverse" syntax, if the sequence itself can express the direction? I'm thinking e.g., of Python's "xrange":

    http://docs.python.org/library/functions.html#xrange

    (e.g., xrange(10,0,-1) vs. xrange(1,11)), or Clojure's "range":

    http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/range

    How does the extra syntax help you? I don't see that it even saves you effort in compiling for loops efficiently, since you still have to deduce type information from the sequence expression (e.g., "1..10") and transform the "for I in 1..10 forward loop" into the usual low-level C-like loop.

    Furthermore, what happens if the user puts "reverse" there instead of forward? Now you have to assign semantics to this degenerate and ambiguous case, and communicate this to users. There's no ambiguity when the sequence expression itself communicates the direction of iteration: "xrange(1,11)" always means 1, 2, ..., 10, and "xrange(10,0,-1)" always means 10, 9, ..., 1.

    ReplyDelete
  4. All good points.

    Readability is a very important goal for this language design, so the explicit "forward" and "reverse" seemed helpful. There was also a desire to make the explicit distinction between the ordered iteration from the unordered one, which is intended to be the default.

    Finally, if we presume we have a sequence, it seems relatively natural to either go through it in the forward direction, in the reverse direction, in a random order, or fully in parallel. All four of these approaches should be available for any given sequence, so having explicit forward/reverse/concurrent/[unordered] keywords seemed to allow a separation of concerns between the creation of the sequence, and the decision on how to iterate through the sequence.

    ReplyDelete
  5. I definitely understand the importance of distinguishing in the syntax between unordered, possibly concurrent iteration, and ordered, nonconcurrent iteration. The thing i don't understand is why you would want keywords for different nonconcurrent iteration cases -- especially since you cite examples from other languages that forgo a specialized "forward | reverse" syntax and simply indicate the "next" value. If you use "next" then you've eliminated two keywords from the language, without hindering understanding. You'll need "next" anyway for the "generalized" for loop (e.g., iterating over lists), and "next" is all you need.

    The thing is, sequences already encode part of the iteration policy (the loop bounds). Why don't you let them encode _all_ the iteration policy? Let the data control the iteration, rather than the syntax. This can even replace the "concurrent" syntax: certain sequences (e.g., integer ranges, arrays) are concurrent by default, unless you explicitly ask for a sequential range. Other sequences (e.g., lists, file streams) are forward sequential sequences by default. This means you only need one kind of "for" loop:

    for thing in sequence:
    ...
    end for

    btw not all Lisp variants use tail recursion to implement loops, because not all Lisp variants have tail recursion. (ANSI Common Lisp is the best-known example. Clojure does not deduce tail recursion in function calls, but does have a special loop syntax that uses tail recursion.)

    ANSI Common Lisp has perhaps the most rich syntax for loops of all the languages i've seen (the LOOP macro, which is a kind of domain-specific language for loops). Take a look at the LOOP macro sometime:

    http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node235.html#SECTION003000000000000000000

    This is the end point of the design you are proposing. If you allow syntax to control iteration policy, then the logical end point is to provide a complete iteration policy control syntax. There's nothing terribly wrong about that -- i've used LOOP (and its younger cousin ITERATE, which offers a more Lisp-like syntax) a lot and found it handy for common iteration patterns. However, i've found that languages that push iteration policy into the sequence datatype are just as useful, and also take me less time to learn the subtle details of syntax-based iteration policy (e.g., LOOP has a special syntax for initializing loop variables, but you have to learn the order in which it initializes them, in case there are side effects or dependencies between the loop variables).

    ReplyDelete
  6. Makes sense. It means the details of the for loop should probably wait until the details of sequence (or perhaps more generally, stream) creation are worked out. There is definitely a delicate dance here between iteration and sequences/streams. Our desire to make concurrency ubiquitous may change some of the considerations, but not necessarily.

    Luckily, none of this is set in stone at this point... ;-)

    ReplyDelete
  7. I hope i'm not being too aggressive in my comments -- language design is a bikeshed kind of activity, whereas language implementation is not ;-)

    ReplyDelete
  8. Not at all. Your comments are very much appreciated. If I really want to do this design in a nice comfortable vacuum, then I shouldn't write about it in a blog... ;-)

    ReplyDelete