Sunday, November 4, 2012

Special syntax for Map/Reduce in ParaSail?

The Map/Reduce operation has become widely known recently.  A traditional way of implementing a Map/Reduce operation is to pass to the map_reduce operation, a container such as a set, vector, or list, a mapping function which is applied to each element, and a reducing function which is used to combine two mapped elements into one result.  The mapping step is relatively straightforward, but the reducing step can come in various forms, as it depends on whether the reducing function is associative, commutative, symmetric, etc.  Most functional languages have at least two standard reducing mechanisms, often called foldl and foldr.  These differ in the way the elements of the sequence are combined, either starting at the left or starting at the right.  Typically there is also an initial value which most often corresponds to the identity element for the reducing function, and this becomes the result if the sequence is empty.  If there is no convenient identity element (such as for maximum), versions of foldl and foldr are provided that use the use the first or last element as the initial value (called foldl1 and foldr1 in Haskell), but then only work on non-empty sequences.  Here is the wikipedia entry on the various forms of fold/reduce:

    http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29

Because ParaSail supports passing functions as parameters to other functions, Map/Reduce in all its forms can be supported in the "traditional" way.  The question is whether some alternative syntax might be useful as well, analogous to the special syntax provided for quantified expressions (which essentially use and or or as the reducing function over a set of boolean values), and for container comprehensions (which essentially use "|" as a reducing operation to combine values into a new container).  As examples, here is a ParaSail quantified expression that asserts an array is sorted:

  (for all I in 1 ..< Length(Vec) => Vec[I] <= Vec[I+1])

and here is a container comprehension that creates a vector of squares:

   const Squares : Vector<Integer> := [for I in 1..10 => I**2]

Quantified expressions and container comprehensions could also be implemented by passing functions as parameters, but clearly the special syntax makes it somewhat more convenient, and arguably easier to understand.  So the question is: is there an analogous situation for the general map/reduce operation?  Would a special syntax make it more convenient and/or easier to understand?

Here is an example using possible syntax for a general map/reduce:

   (for I in 1 .. Length(Vec) => <0> + Vec[I]**2)

This is to be interpreted as a map/reduce operation which produces the sum of the squares of the values of the Vec.  The initial result (in the case of an empty vector) is given inside the <...>, and then after each element is combined with the ongoing result, the new result replaces the <...> for the next element.  Here would be a dot product of two vectors (first we assert they are of the same length):

  {Length(Vec1) == Length(Vec2)}
 (for I in 1 .. Length(Vec1) => <0> + (Vec1[I] * Vec2[I]))

Here is a computation of factorial(N):

   (for I in 1 .. N => <1> * I)

If we wanted to compute the maximum of the elements of some function evaluated over a range of inputs, and we wanted to use the first value as the initial result (analogous to foldl1 in Haskell) we could write (after first asserting we have a non-empty set of inputs):

    {N > 0}
   (for I in 1 <.. N => Max(<F(1)>, F(I)))

This general syntax could be used with any ParaSail iterator, so if we wanted to count the number of symbols in a symbol table whose names are shorter than 3 letters, we could write:

   (for each S of Sym_Table => <0> + (Length(S.Name) < 3? 1: 0))

Here we might better use a filter annotation on the iterator (unfortunately, iterator filters are not yet implemented in the ParaSail compiler):

   (for each S of Sym_Table {Length(S.Name) < 3} => <0> + 1)
   
An explicit reverse or forward could be used with the iterator if we want to distinguish between foldr and foldl for reducing ordered sequences of values.  Otherwise, the default is an unordered parallel reduction.

Note that a quantified expression can be re-expressed in this more general map/reduce syntax.  For example, the above assertion of Vec being sorted could be re-expressed as:

   (for I in 1 ..< Length(Vec) => <#true> and then Vec[I] <= Vec[I+1])

Similarly, a container comprehension could be re-expressed using this general syntax.  For example, the table of squares could be re-expressed as:

   (for I in 1 .. 10 forward => <[]> | [I**2])

Overall this special syntax seems to provide a nice alternative for map/reduce.  In the traditional approach, we would have to "bundle" up the mapping and reducing operations into functions, whereas with this special syntax, we can write the operations directly in a relatively "normal" looking expression syntax.

We would appreciate comments on this proposed special map/reduce syntax.

3 comments:

  1. Hi, Great work man. I am interestes in learning programming language. What should i do?
    www.leonardo-hlc.it/en

    ReplyDelete
  2. In the foldl1-like case, the map expression needs to be repeated for the initial element (F(1) in the example) and the remaining elements (F(I)). In the case where that expression is very short, as in the example, that's fine. But that's the case where a specialized map/reduce syntax is least beneficial, isn't it (because you already have the function, F, that could be passed to a mapreduce defined in functional style)?

    I like this for the foldl case, though.

    -- David-Sarah Hopwood

    ReplyDelete
    Replies
    1. Another option for the "foldl1" case was to use a special notation such as "<>" to indicate where the first element should go, with subsequent elements going the normal place. Hence:

         (for I in 1 .. N => Max(<>, F(I)))

      This would pick one element X of the range of values of I and use "F(X)" as the initial value for "<>", and omit using X in the "F(I)" location of the formula. This clearly only works if the operation is symmetric (that is, takes the same type for both parameters), which is not required for the more general syntax.

      This "foldl1" syntax seems a bit obscure, but I suppose this whole idea of a special map/reduce syntax is a bit obscure when you first encounter it!

      Delete