If you are in the Bay Area, come on down. The talk is ESC-218, on Tuesday April 23 from 2:00-3:00PM in Salon 3: http://www.ubmdesign.com/sanjose/schedule-builder/session-id/98
NOTE: Some browsers are having trouble with this version of the paper. It was cut and pasted from a Word document, which is always dicey. A PDF version of this paper is available on the ParaSail Google Group:
http://groups.google.com/group/parasail-programming-language
Systems
Programming in the Distributed, Multicore World with Go, Rust, and ParaSail 
| 
S. Tucker Taft 
AdaCore 24 Muzzey Street, 3rd Floor Lexington, MA 02421 
taft@adacore.com | 
Abstract              
 
The distributed, multicore train
is stopping for no programmer, and especially the systems programmer will need
to be ready to hop on to the distributed parallel programming paradigm to keep
their systems running as efficiently as possible on the latest hardware
environments.  There are three new
systems programming languages that have appeared in the last few years which
are attempting to provide a safe, productive, and efficient parallel programming
capability.  Go is a new language from Google, Rust is a new language from Mozilla Research, and ParaSail is a new language from
AdaCore.  This talk will describe the
challenges these languages are trying to address, and the various similar and
differing choices that have been made to solve these challenges.
Keywords multicore, distributed, parallel
programming, systems programming language, Go language, Rust language, ParaSail
language
1. Introduction
The distributed, multicore train is stopping for no
programmer, and especially the systems programmer will need to be ready to hop
on to the distributed parallel programming paradigm to keep their systems
running as efficiently as possible on the latest hardware environments.  There are three new systems programming
languages that have appeared in the last few years which are attempting to
provide a safe, productive, and efficient parallel programming capability.  Go
is a new language from Google [1], Rust
is a new language from Mozilla [2], and ParaSail
is a new language from AdaCore [3][4].
The designers of Go,
Rust, and ParaSail all are facing a common challenge -- how to help
programmers address the new distributed and multicore architectures, without
having the complexity of programming going past that which is manageable by the
professional, yet still human, programmer. 
All programming languages evolve, and as a rule, they tend to get more
complex, not less so.  If every time a
new hardware architecture becomes important, the programming language is
enhanced to provide better support for that architecture, the language can
become totally unwieldy, even for the best programmers.  When the architectures changes radically, as
with the new massively distributed and/or multicore/manycore architectures, this
may mean that the language no longer hangs together at all, and instead has
become a federation of sublanguages, much like a house that has been added onto
repeatedly with a different style for each addition.
Because of the complexity curse associated with language
evolution, when there is a significant shift in the hardware landscape, there
is a strong temptation to start over in programming language design.  After many years of a relatively stable
programming language world, we now see a new burst of activity on the language
design front, inspired in large part by the sense that our existing mainstream
languages are either not going to be supportive enough, or that they are
becoming simply too complex in trying to support both the old and new hardware
paradigms through a series of language extensions.
2. Go from Google
Go, Rust, and ParaSail all emerged over the past few years, each with its own
approach to managing complexity while supporting parallelism.  Go
from Google is the brain child of Rob Pike and his colleagues at Google.  Rob was at Bell Labs in the early Unix and C
days, and in many ways Go inherits the C tradition of simplicity and power.
Unlike C, storage management has been taken over by the Go run-time through a
general purpose garbage collection approach, but like C, care is still needed
in other areas to ensure overall program safety.  
From the multicore perspective, Go uses goroutines for
structuring large computations into a set of smaller potentially parallel
computations.  Goroutines are easy to
create – essentially any stand-alone call on a function or a method can be
turned into a goroutine by simply prefixing it with the word “go.”  Once a
goroutine is spawned, it executes independently of the rest of the code.  A goroutine is allowed to outlive the
function that spawns it, thanks to the support of the garbage collector; local
variables of the spawning function will live as long as necessary if they are
visible to the spawned goroutine.
For goroutines to be useful, they need to
communicate their results back to the spawning routine.  This is generally done using strongly-typed channels in Go.  A channel can be passed as a parameter as
part of spawning a goroutine, and then as the goroutine performs its
computation it can send one or more
results into the channel.  Meanwhile, at
some appropriate point after spawning the goroutine, the spawner can attempt to
receive one or more values from the
channel.  A channel can be unbuffered,
providing a synchronous communication between sender and receiver, or it can
provide a buffer of a specified size,
effectively creating a message queue.  
Communication between goroutines can also go
directly through shared global variables. 
However, some sort of synchronization through channels or explicit locks
is required to ensure that the shared variables are updated and read in an
appropriate sequence. 
Here is an example Go program that counts the number
of words in a string, presuming they are separated by one or more separator characters, dividing
multi-character strings in half and passing them off to goroutines for recursive word counts:
 func Word_Count
   (s string; separators string) int = {
     const slen = len(s)
     switch slen {
       case 0: return 0 // Empty string
       case 1: 
        // single-char string
        
if strings.ContainsRune
               (separators, S[0]) {
             return 0  // A single separator
         } else {
             return 1  // A single non-separator
         }
       default:   // divide string and recurse
        
const half_len = slen/2
        
// Create two chans and two goroutines
         var left_sum = make(chan int)    
         var right_sum = make(chan int)
         go func(){left_sum <- span="" word_count="">->
              (s[0..half_len], separators)}()
         go func() {right_sum <- span="" word_count="">->
              (s[half_len..slen],
separators)}()
        
// Read partial sums
         // and adjust total if word was
divided
         if strings.ContainsRune
              (separators, s[half_len-1]) ||
           strings.ContainsRune
              (separators, s[half_len]) {
            
// No adjustment needed
             return <-left_sum right_sum="" span="">-left_sum>
         } else {// Minus 1 because word
divided
             return <-left_sum -="" 1="" right_sum="" span="">-left_sum>
         }
     }
  }
2.1 Unique Features of Go
Go has some unusual features.  Whether a declaration is exported is determined by whether or not its name begins with an
upper-case letter (as defined by Unicode); if the declaration is a package-level
declaration or is the declaration of a field or a method, then if the name
starts with an upper-case letter, the declaration is visible outside the
current package.
Every Go source file begins
with a specification of the package
it is defining (possibly only in part).  One
source file may import declarations
from another by specifying the path to the file that contains the declarations,
but within the importing code the exported declarations of the imported code
are referenced using the imported file’s package
name, which need not match that of the imported file’s filename.  Of course projects would typically establish
standard naming conventions which would align source file names and package
names somehow.
Go provides a reflection
capability, which is used, for example, to convert an object of an arbitrary
type into a human-readable representation. 
The “%v” format in Go’s version
of printf does this, allowing
arbitrarily complex structs to be written out with something as simple as:
fmt.Printf(“%v”,
my_complex_object)
Printf is implemented in Go
itself, using the “reflect” package.
There are no uninitialized
variables in Go.  If a variable is not
explicitly initialized when declared, it is initialized by default to the zero of its type, where each type has an
appropriately-defined zero, typically
either the zero numeric value or the nil
(aka “null”) pointer value, or some composite object with all components having
their appropriate zero value.
2.2 What Go Leaves Out
Because complexity was a
major concern during all three of these language designs, some of the most
important design decisions were about what to leave out of the language.  Here we
mention some of the features that Go does not
have.
Go does not permit direct cyclic
dependencies between packages.  However,
the Go interface capability permits
the construction of  recursive control or
data structures that cross packages, because an interface declared in one
package can be implemented by a type declared in another package without either
package being directly dependent on the other.
Like C, Go has no generic
template facility.  There are some builtin
type constructors, such as array, map, and chan,
which are effectively parameterized type constructors, but there is no way for
the user to create their own such type constructor.  Unlike C, there is no macro facility which
might be used to create something like a parameterized type.  Nevertheless, Go’s flexible interface and reflection capabilities allow the creation of complex data
structures that depend only on the presence of a user-provided method such as Hash and the DeepEqual
function of the reflect package.
Go does not allow
user-defined operators.  Various
operators are built in for the built-in types, such as int and float32.  Interestingly enough, Go does include
built-in complex types (complex64 and complex128) with appropriate operators.
Go does not have
exceptions.  However, functions and
methods  may return multiple results, and
often errors are represented by having a second return value called error that is non-nil on error.  Unlike in C, you cannot ignore such an extra
parameter; unless you explicitly assign it to an object of name “_”.  When things go really wrong in Go, a run-time
panic ensues, and presumably during
development, you are tossed into a debugger.
3. Rust from Mozilla Research
The modern web browser represents one of the most
complex and critical pieces of software of the internet era.  The browser is also a place where performance
is critical, and there are many opportunities for using parallelism as a web
page is “rendered.”  The Rust language arose originally as a
personal project by one of the engineers at Mozilla Research (Graydon Hoare), and
has grown now into a Mozilla-sponsored research effort.  Rust has been designed to help address the
complexity of building components of a modern browser-centric software
infrastructure, in the context of the new distributed multicore hardware
environment.
Like Go, Rust has chosen to
simplify storage management by building garbage collection into the
language.  Unlike Go, Rust has chosen to
restrict garbage collection to per-task heaps, and adopt a unique ownership policy
for data that can be exchanged between tasks. 
What this means is that data that can be shared between tasks is visible
to only one of them at a time, and only through a single pointer at a time
(hence an owning pointer).  This eliminates the possibility of data races
between tasks, and eliminates the need for a garbage collector for this global
data exchange heap.  When an owning
pointer is discarded, the storage designated by the pointer may be immediately
reclaimed – so no garbage accumulates in the global exchange heap.
Here is a Rust version of the
Word Count program, recursing on multi-character strings with subtasks
encapsulated as futures computing the
subtotals of each string slice:
fn Word_Count
   (S :
&str; Separators : &str) -> uint {
     let Len =
S.len();
     match Len
{
       0 => return 0; // Empty string
       1 =>
return    // one-char string
         if
Separators.contains(S[0]) { 0 } 
           else
{ 1 };  // 0 or 1 words
       _
=>           // Divide and recurse
         let
Half_Len = Len/2;
         let
Left_Sum = future::spawn {
           ||
Word_Count(S.slice
                
(0, Half_Len-1), Separators)};
         let
Right_Sum = future::spawn {
           ||
Word_Count(S.slice
                
(Half_Len, Len-1), Separators)};
        
// Adjust sum if a word is divided
         if
Separators.contains(S[Half_Len]) ||
          
Separators.contains(S[Half_Len+1]) { 
             //
No adjustment needed
            
return 
              Left_Sum.get()
+ Right_Sum.get();  
         } else { 
             //
Subtract one because word divided
            
return 
             
Left_Sum.get()+Right_Sum.get() – 1; 
         }
     }
  }
Rust does not have special syntax
for spawning a “task” (Rust’s equivalent of a “goroutine”) nor declaring the
equivalent of a “channel,” but instead relies on its generic template facility and a run-time library of  threading and synchronization
capabilities.  In the above example, we
illustrate the use of futures which
are essentially a combination of a task and an unbuffered channel used to
capture the result of a computation. 
There are several other mechanisms for spawning and coordinating tasks,
but they all depend on the basic tasking model, as mentioned above, where each
task has its own garbage-collected heap for task-local computation (manipulated
by what Rust calls managed pointers),
plus access via owning pointers to
data that can be shared between tasks (by sending an owning pointer in a
message).
3.1 Rust Memory Management Performance
One of the major advantages
of the Rust approach to memory
management is that garbage collection is local to a single task.  By contrast, in Go each garbage collector thread has to operate on data that is potentially
visible to all goroutines, requiring
a garbage collection algorithm that synchronizes properly with all of the
active goroutines, as well as with any other concurrent garbage collector
threads (presuming garbage collection itself needs to take advantage of
parallel processing to keep up with multithreaded garbage generation).  
In Rust, a conventional single-threaded garbage collector algorithm
is adequate, because any given garbage collector is working on a single
per-task heap.  Furthermore, storage
visible via owning pointers needs no garbage collection at all, as releasing an
owning pointer means that the associated storage can also be released
immediately.
3.2 The Costs of Safety and
Performance
One of the downsides of added safety and performance
can be added complexity.  As we see, Rust
has added safety by allowing access to sharable data only via pointers that
give exclusive access to one task at a time, and added performance because
garbage collection is single-threaded. 
However, as a result, Rust needs several kinds of pointers.  In fact, there are four kinds of pointers in
Rust, managed pointers (identified by
‘@’as a prefix on a type) for per-task garbage-collected storage, owning pointers (identified by ‘~’) for
data that is sharable between tasks, borrowed
pointers (identified by ‘&’) that can temporarily refer to either per-task
or sharable data, and raw pointers
(identified by ‘*’), analogous to typical C pointers, with no guarantees of
safety.
4. ParaSail
from AdaCore
The ParaSail language from AdaCore takes support for parallelism one
step further than Go or Rust, by treating all
expression evaluation in the language as implicitly
parallel, while also embracing full type and data-race safety.  Rather than adding complexity to accomplish
this, the explicit goal for ParaSail was to achieve safety and pervasive
parallelism by simplifying the language,
eliminating impediments to parallelism by eliminating many of the features that
make safe, implicit parallelism harder.
4.1 What ParaSail Leaves Out
Some of the features left out of ParaSail include the following:
·      
No pointers
·      
No global variables
·      
No run-time exception handling
·      
No explicit threads, no explicit locking nor
signaling
·      
No explicit heap, no garbage collection needed
·      
No parameter aliasing
So what is left?  ParaSail provides a familiar
class-and-interface-based object-oriented programming model, with mutable
objects and assignment statements.  But
ParaSail also supports a highly functional style of programming, aided by the
lack of global variables, where the only variable data visible to a function is
via parameters explicitly specified as var
parameters.   This means that the
side-effects of a function are fully captured by its parameter profile, which
together with the lack of parameter aliasing allows the compiler to verify
easily whether two computations can safely be performed in parallel.
By design, every expression
in ParaSail can be evaluated in parallel. 
If two parts of the same expression might conflict, the expression is
not a legal ParaSail expression.  The net
effect is that the compiler can choose where and when to insert parallelism
strictly based on what makes the most sense from a performance point of view.  Here, for example, is the Word Count example
in ParaSail, where parallelism is implicit in the recursive calls on Word Count,
without any explicit action by the programmer:
func
Word_Count
  (S : Univ_String; 
   Separators : 
        Countable_Set :=
[' ']) 
   -> Univ_Integer is
    case |S| of
      [0] => return 0  // Empty string
      [1] =>
        if S[1] in Separators then
            return 0   // No words
        else
            return 1   // One word
        end if
      [..] =>          // Divide and recurse
        const Half_Len := |S|/2
        const Sum := Word_Count
            (S[1 .. Half_Len], Separators) +
          Word_Count
           
(S[Half_Len <.. |S|], Separators)
        if S[Half_Len] in Separators or else 
          S[Half_Len+1] in Separators then
            return Sum  // No adjustment needed
        else
            return Sum-1   // Adjust sum
        end if
    end case
end func Word_Count 
Although there is no explicit
use of a parallel construct, the sum of the two recursive calls on Word_Count
can be evaluated in parallel, with the compiler automatically creating a picothread for each recursive call,
waiting for their completion, and then summing the results, without the
programmer having to add explicit directives.
4.2 Implicit and Explicit Parallelism
in ParaSail
Explicit parallelism may be
specified if desired in ParaSail, or the programmer can simply rely on the
compiler to insert it where it makes the most sense.  The general philosophy is that the semantics
are parallel by default, and the programmer needs to work a bit harder if they
want to force sequential semantics.  For
example, statements in ParaSail can be separated as usual with “;” (which is implicit at the end of the line
when appropriate), or by “||” if the
programmer wants to request explicit parallelism, or by “then” if the programmer wants to disallow
implicit parallelism.  By default the
compiler will evaluate two statements in parallel if there are no data
dependences between them.  
As another example of
ParaSail’s implicit and explicit parallelism, the iterations of “for I in 1..10 loop” are by default executed
in any order, including parallel if there are no data dependences between the
loop iterations, while “for I in 1..10
forward loop” or “for I in 1..10
reverse loop” may be specified to prevent parallel evaluation, and “for I in 1..10 concurrent loop” may be used
to specify that parallel evaluation is desired, and it is an error if there are
any data dependences between the iterations. 
In all these cases, the compiler will ensure that any parallel
evaluation is safe and data-race free, and will complain if there are potential
race conditions when parallel evaluation semantics are specified.
4.3 Simplicity breeds Simplicity in
ParaSail
There is somewhat of a virtuous cycle that occurs when a programming
language is simplified, in that one simplification can lead to another.  By eliminating pointers and a global heap
from ParaSail, the language can provide fully automatic storage management
without the need for a garbage collector. 
Objects in ParaSail have value
semantics meaning that assignment copies the value of the right-hand side
into the left-hand side, with no sharing of data.  A built-in move operation is provided for moving the value of the right-hand
side into the left-hand side, along with a swap
for swapping values, thereby reducing the cost of copying while still
preserving value semantics.
Every type in ParaSail has an
extra value, called null, which is
used to represent an empty or otherwise uninitialized value.  An object or component may have a null value of its type T only if it is
declared to be “optional T”.  Optional components may be used to implement
trees and linked lists, without the use of explicit pointers, and without the
potential sharing issues associated with pointers, even if behind the scenes
the compiler uses pointers to implement optional objects or components.  The availability of optional components
effectively allows an object to grow
and shrink, meaning that dynamic
structures like hash tables, and higher level notions such as maps and sets,
can be implemented in ParaSail without any explicit pointers, with the
advantages of purely value semantics.
The lack of explicit pointers
means that all objects in ParaSail effectively live on the stack, even though they may still grow and shrink.  Each scope has its own region, effectively a local heap that expands and contracts to hold
the objects associated with the scope. 
No garbage collector is needed because when an object goes out of scope,
or an object or one of its component is set back to null, the associated storage may be immediately reclaimed, much
like storage designated by owning
pointers in Rust.  By simplifying the
type model, the storage management in ParaSail is dramatically simpler and of
higher performance, with no complex parallel garbage collection algorithm
required.
5. Implementation
Status and Conclusions
The programming language
design world has been rejuvenated by the new challenges of distributed and
multicore architectures.  Three new
programming languages designed for building industrial-strength systems have
emerged, Go, Rust, and ParaSail.  Each of these languages tries to make
parallel programming simpler and safer, while still providing the level of
power and performance needed for the most critical systems development
tasks.  
From an implementation status
point of view, Go is the most stable of these three new languages, with two compilers
available, and a number of systems built with Go now being deployed.  Rust and ParaSail are still under
development, but both are available for trial use, with Rust having early on
achieved the compiler boot strap milestone,
where the Rust compiler is itself implemented in Rust.  All three languages provide opportunities for
the professional programmer to expand their horizons, and see what sort of new
paradigms and idioms will become more important as we leave behind the era of
simple sequential processing, and enter the era of massively parallel, widely
distributed computing.
References        
[2]    The
Rust Programming Language, Mozilla Research, http://www.rust-lang.org
[3]    ParaSail:
Less is More with Multicore, S. Tucker Taft, http://www.embedded.com/design/other/4375616/ParaSail--Less-is-more-with-multicore
(retrieved 4/1/2013).
[4]    Designing
ParaSail: A New Programming Language, S. Tucker Taft, http://parasail-programming-language.blogspot.com

