Monday, September 16, 2013

Parallelizing Python and Java

Designing and implementing ParaSail has been a fascinating process.  Achieving parallelism and safety at the same time by eliminating rather than adding features has worked out better than we originally expected.  One of the lessons of the process has been that just a small number of key ideas are sufficient to achieve safe and easy parallelism.  Probably the biggest is the elimination of pointers, with the substitution of expandable objects.  The other big one is the elimination of direct access to global variables, instead requiring that a variable be passed as a (var or in out) parameter if a function is going to update it.  Other important ones include the elimination of parameter aliasing, and the replacement of exception handling with a combination of more pre-condition checking at compile time and a more task-friendly event handling mechanism at run time. 

So the question now is whether some of these same key ideas can be applied to existing languages, to produce something with much the same look and feel of the original, while moving toward a much more parallel-, multicore-, human-friendly semantics.

Over the past year we have been working on a language inspired by the verifiable SPARK subset of Ada, now tentatively dubbed Sparkel (for ELegant, parallEL, SPARK).  For those interested, this experimental language now has its own website: 

    http://www.sparkel.org

Sparkel has essentially all of the power of ParaSail, but with a somewhat more SPARK-like/Ada-like look and feel.  We will be giving a talk on Sparkel at the upcoming HILT 2013 conference on High Integrity Language Technology in Pittsburgh in November:

   http://www.sigada.org/conf/hilt2013

HILT 2013 is open for registration, and it promises to be a great conference, with great talks, tutorials, and panels about model checking, model-based engineering, formal methods, SMT solvers, safe parallelism, etc. (disclaimer -- please note the name of the program chair).

At the upcoming OOPSLA/SPLASH conference, we are giving a talk about applying these same principles to Python and Java.  Super-secret code names for the results of this effort are Parython and Javallel.  The announcement of this talk is on the splashcon.org web site:

    http://splashcon.org/2013/program/tutorials-tech-talks/853-living-without-pointers-bringing-value-semantics-to-object-oriented-parallel-programming

If you are coming to OOPSLA/SPLASH, please stop by to hear the results.  We will also be adding entries over the next couple of months about some of the lessons learned in working to create a safe parallel language when starting quite explicitly from an existing language.

19 comments:

  1. There was a very simple attempt at bringing referential transparency to Java. The Tako programming language which had a compiler to Java or a set of APIs and guidelines to follow in Java. It retained pointers but gave them some limited attention. The docs for Tako have all but disappeared from the Internet in the past year. Here are some links for posterity.

    http://scholar.lib.vt.edu/theses/available/etd-05252006-171343/unrestricted/Thesis_Jyotindra_Vasudeo_2006_v1.2.pdf
    http://sourceforge.net/projects/takocompiler/
    http://www.eecs.ucf.edu/~leavens/SAVCBS/2008/papers/Sudhir-Kulczycki-Vasudeo.pdf

    ReplyDelete
  2. Ah, the Tako docs seem to mirrored here: http://tako.wikidot.com/.

    ReplyDelete
    Replies
    1. Thanks for the pointers to Tako. It is interesting that the connection between reduced aliasing and parallel programming was not highlighted. The SPARK subset of Ada, which was designed 20 years ago or so, also disallows aliasing to simplify formal reasoning, but also did not make the leap to seeing lack of aliasing as a gateway to implicitly parallel semantics.

      Delete
    2. That is very interesting. It bears repeating whenever a language feature is being evaluated. Who knows, a few years with this line of reasoning in PLT and it may change the way chips are designed.

      Javallel is very exciting. Will memory be handled by the JVM and heap or will there be region management too?

      Delete
    3. Javallel will initially be built on the ParaSail VM, so it will use region-based storage management. We will ultimately have a code generator from PSVM to Java, and the mapping for memory is still TBD for that. Chances are we will support an option to choose whether to use the garbage-collected JVM heap directly, or to use region-based storage management based on "region chunks" allocated from the JVM heap.

      Delete
    4. You might find this interesting for what you can expect for the performance of running your own memory manager on the JVM. Multiples less memory and multiples faster. http://mechanical-sympathy.blogspot.com/2012/10/compact-off-heap-structurestuples-in.html

      Delete
    5. Thanks for the pointer. Avoiding garbage collection while also getting better locality of reference would presumably be a nice benefit of building a region-based storage manager outside (or on top of) the Java heap.

      Delete
  3. What Ada compiler is needed for building the ParaSail/Sparkel parsers and vm from source? GCC or GNAT?

    ReplyDelete
    Replies
    1. The sources are regularly compiled with GNATPro, which is an Ada 2012 compiler, and with AdaMagic, which is an Ada 95 compiler. The GPL version of GNAT should also work. Both GNATPro and GNAT GPL are invokable using "gcc," presuming the Ada front end has been installed with your "gcc" installation.

      If you find a problem compiling it, please report it on the Sparkel forum pointed to from the sparkel.org home page.

      Delete
  4. You will probably need the following information if you are going to do your own memory management and therefore proxy objects so that Java can see them. http://devblog.guidewire.com/2012/05/09/gosus-inconceivable-non-classloader-take-1/

    ReplyDelete
    Replies
    1. Thanks for the pointer to Gosu. Clearly a lot of interesting JVM lessons there.

      Delete
  5. OK, sparkel_sources_4_8.zip builds fine on Ubuntu. Quick instructions for others:

    apt-get install build-essential pkg-config gnat libgtk2.0-dev
    unzip sparkel_sources_4_8.zip
    cd parasail_sources_4_8
    make

    ReplyDelete
  6. It seems ParaSail can be built with LLVM DragonEgg [http://dragonegg.llvm.org/].

    The standard packages on Ubuntu are not ideal. The default packages are for DragonEgg 3.2 which misses recent optimizations for Ada. The Ubuntu staging packages for DragonEgg 3.3 require gcc-4.7. GCC 4.6 is said to be best for DragonEgg. DragonEgg 3.3 will have to be built from scratch against GCC 4.6. However, we can try it out now with DragonEgg 3.2.

    In addition to the steps in my post above a few more things need to happen.

    # don't apt-get install gnat above. Use gnat-4.6 different version instead
    apt-get install build-essential pkg-config libgtk2.0-dev gnat-4.6
    unzip sparkel_sources_4_8.zip
    cd parasail_sources_4_8
    vim Makefile
    # On the line that reads: GNATMAKE=gnatmake
    # change it to: GNATMAKE=gnatmake --GCC="gcc-de"
    # gnatmake doesn't really seem to handle GCC switches in the --GCC switch.
    # It causes an exception in make.adb line 1720.
    # So, we are going to wrap the GCC switch in a script named gcc-de.
    vim /usr/local/bin/gcc-de
    # Type the following into your text editor.
    # Adjust path to dragonegg.so as needed

    #!/bin/sh
    gcc-4.6 -fplugin=/usr/lib/gcc/x86_64-linux-gnu/4.6/plugin/dragonegg.so "$@"

    chmod 766 /usr/local/bin/gcc-de
    # Note the environment setting in front of make below.
    # If you see some files compiled with gcc-4.6 and not gcc-de then you may have skipped the following step

    GCC=gcc-4.6 make

    OK, so that seems to build just fine. I tried to verify that LLVM was in fact being used. I passed the -fplugin-arg-dragonegg-emit-ir switch to gcc-4.6. That made the system assembler unhappy which I take to mean that gcc was emitting LLVM IR. This seems to be some indication that LLVM was actually doing the compilation.

    ReplyDelete
    Replies
    1. Thanks for all this info related to LLVM and Ubuntu.
      -Tuck

      Delete
  7. My goal is to get ParaSail to build via LLVM and Emscripten. This will allow for the compiler and runtime to run in the browser. People will have the opportunity to develop in the ParaSail family of languages without installing anything. It will also demonstrate the efficacy of the ParaSail VM. Will region-based memory management can excel in an environment like Emscripten? A Boehm style garbage collector is available for comparison in Emscripten. ParaSail will hopefully do much better than a gc in allocation and destruction speed and memory footprint. It will also be good to see if ParaSail can scale down into hostile environments like the browser. It may shed light on how ParaSail will perform in embedded platforms.

    It occurs to me that in this type of environment, ParaSail might benefit from the option of using only one OS thread. Browsers only allow a single thread for JavaScript. Everything above that could be handled in ParaSail pico threads. Is this type of arrangement possible?

    All in all I think ParaSail as a programming paradigm strives for simplicity in very a special way that drives simplicity into its implementation details. I am hoping that this simplicity of implementation and paradigm can provide a real portable replacement to the whole stacks of cruft that most people have to deal with. This whole talk is good, but the salient point starts here: http://vimeo.com/43380467#t=26m3s.

    ReplyDelete
    Replies
    1. Sounds great. I sympathize with the speaker on the video -- there is a huge amount of cruft out there, and the fact that we can build anything at all using it is impressive. I am hoping that the simplicity, efficiency, and inherent safety of ParaSail and friends can do their part to reduce the "cruft" coefficient a bit over the long haul.

      Delete
  8. I forgot to mention above to install DragonEgg.

    apt-get install dragonegg-4.6

    ReplyDelete
  9. First release of Javallel and Parython are now available. See the new blog entry for details.

    ReplyDelete