Thursday, August 30, 2012

Allowing indenting to be significant a la Python?

We are toying with the idea of having a "mode" in the ParaSail compiler in which indenting is significant, as in Python.  This would allow the programmer to omit the "end XYZ" part of a construct, provided that they start the next statement at the appropriate level of indentation.  This would also allow the programmer to omit the ";" terminator of a statement or declaration so long as continuation lines are indented and new statements/declarations are not indented relative to the prior statement/declaration.

Because we would like to support both styles, we are considering having the "indentation is significant" flag be a function of the filename extension on the file.  If the filename ends ".psi" or ".psl" then indentation is not significant, and ";" and "end XYZ" are needed as usual.  If the filename ends ".psn" (for ParaSail, iNdented), then indentation is significant, and ";" and "end XYZ" are optional provided appropriate indenting is performed.

This is just an experiment so far, but any comments would be appreciated.  This is almost certainly well into the religious zone of language design, so an extra effort toward courtesy in any comments would be very much appreciated...

19 comments:

  1. Seems like a good idea, to attract the Python crowd and the Ada crowd both.

    I'd want a pretty printer that put in all the missing syntax, to convert a .psn to a .psl. And I guess a Python fan would want the opposite conversion.

    It would be interesting to see that written in Parasail (as an example of a real text processing task), although I don't see much opportunity for parallelism.

    ReplyDelete
    Replies
    1. Rather than a stand-alone translator, it would probably be simpler for the compiler to output the desired format, since it has all the semantic info required.

      Which (sort of) brings up the topic of introspection; I don't recall if Parasail supports that, either in the Java style, or the Ada ASIS style.

      Delete
    2. We are working on an "introspection"/"reflection API for ParaSail. Our first use of it will be to build a translator from ParaSail Virtual Machine Instructions to a compilable language like C or Ada.

      Delete
  2. Please ignore what follows if it's nonsense, as it may well be. It's certainly underinformed.

    In its current form, Parasail seems to be aiming for a fairly limited niche. To a greater extent than any popular language except Ada, it expects programmers to understand their problem and solution pretty well before they begin coding, and to write quite a bit that the compiler could infer or that syntactic sugar could abbreviate. Those who, unused to Ada, might prefer to hack up something half-baked to see what it does, or who believe that terseness in the right places is an aid to readability, aren't going to find it as inviting as Ada programmers undoubtedly will. Parasail's major attraction for Ada programmers is the promise of better performance on parallel hardware, but on heterogeneous cores, or complex locality hierarchies, Parasail lacks the tuneability of X10 or Chapel (though its memory management should be an important advantage over them in some applications). I wonder how many Ada users are worried enough about performance on single-socket homogeneous manycore machines to switch languages, but confident that their programs will never have to perform competitively on other kinds of machine?

    All that said, providing a second skin with significant indentation seems an excellent step towards expanding Parasail's reach beyond Ada programmers; perhaps there are other concessions to the sketcher that could be made in it too?

    ReplyDelete
  3. Thanks for your comments. It sounds like having an indentation-significant version of ParaSail is not totally crazy, and might be an interesting experiment.

    Generally I am in favor of spending more time writing to make the source code more readable. But there is some argument that the Python style can be pretty readable, since you can rely on indenting to be "correct," whereas in other languages indenting can be misleading if not done carefully, and the compiler won't complain. (Of course some compilers/GUIs will automatically check that your indenting is "reasonable" even in a language where it isn't significant.)

    One could also argue that the Python approach can be readable even when reading back to front, because there is nothing "in the way" when quickly glancing up to find where the current construct starts.

    Perhaps these are just rationalizations, but somehow the Python approach seems cleaner than using something as non-specific as "}" or "end;" to end every construct, no matter what it is.

    ReplyDelete
  4. Thanks very much for your courteous responses.

    Where any kind of non-whitespace terminators are used, I do think it's worth requiring that indentation and linebreaking match up with those terminators (except of course in lambdas, where Python's whitespace-only policy incidentally causes some pain).

    ReplyDelete
  5. Recently I went thru the dreaded "Python white space experience" with a student (engineering not CS) of mine.
    I passed her some small Python tools from XP+Notepad++ to Linux+Emacs to Linux+Gedit and of course during the transit the white space problem (tab vs. spaces, tab as 4x or 8x spaces) occurred and she had trouble figuring what's going wrong because in gedit things looked ok and only in certain sub cases would the error caused by wrong indentation show up. As her text view looked ok, it was hard for her to find the hidden logic problem caused the "wrong" tab caused indentation.

    Therefore I would recommend to steer clear of a text representation that assumes that the code is white space formatted properly across different editors, platforms and "Python Modes". To non-experts it is very confusing during debugging to have to look for white space errors too. I don't know if you have a good idea how to avoid this in your lexer/parser ?

    Personally I see two approaches worth investigating.

    1) Standard text/editor based style like Ada to be compatible with current tools, editors, IDE (VS, Eclispe) and allow easy reuse.
    2) Instead of trying "colored ASCII art with or without white space", why not go really further to a graphical view.
    To me it is a great pity that current IDE's main idea how to use the additional screen space is to display more linear lists of "stuff". Back in 1995 I found the DDD debugger really interesting because it allow exploring data structures graphically in 2D. Why not transform the "good old" monospaced text file into smaller pieces and using graphics to create a denser and more meaningful view of the of the code. If the graphic view is done well, most user may not even want to see the traditional text view -- and no Simulink or Labview do not serve as good example to copy.

    Ralph

    ReplyDelete
    Replies
    1. Thanks for your comment. It is interesting to learn of the "white space" problem in Python. The shift from tab=8 in the Unix world to tab=4 in the PC world certainly created a lot of heartache. I suppose it goes with the shift from "/" to "\" in filenames. Fun all around.

      I will have to think how to avoid this being a problem. It does seem like some cleverness in the lexer could reduce the problem. Of course the simplest solution is to make tabs illegal in the source file whenever indentation is significant.

      I agree that IDEs can help here, but it is hard to require that everyone use the same IDE. The language Fortress pursued an elegant approach here, but did face some resistance because of the variety of text editors that programmers continue to prefer.

      Delete
    2. Well for me, having a bad indentation in the text view is one thing (ugly looks can be solved by the editor's pretty printing/indentation mode) but having the meaning/structure of code changed due to the "white space issues" of a specific editor feels really wrong, esp. when one has some experience in Ada which has allows one to be very exact -- e.g. representation clauses, pragma, …

      I agree that it is important not to chain yourself to a specific IDE nor any IDE. If you want people to start using a new languages they shouldn't be require to install a 100-1000 Mb monster just to type in "print 'Hello World'". (That's why I still enjoy Rebol or looking back over Oberon stuff).

      However I still ask the question of "is the white space thing really worth the/your effort ?". It doesn't enhance the language itself, just the visuals in text view and makes your lexer/parser more complicated. Given the overall aim with Parasail your (first) users are experts anyway which are not deterred by "{}","if"/"endif". Since you are doing a radical language experiment with Parasail why divert some effort to solving "text view" problems thru clever lexer/parser tricks. Forcing everybody to a 'GNAT style option' to allow white space mode use, may also be counterproductive -- "YES Sir, we took away those ugly delimiters but now YOU better watch your white space column STEPS" hmmm (;-).

      I never looked in detail at Fortress because it appeared to be too little progress compared to the status quo, like Java to C++ in the 90's. There are couple of interesting things floating around the web trying to get away from the VS / Eclipse IDE's like "Light Table", Bret Victor's and Alan Kay's stuff which aim to simplify/unify instead of adding more although clearly in different areas. Maybe the graphical view is better deferred to later or a cooperation with somebody with clear visual thinking (beyond text views) like E. Tufte.

      Best Regards

      Ralph

      Delete
    3. It would certainly be interesting to see an IDE designed by E. Tufte!

      Delete
  6. Tab characters in source code are definitely evil. You could make forbidding them a style option (as in the GNAT Ada compiler).

    ReplyDelete
    Replies
    1. Tabs weren't evil when everyone agreed on how they were expanded. Once that changed, they became worse than useless -- they moved into the "evil" category. I'd like to peer into the brain of the "clever" person who initially decided tabs should expand into 4 spaces instead of 8.

      Delete
    2. Originally, tabs were little bits of metal that you inserted onto a bar in a mechanical typewriter. In OpenOffice, you can set tabs at any position horizontally; emulating that typewriter.

      So it's the people that decided tab should be 8 that got it wrong. I think that first happened with line printers (Wikipedia is no help here)? Of course, 4 is no better.

      Emacs (and I assume vim) provides a way for text files to define the tab stops it assumes; that should be the standard. Parasail could adopt that, if it wants to support tabs. Lacking that, we should simply not use tabs.

      Delete
  7. Interesting ...

    By the way, F# is the only language I know that also has both indentation aware and indentation-ignoring mode.

    In indentation-ignoring mode, you basically have OCaml Syntax, and you write

    let x = 1 in
    let y = 2 in
    f x y

    in indentation-aware mode, you can drop the "in":

    let x = 1
    let y = 2
    f x y

    See also:

    http://msdn.microsoft.com/en-us/library/dd233191.aspx
    http://www.tryfsharp.org/Tutorials/QuickLanguageOverview/Section3.html

    ReplyDelete
  8. Here is a detailed question: In Python, continuation lines are identified by using a '\' as the last character of the line. That seems pretty unfriendly. I see a couple of alternatives:

    1) Indenting is used to indicate the first continuation line; a statement continues until you return to the initial indent level of the beginning of the statement;
    2) Indenting is required as above, but in addition, a line that is to be continued must end with a symbol or reserved word that requires a following token, such as "(", ",", "+", "and", "mod", etc.

    I somewhat prefer the latter as it makes it very obvious that the change in indent level is not just a typographical error.

    ReplyDelete
    Replies
    1. Changed my mind -- after more thought it seems that indenting by itself should be enough to indicate a continuation line. That means that any reasonably-indented piece of "normal" ParaSail should still be legal if you delete ';'s that appear at the end of each line.

      Delete
  9. . I liked the idea of Python's indentation,
    but I thought trying to avoid semi-colons
    was a bit too "(clean) .
    . as for dealing with modes,
    I thought xml already solved that;
    of course, you already mentioned
    not wanting to depend on an IDE;
    but, isn't there an existing, open, usable,
    MathML editor ?
    http://en.wikipedia.org/wiki/MathML
    . I just now searched, but haven't tried these yet ...
    http://www.w3.org/Math/Software/mathml_software_cat_editors.html
    http://code.google.com/p/mathmleditor/

    ReplyDelete
    Replies
    1. Thanks for the links. Having to use a particular editor like the MathML editor to get WYSIWYG was apparently an impediment to the uptake of the Fortress language. My hope is that we can make ParaSail code easy and pleasant to read without making it overly hard to write.

      Delete
    2. . sorry if it seemed I wasn't listening;
      I was hoping that those wanting their own editor
      would stay with the ascii mode
      rather than the html mode;
      however, one advantage your system has
      is the same as python provides:
      you're getting both the cleaner syntax
      plus you're working happily in your favorite editor
      (assuming the favorite editor is python-friendly).
      . how did you deal with editors that use tabs?
      here: Tuck September 6, 2012 5:36 PM
      you suggested the compiler should make tabs illegal;
      it sounds like those situations would appreciate
      something the python community already has:
      a module that automatically converts tabs
      in a hopefully intelligent way .
      (heikkitoivonen.net blogged a way
      ... unfortunately, that way assumes
      you know how many spaces a tab is representing;
      nevertheless, if the user could answer the tab size,
      the given method can adapt to that input .

      Delete