Saturday, May 15, 2010

ParaSail enumeration types

We haven't talked about enumeration types in ParaSail yet.  One challenge is how to define an enumeration type by using the normal syntax for instantiating a module, which is the normal way a type is defined in ParaSail.  Generally languages resort to having special syntax for defining an enumeration type (Haskell being an interesting exception).  But that seems somewhat unsatisfying, given that we can fit essentially all other kinds of type definitions into the model of instantiating a module.  Another challenge with enumeration types is the representation of their literal values.  In many languages, enumeration literals look like other names, and are considered equivalent to a named constant, or in some cases a parameterless function or constructor.  Overloading of enumeration literals (that is using the same name for a literal in two different enumeration types declared in the same scope) may or may not be supported.

In ParaSail we propose the following model: We define a special syntax for enumerals (enumeration literals), of the form #name (e.g. #true, #false, #red, #green). We define a universal type for enumerals, Univ_Enumeration.  We allow a type to provide conversion routines from/to Univ_Enumeration, identified by operator "from_univ" and operator "to_univ".  If a type has a from_univ operator that converts from Univ_Enumeration, then that type is effectively an enumeration type, analogous to the notion that a type that has a from_univ that converts from Univ_Integer is essentially an integer type.  As with other types, double-bracket notation applied to a value of an enumeration type, e.g. [[enumval]], is equivalent to a call on to_univ(enumval), and produces a result of Univ_Enumeration type.  An additional special Boolean "in" operator which takes one Univ_Enumeration parameter determines which enumerals are values of the type.  The notation "X in T" is equivalent to a call on the operator T::"in"(X).  The precondition on from_univ would generally be {univ in Enum_Type}.

Here is an example:
interface Enum<Enumerals : Vector<Univ_Enumeration>> is
  operator "in"(Univ_Enumeration) -> Boolean<>; 
  operator "from_univ"
    (Univ : Univ_Enumeration {Univ in Enum}) -> Enum;
  operator "to_univ"(Val : Enum) 
    -> Result: Univ_Enumeration { Result in Enum };
end interface Enum;

type Color is Enum<[#red,#green,#blue]>;
var X : Color := #red;  // implicit call on from_univ

Here we presume the class associated with the Enum interface implements "in", "from_univ", and "to_univ" by using the Enumerals vector to create a mapping from Univ_Enumeration to the appropriate value of the Enum type, presuming the enumerals map to sequential integers starting at zero. A more complex interface for creating enumeration types might take such a mapping directly, allowing the associated values to be other than the sequential integers starting at zero. The built-in type Boolean is presumed to be essentially Enum<[#false,#true]>.

So why this model?  One is that it fits nicely with the way that other types with literals are defined in ParaSail, by using from_univ/to_univ.  Secondly, it naturally allows overloading of enumeration literals, in the same way that the types of numeric literals are determined by context.  Finally, it makes a strong lexical distinction between enumeration literals and other names.  This could be considered a bug, but on balance we believe it is useful for literals to stand out from, and not conflict with, normal names of objects, types, operations, and modules.  Note that this model is reminiscent of what is done in Lisp with the quote operator applied to names, e.g. 'foo.  It also borrows from the notation in Scheme of #f and #t for the literals representing false and true.


  1. I really can't stress enough how valuable this exposition of the language design process is to someone who is only beginning to investigate the abstractions surrounding "programming."

    I do wonder why enumerations have to be integers. Would there be, say, from_univ(Univ_String s) and even for user types? I would be grateful if you could point me to a text or other relevant information (or perhaps it is just not a very important element of a language).

  2. In short: why eNUMerate and not "e-type-erate?" :)

  3. Although I illustrated the notion of a user-defined operator "from_univ"(Univ_Enumeration) by a type that used integers internally, in fact there is nothing inherent in this proposal that requires an "enumeration type" to use integers internally. You might have noticed that I made no mention of how Univ_Enumeration itself is represented at run-time. I would suspect it might best be represented as a pointer to an entry in a hash table.

    It would have been helpful if I had suggested what operations were provided on the Univ_Enumeration type. At a minimum we would want functions that would convert to/from Univ_String. In addition, we would want some kind of "Hash" function, and an "==" operator (or more precisely, a "=?" operator) to allow the easy creation of sets and mappings of Univ_Enumeration values.

    As far as pointing you to texts that discuss the design of enumeration types, nothing comes to mind immediately. The designers of Pascal (Jensen and Wirth) invented enumeration types, I believe, and most language designers since then have included them. Interestingly, however, they were left out of Wirth's newer langauge Oberon in part because Wirth did not consider them "extensible" enough, and the Java designers also omitted them for apparently the same reason. Ultimately they were added back to later versions of Java (though not to more recent versions of Oberon).

    Below is Wirth's explanation for why he omitted enumeration types. Based on my personal experience using Java before enumeration types were added back, this was a very short-sighted decision, because there is a fundamental and important distinction between an enumeration type and an integer type, which is lost in Oberon and initial Java.

    The enumeration type approach suggested for ParaSail might be considered "extensible," which perhaps means Wirth would find it more appealing.

    In any case, here is Wirth's explanation:

    "From Modula to Oberon" (Wirth 1988):

    Enumeration types appear to be a simple enough feature to be uncontroversial. However, they defy extensibility over module boundaries. Either a facility to extend given enumeration types has to be introduced, or they have to be dropped. A reason in favour of the latter, radical solution was the observation that in a growing number of programs the indiscriminate use of enumerations (and subranges) had led to a type explosion that contributed not to program clarity but rather to verbosity. In connection with import and export, enumerations give rise to the exceptional rule that the import of a type identifier also causes the (automatic) import of all associated constant identifiers. This exceptional rule defies conceptual simplicity and causes unpleasant problems for the implementor.

  4. The Dylan language also uses the # syntax for not only #f and #t, but also #key, #rest, and #next, symbols related to parameter passing.

    The enumeration literal syntax might also be useful in ParaSail for specifying attributes associated with a declaration, such as #optional, #mutable, #volatile, etc., avoiding the need to make the names for these kinds of attributes into reserved words.

  5. Thanks very much for taking the time to answer my question.

  6. One further comment on Wirth's discussion. He expresses a concern that on import of an enumeration type, one implicitly imports all of the enumeration literals ("constant identifiers"). In the ParaSail model that is not necessary, since the enumeration literals are syntactically distinct from other names, and are treated more like numeric and string literals. They don't appear in symbol tables, and don't need to be imported or exported. The "in" and "from_univ" operators determine which enumerals are usable with a given (enumeration) type.

  7. very interesting idea about enumeration types, indeed. i decided to try out how this stuff works and wrote the following test:

    func main () is
    type Color is Enum <[#red, #green, #blue]>;
    type rColor is Enum <[#black, #red]>;

    var x: Color;

    x := #black;

    end func main;

    pslc compiles it just fine. so when i run main... i get a run-time error. what's with that? i thought the compiler was supposed to detect all exceptions at compile-time and refuse to compile a program which would crash. is there a way to solve this?

    next thing. the compiler allows me to define:

    func mix (c: Color; rc: rColor) -> Color is
    return c;
    end func mix;

    func mix (c: Color; cc: Color) -> Color is
    return cc;
    end func mix;

    but if i try to compile a call to this function i get a compile error "construct is ambigious", even in clearly unambigious cases, such as mix (green, green). any suggestions?

    1. The prototype ParaSail compiler doesn't yet perform all of the compile-time checks required by the language definition, so that is why you didn't find out until run-time that #black was not a legal value of Color.

      As far as your question about calling "mix(#green, #green)" it is correct that this is considered ambiguous in ParaSail. You can write "mix(#green, Color::#green)" if you want to disambiguate,
      or "mix(#green, cc => #green)".

      It might be possible to allow the compile-time check against a precondition (like that on "from_univ" for rColor failing for #green) to factor into overload resolution, but as defined at the moment, overload resolution is complete before any preconditions checks would be performed.

      So the bottom line is that the particular enumeration literal used cannot help disambiguate which function is being called. This is somewhat analogous to saying that the particular value of an integer literal cannot be used to disambiguate which function is being called.

      Clearly there is a tradeoff here, and conceivably we could change the definition of ParaSail to start taking preconditions into account during overload resolution, though there are tricky chicken-and-egg issues involved (since the preconditions are user definable, and can be arbitrarily complex, and could conceivably call the function whose code is undergoing overload resolution at that moment).