Friday, January 25, 2013

The ParaSail "import" clause -- library level visibility

We have postponed implementing the "import" clause for ParaSail until just recently.  That may seem somewhat surprising.  Up until now, any ParaSail module could refer to any other ParaSail module, provided it used its full expanded name (e.g. a name of the form Acme::Killer_App::Driver).  This works fine for small programs, but as programs grow larger, there is often a desire to be more restrictive (on the one hand) as far as interdependencies, and provide a shorthand (on the other hand) for long module names.

So now in ParaSail, you can both restrict visibility, and create shorthands, using a (Java-like) import clause:

   import Acme::Killer_App::Driver

will both give visibility on this Driver module, and provide a shorthand of simply "Driver" to refer to it.  Alternatively, you can gain visibility on all of the modules in a hierarchy, along with shorthands for the top-level items in the hierarchy:

   import Acme::Killer_App::*

gives visibility on Acme::Killer_App::Driver, Acme_Killer_App::Utilities, etc. as well as allowing the use of the shorthand names "Driver" and "Utilities" directly in the source code that follows the import clause.

In the absence of any applicable import clauses, a default is provided based on the name of the module being compiled.  If the module being compiled has the name "A::B::C" then the default import clause, when none is supplied, is "import A::B::*".  That is, the module has visibility on the entire hierarchy of modules rooted at its parent module A::B, and can use the names of its sibling modules directly as a shorthand (e.g. can refer to A::B::D as simply "D").

Whether or not explicit import clauses apply to a piece of ParaSail source code, the following two implicit import clauses always apply:

    import PSL::Core::*
    import PSL::Containers::*

This ensures visibility on, and shorthands for, the standard ParaSail types such as Boolean and Univ_Integer, and the standard ParaSail containers such as Vector and Map.

Within a single source file containing multiple modules, a given set of import clauses apply to the modules following the import clause(s) up until the end of the source file, or to the next set of import clauses.  Whenever a new set of import clauses is given, the new set completely replace any earlier imports.   So there is no inheritance or accumulation of import clauses.  Only one set applies at a time (after always adding in the implicit ones for PSL::Core::* and PSL::Containers::*).

So why did we choose this particular approach for the import clause, and how does it compare to that of other languages?

Programming languages take many different approaches to controlling visibility between standalone program units defined at the top level of some source file.  C and C++ use a strictly textual #include approach, where what is defined by some included declaration is visible, and what is not declared in any included text file is invisible.  There are many well known troublesome issues with such an approach, but it has the advantage of simplicity.  However, some of the nice simplicity is lost when you add in the notion of "extern" declarations and single-definition rules and such.  Much of the nastiness comes from nested includes, and multiple includes with redefinitions of macros which allow the various includes of the same file to have different effects.

Languages like Modula, Ada, Java, etc., adopted a module-based approach to standalone unit visibility, where an import clause or a with clause or something similar is used to indicate what other modules are available for use within a given module.  The import clause specifies the name of the module or modules to be imported, using their programming language name rather than their file name.  Some implementations of such languages require that modules be stored in files whose filenames are derivable from the module name, but that is not always required, and in Java is only required for a public class or interface, but not for package-visible classes or interfaces that come along for the ride in the same file.  In module-based approaches, there is typically no transitive importing, in that each module must specify the modules on which it wants visibility -- it doesn't get visibility on a module B just because it imports a module A that imports B.  It must import B directly.

Note that Ada does allow the body of a library unit to inherit the with clauses from its spec, as well as from its parent units in the library unit hierarchy.  Hence a with clause on the spec for a package Acme would be inherited by the Acme package body, as well as the Acme.Killer_App spec and body, and so on.  For most other languages, there is no inheriting of import clauses between source files, even when one is defining the implementation for a module and the other is defining the interface for the module.

The new language Go has an interesting approach, where filenames are used in the import clauses, but they are treated more like modules, in that there is no automatic transitivity, and the imported files never need be read more than once, even if they are indirectly imported several times.  Go was designed in part in response to the unpleasant effects of the C++ #include rules, which in large systems can dramatically slow down compilation.

Other solutions to the C/C++ #include unpleasantries exist, some based on a "#pragma Once" approach which says that a given include file should only be read once, and others based on an idiomatic use of #ifndef/#endif's bracketing the text of each include file.

Another dimension over which importing differs is the provision of shorthands for imported declarations.  In Java, for example, no import clause is required if the programmer is willing to refer to an externally-defined class by using its full name (such as java.io.InputStream).  A Java import clause is primarily designed to provide a shorthand for a class/interface name (by writing "import java.io.InputStream"), or for all of the classes/interfaces in a given package (by writing "import java.io.*").  Java also provides an implicit import of java.lang.*, providing shorthands for all of the most basic Java classes and interfaces without need for an explicit import clause.  Modula (and Python) provides a shorthand when using an import clause of the form "from M import A, B" which makes the names A and B directly visible.

Ada has a different approach to shorthands.  The contents of any package may be made directly nameable using a "use" clause (e.g. "use Pkg;" makes Pkg.A nameable as simply "A").

So now back to the approach adopted for ParaSail.  ParaSail follows the Java model as far as shorthands, in that an "import Acme::Killer_App::Driver" has the effect of making "Driver" usable directly within the code that follows.  But unlike in Java, if the programmer gives explicit import clauses, then they restrict what modules the source code may use.  If the name of a module is not covered by one of the explicit import clauses, and is not within PSL::Core::* or PSL::Containers::*, then no reference may be made to it, even with its full name.  What this means is that if an import clause is present, it ensures that no unexpected dependencies on outside modules can occur due to some buried full-name reference.

ParaSail also eschews any inheritance of import clauses from other modules, such as the parent module or the interface of the module.  Import clauses apply to a piece of source code, namely the source code that follows the import clauses, up until the next set of import clauses, or the end of the file.  This means that identifying what other modules are relevant to understanding a given piece of source code is fully circumscribed by the applicable import clauses.

No comments:

Post a Comment