Monday, February 8, 2010

Physical Units in ParaSail

One of the tests of a type system is whether it can support the notion of physical units, such as meters, kilograms, seconds, etc. being associated with particular numeric values, and having static checking for unit consistency, such as checking that meters + meters = meters, and meters / seconds = meters-per-second.

ParaSail can support the association of physical units with numeric types by adding interface parameters which represent the name of the physical unit (e.g. "centimeters"), its shorthand (e.g. "cm"), and the relevant power of the various possible base units (SI, the International System of Units, identifies 7 different base units), and then a scale factor relative to some base unit (if meters are the base unit, then centimeters would have a scale of 0.01).

For a numeric type with an associated unit, the "simple" predefined operators would be "+", "-", and "=?".  These are the ones that expect the same unit for both operands.  The "simple" multiplication and division operators would require one of the operands to be unitless (the divisor for division must be unitless, either multiplicand can be unitless), producing a result of the same unit as the non-unitless operand.  All of the simple operations can be declared as part of the original interface.

The more "complex" multiplication and division operators would support operands with different units, producing a result with yet a third unit.  So how do these operators come into being?  Where are they declared?  One possibility is to have an interface which is designed to take two existing types each with some set of units and produce a new type that corresponds to the product or the quotient of the constituent types.  However, you could end up producing multiple types that are effectively equivalent, depending on the order of the operands in the product, or on the particular sequence or products and quotients used to produce the final type.  What you really want to declare is a generic operator, which produces a particular result type, and which has a precondition that checks that the units of the operands when combined produce the units of the result type.

It seems relatively straightforward to define a nested interface which takes two other types as interface parameters, and declares a "*" operator that has the appropriate precondition.  In fact the precondition could really be part of the nested interface, rather than the operator it defines.  The slightly odd thing is that the nested interface, once instantiated, is merely used for the operator it defines.  There is never an object of the interface type created.  Furthermore, we would like it to be instantiated implicitly (a la C++ generic functions), with the operator being directly "visible"/usable without having to write something like:
Product::"*"(X, Y)
We simply want to write "X * Y" and end up with the right thing.   The simplest approach seems to simply allow functions and procedures to have some of their parameters be of "generic" types:
interface Float_With_Units
    <Base is Float<>;
     Name : Univ_String;
     Short_Hand : Univ_String;
     Unit_Dimensions : Array<
       Element_Type => Univ_Real,
       Index_Type => Dimension_Enum> 
       := (others => 0.0);
     Scale : Univ_Real>
    operator "from_univ"(Value : Univ_Real)
      {Value in Base::First*Scale .. Base::Last*Scale} 
      -> Float_With_Units;
    operator "to_univ"(Value : Float_With_Units)
      -> Result : Univ_Real
      {Result in Base::First*Scale .. Base::Last*Scale};
    operator "+"(Left, Right : Float_With_Units) 
      -> Result : Float_With_Units
      {[[Result]] == [[Left]] + [[Right]]};
    operator "-"(Left, Right : Float_With_Units) 
      -> Float_With_Units;
    operator "=?"(Left, Right : Float_With_Units) 
      -> Ordering;
    operator "*"(Left : Float_With_Units; 
      Right : Right_Type is Float_With_Units<>)
      -> Result : Result_Type is Float_With_Units
        <Unit_Dimensions => 
          Unit_Dimensions + Right_Type.Unit_Dimensions>
      {[[Result]] == [[Left]] * [[Right]]};
    operator "/"(Left : Left_Type is ...

  end Float_With_Units;
This presumes that "+" is defined for arrays. We might need to have a distinct array interface that defines +/-/=? component-wise.

This approach of having a parameter type of a function/operator be "generic" is similar to the way generic functions work in C++.  It would seem to work pretty well, so long as at least one of the parameters is not of a generic type, so the name resolution can take place based on the type of that parameter. For example, if we write:
    Z : T := X * Y;
we could use the type of X to "find" the "*" operator, and then see whether the types of Y and Z satisfy the other requirements.  We couldn't easily "find" the operator if both operands and the result type were all "generic."


  1. How about inheritance for units of measurement? Could a function take a distance argument and be able to compute a value for anything that inheirts from a distance unit (m, ft., rods, etc.) Or are you purposefully sticking to SI units? To me it seems that would cut out certain applications in domains with specialized units.

    Could you elaborate on why the units resulting from 8m * 1sec = 8 m*sec and 1sec * 8m = 8 sec*m wouldn't be trivial to equate? Maybe I am misunderstanding the part about worrying about extra, equivalent compound units, though...

  2. The model here is that one unit for a given kind of physical quantity (e.g. "m") is the primary unit, and the others are defined in terms of it using a Scale factor (e.g. "cm" has 0.01 Scale). Which kind of physical quantity a type represents is determined by the values in the Unit_Dimensions array. The primary units should be set up so they are related to one another using Scale factors of 1.0, so no scaling is needed as part of multiplication for the primary units (e.g. "m", "kg", "s", "N", "J", etc.).

    The SI system need not be used. Any system, such as CGS, which has the same scale of 1.0 connecting the different units, would work with the given scheme (e.g. "cm", "g", "s", "dyn", "erg", etc.).

    As far as equivalence, for instances of a generic interface, it is based on having the same actual interface parameters. If we supported multiplication by having a separate interface which provided the "*" operator and took two type parameters, we would need a special case to say that the interface defines only "commutative" operators, and so Multiplying<A,B> is considered equivalent to Multiplying<B,A>.

    We avoid this equivalence issue in the above approach because we don't have a separate interface to instantiate to get multiplying operators. Instead, the multiplying operators themselves are "generic" and instantiated implicitly as needed to match the context where the operator is used.

  3. The difficulty with this design of quantities is that you can add up things that don't work. For example, length + width + height is type-correct but not meaningful, except that the limit on the size of FedEx packages is set that way. A stronger example is adding energy and torque, which are both newton * meters.

  4. Good point. In some cases, units are not enough. You would need to add some additional semantics. Be that as it may, units by themselves can provide a useful "consistency check" on a computation, even if they don't embody all of the semantics that are possible.

    Most programming languages don't support units, much less the additional semantics you are suggesting. The late John Knight, UVA professor, talked about "real-world" types, which I think begins to include some of the kinds of semantics you mention. Here is a paper of his from 2015: