Is "type" a fundamental intrinsic property of values?

I've recently been doing some experiments with Tcl (which is untyped, or mono-typed; everything is a string), in particular working on a package (called TOOT) which allows types to be created (in a vaguely OO fashion) and then associated with values. The method for associating a type with a value is to create a new value consisting of a tuple (type,value) where this is represented in Tcl as a list of two elements: the first being a command name representing a type, and the second being the original value.

It struck me that the key concept here is that all values stored in a computer have no intrinsic type. That is, you can never actually store a number in a computer, only a representation of a number, for example. When you associate a type with a representation, you create an interpretation. Performing some operation on a value implies interpreting that value in some way, but different operations can apply different interpretations (the "type" is an extrinsic property of the value).

The package I am developing allows creating functions which encapsulate some interpretation of a value, much like classes in an OO language. This type-function can then be curried with a particular value to create a default interpretation of that value, which can then be passed around. Other operations can then either use this default interpretation, or ignore it and apply some other interpretation to the underlying representation.

This seems to be a quite different view to that embodied in most mainstream languages that I've come across, where values have a fundamental, intrinsic type. This isn't my area at all, so I thought I'd ask here for expert opinion. Is this a valid viewpoint? Is it an important distinction (between intrinsic and extrinsic typing)? Pointers to relevant theory, etc would also be greatly appreciated, and I hope this is on-topic.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Yes, it's valid

Very loosely speaking, what you're discovering is the difference between the Church and Curry interpretations of types.

In the Curry interpretation, you assume that programs evaluate to values, and that a "type" is a subset of the universe of values. So values are fundamental, and a type system is a way of determining which subset of the universe of all values that a program can compute. In the Church interpretation, you assume that types are fundamental. There is no universal space of values; values are terms evaluated at a particular type. Think of the difference in C between pointers and integers on x86 -- they are both represented with a 32-bit word, and what a given word really means can only be determined if you also know what type to interpret it at.

What you are doing is essentially a very fancy cast: your values are a package containing a representation of a type along with a representation term, and you allow the programmer to do a cast by making a new package with a different type and the same representation field.

Simpler example

Suupose you have heights and weights in your program, both represented as floating point numbers. You want to rule out expressions like H1=H1+W1 so you use two types one for heights and the other for weights. Thus 3.0::Height is not the same as 3.0::Weight.

ADTs

Which interpretation do ADTs fit into? There, instead of a value, you have a (hidden) "representation" of a value, and a set of possible operations on the value that the representation represents. The only access to the value is through its type; so do ADTs belong to a "type-first" universe?

ADTs

Well, the raw representation for an ADT in my scheme is just the name of the command which represents that type. There is also a "type" type, so you could pass around an ADT as [list type: $typename] (Tcl syntax) if you wanted to. There's no way to create an instance of an ADT, so values aren't a problem (instead, you'd use a concrete sub-type).

In my setup types don't lay down the only possible operations you can perform on a value, just a default set. There is no data hiding, although you could accomplish this too (and I have some types which do just that - "type" being one of them), but I think it's a separate issue.

Thanks

Thanks for the reply, and the information. I'll have to look this stuff up, but it sounds like the relevant theory I was after.

Well, you may not want to add them

But it makes perfect sense to multiply them: 9.0::Height*Weight. In physics, this would give the amount of work (foot-pounds). But the type of the multiplied unit is different from either Height or Weight.

Units vs. Types

We discussed this a couple of times, IIRC. Getting units right in a language is alas, usually a bit harder than just having types.

Well, since we're talking about the combination of types...

...it's only natural that the complexity is introduced. You can either have the Height type know how to combine with Weight type (or vice versa) or you can have a third type that is the resultant of the first two.

Not sure, though, whether I understand the distinction between Units & Types. I suppose it's the difference between how types interact and are composited?

Types vs. Dimensional Units

Chris, here's a relevant discussion from the archives.

Units are simple, types are not

Units are pretty simple when treated in the usual Dimensional Analysis sense, which was developed into its modern form before 1930.

Units are a finite vector space basis; instead of 1, i, j, k, for instance, you might have meter, kilogram, second, and then proceed with vector arithmetic; usually only vectors with the same non-zero components can be added (e.g. illegal to add meters to kilograms).

Types can be arbitrarily more complicated than that.

Wrong formalization

Units are not a finite vector space. It makes sense to multiply and divide units, hence velocity, work. I have never encountered an instance where addition of different units made sense, so you can't use polynomials.

As complex general types can be, units are a special case and can easily be expressed within a rich type system.

Incomplete, not `wrong'

If you regard a vector space as a (domain for a) model for quantities with units, then as far as I can see it is sound, hence not `wrong' in the usual sense. However, if you insist that adding quantities with different units is not sensible, then you may say it isn't complete, because values which don't lie on an axis don't represent a quantity or (depending on your model) represent many quantities.

If you argue that this formalization is `wrong' because there is junk and/or confusion in the domain, then you also have to argue that it is `wrong', for example, to implement a set ADT using a sequence ADT, because there are many ways to order a set.

A vector space is a kind of ring because of addition; if you want to reject addition of quantities, I guess a complete model involves some kind of group which is not a ring.

Units: even simpler than that

Mostly, but it's even simpler than that.

(Sorry the following isn't shorter and clearer, I'm getting sleepy.)

From a purely pragmatic perspective, you can allow addition, it's just that after a series of calculations the final result will be non-physical (logically defined but physically non-existant) if it still includes addition after simplification (there is plenty of precedence for this in physics, e.g. probability amplitudes). E.g. (meter + seconds)(meters - seconds) + seconds^2 isn't illegal; it simplifies to meters^2, and there is no harm at all in going through the intermediate additions to reach that final result.

Or from another perspective, note that I said vector space *basis*; the latter word is very important. I said "finite" before for pragmatic reasons, but what you really want to do is to form an infinite basis of powers of meters (...m^-2 m^-1 m^0 m^1 m^2...) and of seconds (...t^-2 t^-1 t^0 t^1 t^2...) and of kilograms, and take the cross product of those three basis spaces. The resulting orthonormal space holds all possible units (in this case; other choices of units are possible, e.g. for the full SI). Select exactly one for the physical quantity of interest, such as u = meter*sec^-1.

This is the dimensional analysis stage. If a calculation seems to involve more than one unit, such as u1=meter*sec^-1 but u2=meter, then this stage was performed incorrectly.

Then pair it with any real x or formula f over the reals, and you're done. If a formula, each term must be of the same unit.

The confusion comes from mixing up the complete field of reals used for calculation, which has no limitations on addition etc, with the formation of the infinite orthonormal space of possible units, which in effect allows only multiplication and division in its formation.

Actual calculations are done only in one dimension, that of the selected basis vector, so you could say they aren't really vector arithmetic from this point of view, they're just reals tagged/labeled with some selected unit vector.

This all makes it sound complicated, I suppose, since we're now talking about some of the pure math issues, but it's quite straightforward to implement such things -- well, except that some problem domains require non-integral unit powers (at least 1/2 powers) and sometimes other operations (log, exp), but although this would complicate the theoretical discussion, it doesn't make the pragmatic implementation much harder.

As for whether you can implement units with a type system -- well, of course you can. You can also just directly implement them with a Turing machine. Stating either possibility is, however, unenlightening. There are Turing-equivalent type systems, after all, so you can implement anything at all in them.

Or you can consider units completely separately from other type issues, as here, which is a very good idea, because units *should* be supported in all type systems (and in all programming languages). They're not tied to any particular type system, and they have their own issues, primarily that they have inherent physical meaning, which may not be true of any other issue in a type system.

Also they tend to have practical benefits, like helping to avoid destroying expensive rockets/space probes, killing patients with medical equipment, etc. This, also, is pretty rare with the other issues that arise with type systems.

Vectors?

As for whether you can implement units with a type system -- well, of course you can. You can also just directly implement them with a Turing machine. Stating either possibility is, however, unenlightening.

I could say the same thing about modelling them with a vector space.

I think however that this is a key question for language design: what features of the language help the programmer to represent the conceptual model of the problem domain in the program.

There are lots of equivalent encodings for the concept we are trying to work with. I can always say that register A1, memory location C13G, or the variable named Fred represents height, and all of these are computationally adequate, but none help me to work with height.

However, if I create a type Height I have encoded into the vocabulary of my programming space a concept that can be refined and constrained in exactly the way that I want.

I think this is what makes a type solution much more interesting from a PLT point of view: it is a solution that depends on a non-trivial property of the language.

problem versus solution domain

Units are part of the problem domain, types are part of the solution domain.

If you resist looking into units and dimensional analysis and a formal mathematical analysis of same, because you are sure that type systems are the way to go, then you're missing the point.

First one should understand the problem domain. Units arise in most physical problem domains.

Only then should one explore the solution. You're acting like I'm saying that the mathematical analysis I tried to explain is competing with implemention via types. They're not competing, it's just that understanding units must come prior to implementing them, that's all.

They are orthogonal issues; units can be implemented many ways. Type systems are a good way, but that's unhelpful if you want to implement units in a language with a weak type system.

We're mainly interested in the solution domain around here...

If you resist looking into units and dimensional analysis and a formal mathematical analysis of same, because you are sure that type systems are the way to go, then you're missing the point.

At this point, I'm not too clear what claim you are making about units of measurement as they relate to PLT.

My point is that, as a PLT issue, many of the constraints you want to express on measurements with units can be handled by type in a language endowed with a modestly rich type system.

Sure, there are lots of ways to analyze the domain, and lots of implementations as data structures but they don't necessarily have a lot to do with the expressiveness of the PL.

If you are interested in domain analysis in general, I would have to know what problem you were tring to model in particular before deciding whether or not a full study dimensional analysis would be necessary.

In general, I would think the simplest way to handle quantity in (for example) C would be something like:

struct quantityWithUnits {
  double quantity;
  char * unit;
}

From a purely pragmatic persp

From a purely pragmatic perspective, you can allow addition, it's just that after a series of calculations the final result will be non-physical (logically defined but physically non-existant)

This is what it incompleteness means. The point of a complete semantics is to eliminate what you call `logically defined but physically non-exist[e]nt' values from the model. (I call these values `junk'.)

if it still includes addition after simplification (there is plenty of precedence for this in physics, e.g. probability amplitudes)

Physicists are notoriously sloppy mathematicians. :)

E.g. (meter + seconds)(meters - seconds) + seconds^2 isn't illegal; it simplifies to meters^2, and there is no harm at all in going through the intermediate additions to reach that final result.

Minor point, but: this is irrelevant. There is no notion of `simplication' in a vector space. Two vectors are either equal or unequal. If you are talking about expressions in the programming language, then there is a reduction relation, but if you allow addition of units in the PL you are really just changing the problem to suit your solution.

Or from another perspective, note that I said vector space *basis*; the latter word is very important.

Yeah, it is, but conversely the first two words "vector space" are misleading. You want a basis (a set of generators), but not for the purpose of generating a vector space.

what you really want to do is to form an infinite basis of powers of meters (...m^-2 m^-1 m^0 m^1 m^2...) and of seconds (...t^-2 t^-1 t^0 t^1 t^2...) and of kilograms, and take the cross product of those three basis spaces. The resulting orthonormal space holds all possible units

If you insist that exponentiation of units be compatible with the multiplication in your `orthonormal space', then this will clearly cease to be orthonormal and the generated space collapses to a 3D space, and you get junk again. If you don't insist on that, you get even more junk.

I do not think you will find a complete semantics in the category of vector spaces; they have too much structure. As you say, units are simpler than that.

As for whether you can implement units with a type system -- well, of course you can. You can also just directly implement them with a Turing machine. Stating either possibility is, however, unenlightening.

No kidding! :)

Or you can consider units completely separately from other type issues, as here, which is a very good idea, because units *should* be supported in all type systems (and in all programming languages). They're not tied to any particular type system, and they have their own issues, primarily that they have inherent physical meaning, which may not be true of any other issue in a type system.

I disagree. If you can express units using your type system, you should do so rather than maintain two separate mechanisms with overlapping functionality. In order to make units interact properly with types, you will probably need to introduce some coupling anyway.

Also they tend to have practical benefits, like helping to avoid destroying expensive rockets/space probes, killing patients with medical equipment, etc. This, also, is pretty rare with the other issues that arise with type systems.

Are you saying that units are more likely to avert disasters than types? I don't believe that.

Units vs. quantities

Having said all that, and at the risk of confusing everyone with even more abstract nonsense, I just realized that even though units clearly generate a free group, that still doesn't address the problem of how to treat quantities, that is, reals with units, supporting arithmetic. Clearly one still wants something like a field, but rather than having only one type (`number') it ought to have multiple (at least three) types. So the situation is a bit like the analogy between monoids and categories: a monoid is a category with one object, and a category is a `monoid with many types'.

So, units are simpler than vectors, but quantities are probably more complex or just not comparable.

Re: Units vs. Types

If we have type expressions what more do we need? AFAIK we can define a statically-typed, safe unit to be like:

type Unit a (c, g, s)

where a is the type of the representation value (e.g. double, rational) and c, g, s are the integer values representing the powers of the three basic dimensions (i.e. length, mass, time). We can always do tricks to create these in the type-system if we have type-expressions.

With the proper definitions we can always write a type-checked product:

* :: Unit a (c1,g1,s1) -> Unit a (c2,g2,s2) -> Unit a (c1+c2,g1+g2,s1+s2)

Dimensional (units) type in Haskell

Perhaps you will find the folloing links relevant.
What you have outlined is indeed possible:

http://www.haskell.org/pipermail/haskell-cafe/2003-May/004308.html
http://www.haskell.org/pipermail/haskell-cafe/2003-May/004310.html

Dimensions and units

I'm developing an interest in the problem of static dimensional analysis, and so I bothered to take a look at the literature. There is a nice type system for dimensions developed by Andrew Kennedy; he has a page about dimension types here; see particularly the introduction and HTML slides.

After our discussion here, Doug and I discussed this issue in private communications and one of the things he said was that it is `unforgiveable' that modern programming language don't provide support for dimensions. I agreed. But now that I've read Kennedy's paper on the matter (for the second time in my life, actually, but I had mostly forgotten about it) I need to retract that. In fact, static dimensional analysis is quite a non-trivial thing, and once you start to write even simple programs with dimensions you quickly run into the need for rather sophisticated static typing features such as parametric polymorphism, polymorphic recursion and even dependent types. I can easily imagine how users would get frustrated with a system which is not careful to implement dimensions in as general and clean a way as possible.

For example:

  • Multiplication, more generally arithmetical operators, and constants should be polymorphic in dimension.
  • To define the n-th root function in terms of the square root function, you need polymorphic recursion. Similarly for a function which evaluates polynomials.
  • The return value of the power (exponent) function has a dimension which depends on the exponent argument, so to define it in general you need dependent types.

Incidentally, the encoding Daniel proposes resembles a system of Wand and O'Keefe. It has some disadvantages that Kennedy treats.

There is also a paper by Rittri, which addresses polymorphic recursion and rational exponents.

Interesting reading. I would like to treat this in my coercive subtyping calculus; the coercions between commensurate units would be the obvious conversions.

Re: Dimensions and units

Daniel's system served me very well before.

Units are still a hole even in scientific software. The $80,000 ANSYS multiphysics package has no units. One uses a paper napkin and hand calculator. This situation is typical, with some exceptions. LabVIEW offers good unit support. LabVIEW display units are heavenly. The options of scientific and engineering notation are critical. From a type standpoint, LabVIEW code allows arbitrary units and checks for correctness, how I do not know, but a nice web page covers polymorphic units.

My C library's unit entry boxes allowed any compatible unit, complete with SI prefix. This arrangement is maximum user convenience and minimum human error. Incompatible entries were disallowed. Incompatibility is easy to check. Values converted down to SI base units as a canonical internal format. From a model/view standpoint, unit conversions happened only at the boundary. There was no type checking of code, or for that matter, tracking of significant digits.

In the SI system you have 7 base units. Tack on 2 additional units, plane angle and solid angle. So there are 9 slots to fill with a power exponent. Technically the angle slots are derived units, but obviously common enough to merit privileged positions.

Units have subtleties all their own, whatever the relation to CS type theory.

Integral powers are not sufficient. You need at least signed rationals. In electronics work we speak of "root-Hertz," a bizarre unit relating to noise figures and bandwidth. So that would be (sec)^(-1/2). The question of encoding the exponents arises. Using reals makes sense, because they have direct CPU and language support. One could argue that rounding errors mandate true rationals, so 1/3 instead of 0.3333...

There are issues of dimensionless units. I handled angles as stated above, directly. Sometimes units technically cancel, but still carry meaning. Coefficient of thermal expansion is physically (degree)^(-1), but really (volume change per unit volume) per (degree). Consequently one might wish to display this parameter as (cc/cc)/degree to the user. My overall feeling about these ghost units is that they probably deserve parallel treatment or some kind of retention flag so they don't automatically cancel.

Some quantities relate by scale factors but share identical base units. Frequency is one: is it simple or angular frequency? A volt-amp is often confused with a Watt. They are absolutely not the same thing. A volt-amp (apparent power) relates to a watt (actual power) by a scale factor which depends on the circuit.

We also have the prefix problem. SI prefixes conflict with computer prefixes, as in kilobyte (1024 not 1000).

There exists no good XML schema that encodes units. Some 2-3 years ago, I was talking to unit gurus about that; maybe something new has come out since.

My summary thought is that any unit support is better than none. So it might be possible to define limited type theory which just skips some cases. Frank's "n-th root function" strikes me as a piece of exotica, interesting from a theoretical standpoint, but never arising in practice. Maybe the programmer could be required to annotate what the type system can't decide.