Type system design choices

Hi LtU,

I'm working on the design of a programming language (as I'm sure many of you are). And sometimes I feel the need to brainstorm about a new idea. LtU looks exactly like the kind of community I want to reach. My experience with different programming languages is limited (I'm basically a C++ guy) and I want to learn.

The LtU forum is not meant for design discussions, but I've been allowed to post a link to another discussion group. If you would like to respond, please do it there, not here. Any thoughts would be most appreciated!

Type system design choices

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Generic advice

Michiel,

I looked at the points you raise, and you definitely have good insight. But, just as you say, many of these issues have been explored by others. I think you will benefit tremendously from reading Essentials of Programming Languages (particularly about the distinction between denoted and expressed values), and Type and Programming Languages. The two books are mentioned here often. They are usually referred to by the acronyms EOPL and TAPL.

Thanks!

That's the sort of advice I need. I should have mentioned books in my post. They are usually much easier to digest than papers. I will look into EOPL and TAPL.

Is there much overlap between the two? Which would you suggest I start with?

quick imho reply

There's very little if any significant overlap between the two books (they're both on my table in front of me at this time).

EOPL is an introductory

EOPL is an introductory level textbook about programming lannguage theory. TAPL is aimed at more advanced students, and deals exclusively with types.

Types are more than sets of values.

Types are usually distinguished by operations as well as values.

When a programmer has a value that could be a member of several different types, and selects a type, he is selecting a set of preferences about how that value should be treated by overloaded functions capable of taking members of more than one of those different types. To make a trivial example, the value 5 could
be an integer, a floating-point value, or a member of a subrange
from 0 to 9. If it's an integer we want 5 + 6 = 11. If it's a
subrange member we want either 5 + 6 = 1 (modular addition) or 5 + 6 = {+inf} (saturating addition). In fact we might make different types of subranges depending on whether we want modular or saturating mathematical behavior. If it's a floating-point value we want 5 / 2 = 2.5. If it's an integer we want either 5 / 2 = 2&1/2 (exact division) or 5/2 = 2 (modular division with underflow, as in C). In fact we might make different integer types depending on whether we want exact semantics (leading to bignums and big rationals) or "hardware" semantics with well-defined points of modular overflow and "closed" semantics under division etc.

The point is that types distinguished by their behavior, not just their value sets, are useful in different contexts.

If you want overloaded functions to be required to have the same semantics regardless of the type of the operand, when an operand can have any of several different types, you are making it impossible to express the semantics of different types in the most natural way by overloading functions.

It's a design choice.

(1) One choice is, as you say, to allow a value to exhibit different types, as selected by the programmer. The same operator could be overloaded to do different things for these different types. This way, a type is defined not only by its values, but also by its operations.

(2) The other choice is to define types by a set of values. Same set of values = same type. Subset of values = subtype. Performing the same operation on the same value in the same environment would have the same result every time. Integer addition, modular addition and saturating addition would just be different operations (different notation / different name). In Mist, 5 / 2 = 2.5 (real division) and 5 // 2 = 2 (floored division). I borrowed that one from Python.

Neither choice is necessarily better than the other. In some way, the two choices aren't that different.

(1) looks like (2), only with several distinct values representing the concept of '5'. The type int and the type (0 to 9) would have an empty intersection. But likely there would be a set of implicit/explicit conversion rules to change from one interpretation to the other.

(2) looks like (1), only instead of casting the operands to the right type and using the sole addition operation, we do not cast, but use one of several addition operations.

Types are not sets

Types-as-sets has been discussed on LtU many times before. See e.g. my comment here for a brief rebuttal.