"null" is to tagged unions as ???? is to records

Many programmers are not aware of tagged unions and I love introducing them. I talk about how records and tagged unions are two complementary and fundamental constructs, and that null is a trick that lets you still hackily get stuff done without tagged unions.

But this got me thinking, if null is the crappy version of tagged unions, what's the crappy version of records? My initial thought was "pairs", but that's not quite right because pairs are more like Either.

Any ideas? Or am I thinking about this the wrong way entirely?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Records are formally dual to

Records are formally dual to tagged unions. Null is an implicit extra case with no payload data added to every data type. The formal dual of this would be a record field with no payload data (i.e. of type unit) added to every data type.

I don't think that analogy

I don't think that analogy is precise enough. Null in most languages definitely adds the clumsiness of an extra case, but it's even worse because "null" doesn't nest. For example there's no way to have an Option[Option[T]] using nulls.

Some languages support distinguishing between nullable and non-nullable types, which fixes the "extra implicit case" problem but doesn't fix the nesting problem.

Built-in dirty flags?

How about, instead of taking the sum with 1 or the product with 0, we take the product with 2? So now you're allowed to ask for every value, "is this value dirty?" That seems to me to have the same smell as NULL, but in product form.

This extra field for every

This extra field for every value doesn't nest either. There is no way to have two such fields on the same value. As I understand it, one can't have an opinion on what is dual or not. It's a mathematical concept. The dual of a type F(A,B) is K(F(K(A),K(B))), where K(T) is the type of continuations of type T. For example the dual of pairs (A,B) is K((K(A),K(B))), which is isomorphic to the tagged union A+B.

Instant failure as a better dual?

If you translate a NULL-having "source" language into a sane "target" language, you turn any source type T into the target type 1+T (disjoint union with the unit type with a single element). This allows you to provide a "dummy" value at any type.

You're right that dualizing "+1" should give you "*0" (the product not with the unit type of a single element, but with the *empty* type that has no inhabitant). But I think one should not consider this change on values, but on *continuations*, taken as ways to terminate the computation (whereas values initiate it by producing data).

So I would imagine a language where any continuation for the type T (an exit consuming a T) is in fact a pair of a target continuation on T, and the trivial continuation (in 0, or dual(1)) that immediately terminates computation with no argument.

So instead of an ambient "null" value, you have an ambient "exit" control-flow construction that just stops everything with no argument. The dual of an void-of-information value would be a void-of-information exit.

Yes, certainly should be

Yes, certainly should be empty instead of unit. Better just stick with '0' and '1' notation :)

This duality business is still a bit vague to me. It seems that this kind of duality only works "properly" in linear languages and without effects. For instance if you look at consumers of (a+b). Clearly this consumer needs to be able to handle both the 'a' case and the 'b' case, so it must be a pair of consumers one for 'a' and one for 'b'. However this reasoning only works if you assume that we must actually use the value. In a non linear language a perfectly valid consumer of (a+b) is one that doesn't even use the value at all. How to make sense of this? Or am I just being silly?

Pointer arithmetic

The null concept has two (structural) problems: first, it adds a value to every type (type `a` becomes `a + 1`); second, it isn't parametric (type `a+b` implicitly has type `(a+b)+1` instead of `(a+1)+(b+1)`). These properties make it difficult to reason with null, especially in context of generic programming.

For records, an implicit `*1` product (adding unit) would not be a problem; it does not add any information to the type. At most, it adds structure and room for extensions. However, the actual dual to the implicit null sum would be an implicit product `a*0` - a pair where one value is divergent or logically invalid. (A type that should never be realized at runtime.)

The closest in-practice dual to 'crappy tagged unions' is probably pointer arithmetic, i.e. unsafe memory paired with every object reference, which does (roughly) seem to correspond to a `*0` product.

Oh, interesting. Can you

Oh, interesting. Can you think of anything that's a more precise dual, even if it isn't used in practice anywhere? I'm looking for some construct that would let you limp along in a language that didn't support records.

Null not built in

In most of the newer languages with strong type systems, NULL is not built in. For a variety of reasons, it impedes program reasoning if pointers in general are nullable. In languages that care about this, it's fairly common to see something like:

datatype nullable 'a
  null
  some 'a

you'll more commonly see that "null" written as "none" for stylistic reasons. From a programming perspective, the difference is that NULL is not considered a value of type 'a - this is a tagged union, and the tag must be unpacked to access the pointer.

incomplete specifiers

well, "most of the newer /ivory tower/ languages" maybe? :-) has anybody already released a language that isn't in its infancy that is used by lots of people to do real commercial work? (i'm just whining since i'm using scala at the moment, which imho failed to do the right thing, here.)