Kinds of Null

We all know that, in many languages, null is very semantically overloaded. Has anyone attempted to separate and enumerate its various meanings? A few spring immediately to mind:

  • Empty list, set, dict., etc. (one or many?)
  • No object
  • No value
  • Uninitialized

What others are there?

(I am a long-time, on-and-off lurker, first-time poster. Etiquette corrections are welcome.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

que a favourite of mine

Ward's Wiki

has long hosted a page on this topic.

And of course, LtU has lots of discussion; most recently this thread. [edited to fix URL]

Lots of criticims over the years, from Chris Date and Hugh Darwen (see The Third Manifesto, at a bookseller near you), to the Tony Hoare talk referenced in this LTU thread.

It's a fun and interesting topic, null.

I like to think of it

I like to think of null as the bottom type. Even if languages tend to give meaning to that type, it is what it is and tends to be consistent amongst languages.

Speaking of which

Two more recent LTU threads which may be of interest are here and here.

In some statically-typed languages such as SQL, Eiffel and Java (excluding intrinsics like int, bool and double), there is an implicit minimal type with one element (null in Java, VOID in Eiffel, NULL in SQL) which is a valid subtype of pretty much everything. Strictly speaking, it isn't a true "bottom" (the bottom type is empty, after all, and the set containing the null reference is not empty), but is is a minimal type.

Many of the objections to nulls arise because of this property, making null values essentially unavoidable.

Null Type: bottom type or minimal type

The bottom type is an artifact of the curry-howard isomorphism between proofs and types and is by no means a universal feature of type theories as it is used as the type of programs that other theories would reject as not being well-formed. It isn't of any practical use in programs because a type with no values and no methods isn't a valid subtype of anything.

A minimal type with a single value and no methods is of some use in a real programming language as instances of the type require zero storage so you can do things like create a graph with data on the edges but not in the nodes while in theory not having any overhead.

"It isn't of practical use"

The bottom type does have some practical use. It's the type of constructs like Java's "throw" that diverge by throwing an exception. Java treats "throw" specially. In Java any code after a throw is flagged as unreachable. If Java had an explicit bottom type then it could treat user written code the same way, so that code after "logMessageAndThrow(blah)" would also be recognized as unreachable.

A bottom type can also be used with covariant type parameters. In Scala, there's one constant to mark the end of any list called Nil. Nil's type is List[Nothing]*, where "Nothing" is Scala's way of saying bottom. Concretely you can see that "head" of a List[T] returns a T so head of List[Nothing] must return Nothing and in fact it throws an exception.

All that said, languages like ML and Haskell just encode the bottom type as "forall a.a" and that works just fine, too. That doesn't mean the bottom type is useless, it just means that a language may have a reasonable way of indicating it without having an explicit type.

* Technically Nil's type is a singleton type that subtypes List[Nothing]

Should we encode nontermination with bottoms?

The use of ⊥ to represent things like nontermination or failure--has, for some reason, always kind of bothered me. This is a wholly philosophical observation, but if ⊥ is the zero of your type system, use of it (as the return type of a function) to signify failure of various sorts is like division by that zero--all bets are off.

In purely theoretical contexts, where nontermination, "getting stuck", or other causes of failure are essentially elided from consideration, this is a reasonable position to take, I suppose--it is a convenient means to abstract away details that aren't important to the problem at hand.

For production programming languages, error handling does become an important concern. It's frequently useful to distinguish between functions which may throw exceptions (or otherwise fail) between those which do not; and use of ⊥ to indicate failure undermines this--the algebraic sum of some T and ⊥ is T.

At any rate, if a language includes a type (whether named or not) containing more than zero values; it's probably inappropriate to refer to it as a bottom type, as ⊥ is empty. By maintaining the empty invariant, the satisfaction of otherwise impossible promises (such as the notion that the universal subtype must somehow implement every possible method) becomes vacuously possible. :)

conway

Nil is a generic inductive base case.

Also, you left out that one common meaning of nil is 0.

-t

Unknown value

In SQL below 6NF, or relational programming with outer joins, some sort of representation is required for an unknown domain value in a relation with join dependencies. NULL is the default choice.

This is unfortunate because it loses information... i.e. if some sort of 'unique unknown variable' were introduced instead, one could determine when two NULLs happen to be equal, and it'd be far more suitable to updateable views.

Ooo. My hobby horse! Let me ride it!

How about "Not Applicable".

And "nothing".

So you tell nothing to do something what should happen?

Nothing.

Interestingly enough XDR, a serialization protocol, has a pointer type.

What they call it is instructive.

They call it "optional".

myType_t * p;

is equivalent to
myType_t p;

which is a variable length array of maximum length 1. Which is serialized as the current length followed by current length elements of myType_t.

When is it OK to operate on Uninitialized?

Valgrind is a superb piece of infrastructure that pretends it's a machine code level CPU, and then invokes a plugin to contemplate each instruction executed.

The most famous of it's plugins is Memcheck, which amongst many other useful things, checks whether you are operating on uninitialized memory.

Curiously enough, to prevent a flood of false positives from standard valid C code, (eg. Alignment padding in structs) it permits you to copy uninitialized memory, but not branch.

So there you have an interesting shading of grey on the "Uninitialized" interpretation of Null.

And FALSE in some languages.

Also FALSE in some languages.

Ruby for example...

...although Ruby is a little weird there....

In ruby nil and false evaluated in a boolean context (if expression, while loop....) are false.

Everything else is true, including true and 0.

Which gives rise to an interesting question....

What is !NULL?

Is NOT NULL true? 1? ANY? ALL? Applicable? Initialized? Present?

What is !!NULL? Is it NULL?

Interestingly !ANY is NOTHING.

!ALL is NOTHING or ANY

Everything else is true,

Everything else is true, including true [...].

That was unexpected.

True may be true, but

True may be true, but is false false? Javascript has something to say about this:

f = new Boolean(false);
print(f);
if (f) print("false is true!");

Transcript:

false
false is true!