Non-null references?

Hi,

Is it possible/plausible to have a Java/C++ like language in which you can write code like:

void aFunction(nonnull Object o) ....

Something s = new Something();
aFunction(s);

and then have the type system prove that o can never be null? This feels simple but I wonder if it ends up becoming dependent types, arbitrary theorem proving or some other suitably scary thing. Being able to add simple, small increases in type safety to a program written in a non-functional language would be quite nice to have!

I also wonder if this sort of thing could be implemented using C++ smart pointers.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nice

You will want to have a look at Nice option types.

Sure, it's pretty simple.

Sure, it's pretty simple in c++. Here's a trivial non-working example (been a while since I coded in c++ so this may not directly work):

template<typename T>
class Nonnull
{
  T* ptr;
public:
  Nonnull(T* ptr) 
  {
    if(ptr) {
      this->ptr = ptr;
    } else {
      thow error;
    }
  }
};

template<T>
void foo(Nonnull<T> x) {...}

Nonnull x(new int);
foo(x);

Nonnull x(0); // errors out

Of course, the Nonnull will need to be extended to have a real interface. Now, because c++ is weakly typed, this can be easily violated if one is so dedicated, but this will at least force most situations to have a guaranteed null pointer check. Stronger type systems have these handy things called variants that eliminate many of the situations a raw pointer is used for, namely using pointers as optional values. For instance, in ocaml, you can do this (another non-working example):

type optional =
| Some of int
| None;;

let foo y = match y with
| None -> print_endline "nothing"
| Some x -> print_int x
;;

foo None;;
foo (Some 5);;

This forces you to destructure this value every time you want to access it, so you never end up trying to dereference a null pointer.

References

You could also use a reference type, which in C++ cannot be null.

references can be dangerous

That statement is only mostly true. You can still get a null reference. Consider this valid code:

int* x = 0;
int& y = *x;

don't you just love c++?

There is no such thing as a

There is no such thing as a "null reference" in C++. Dereferencing the null pointer leads to undefined behaviour (this code is invalid).

"Valid"

Begging the question as to the value of "invalid" code that the compiler gleefully goes right ahead and compiles. Believe me, I know that this is in the finest C and C++ tradition. It's also why many of us feel it's well past time for the C/C++ tradition to die off, already.

In fairness this particular

In fairness this particular example's applicable in any language that supports pointers with nulls and an 'unsafe' dereference. I can talk GHC into doing that, too.

Invalid to the max

Some C++ compilers go a step further -- this actually compiles (without warnings) on some versions of Visual C++:

int &x = x;

I don't know what comes out of the compiler, but it's not anything good.

oh i've done that before.

oh i've done that before. there was definitely non-goodness coming out of my compiler when that happened.

This is invalid — undefined behavior

And some compilers (e.g. GCC) make use of the assumption that references are never null for optimization. For example when converting to a superclass with multiple inheritance, ordinary pointers shift by some delta, but the null pointer stays null, so there is an conditional in the generated code in the case of pointer types. Converting references skips the conditional.

Null references

Its behaviour may be undefined, but it is certainly compileable C++.

I actually found code like this, in a shipping product:

class Class1 {
  Class2 & getMember1() { return *member1; }
  Class2 *member1;
}
......
void foo(Class1 &obj)
{
  if( &(obj.getMember1()) != NULL) { ... }
}

I.e. someone was returning null references, and the caller was expected to check whether the reference was null before using it!

buckle up for safety

Mike Hearn: Is it possible/plausible to have a Java/C++ like language in which you can write [...] and then have the type system prove that [the var] can never be null?

A Java-like language could do this when C style memory effects are not permitted. It's entirely too easy in C++ to cause memory anywhere to contain bit patterns of any kind, and well beyond a C++ type system's ability to help you.

But even though you could have non-nullness built into a type system, you would still find a need for the same sort of role in code as a "null reference" in order to make optional inter-object links. So the problem of coping with missing objects would tend to persist in a new system anyway, just with a new look.

Mike Hearn: I also wonder if this sort of thing could be implemented using C++ smart pointers.

You couldn't prove safety statically with smart pointers, since you can (and will) corrupt memory with C++, but you could use smart pointers to audit the problem for you at runtime in a clean manner.

Template-based smart pointers in C++ appear to be the best and least confusing way in C++ to ensure a ref to another object transitions cleanly back and forth between null and not null, so you can have reasonably secure auto pointers and refcounting. This is currently best practice in C++ refcounting that I know about. So I'd use them in the bottom runtime for a new language implemented in C++ even if templates appear rarely elsewhere.

In every day job using C++, I must write code checking that conditions are actually true at runtime that are presumably given by using C++, such as that any object you use has been constructed but not yet destroyed. (Everybody ends up using objects after destroyed, and sometimes before they get created, which is harder to do. You also assume no one will shift your object's location in memory, and that will happen too, but only very rarely.) With this code checking basic invariants in place, folks discover rapidly when they violated the C++ runtime assumptions with the code they write, and we find (potentially really bad) bugs much faster than otherwise.

For example, I have pthread mutex style wrappers for locks attempt to keep track of whether they think they are locked, just to ensure a complaint will issue when you destroy a mutex when still locked, or signal a condition when not locked, etc. When such error messages appear, it's almost always a bug in someone's code -- it seems to come up every few weeks. Similarly, refcounting operations must check whether objects still look valid, because often code adds or releases a ref to an object already destroyed. This would happen less often than it does if smart pointers were used exclusively for refcounts.

It's common for C++ references (e.g. using syntax TObject& o) to be null pointers since nothings stops you from writing the legal code fragment that erikt cited. When I was at Akamai, they were in an anti-reference phase in C++ code, because they'd been burned by null references, and believed outlawing that syntax was the best solution. I understand the guy who'd written the code in question believed null refs were impossible.

A C++ reference is just another type of pointer that uses different syntax, with typically better type safety protecting against unintended type conversions. Some folks (I've worked with some, often former Java coders) have no idea passing by value and passing by reference has a different cost in C++, since passing by value invokes more constructors and destructors for the object copies. If you, for example, cause bus locking to occur much too often from, say, atomic refcounting operations when passing by value, suddenly your code is flushing the chip cache constantly and performance is very different.

Telling folks about this when they don't want to hear it turns you into an unpopular nag.

Funny...

Rys: Telling folks about this when they don't want to hear it turns you into an unpopular nag.

Funny how informing people that they don't actually understand the language they're using can do that. I even suspect that there's a correlation between the accuracy of the observation and the level of resulting unpopularity. :-)

I've made myself unpopular in such environments by pointing out that C++ programmers who don't know that "pass by value" means "copy" and that "pass by reference" means the same thing as "pass a pointer," including the fact that references are polymorphic, shouldn't be hired, and should be fired when found. I mean, really: can you claim to be competent if you don't even understand whether the code you're calling is going to be working on the same object that you (think you) passed it or not?

Nice


void main(String[] args){
   aFunction(new Something());
   aFunction(null);
}

class Something {}
void aFunction(Something o){ println("non-null Something"); }

nicec --sourcepath .. -a t.jar test
nice.lang: parsing
test: parsing
test: typechecking

test.nice: line 3, column 4:
Arguments ([any] ?any) do not fit:
nice.lang.void aFunction(test.Something o)
compilation failed with 1 error

See Option types in Nice

You can do this Java

You can already do this in Java if you use ESC/Java2. If it cannot prove that something you have declared as non null isn't then it will spit out a warning:

class NonNullTest {
                                                                                
    private String t;
                                                                                
    public void aFunction(/*@ non_null @*/ Object o) {
        o.hashCode();
    }
                                                                                
    public void main(String[] args) {
        String s = new String();
        aFunction(s);
        aFunction(t);
    }
}

./escjava2 NonNullTest.java
ESC/Java version ESCJava-2.0a9
    [0.066 s 4317784 bytes]
 
NonNullTest ...
  Prover started:0.023 s 8643488 bytes
    [1.493 s 9022872 bytes]
 
NonNullTest: aFunction(java.lang.Object) ...
    [0.155 s 8703768 bytes]  passed
 
NonNullTest: main(java.lang.String[]) ...
------------------------------------------------------------------------
NonNullTest.java:12: Warning: Possible assignment of null to variable declared non_null (NonNull)
        aFunction(t);
                 ^
Associated declaration is "NonNullTest.java", line 5, col 30:
    public void aFunction(/*@ non_null @*/ Object o) {
                              ^
------------------------------------------------------------------------
    [0.327 s 8864680 bytes]  failed
 
NonNullTest: NonNullTest() ...
    [0.155 s 8797624 bytes]  passed
  [2.133 s 8798504 bytes total]
1 warning

See this presentation for a sketch. Pages 14-16 are probably the ones most relevant to what you're asking.

for some JVM versions

afaict from the ESC/Java2 FAQ
"ESC/Java2 is written with Java 1.4 and only runs in a Java 1.4 virtual machine ... ESC/Java2 does not parse Java 1.5 code, thus cannot reason about Java 1.5 programs"

Spec#

And for C# language, the resembling non-nullified one is Spec# (spec-sharp).

If you don't mind paying a few dollars...

this functionality is available for Java via IntelliJ IDEA. Parameters, variables and methods can be annotated as @NotNull or @Nullable, indicating that they should never be null or that them may be null, respectively. Static analysis is then used to determine places where null may be bound to a @NotNull variable, and reported either in-line or in batch. Alternatively, a compiler post-processing pass will instrument generated bytecode to add run-time checks for nullity violations.

The same annotations are usable by the open source FindBugs static analysis tool as well, if you're embarrassed for funds. There has been talk of submitting a JSR codifying these annotations, but I don't believe anything has actually been done along those lines.

Not quite as elegant as Nice, but very handy.

What do you mean by Null?

A lot of the confusion goes away if you realise the concept of null is heavily overloaded. And once you realise that implementations handling it can be made a lot simpler...

Do you mean by Null...

  • The variable is uninitialised.
  • The Answer to the Question you just asked is "Not Applicable"?
  • The item is "Not Available"
  • There is No Thing here. (Use Null Object pattern)

So much of our sorrows are dumped in the great big "It's a Null" bucket. Partly it's the language designers to blame, so by the time the users get to it it is a bit hard to fix...

Ja, worth repeating

As noted in Lisp.

Thanks!

Really interesting stuff here, it seems the Nice option types are what I was thinking of, as well as the Java comment annotation thing mentioned above.

sound non-null types are tricky

It's actually fairly subtle to plug all the soundness holes for systems like Java + non-null types, particularly for object fields. The problem is that during the invocation of the constructor of an object, a non-null field has not yet been assigned a value. You could throw an error if the program reads from the field before it's been initialized, but this is really just a glorified NullPointerException, which was what non-null fields were supposed to prevent in the first place.

You might think you can get around it with a definite assignment analysis on the constructor, but this turns out to be impossible. Since a constructor calls its superclass constructor (either explicitly or implicitly), that superclass constructor might evaluated arbitrary code before the body of the subclass constructor. In particular, it could invoke a dynamically dispatched method that references the new field from some new subclass. For example:

class A {
    A() { m(); }
    void m() { ... }
}

class B extends A {
    protected nonnull Thing x;
    B() {
        this.x = new Thing(...); // looks safe, right?
    }
}

class C extends B {
    void m() { ... this.x ... } // ouch!
}

There are several ways to get around this; a fairly elaborate solution is described in Declaring and Checking Non-null Types in an Object-Oriented Language, but there are other approaches that I'm too jet lagged to go into right now.

I thought I'd heard that it was a big no-no

to call overloaded methods in constructors, which would maybe avoid what you mention?

Good point.

Not sure about Java, but I do know that C++ avoids this particular issue by never doing dynamic dispatch for methods called in constructors. Thus, constructing a C would cause A::m() to be called and not C::m().

3 stage construction

I solved this problem in Virgil by having a 3-stage constructor model, where fields declared in the current class (either the immutable or mutable variety) that have the same name as a parameter to the constructor are implicitly assigned the value of the constructor parameter of the same name before the the super constructor is called. The super constructor then does the same: initializing any of its fields that are declared in this way with values passed from the subclass's super() constructor call. When the root class is reached, it can then execute its constructor body, which might perform virtual dispatches and/or leak the "this" object, which is considered to be OK because the implicitly initialized fields are already initialized. Then the super class constructor returns to the subclass, which returns to its subclass, etc.

An example:

class A {
    new() { }
}

class B extends A {
    field f: int; // implicitly initialized by constructor
    new(f) { }
}

class C extends B {
    field g: bool; // implicitly initialized by constructor
    new(g) : super(0) { }
}

There is a slight restriction on the expressions appearing in a super() constructor call: they cannot reference the "this" parameter, so code here cannot access any instance members. But code here can be constants, expressions, uses of the incoming parameters, calls to external globals, etc. But disallowing the use of "this" in the super code makes sure that it cannot leak until the first line of code in the root class's constructor begins executing, at which all implicitly initialized fields are guaranteed to be initialized.

Thus even if the root class calls a virtual method on "this", the receiver method will at least be guaranteed that any implicitly initialized fields it uses will have been initialized.

Virgil doesn't currently have non-null types, but if they were to be added, the simple restriction that any field of non-null type be implicitly initialized would solve the problem.

BTW, this also has the nice side effect of reducing the amount of code you have to write to initialize fields, since they can simply be declared as parameters to the constructor (and the type of that constructor parameter is inferred to be the type of the field of the matching name). So completely immutable objects have trivial constructors that simply declare their fields to be parameters to the constructor (and then BOOM, as Steve Jobs would say).

class List<T> {
    value item: T;        // a value is an immutable field
    value tail: List<T>;
    new(item, tail) { }   // trivial constructor
}

Cyclone

Cyclone offer non-null pointers. It is an attempt at a safe dialect of C.