Null, bottom, etc.?

One of the annoying/frustrating things about Java's type system is that all reference types are inhabited by null so there's no way to specify, in the type system, that a reference actually refers to an instance and not null. It seems like Haskell's bottom is similar to this.

What are the differences? And where are the type systems that avoid having all types also inhabited by some nullary type?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Modern Eiffels type system has no void references

I am currently designing a variant of Eiffel which avoid nullary types completely. You can have a look at my blog where the basics of the language are described.

Classic Eiffel has introduced a feature called "void safety", but it does not avoid nullary types completely. There are still detachable types. Modern Eiffel avoids nullary types completely and furthermore unifies the advantages of functional and imperative langugages.

On this sort of topics, I

On this sort of topics, I find your Modern Eiffel project pretty ambitious and of the noblest kind, btw.

I am watching it with great interest every time I get a chance. I know it's not a recent effort of yours, too.

Do you have some implementation bits to look at to balance with your ongoing design by any chance?

I, for one, can't wait to see a nice OO language that could get rid of nulls!

My best encouragement to you here, anyway.

Thanks for the comment. No,

Thanks for the comment.

No, there is not yet any implementation. I am still experimenting with the concepts. The sequential part of the language is now pretty stable. The concurrency part is not yet in a state which I would like it to be.

I will probably implement the compiler in its own language. This is also a good proof of concept. But the downside is that you don't see anything implemented until I have the bootstrap compiler.

Makes sense.This dimension

Makes sense.

This dimension of Modern Eiffel is very relevant to this thread, and of course the language's design has to stabilize first.

Just curious, though: do you intend to have this bootstrap compiler implemented for a managed runtime environment, e.g. Java or .NET, or would you go for plain vanilla portable C to target the native platforms?

My guess is even if you provide Modern Eiffel with its own managed runtime (written in itself or not) you might save some time and effort with the former vs. with the latter, if only for the bootstrap itself. But I might be wrong in my assumptions, sure.

It is not yet decided. But

It is not yet decided. But the first bootstrap compiler will probably not compile to C because the bootstrap compiler will probably not contain a verifier (in order not to write the verifier twice, in Modern Eiffel and the bootstrap language). Therefore it seems appropriate that the bootstrap compiler compiles to a pretty typesafe language (like e.g. Ocaml).

Nice.

Nice.

They are different

You can't actually produce a bottom value in Haskell, like you can null in Java. The bottom type represents functions that don't return- they loop forever, or throw an exception, or exit the program, or what have you. And you can eliminate the bottom type from the type systems- but this means you can't have general recursion (and thus looping or similar constructs), as that introduces the ability to loop forever. And what return type does a function have that loops forever?

Strictness

I wouldn't view non-termination in Haskell as a null value, although I agree that it has some of the same characteristics. But in any case, this does not arise in a strict language. Take any of the MLs (SML, OCaml, etc) for example. Their option types are similar to Haskell's Maybe type, but not lifted, and hence not inhabited by bottom.

Not lifted?

Not lifted?

Just a pretentious

Just a pretentious denotational semantics way of saying that a type is not inhabited by bottom. :)

Edit: Point is, in a strict (aka call-by-value) language types are inhabited by values only, and bottom isn't one. So when you get a value from somewhere (e.g. as a function argument or result) you know that it cannot be bottom.

Don't get it...

I would assume bottom to emerge in the denotational semantics of a language, but not as part of the language values proper. Is bottom a Haskell value or not?

This might help

This might help:

http://en.wikibooks.org/wiki/Haskell/Denotational_semantics

Quoting "Bottom and Partial Functions":

"Every basic data type like Integer or () contains one ⊥ besides their usual elements"

Unsure if it's "authoritative" as they say, but it's still seemingly under ongoing edits (e.g. the remark "This section is wrong" in "Not all Functions in Strict Languages are Strict")

Didn't help

I glanced through it, and honestly, I grok why you would include 'bottom' as an element in the denotational semantics of a language.

It doesn't explain why 'bottom' would show up in the language proper.

To elaborate on that: It would make sense to have a 'bottom' in the denotational semantics of Java, but there it doesn't show up in the language proper.

(And I am lazy, I want someone to explain that to me. No idea where to find a rationale in the wiki page you referenced.)

On second thought

Forget about it. Just one of those Haskell quirks I guess.

I think folks should explain more often...

marco: And I am lazy, I want someone to explain [bottom] to me.

I can totally relate. Also, I use simplicity and brevity of basic explanation as a proxy for whether an idea is meaningful in many contexts. When it isn't clear/short/direct, I temporarily assign "drivel" as a mental tag, until I find reason to think otherwise.

I've been operating without a formal definition either, so I can tell you what I thought bottom meant before I did a search just now. I had thought it meant, "A sticky non-value value representing failure to progress when a calculation passed out of bounds with respect to representation or resources." For example, I had the idea NaN in floating point numerics was a bottom value, since once you start working with NaN, your results stick to NaN.

Note my coworkers never use that word (bottom), but oob for out-of-band is often used in a similar role to express handling things that failed to fit in their originally planned channel. For example, non-local exits are a form of oob control flow with respect to normal code flow. If your focus is "stuff happening" then out-of-band is an easy handle; but if your focus is "everything is a value" then you might want to explicitly represent "but it's not a value" as a value too. (People cope with apparent contradiction by picking the right context each time they think about parts of it.)

Just like zero means nothing with respect to counting things, bottom might mean nothing with respect to completing an algorithm. Wikipedia's http://en.wikipedia.org/wiki/Bottom_type looks approachable, and doesn't seem to contradict naive things I thought about bottom before.

Edit: I omitted a spatial metaphor I had in mind. If you think of values as having meaning in a specific context (a Venn diagram circle or a sphere if you feel 3D), then outside that boundary amounts to out-of-band where those values have no meaning. You might think of bottom as everything outside the circle: it's all the same value, the same "not-a-value."

I understand Bottom from a Denotational Perspective

But I don't understand why it would show up in the language proper. And it seems, from the answers I got so far, it doesn't. I.e., it isn't a value one can pattern match against for example. Or is it?

In a lazy language, all

In a lazy language, all values are represented by thunks (a closure that doesn't take any arguments and caches its value after computation). Bottom is just a thunk that doesn't terminate. So contrary to some other posts in this thread, it is in fact a real value with a real representation in your computer, but you can't do anything with it because as soon as you touch it your program doesn't terminate (if your program is single threaded at least). It is not a consequence of the type system, it's a consequence of being a Turing complete lazy language.

Or is it?Doesn't seem so.

Or is it?

Doesn't seem so. As David said, undefined is an artifact, not an actual value as useful as the values you can compute for the other types ... not undefined.

As early as undefined gets to be evaluated somewhere, the entire program is supposed to evaluate to the same "pseudo-value" : undefined.

You cannot discriminate here and there against it, evaluate some condition, and then fork to some place else to continue your computation : if you reach undefined, your whole program is undefined.

And that's its only purpose in Haskell it seems : a catch all for undefined computation.

Now, I agree it's rather nitpicking and you could always argue that it's barely prettier than having your program crash, or hang, or coredump in other languages ... but that's essentially the same conclusion for the programmer : something needs to be fixed somewhere.

In the Haskell's case, the interactive interpreter just doesn't hang, that's more friendly.

This is to be contrasted with undefined in JavaScript, for example, deceptively named the same but with a non-bottom semantic : if undefined is encountered, your entire script doesn't halt shouting "undefined!" if you don't want so ... since you can actually test against it : typeof (something) == "undefined".

EDIT
Here is a link relevant to Andreas' cautionary distinction between strict and non-strict semantic and how that relates to undefined, though :

http://www.haskell.org/haskellwiki/Non-strict_semantics

The Smoke Screen Evaporated

So, okay, I see it now: There is a value 'undefined' with bottom semantics which is a value of any type you can't pattern match against, but it _is_ part of the language proper.

It's nice, 'undefined', it makes an argument about non-terminating functions and lazy evaluation explicit, so I see it as a selling gimmick, or an educational feature. You can do without.

This has nothing to do with a 'null' type, but I see that has been stated in other posts.

Since you ask for it, here's

Since you ask for it, here's my understanding, FWIW. I have to give credit to David's and Brian's (and Andreas') comments, though, which helped.

As David pointed out, Haskell's "Bottom" is likely, mostly, more of theoretical (i.e., re: language semantic) interest than of a practical one. Bottom is meant to keep the type system consistent in expressing its incapacity to solve the unsolvable (that'd be predicting if any arbitrary recursion via partial functions terminates or not).

Thus, it's a special "type" (in Haskell's semantic) to denote non-termination or "undefinability". But of course, that you cannot manipulate or discriminate, let alone observe, in any useful way other than that : undefined.

But it's the making of a comparison between Haskell's "Bottom", all about computation, with the "Bottom" found in another languages, and pushing it too far that fails us :

if one considers "void" and all reference types to be inhabited by "null" in say, Java, or C#, or etc., the pitfall is it's actually about another kind of "Bottom", and you usually don't want to compare apples and oranges:

"null" in those languages is about the "Bottom" of something that is NOT computation (as opposed to Haskell's) :

it is about the bottom of the transient state (e.g., of object references, or pointers, or etc) that happens to live *within* an ongoing computation.

Really not the same animal when you look closer. Still an animal, though.

David said it well: _|_ is more of theoretical interest in the case of Haskell, while null is (regretably) more of practical interest in, say, OO languages, that have chosen NOT to model COMPUTATION (as functional languages do) because they have chosen to model, somehow, and more or less adequately, STATE instead.

Makes sense?

My understanding anyway.

This Discussion Hit Bottom

Given Andreas post, I think we can conclude nobody has a clue what they are talking about. The problem: Bottom is a popular symbol which occurs in many contexts to denote a bottom element of some domain of discourse.

Probably you can add another five different 'bottom's to Andreas' post; and bottom has at least three different badly-understood meanings in Haskell.

I think we can conclude

I think we can conclude nobody has a clue what they are talking about.

lol, you're too harsh. :)

I'd only put it as :

"... few people have clues on what they are willing to agree on"

;)

Well. Actually I was

Well. Actually I was laughing my head off, so I guess that came out wrong (again). ;-)

Obviously :)

Obviously :)

(Man, you were more right

(Man, you were more right on, in your remark, than I thought, actually... Eerie.)

No sense

Some part of your post makes no sense to me (transient state, etc.); the rest I disagree with.

1. What Haskellers call "bottom" is not a type, but something (element, value, what have you) inhabiting each Haskell type

2. This "bottom", and more generally thinking about what is forced by a given program, is not "mostly of theoretical interest" but of very practical interest, and programmers think about it in various circumstances. See Jules' example (length li > 5).

I agree that Haskell's bottom and "null" as a value of reference types (not SQL NULL, hell no) are completely different.

As marco notes, "bottom" is a generic term for the smallest element of a partially ordered set. Haskell's "bottom" is, in some sense, the "least defined" value: we are talking about the information-richness order of partially defined values, as in domain theory. The theory of languages subtyping often uses a notion of "bottom type" mentioned by Andreas, that is a type smaller than (/ subtype of) all others; it generally coincides with the empty type with no value. Those are the only two relevant meanings of "bottom" I'm familiar with, though marco seems to say there are much more.

(I believe dmbarbour is not wrong, but has a use of programming languages that is, currently, quite different from mostly anybody else. He therefore looks at things from a different perspective and use different vocabularies, which makes for interesting misunderstanding.)

I am more familiar with the

I am more familiar with the OO type systems jargon than with functional languages' ones. Hence my attempt at pointing out that "null" and "void" in OOPLs are likely not at all comparable with Haskell's bottom *value* (since you say the bottom type doesn't exist, there). Sorry if I brought more to the confusion instead, though.

Bottom types in OO language do "exist" : at least, when those language specs bother to define them (e.g., Eiffel). Also, when it's a reference semantic at play to work with object instances (usually the case) the "bottom type" is usually inhabited by exactly one conventional value, "null", which also inhabits the non-bottom types which have actual object instances capable of holding non-trivial state (well, the instances of the concrete types, that is).

In fact, the main point I was trying to make for the OP was :

be careful when making analogies between PL paradigms and their semantic notions provided to their respective languages, especially when it's quite easy to see (on my end, anyway) that :

1. to solve problems, OO tends to focus on modeling *mainly* the state of the world (object graphs) throughout the ongoing computation, which is ambient and barely spoken of (by the programs), and sometimes saying in fact too little about the latter (inducing side effects, and all those messy poorly controlled state-related bugs)

conversely:

2. to solve problems, FP tends to focus on modeling *mainly* computation itself (i.e., pure functions, and that begins with distinguishing which are partial, which are total), and saying as less as possible about state, since that would precisely defeat the purpose of functional purity and referential transparency.

Hoping I'm not oversimplifying:

OO programmers reason about approximate models of the world made by their programs. They focus either on the faithfulness or the usability of their representations, often losing sight of the predictability of the mutation. Bonus point for FP, definitely.

FP seems to reason first about the predictability and accuracy of the computation : what is the domain, the co-domain, when are we defining a partial function vs. a total one? The challenge for FP is it's not just about knowledgeable mathematician programmers. There are also end users around and those rarely have an even remotely accurate notion of what *really* is a function, and why, when, and how determining program termination *is* an issue, while they conceptualize objects more easily. Bonus point for OO, sort of.

I am a practitioner, not a theorician, so I probably have a somewhat biased point of view of course.

Otherwise, I am not interested in the ongoing debate of types vs. values ("which ones are of interest first?", etc). My formal knowledge has been left behind for a while already now; I don't see anymore what you guys are really talking about. I just found the early thread comments of David for the OP were accurate enough as far as I could tell, that's all.

It's a mess...

Well, for one, 'bottom' the undefined value is different from 'bottom' the 'value' used to denote a non-terminating process. And I think, 'bottom' the type which holds the 'bottom' undefined value (from OO subtyping), is different from 'bottom' the type which is the smallest subtype (from type theory), which usually denotes an empty type.

Why is it important to clarify the different usages of 'bottom': I read a Haskell wiki page which claims bottom is a necessity since otherwise Haskell's type checker would need to be able to solve the halting problem. I think there's some serious confusion of theory there.

I read a Haskell wiki page

I read a Haskell wiki page which claims bottom is a necessity since otherwise Haskell's type checker would need to be able to solve the halting problem. I think there's some serious confusion of theory there.

I suppose I've read the same. But even if that page is not too far from what a most purist theorician would be willing to accept as accurate enough re: Haskell, remember we need to be forgiving there:

my guess is such statements are meant to convey the idea that the designers have thought the language's type system as carefully as they could to keep it, say, "not too much inconsistent" globally and for the common practice in the language.

Because, hey, all Turing complete languages are inconsistent strictly speaking and by definition. And Haskell happens to be one of them.

Personal experience: I've noticed people always, eventually, end up debating about the best way to give a clear and useful (i.e., readable) definition of Turing languages' semantic that everybody can agree upon without aspirine.

Would that be because they're inconsistent, to begin with, precisely, and that people have many personal ways to "interpret" the nature of non-termination of some Turing languages programs in their brain ? You've got to think about that too, maybe, after all.

Anyway, you have much less often this sort of brain energy spendings problems with non-Turing complete languages. That's another nice (social) thing about them, btw. :)

No mess

Well, for one, 'bottom' the undefined value is different from 'bottom' the 'value' used to denote a non-terminating process.

No, it's the same. Haskell's 'undefined' for example denotes nothing but a non-terminating computation. It's defined by undefined=undefined. Using it causes your program to loop indefinitely (although the runtime system sometimes discovers such infinite loops and terminates with an error).

And I think, 'bottom' the type which holds the 'bottom' undefined value (from OO subtyping), is different from 'bottom' the type which is the smallest subtype (from type theory), which usually denotes an empty type.

No, again the same thing, but in different languages. Only in a non-strict language is the bottom type inhabited by the bottom computation, because there, every type is inhabited by the bottom computation. In a strict language it is truly uninhabited.

Actually...

A language with subtyping could have a type called Null that is the subtype of all other types and is inhabited by the null value (which could behave like a Java null or an Objective-C null or a SQL null or what have you). Then Null would be bottom in the mathematical lattice[1] sense even though it wouldn't fulfill the same role as a type that is uninhabited (or inhabited by diverging computations depending on your POV).

[1] well, full lattice if we also have a type that is the supertype to all types.

If you have a type called

If you have a type called Null with a null value, you can have a type called Bottom with no value at all, and, semantically, it is sound to claim that Bottom is a strict subtype of Null, and that therefore Null is not the subtype of all other types.

Now there is a difference between what is *sound* to claim semantically (by observing inclusions of set of values) and what is actually provable/true in the language, as defined by its syntactical inference rule for subtyping. You may very well have, for example, several empty types that are not subtypes of each other (therefore none of them is a "bottom" type in the usual sense), or a situation like you describe where no empty type is accepted and all types, including a least type, have a `null` value.

Yes, but

| If you have a type called Null with a null value, you can have a type called Bottom with no value at all

Yes, but my proposed type system had no way to explicitly express an uninhabited bottom type. My point was that in that type system it's reasonable to call Null "bottom" because in mathematical terms Null is the bottom of the bounded lower semilattice that such a type system creates, even if it would be sound to add an uninhabited type "below" it[1]. Hence it's reasonable to have a bit of confusion on what bottom type "really" means.

[1] Scala does just that. Null (inhabited by null) subtypes all the reference types and Nothing (inhabited by nothing) subtypes even Null.

True

You are right, what type is bottom of course depends on what your subtyping relation is. It can still be inhabited, although there are very few possibilities for that in typical PLs. Regarding null specifically, I would argue what is "messed up" is not the notion of a bottom type per se, but those languages that put null in everything. :)

It's defined by undefined=undefined.

Well, it's not, is it. It's defined in the Prelude as "undefined = error "Prelude; undefined"" But this is real semantic nitpicking. I would say there's a good case for _not_ viewing an 'error' value as the same as whatever a non-terminating function yields. For the simple reason that one will halt the entire program, and the other will not terminate.

But, well, whatever.

Generalisation

Well, in denotational semantics, ⊥ is often generalised to include any kind of computational behaviour that is observationally equivalent (i.e. from inside the program) to non-termination. Halting the program interestingly is.

-

-

Bottom is not a value.

Bottom is not a value. You can't observe it, or distinguish it from other values. Its use in Haskell is most often an artifact of Haskell's operational semantics, not inherent to the problem specific type. We should ignore artifacts of embedding when reasoning about embedded languages. I.e. when reasoning about monad laws, arrow laws, etc. it is often morally correct to ignore ⊥.

it is often morally correct

it is often morally correct to ignore ⊥

Even so, if it makes sense to morally ignore _|_ in Haskell, but back to the OP's post, I really don't see Haskell's bottom as a similar thing (at all) as, say, Java's null.

Clarifications

There seems to be some confusion in this thread. Let me try to clear some of it without sounding too much like a theorist.

1. Values vs computations. The former are the results of the latter. In a strict language, these are strictly separate concepts, and types classify only values. Computations are not usually described by types, only their results are. (Computations are encapsulated in functions, though. And monads are a way to make computations into first-class values themselves.) In a non-strict (e.g. lazy) language on the other hand, the difference is muddled a bit. Types actually do describe (pure) computations, because everything can be a thunk. (Other languages that are even purer than Haskell regard non-termination as a computational effect and require all pure computations to be terminating. Then the difference goes away.)

2. Bottom. Formally describes a non-terminating (or erroneous) computation. It is not a proper value, and in a strict language, does not show up as one. In a non-strict language a la Haskell it inhabits every type, though. The crucial difference is that when you e.g. have a function f : A -> B, then in a strict language f can safely access its argument without risking non-termination. Likewise the caller can safely access the result. In a non-strict language like Haskell the type contract is weaker in a sense, because both could diverge.

3. The bottom "value" vs Bottom type. These have nothing to do with each other. The former is part of the formal semantics of computations. It exists independent of types. The latter is the least type in the subtyping (semi)lattice, which is not inhabited by any values. (Although they relate by bottom being the only inhabitant of a computation with result type Bottom.)

4. Bottom vs null. The former is a computation, the latter just a random constant value with some special meaning. The difference is that you can access null like any other value (e.g. compare it) and continue after that, probably based on the outcome. In contrast, accessing bottom always results in bottom, there is no way to "observe" that something is bottom from within the language.

5. 'undefined'. In Haskell, 'undefined' is predefined as a diverging "value" (i.e. bottom), which is sometimes used in a way null is in other languages, because it inhabits every type. This has nothing to do with 'undefined' in JavaScript, though, which is just a random constant that has some specific use in other language constructs (it is indeed just another form of null -- JavaScript has both, which is weird, but arguably a result of its lack of a proper option type, and the desire to distinguish Nothing from Just Nothing in some circumstances.).

Value vs. Suspended Computations

Suspended computations, thunks, are also not values of the Haskell types they compute.

When we want types that describe pure computations, we have functions. Even for infinite values, Haskellers have learned the hard way that (for performance and clarity) it is best to model these in type, too - i.e. explicit types for streams, enumerators, signals, pipes.

The crucial difference is that when you e.g. have a function f : A -> B, then in a strict language f can safely access its argument without risking non-termination.

In a strict language, `∀ f . f ⊥ = ⊥`. It seems moot to say such functions "can access its argument without risking non-termination". Where divergence happens is not observable unless we also introduce side-effects (and, consequently, partial failure). But I do not believe you meant `f` to be an impure computation.

I agree with your last three points.

Not sure what you are getting at

Not sure what you are getting at exactly, but the difference (between strict and non-strict) is quite real when you e.g. look at the semantic interpretations of types in strict vs non-strict languages. It results in different program equivalences, and as a result, different abstraction properties, so it has practical relevance as well. You cannot argue that away by saying that programmers should ignore it.

We don't need "types" to explain "values"

First, we don't need "types" to explain "values". (Rather the converse, we need values to explain types.) A semantic interpretation of types is extraneous and irrelevant to an understanding of values. Values are domain abstractions; we can represent them in many different languages. There is much programmers should ignore about how we embed values and other abstractions in any specific language. For example: if we know a domain value is always represented by a finite tree, we should ignore that our Haskell type happens to allow for infinite trees (though perhaps document this issue).

Your first point, earlier, was not about "program equivalencies". It was about "values vs. computations".

Regarding your "risk of divergence" point, strict languages always have greater risk of divergence than lazy languages. This is easy to demonstrate, since we can model any strict function as a lazy function that implicitly accesses all its arguments. The argument about where we observe divergence isn't relevant unless you introduce side-effects.

Same in untyped

In an untyped language, the semantic interpretation of its one "unitype" will differ in the same way: it's either the domain of values when you're strict, or the one of computations when you're not (unless you are total anyway, as I mentioned).

My point about program equivalences was about the implicit "contracts" of functions. It makes a difference whether your arguments are values or computations. The difference may not be observable from within the language, but externally, e.g. when you (formally or informally) reason about abstraction, via program equivalences. I don't understand why you are trying to deny the difference.

Arguing Semantics

Values are not "the semantic interpretation of types". You're acknowledging this ("it's either the domain of values when you're strict, or the one of computations when you're not"), but failing to apply it to your earlier argument about values vs. computations.

about the implicit "contracts" of functions. It makes a difference whether your arguments are values or computations.

Expressions in lazy languages necessarily meet the contracts for the same expressions in strict languages. If you want to find non-equivalent programs, you'll need to look for programs that depend on lazy evaluation (e.g. related to fixpoint or recursive definition).

Thus, there is a difference in expressiveness, but no difference in reasoning for the subset of abstractions that also work in strict languages.

No

Values are not "the semantic interpretation of types".

No, but the other way round: the semantic interpretation of types are their values -- but only in a strict language. I don't know where I "failed to apply" anything.

Expressions in lazy languages necessarily meet the contracts for the same expressions in strict languages.

And therein lies the rub: this statement is patently false for functions or non-closed expressions.

Edit: Take g = λf.[f(λx.x*0), f(λx.0)]. In a (pure) strict language, the caller can be assured that applying g to whatever function it wants, it will either diverge or return a list with two equivalent elements. Consequently, the caller can later use the two result elements interchangeably. Not so in a non-strict language, where g may return a list with different elements (one being equivalent to bottom) — for example, if you apply g to λh.h(⊥).

Divergence meets no contracts

Divergence necessarily fails to meet any contracts, since (semantically) it never actually happens - i.e. it takes forever to diverge; divergence means failure to return. In examples where the strict expression diverges, it meets no contracts. Therefore, the lazy language has no contracts to meet in order to meet all the contracts met by the same expression in strict languages. QED.

Your proposed counter-example is one where the strict expression would always diverge.

semantic interpretation of types are their values -- but only in a strict language. I don't know where I "failed to apply" anything.

Your entire section on "values vs. computations" was actually on "types vs. computations". You didn't have even one sentence relating values and computations. The implication is pretty clear, even if you did not intend it.

"Contracts"

Divergence meets no contracts

OK, I suppose that depends on your exact notion of contract. You are right as long as you only consider dynamic contracts, i.e. runtime assertions. I used it in the broader sense of properties that you can specify about a piece of code. Not all interesting properties can be stated or checked dynamically, so if you only focus on assertions then you are missing out on important aspects.

Your entire section on "values vs. computations" was actually on "types vs. computations".

I would argue that it was on (types being inhabited by) values vs (types being inhabited by) computations.

Not all interesting

Not all interesting properties can be stated or checked dynamically, so if you only focus on assertions then you are missing out on important aspects.

With staged programming I can simply include "runtime assertions" at each layer. Runtime assertions subsume everything else if you're willing to properly model compilation as staged programming.

The difference is that I'd acknowledge I'm checking structural properties of the language expression, guided by but ultimately independent of observable semantics for the language expression.

I would argue it was on (types being inhabited by) values vs (types being inhabited by) computations.

That's fair. But why bring types in at all? (We don't need types to explain values or computations.)

Justification?

Runtime assertions subsume everything else if you're willing to properly model compilation as staged programming.

That seems clearly false, but maybe I don't understand what you mean. Not all interesting properties are even computable (e.g. forall x. exists y. F(x,y)).

Anything we can prove at

Anything we can prove at compile-time, we can prove at a compile-time modeled during a runtime. We can understand any compile-time assertions/checks/tests/etc. to be runtime assertions/checks/tests/etc. in a staged computation.

I'm not sure why you find non-computable properties interesting. But, supposing you do, they'll be no less interesting at runtime.

This is a terrible conversation :)

I think this particular subthread went off the rails earlier when Andreas made the comment:

Not all interesting properties can be stated or checked dynamically

I'm pretty sure he intended that to mean that not every property can be encoded with dynamic tests (think dynamic typing) -- not that you couldn't encode the proposition in some logic at run-time.

Anyway, I agree with this position, even if I don't really understand how we find ourselves here. I also agree with your larger point in so far as I think it's reasonable to treat bottom as not being a value (but think it's just as reasonable to treat as a value).

Not treating bottom as a

Not treating bottom as a value is the same as not treating a diverging computation as a computation in a strict language. A valid formal model, although it is a bit strange from an operational perspective (since for example we can clearly get a pointer to bottom, and a non-diverging computation can still pass these pointers around).

Saying that a closed expression written in a lazy language satisfies the same contracts as the expression in a strict language is not the interesting property. People *will* take advantage of the fact that they are in a lazy language, for example they will use lazy lists. So functions must be written in such a way to not force too much. For example a function "isLengthLessThan5 xs = length xs < 5" shouldn't be written that way depending on how it will be used, since this version does not work on infinite lists and it's possible to write a version that does.

Not treating bottom as a

Not treating bottom as a value is the same as not treating a diverging computation as a computation in a strict language.

The analogy doesn't work. It's the same as not treating computations as values, in a consistent manner, across both strict and non-strict languages.

it is a bit strange from an operational perspective (since for example we can clearly get a pointer to bottom, and a non-diverging computation can still pass these pointers around)

This "pointer to bottom" notion is only observable if you also have side-effects. If we do interleave effects with lazy computation, the bottom becomes observable as a partial failure (similar to hitting resource limits).

People *will* take advantage of the fact that they are in a lazy language, for example they will use lazy lists.

Sure, but I'd note that serious, performance-sensitive projects in Haskell eventually (almost invariably) reject infinite lazy lists in favor of properly modeling streams. They're too difficult to reason about. Your `isLengthLessThan5` example isn't especially realistic (I can't think of any real programs that would make that distinction) but a realistic issue is filtering duplicates from a list, or filtering `Nothing` values from a list, or splitting [Either a b] into ([a],[b]). In all these cases one might end up performing an unbounded (even infinite) amount of computation searching for the next non-filtered value in a list. Another issue is managing state that accumulates while processing a list.

While I agree that people *will* take advantage of lazy language (any theory of affordances will tell that much), it isn't actually a good idea, and people eventually learn better (usually the hard way, in Haskell) and begin using explicit stream models.

It would be better for lazy languages to push non-termination all the way to the edges of the model, thus eliminating the semantic distinction between lazy and strict evaluation, while enabling us to leverage laziness where it is most advantageous - decoupling performance from abstraction.

I agree with some of the

I agree with some of the sentiment here. Lists don't work well as streams, because you can't make a pass through the stream without memory explosion. I'm not sure what the point of the filter example was. If you want to be able to abort or do bounded searches for the next element, then Stream isn't the right abstraction to use (whether modeled with lazy lists or otherwise).

It would be better for lazy languages to push non-termination all the way to the edges of the model, thus eliminating the semantic distinction between lazy and strict evaluation

Can you explain what you mean by that? One common use of laziness is in resolving dependency order and allowing recursive dependencies. How does that fit into your account? Here's the usual example of laziness without infinite structures:

foo p = (1, 1+fst p)
bar = foo bar

It is true that we can't

It is true that we can't simply search for "the next element". But if we step away from lazy lists, we can also provide richer queries than for "the next element". The queries themselves can control computation.

One common use of laziness is in resolving dependency order and allowing recursive dependencies.

True. Laziness is convenient in that role, and expressive enough if you need a closed set of dependencies (without hope for later extension, configuration, composition, override). OTOH, I'm usually concerned about extensibility, so I've never been too happy with this application of laziness.

In absence of laziness, I would like a convenient way to express a small set of mutually recursive functions. But for dependencies, I favor explicit models that don't cut me off from later extension and override.

I see that I posted my

I see that I posted my comment in the wrong subthread, but anyway. The bottom is a value vs bottom is not a value is a definition issue...

I agree that it's better not to use laziness and be explicit about streams and other infinite data structures, which is why I like strict languages. You still see an advantage to laziness even if you don't make use of it for correctness. What do you mean by decoupling performance by abstract and why is it good? It seems like something that could lead to inefficient programs. A program that takes a year is not any better than a program that doesn't terminate. To reason about performance you'd still need to reason about laziness.

What do you mean by

What do you mean by decoupling performance by abstract and why is it good?

Decoupling of abstraction and performance means that we can introduce new layers of abstraction, and perform ad-hoc refactoring, without much concern for performance. Abstractions in a functional language correspond to sets of functions and structures.

This is a good thing because, as I've explained before, "performance-aware developers must feel free to use the abstraction and composition features of the language without justified paranoia that performance will suffer. Otherwise, performance-aware developers feel systematic pressure to create monolithic programs with a lot of hand-specialized code. And we really want to avoid systematic pressure to do the wrong thing."

With laziness, we can easily move computations around, even store them in structures, without performing them. So it is easy to introduce new layers of abstraction without paying for each layer. In some cases, we can even reliably reason about incremental processing and stream fusion of what would otherwise be a series of batch processes.

With strict languages, it is not so. It is difficult to refactor conditional code out of a strict method without changing performance characteristics. Structures are fully evaluated prior to entering an expression, so we must be vigilant and precise about our arguments. Often this means providing several specialized implementations of a function based on which arguments we'll need, and selecting between them based on conditions observed in a caller function. If we want incremental processing, we'll need to be explicit about it, which may similarly require specializing algorithms.

Though, it is not unusual to augment a strict language with macros or limited laziness annotations to help with abstraction.

It seems like something that could lead to inefficient programs.

I would not suggest laziness is a replacement for abstractions that help us reason about performance. Rather, it augments them - allows us to easily layer more abstractions above them.

The moderate overhead of lazy implementations can be countered, to a large degree, by a good optimizer that recognizes when you'll certainly use an argument, or when the computation will be cheaper than the thunk. The optimizer has much freedom if laziness is just a performance decision (e.g. in a total language).

Because

Runtime assertions subsume everything else if you're willing to properly model compilation as staged programming.

What Matt says. That is just false.

But why bring types in at all?

Er, because this was in reply to the original discussion of this thread, which was about type systems that rule out null errors and friends?

Disagree

The first sentence is trivially true. Anything you can prove at compile-time, you can prove at a compile-time modeled in a runtime.

this was in reply to the original discussion of this thread

It was placed as a reply to one of my comments, and your context was "confusion in this thread". If you meant it as a reply to the OP, I do not appreciate the placement.

Disagree indeed

The first sentence is trivially true. Anything you can prove at compile-time, you can prove at a compile-time modeled in a runtime.

It is not, I'm sorry. First of all, the sentence said "everything", not "anything that can be proved at compile time", which is a whole different class. Termination properties are usually in the difference, and they are the most fundamental correctness properties that exist.

Second, doing respective proofs (or even stating the properties) often requires richer semantic models than are expressible within the language itself, staged or not. Likewise for other interesting properties, like abstraction or equivalence.

Third, AFAICS, even for simpler static properties, checking them via staging only works when you have the whole program. Most properties you cannot prove when the code in question isn't a closed program but just a module, and you have no way of imposing restrictions on its possible contexts. And I don't see how staging could enable you to do that (unless you go well beyond what I would still call staging).

Or, in other words, how does staging help you to state and check the termination behaviour of, say, a higher-order function provided by some module?

You may not care about it, fine. I usually do. Either way, it's not "subsumed".

Everything

First of all, the sentence said "everything", not "anything that can be proved at compile time", which is a whole different class. Termination properties are usually in the difference, and they are the most fundamental correctness properties that exist.

By "Everything" I don't mean creating rainbows and curing cancer and proving things that cannot be proven, Andreas. I mean everything that can be tested or proven. This includes any termination properties that can be proven.

While termination properties are "generally" in the difference, they are not "usually" in the difference (for structured programs written by humans or written by programs written by humans), and especially not in the cases where we care about proving termination for correctness (since we'll use a constrained language or structural discipline to support the proof).

doing respective proofs (or even stating the properties) often requires richer semantic models than are expressible within the language itself, staged or not

Untrue. Your brain is not more expressive than a Turing machine. Any proofs you can achieve on your program, and any properties you can express, can be achieved and expressed by an analysis program you add to an earlier stage of computation.

Third, AFAICS, even for simpler static properties, checking them via staging only works when you have the whole program. Most properties you cannot prove when the code in question isn't a closed program but just a module

Modules are glued together by a linker. In addition to compilation, linking is a stage of computation traditionally modeled outside the language proper. Many languages now support dynamic loading of modules, but without the ability to analyze or recompose what has been loaded.

With staged programming, we can model linking and post-link analysis and optimization within the language itself. Indeed, ability to explicitly model linking - and a flexible interpretation of `module` (potentially including configuration files, with just a little preprocessing) - is often mentioned as a point in favor of staged programming. For object capability languages, staged linking is an effective way to distribute heterogeneous authorities to modules.

In any case, you'll be able to link modules and perform cross-module proofs before compilation. Where you must, you can even annotate some modules to support the proofs (via flexible interpretation of module).

unless you go well beyond what I would still call staging

What I described are design patterns, idioms, and potential libraries or services for effective use of staging. I suppose you could say this "goes well beyond staging".

how does staging help you to state and check the termination behaviour of, say, a higher-order function provided by some module?

It can help in a couple ways.

  1. It can provide a richer explicit context. I.e. for functions you cannot prove to terminate "in general", you may be able to prove terminate after a partial application or based on contextual restrictions on input.
  2. It allows you to direct linking to support the proof. I.e. if you have multiple implementations for a given module interface, you can pursue proofs on each of them and select the module that leads to an easier proof. (With multiple modules for multiple interfaces, this generalizes to a constraint satisfaction model.)

You may not care about it, fine. I usually do. Either way, it's not "subsumed".

I've never said I don't care about it. I do, and I've considered many of these issues many years ago. But these issues are subsumed by staged programming. Really, we can understand everything we do today (especially type-driven programming) under the lens of staged programming, but with inflexible, inadequate languages.

Too speculative

Provable is not the same as automatable. Ignoring that, I'd say that what you sketch there is way more than just staging, and I'm afraid it also is way too speculative to comment on. But if it doesn't allow modular reasoning (without linking first) then I won't buy it.

I haven't heard of any sound evidence that the Human mind is subsumed by Turing machines btw (nor the other way round, for that matter).

Provable is not the same as

Provable is not the same as automatable.

Even if you can't automate the proof, you can automate a test that a known proof still applies to a given expression. Between extremes, you can create specialized proofs that can adapt to small variations in structure.

Ignoring that, I'd say that what you sketch there is way more than just staging

That's up to you, of course. It's a way to use staging. You could similarly say that monads and arrows are way more than functional programming. Or that OO design patterns are way more than OO.

I'm afraid it also is way too speculative to comment on.

Well, we all have our fears of the unfamiliar. I would have said the same about program proofs via Coq, etc. several years back.

Fact is, we already do staged programming. And proofs between stages. (Including typechecking.) In many program architectures. Without much language support.

if it doesn't allow modular reasoning (without linking first) then I won't buy it

Of course you can still do modular reasoning. Staging doesn't much help in that role, but it also doesn't get in the way. Perform any per-module analysis you want. Compose normally (e.g. for type-safety based on module interfaces).

I haven't heard of any sound evidence that the Human mind is subsumed by Turing machines btw (nor the other way round, for that matter).

I don't know about the human "mind", but I did use the word "brain". And the physical processes of computation in our brains are well understood, even if the algorithms we use are not.

We can even communicate between brain and machine - not much difficulty, beyond the normal risks of opening a skull and performing brain surgery. Working in robotics, I regularly hear of experiments involving wiring animal brain matter to a robot to control it (often with sensory feedback).

Bottom as a value

I disagree with you and agree with Andreas.

There is a difficulty in what we call a "value". In the usual theory of strict languages, a value is either (depending on how the semantics of the language is defined):
- one of the semantics elements that correct programs "denote" in a denotational semantics; in a typed language, there is usually a set of such values for each (closed) type in the language
- a syntactic entity corresponding to the *normal form* of a correct program, whose semantics is understood as a reduction by an operational semantics; this is therefore a subset of programs, without redexes of any kind, and it maps closely to the semantic set of denotations of the vision above

If you're interested in lazy languages, either call-by-name or call-by-need, you have to look at how their semantics is defined to get the corresponding concept. In all such semantics I remember of (I just checked "Small-steps and Big-steps semantics for call-by-need" by Nakata and Hasegawa, JFP 09), bottom is included in the "values" considered for a denotational semantics, and, syntactically, `let bot = bot in bot` is a normal form, because variables dot not reduce by themselves (they need to be forced). This supports Andreas claim that "bottom is a value"; but of course you're free to define the semantics of your lazy language in a different way that doesn't handle bottom like this, as long as you can show they are equivalently useful to reason about programs.

Regarding the idea that a function "can access its argument without risking non-termination", I think you have missed Andreas point; I have discussed this before somewhere (here or reddit) but didn't find the link. The point is that, in a strict language, when you're writing a function, you *know* that your arguments won't be bottom, they will be values (in the sense above), a function is never run with bottom arguments. In a lazy language, you are forced to take into account, when defining the function, the possibility that an argument is bottom, because whether you force it or not is observable (this is the difference between `id _ = ()` and `id () = ()`); this has very practical effects on the way the programmer reason when programming.

PS: regarding your point that values belong to your problem domain: I think this is a different concern. Your programming language has a notion of values as I described above, and you have a problem-specific notion of value that you try to map into your programs. Both need not coincide: some states you care about may not be representable (probably a bug), and some values of the language may not correspond to any meaningful data in your problem domain. It's better if you can refine your types as much as possible, so that the "language values" are close to the "problem values", but you generally only get an approximation (unless you're using the über-dependent language of the future).

If values are defined by language

If values are defined by language, then it is inconsistent for you to not also acknowledge values are defined by embedded languages - i.e. for a particular domain problem. This is the point many people miss, and thus they confuse abstraction layers, confuse the properties of domain values with the artifacts of the embedding. Haskell's bottom is not a value of embedded languages; each language (e.g. each monad or arrow) with bottom should model it explicitly.

My own preference is for terminating semantics and a proper model for codata, in which case strict vs. lazy vs. parallel vs. full beta reduction has no semantic impact (only a performance impact).

The point is that, in a strict language, when you're writing a function, you *know* that your arguments won't be bottom, they will be values (in the sense above), a function is never run with bottom arguments.

I understood this point. But my point is that this knowledge is useless, semantically, unless we are concerned about side-effects. What does it mean for a pure function to "run"?

What we do with pure functions is compose them into larger pure functions. And, in this sense, developers must concern themselves with potential divergence of every intermediate argument. Compositions of functions in a strict language diverge for a strictly greater set of arguments than do identical compositions of functions in lazy languages.

In a lazy language, you are forced to take into account, when defining the function, the possibility that an argument is bottom, because whether you force it or not is observable

The issue of "forcing" values is caused only by the introduction of strictness into our lazy semantics, is it not? Pattern matches can be lazy, and we could do without `seq`.

It is strict evaluation that forces us to take everything into account. Laziness relaxes this requirement. But in strict languages, the need to take everything into account exhibits in obvious phenomena:

  1. caller must often test conditions and switch between functions based on which arguments might be needed, rather than leaving problem to callee
  2. introduce macros to abstract the boiler-plate resulting from condition and function selection patterns
  3. make use of unit functions, thunks, explicit laziness

These are predictable, emergent consequences - hence evidence - of the fact that strict evaluation forces us to think more carefully about divergence (and performance) than does lazy evaluation.

What are your values?

If values are defined by language, then it is inconsistent for you to not also acknowledge values are defined by embedded languages - i.e. for a particular domain problem. This is the point many people miss, and thus they confuse abstraction layers, confuse the properties of domain values with the artifacts of the embedding. Haskell's bottom is not a value of embedded languages; each language (e.g. each monad or arrow) with bottom should model it explicitly.

I have given you two different but equivalent definition of "value", which I'm confident are shared by a large amount of people. Apparently you have a different notion of value; could you explain more precisely what it is? I would also be interested in concrete example of what my definition(s) make inconsistent or confusing.

If I understand what you are saying, you are embedding a language inside Haskell. Let's suppose you have a type `Expr` for the programs in your language a type `Val` for the results of evaluating such a program, with a function `eval : Expr -> Val`. You seem consider that the "values" at type Val are exactly the images of evaluation of well-formed (non-bottom) expressions, as a *strict* subset of the result of all expressions that would be well-typed at type Val. For example, you would argue that:

let v :: Val = v in v

Is not a "value" of the type Val. Is that correct ?

In that case, I would suggest relativizing the notion of value to the language under study: Haskell is a language, it has types and a notion of values (let's call them Haskell-values), your language (let's call it Expr), as an abstract entity, has a notion of evaluation and Expr-values, and the embedding of Expr in Haskell has exotic terms: there are Haskell-values at the (Haskell-)type Val that do not correspond to Expr-values; you could say that your language constructs have undefined behaviors on such exotic terms, and not consider them any longer.

But I think the point still stands: the Haskell-type Val has bottom as a Haskell-value, even though it's not part of the Expr-values of the corresponding type in your abstract language semantics. You could implement your language in a host language with no such thing (eg. Coq or Agda), and you would have a closer match between host values and embedded values. But having exotic terms is not necessarily a big problem.

Values relative to Language

It's okay to study values be relative to language - i.e. to distinguish `Val` as generated by `Expr` from `Val` as generated by Haskell in general (and similarly distinguish `Expr` as generated by a prescribed set of functions from `Expr` as generated by Haskell's `undefined`). I agree with much of what you're saying here.

the Haskell-type Val has bottom as a Haskell-value

Bottom represents a suspended computation that will never reduce to a value. Suspended computations are not values; we cannot semantically distinguish suspended computations from the values they may eventually compute. Bottom is not a value in Haskell, even though it is admitted by Haskell types.

Isn't this just a trivial argument over what "value" means?

Bottom represents a suspended computation that will never reduce to a value.

As far as I know, Haskell doesn't have an official formal semantics, so any reasonable discussion of what its values are must begin with a definition of "value."

I'm curious, what do you consider the values to be in lambda calculus? (Both with strict and non-strict evaluation order.)

Within a lambda calculus,

Within a lambda calculus, the values are opaque lambdas, but not the applications. And possibly other values the lambda calculus was extended with (such as small integers).

There is a lower stage, of course, at which we could treat the whole lambda calculus expression as a value. In that stage, we deal with structural semantics of the expression or AST, and we can thus distinguish an application from a lambda, or structurally distinguish two lambdas.

The broad, language-independent notion of `value` is a domain descriptor that can be distinguished from other values.

So consider the functions

So consider the functions \x. x and \x. _|_. If I understand your first sentence in the preceding, they are both values. How do you distinguish them?

Distinguish them as you

Distinguish them as you distinguish any other two lambdas - by providing arguments then comparing result values. In this case, you'll never finish the comparison because one of the two does not have a result value. (There are more general reasons you might never finish a comparison, of course, such as having an infinite domain of possible arguments that might distinguish values.)

Real world domains involve practical values: input from sensors, commands to actuators, representations in databases, processing by intermediate services. The mathematical puzzles you like to present aren't very relevant except insofar as they demonstrate the excessive expressiveness of lambda calculus for real world programming.

Similarly revealing: how do you distinguish λx.x from λx.(if x == Ackermann(6,6) then 1 else x)? Should this be something we ever need to distinguish?

Did you see my point?

You can't distinguish \x. x and \x. _|_ in the same way that you can't distinguish \x. x from _|_, and yet you seem fine to let \x. _|_ be counted as a value but not _|_.

I'm going to opt of the discussion of a taxonomy of practical and impractical values. I will, however, nominate 2^64 as the largest practical number. Make that 2^64+1.

You can't distinguish \x. x

You can't distinguish \x. x and \x. _|_ in the same way that you can't distinguish \x. x from _|_

It is true to say I can't distinguish. It is not true to say "in the same way". This would be clear if I added, for example, integers and a `function?` test to the language, in which case I could distinguish λx.⊥ from 0, but not ⊥ from 0.

OK

So in Haskell, is [_|_] a value? Your position makes much more sense to me in the context of a strict language.

With respect to the subset

With respect to the subset of Haskell for generic list processing (length, null, append, zip), [⊥] is a value. For most other operations - e.g. taking the sum of a list or matching a substring - it is not a value.

Values are domain descriptors. Which domain are you assuming? ("All possible domains" is not a valid answer. Values always have context.)

Odd terminology

How does this jibe with your telling Andreas that values come before types? Domains sound a lot like types to me (and in fact I have a set up very similar to what you're describing here in the logic I use, but use types to map expressions to values). Anyway, I'll stop now.

Domains before values

Apologies for the confusion; the word 'domain' is overloaded. I'm referring here to problem domains, the sort of 'domain' we mean by 'domain-specific language' - like engineering bridges, weather prediction, tax reporting, digital cameras, printing, GUIs. Values are descriptors for real, meaningful domains.

To me, asking whether [⊥] is a value is not a very practical question. What does it describe? It's just semantic garbage allowed by imprecision in a general purpose programming language. So is [Ackermann(6,6)]. If we want to shed light on a subject, looking at pointlessly dark corner cases isn't helping. (Indeed, I might look to finitism to eliminate such corner cases.)

Lambda calculus, understood as a domain, does have `values`, but its primary use is as a general purpose representation for values and language of other domains. Similarly, focusing on whether λx.⊥ is a value is quite irrelevant to anything in the real world (except for buggy representation in the lambda calculus).

How does this jibe with your telling Andreas that values come before types?

Try to describe or explain to me a type without using any values (e.g. expressions in a DSL for describing such types), and it will soon become clear.

Forcing is internal to lazy languages

The issue of "forcing" values is caused only by the introduction of strictness into our lazy semantics, is it not? Pattern matches can be lazy, and we could do without `seq`.

Users want to get a result at the end; just re-printing the program source with "voila, I'm lazy" is not correct. Assuming you have such a "forcing" at the limit of your program (like Main in Haskell "forces" the IO monad), it will propagate in your lazy language even if you don't use `seq` internally: if you force `f x`, it will force `f`, and if you force `fst x` or `snd x` it will force `x`. Andreas has given above another example (`x*0`) where forcing is directed by a primitive.

Programmers in a lazy language do take this forcing into account when writing code (they think: if I force the result of my function, which parts of my arguments will be forced?). This is exactly the dual of your argument that strict programmers must think about non-termination on the caller side.

(I could agree that strict language forces us to think more carefully about divergence, I wouldn't agree though with the statement that they forces us to think more carefully about *performance*. In my experience performance is often directed by memory consumption, which is much harder to reason about in a lazy-by-default language than in a strict-by-default language; you may not think about it as often, but you will have to think much, much more carefully.)

Lazy performance is harder

As Neel has pointed out, reasoning about runtime performance is harder in lazy languages, because it is not compositional.

Interestingly...

..there appears to be a paper about static analyses for space usage in call-by-need at ICFP this year. On a quick skim, they seem to use techniques originally introduced in implicit complexity research to do so, and those semantic models are actually very closely related to separation logic and other logics for aliasing.

Lazy performance

Neel's argument isn't valid unless we are interleaving side-effects - i.e. if we cannot observe this proposed non-compositional performance (based on timing intermediate side-effects), then it does not exist. The ceiling for CPU costs for a lazy expression is the CPU cost of the same strict expression - and just as easy to reason about.

As gasche points out, reasoning about memory costs can be more difficult. This is only a problem for certain programming models that accumulate thunks indefinitely over temporal abstractions (e.g. fold over a stream as a lazy state model). Unfortunately, Haskell happens to make such programs easy to express, and makes it difficult to distinguish spatial abstractions from temporal ones.

Except that

Except that the whole point of laziness is that you can use programming patterns that depend on it. And then the "strict ceiling" is simply infinite runtime, so useless for any reasoning.

Whole point of laziness

The "point" of laziness (or of anything else) is whatever we choose to value in it. I value laziness because it decouples performance from abstraction, and it enables me to abstract control-flow patterns quite easily and formally.

I do not depend on lazy semantics for much at all. And I think that - if you actually look at the directions Haskellers are taking (enumerators, iteratees, pipes, conduits, etc.) - they do not much value it in that role, either.

That is, laziness offers a significant portion of its value even if we program as though with strict semantics.

Don't follow

I don't follow. Most control flow abstractions that I'm aware of do depend on laziness. Likewise, the omnipresent use of lists as streams.

Most control flow

Most control flow abstractions that I'm aware of do depend on laziness.

Sure, but only in a very shallow manner (the sort achievable with explicit blocks or unit lambdas or first-class hygienic macros in a strict language).

Likewise, the omnipresent use of lists as streams

Are not so omnipresent. There is a reason people quickly switch to iteratees, enumerators, pipes, conduits, and other explicit stream models. Laziness has proven to serve poorly in this and similar roles.

Weird conclusion

if we cannot observe this proposed non-compositional performance (based on timing intermediate side-effects), then it does not exist.

You keep concluding that things you cannot observe from within a language are things that don't matter. If you ask me, that is a category mistake.

You keep concluding that

You keep concluding that things you cannot observe from within a language are things that don't matter.

I didn't read it that way. That probably matters to him too, I suppose. I believe he just meant those are the sort of things you will necessarily overlook if, for some reason, your point of view is constrained to be from within the language.

EDIT
That said though, to counter David's own remark and join Neel's point and yours, I'd tend to think that what Neel pointed out has high probability to be valid the majority of the time. I assume that when Haskellers make profit of laziness and call-by-need semantic in Haskell throughout the amounts of lines of code they can write to do whatever, they don't start first by thinking in terms of how the language's runtime thunks management is going to play well or bad with the visible or potential interleaving effects (for Neel's point to be valid according to David).

Or if they do reason at such a level (rather low) a priori, they could as well be using plain vanilla portable C and force themselves to a pure functional writing style full of clever pre-processor macros to manage those call-by-need thunks by hand.

Speculation on Implementation

Even from outside the language, we cannot `observe` the proposed non-compositional performance associated with lazy, pure computation. That is, we cannot look at the expression and say - based on semantics - when specific sub-expressions are computed. At least, we cannot without mentally modeling the implementation and optimizer. But in a fairly composable manner, we can say how much might be computed - both total and relative to consumption of the output.

You speculate on a relationship between language and implementation. Implementation does matter, of course, and it's best to design languages for simple implementation. But the "category mistake" you are making is confusing properties of the implementation with properties of the language. To note that a given 'thunk' is evaluated before or within or after a given function 'call' is observable only within the implementation, and is even steeped in implementation language.

Operational semantics?

Say what? A precise specification of computation is exactly what any run-of-the-mill operational semantics gives you (additional optimisations notwithstanding). Granted, the Haskell definition does not include an operational semantics itself. But it's widely understood to be a lazy language, and you can find formalisations of both Haskell and standard lazy reduction in plenty of literature.

Operational semantics

Even operational semantics for laziness deny you the ability to observe when a computation takes place in a pure function. Denotational semantics won't even make a suggestion. Semantics aren't implementation; they are specification.

Re: "additional optimisations notwithstanding" - I think the contrary; valid optimizations withstand your claims quite effectively.

Observing vs knowing

Well, exactly! An operational semantics does not let the program "observe" when reductions are taken place, but it allows the programmer to know it nevertheless. Which is all you need for (formal or informal) reasoning, and the difference you seem to keep overlooking when you say this or that "doesn't exist" or checks "subsume" everything else.

Optimisations should not change termination or big-O (or at least not worsen it), so they aren't particularly relevant for reasoning about correctness. Lazy semantics is.

Knowing the Unobserveable

Knowing the Unobserveable is more religion than science.

I do favor denotational semantics.

Silly

Even ignoring the silliness of your comment, you are still confused. Non-termination and runtime are of course observable, just not reflectively from within a program (likewise many other effects). In fact, non-termination happens to be the canonical observation often used in semantics to distinguish program behaviours (both operationally and denotationally).

Non-termination and runtime

Non-termination and runtime are of course observable, just not reflectively from within a program

We can understand that what you call "programs" are actually "subprograms", always in a larger program (which we might call the "superprogram") until you hit what we call the "real world". (Whether the real world is a subprogram is not observable.)

I have already acknowledged we can observe non-termination if we interleave side-effects (e.g. if the superprogram is effectful). Similarly, the runtime cost of a subprogram can be computed. But so long as we compose a larger pure computation, it isn't observable.

Relevantly to my earlier point, where computation occurs in a pure subprogram isn't observable even to the superprogram. That is, even acknowledging that non-termination and runtime are observable in a superprogram is insufficient to defend Neelk's argument about the non-compositional nature of lazy performance in a pure subprogram.

I don't believe I am the confused one here. Your claim is that we can know (and even that it is meaningful to know) properties that aren't even observable from a superprogram.

Giving up

OK, I give up, this doesn't lead anywhere. I'm sorry, but at this point, I don't feel much desire to go to length and explain the difference between computation, reasoning, and meta theory, or the difference between modular and non-modular reasoning — all of which, I have to say, your post does seem pretty confused about.

I expect we accept different

I expect we accept different axioms. I don't feel as though I've misunderstood anything you've said, but I keep tracing your arguments back to fundamental assumptions that I reject. Such as: "unobservable things exist". To me, that's an absurdity. Existence is justified and defined by what is empirically observable, and any contrary conclusion is sufficient to reject an argument.

Regarding "modular reasoning" - that's a rather overloaded term, might mean different things to you and me. If you mean compositional, inductive reasoning of properties across modules (∃f.∀A,+,B.P(A+B) = f(P(A),+,P(B)), for some non-trivial P), I call that "compositional reasoning" (or "inductive") and consider it very important.

Anyhow, I agree with the decision to end the conversation here.

Your tracer is broken

I keep tracing your arguments back to fundamental assumptions that I reject. Such as: "unobservable things exist".

See, such an assumption was never part of my argument. What may have been part of my argument is (the fact) that non-computable things exist, which is why your assumption that you can limit observations to what (a tower of) programs can detect is a fallacy. Non-termination is the obvious counterexample.

What may have been part of

What may have been part of my argument is (the fact) that non-computable things exist

Your argument does rely on non-observable properties existing, Andreas. And your distinction between non-observable and non-computable is quite shallow, considering that observation physically depends on computation.

But I think we are at least in agreement that we accept different axioms.

Non-termination is the obvious counterexample.

It really isn't. Your choice to distinguish "non-termination" from "takes longer than the heat death of the universe to terminate" is just as disconnected from reality as your other axioms.

Straw man

Your choice to distinguish "non-termination" from "takes longer than the heat death of the universe to terminate" is just as disconnected from reality as your other axioms.

While that distinction may not matter in practice, the distinction between either of those, and "always terminates in at most N steps" for a predictable N, does very much. Yet no program will generally be able to tell.

the distinction between

the distinction between either of those, and "always terminates in at most N steps" for a predictable N, does very much.

Yes, that property does matter.

Yet no program will generally be able to tell

Unlike the general halting problem, the property you described just now is actually decidable (assuming N is not a function of the input). It's also easily audited and observed via runtime assertions, e.g. in a tower of programs. It is also easy to structurally enforce.

Of course it is

Of course it is dependent on the input. Also, you can decide it, but not detect or check it from the outset (by a program context).

Of course it is dependent on

Of course it is dependent on the input.

That wasn't clear to me. I read it first as relative to program size or structure. Though, there is a relationship between the two (via staged programming).

you can decide it, but not detect or check it from the outset

The ability to predict a property is valuable, but is distinct from the issue of whether it is an observable property (i.e. that can be said to exist).

A point relating to your argument is that making a decision is also a program. I would ask not whether a property is decidable, but rather whether we decide in a predictable N steps whether some other program will terminate in a predictable M steps? (Which relates back to the confusion above.)

It seems you'd be okay with a proof that takes until the heat death of the universe, because you've made some arbitrary and dubious distinction (other than staging) between proofs and programs.

Humans should understand and be able to explain the code they write. Therefore, proofs on code should be shallow things - like human reasoning - based on constrained reasoning resources. It's a bit ridiculous for a machine to make deep connections that a human cannot follow, deep conclusions a human cannot understand.

[edited heavily]

The ability to predict

The ability to predict a property is valuable, but is distinct from the issue of whether it is an observable property (i.e. that can be said to exist).

So you are saying one could predict something that doesn't exist??

I would ask not whether a property is decidable, but rather whether we decide in a predictable N steps whether some other program will terminate in a predictable M steps?

Which is an even harder problem, right?

Bottom line is, there are important properties ("contracts") that you won't be able to check or prove (in reasonable time or at all) about a piece of code passed to you. The only reasonable approach then is to put the burden of proof on the provider of the code, and leave the consumer (or rather, his runtime system) with the simple task of just checking the proof. That inevitably brings you to more powerful type systems, proof-carrying code, or anything in that equivalence class.

Reflection, staging, or any other dynamic dream feature doesn't solve this problem. Let alone the underlying assumption of being able to inspect any piece of code programmatically, which would outright break all abstraction and encapsulation.

Now, how we got here form the OP I have long lost track of...

So you are saying one could

So you are saying one could predict something that doesn't exist??

You can. For example, the color of a pink invisible unicorn. The behavior of Optimus Prime in the next world emergency. Or, as you've been arguing, performance internal to a purely functional lazy expression.

Just because you can predict a property doesn't mean it exists.

but rather whether we decide in a predictable N steps whether some other program will terminate in a predictable M steps?

Which is an even harder problem, right?

Depends on how you interpret it. Can you generate a proof that, in N steps, will prove some property about a program? Relatively easy (compared to the general problem of proving or disproving the property), since you can determine in a finite and predictable number of steps (albeit, exponential in N) whether such a proof exists. OTOH, realistically we're also constrained to discovering the proof in a relatively constrained number of steps. Reality is resource bounded computations all the way down.

The only reasonable approach then is to put the burden of proof on the provider of the code, and leave the consumer (or rather, his runtime system) with the simple task of just checking the proof.

Agreed. Or said another way: providing code for predictable processing by the next stage is always a burden of the prior stage. Same principle works all the to the human, who will enforce the principle by natural constraints on time and attention.

the underlying assumption of being able to inspect any piece of code programmatically

Staging does not assume ability to inspect any piece of code programmatically, only code passed as a value into a particular staged evaluator. Of course, if you want to achieve certain cross-cutting proofs, you'll simply design a more monolithic stage in the program to ensure more code is accessible to a prover. (I.e. as opposed to using external services for encapsulation and abstraction.)

Reflection in many languages does violate encapsulation, but that doesn't need to be the case. A secure, modular variation on reflection is possible. See mirrors in Newspeak.

Forcing is not Necessarily internal to lazy languages

I grant that programming languages will need a language that consists of more than its lazy subset, but there is no need for the lazy subset of a language to include any forcing.

In my experience performance is often directed by memory consumption, which is much harder to reason about in a lazy-by-default language than in a strict-by-default language

Reasoning about memory consumption can also be difficult in strict languages if you use similar programming idioms as you would in lazy languages. The issue doesn't seem "language related" so much as "idiom related". In lazy languages today you're much more likely to use first-class functions and similar structures.

I find it easy to reason about memory performance in lazy languages, but I use idioms and design patterns (abstracted into embedded languages) that make it easy.

PS: regarding your point

PS: regarding your point that values belong to your problem domain: I think this is a different concern.

It is a relevant concern for consistency, to avoid equivocation. You and Andreas have tried to explain "values" in spite of moving goalposts called "language" and "type system". For consistency, you can do one of:

  1. acknowledge how problem specific languages, values, and type systems may be represented (embedded) in other languages and thus the nature of "value" must be reconsidered at each layer
  2. step back and take an information-theoretic language-independent notion of "values"
  3. select exactly one meta-language in which to represent all other languages and type systems

These solutions are equivalent with regards to my own arguments about values vs. computations. At the moment, I see your arguments as inconsistent in a deep, subtle way.

It's better if you can refine your types as much as possible, so that the "language values" are close to the "problem values"

Better by some metrics, certainly. But state of the art currently lacks type systems where precision is free. Instead, for precision we trade complexity, extensibility, serializability, productivity, etc.. Dynamic type enthusiasts are often quick to point this out.

Er...

I'm sorry, but I really have no clue what the heck you are talking about. Can you rephrase without resorting to all kinds of vague meta?

You have conflated "value"

You have conflated "value" with "semantic interpretation of type in a specific language" for some of your assertions. Result is invalid and inconsistent arguments. You have argued that concepts are 'muddled a bit' in lazy languages. It is not the concepts that are muddled.

What's the misunderstanding?

There must be some misunderstanding about what I have said. Unfortunately, I'm a bit puzzled about the exact nature of this misunderstanding.

This makes perfect sense to

This makes perfect sense to me, Andreas. I suppose what confuses people (or used to) is the loose or ambivalent meaning given to "bottom", bottom types, and special tokens such as null, depending on the paradigm one is the most familiar with.

In OO languages parlance for instance, the bottom type (aka "void", or "None", or etc) is often seen as the one inhabited by a unique value (null) and that no other conforms to. But which "magically" / conventionally conforms (assignment compatibility) to all other types simultaneously, if they're also inhabited by null.

Also, in your point 1, that I find valid, some can always argue nevertheless it is largely affected by our own natural language semantics, as it's seemingly pretty difficult (if not impossible) not to define computation and introduce PL strictness vs. non-stricteness in an interdependent way, and this amounts a lot to the choice prior made between constructivism and non-constructivism, or intuitionistic vs. classical logic.

Let's not forget PLs and machines are, before anything else, nothing but artifacts of something else themselves, after all : our own observations of the world and activity!

Void isn't void

One annoying thing about the Algol & C family of languages is that what they actually call "void" is not the void type at all, but the unit type, i.e. the exact opposite.

On the other hand, I'm not aware of any language where "void" is inhabited by null.

Btw, Void and Bottom are not exactly the same, nor are Unit and Top. Usually, you talk about Top and Bottom types when you have a subtyping relation, and Unit and Void in the context of algebraic types. I've seen very few languages that have both.

Edit: Use of the term "bottom" isn't loose, it just can occur in different semantic categories.

On the other hand, I'm not

On the other hand, I'm not aware of any language where "void" is inhabited by null.

Now you know one, as I was alluding to this sort of thing (and I'm probably adding to the confusion, though I believe Eiffel is maybe one of the few OOPLs which got it right, at least re: the consistency of its design with its design objectives and rationale) :

In Eiffel, "the type of Void, as declared in ANY, is NONE, the "bottom" type."

thus, 1) the (case insensitive) "Void" or "void" of Eiffel stands for the traditional token "null" as found in Java, C#, ..., and 2) "NONE" stands, in the idea, for "void" in those languages.

Note the syntactic frugality, though : instead of making them language keywords, Eiffel introduces them as, respectively:

1) void : a frozen feature of ANY available from anywhere/any scope, and

2) NONE : the so-called bottom type.

Indeed, here's an extract of ANY (itself, meant to play the role of "Object" in C#, Java, etc) :

feature -- Basic operations

   frozen void: NONE
      -- Void reference

And of course :

"Unlike ANY , class NONE doesn't have an actual class text; instead, it's a convenient fiction."

Okay! :)

Btw, Void and Bottom are not

Btw, Void and Bottom are not exactly the same, nor are Unit and Top.

And I agree with you here again. My personal regret is "void" is rarely (if at all) seriously defined or even relevant to the discussions. My own understanding is, in the context of this thread, speaking of types, and whether the paradigm is OO or FP, there should "only" be three terms to look at, not four, really:

Top, Bottom, and Unit. And that's it.

As you've said, Bottom and Top can be given a clear meaning in regard to subtyping, even in OO. And Unit is definitely an algebraic notion.

The big confusion I think occurs when trying to compare "bottom" in an OO mindset and "bottom" in FP-oriented computation theory (and practice).

I see "void" (usually a keyword, btw) often as an artificial, parasitic or redundant syntactic construct in (OO) languages. I like Eiffel's way better in looking at both these ends, Bottom and Top, of its subtyping relation.

Fair enough? :)

Well, the "void" keyword did

Well, the "void" keyword did not originate in OO but in Algol and later C.

As I said, it is not the type-theoretic Void type, though. That one is quite standard, and important when you are interested in exploiting the Curry-Howard isomorphism, for example (where it corresponds to the False proposition, while Unit is True). It also naturally arises as the empty sum, just like Unit is the empty product. It's really no less fundamental than Unit.

Ah, thanks for the

Ah, thanks for the clarification, now I see where I misinterpreted what you commented first re: void.

True, I tend to disregard the heritage of languages a la Algol and C where dangling pointers end up with coredumps when dereferenced.

Of course I don't claim OO is the only way not to be purely functional, but I am so used to helper OO and VM based runtimes and other managed environments, that I overlook more and more their complete legacy.

Meh, I just wish we'd confuse less often the intrinsic difficulties of partial computations in strict languages, with others of different natures (e.g., under-controlled state) found in the same ... or even, different languages, with different views about how's computation to be delimited and carried out in the first place.

Those topics are complex enough taken one at a time!

Java

On the other hand, I'm not aware of any language where "void" is inhabited by null.

Java has something like it thanks to erased generics:

class Box<T> {
  T value;
}

public static void main() {
  Box<Void> box = new Box<Void>();
  box.value = null;
  System.out.println(box.value); // null
}

There is a Void (note the capital "V") reference type that has one value, null.

Also, if you invoke a void-returning function in Dart in an expression context, it will implicitly return null.

Doesn't count

java.lang.Void is just an ordinary final class with a private constructor.

You can create your own Void if you want to

    public final class MyVoid {
       private MyVoid() {}
    }

Nothing to do with generics, erased or otherwise. The following never fails its assertions but has no generics.

   import java.lang.Void;

   class VoidTest {
       public static Void test(Void arg) {
          assert arg == null;
          return null;
       }
 
       public static void main(
           Void x = null;
           assert x == null;
           assert test(null) == null;
       }
    }

java.lang.Void is a kind of strange way to do something like "unit" in functional languages. "void" (lower case) behaves like "unit" right up until you need to use an expression of type void in a larger expression. Void (upper case) behaves in a more "unit" like fashion. Like "unit" it has exactly one useful value[1]. That value just happens to be null.

Look, I'm not defending this design, just explaining :-).

[1] Actually, without a security manager you can use Java's reflection API to instantiate a class that only has private constructors so java.lang.Void and MyVoid have infinitely many values differentiable by pointer equality. sigh

Total functions

In Haskell, bottom is a representation of a non-terminating computation. If there was no special error function, we could still model it as something like:

bottom :: a
bottom = fix id

In which case we'd have an actual non-terminating computation (as opposed to an IO exception) when we try to compute bottom.

If you really wish to ensure all types are occupied, you'll need to control recursion - i.e. limit non-terminating representations to codata (where each *step* terminates, but you don't know how many steps are needed; hook steps to an external clock).

Coq, Agda, and many other proof languages can guarantee every type has a valid instance. Of course, it is still possible to represent computations like Ackermann function that will take until the heat death of the universe to compute.

My hypothesis: all computations humans need in actual practice are a tiny subset of terminating computations - incremental and real-time. Most NP-complete computations (voice recognition, NLP, etc.) are better replaced by incremental hill-climbing optimizations and stable soft-constraint systems.

Could go stricter or looser

Coq, Agda, and many other proof languages can guarantee every type has a valid instance. Of course, it is still possible to represent computations like Ackermann function that will take until the heat death of the universe to compute.

We could go further toward that extreme by embracing implicit computational complexity, where the types may technically be inhabited by the same values, but reasoning about these types becomes stricter, with some classes of values seeming more real than others, as though trying to convince an ultrafinitist.

Alternately, maybe we could go a little bit backward from there to a more computationally flexible language without completely trivializing the type system's reasoning power. A paraconsistent logic might come in handy if we're not content to model nontermination as an explicit effect. To the point of the thread, bottom might be welcomed as a type inhabitant in such a system.

Ultrafinitism and Paraconsistency

Both are good ideas, IMO. Maybe both at once, in some orthogonal fashion.

Sounds like an interesting place to visit

... but I wouldn't want to live there.

Sounds like

unjustified prejudice.

but I wouldn't want to live

but I wouldn't want to live there

lol, why not ? :)

Call me crazy, but if we'd go down that path, unlike David, I'd pick just one: ultrafinitism (or just finitism).

My rationale is though paraconsistent logic looks interesting, there seems to be many more flavors to it (to choose from) than for the finite world fans and their object of study.

Might take a while to attract enough in-the-mainstream interest, though...

EDIT
(@dmbarbour) plus, a little voice in me whispers they may not be easy to look at/to use as being orthogonal, precisely (highly speculative, of course)

Programming by Analogy with Parconsistent Modules

"Paraconsistent" is not the same as "inconsistent". What it means is that inconsistencies don't explode - they propagate in a constrained manner.

This is a useful property in module composition - for example, it would allow compositions modules that, together, imply inconsistencies. And this is okay so long as those inconsistencies aren't relevant to what we're trying to prove or describe (i.e. are filtered away). This would allow a style close to programming by analogy. I believe it would apply effectively to mixin/traits/stone-soup linking models.

Orthogonal bounds on computation (the finitism aspect) augments paraconsistency. It allows us to quickly reject programs that we cannot prove (one way or another) in a number of reasoning steps bounded by a well defined limit relative to program size. Limiting depth of proofs allows the compiler to say "hey, I don't get it, please clarify the relationship between foo and bar" rather than leaving it to reach deep conclusions and connections that that the programmer never intended. (Human reasoning is shallow. Therefore, compiler reasoning should be shallow.)

Finite bounds can also be applied to control where inconsistencies are observed in a system, allowing us to work within inconsistent systems but control which inconsistencies reach certain clients or how long they may last. This is useful for tradeoffs between consistency and latency (e.g. with respect to speculative evaluation).

If we want to bring programming closer to human reasoning and utility, ultrafinitism (which justifies shallow, concrete reasoning) and paraconsistency (which allows open composition of weak specifications) are promising language features.

Okay, thanks for the

Okay, thanks for the elaboration. I never intended to confuse paraconsistency with inconsistency, though (i.e., even before this thread).

But true, I haven't even scratched the surface of all what is to read about the former. I promise I ll try to recognize the orthogonality you're alluding to, some day ... if I can. :)

I loved Meyer's early works

I loved Meyer's early works on Eiffel (mid 80s) because he was one of the first to study the client-supplier relations in software design with a serious, very committed approach re: the very design of his own language (OO paradigmatic, as we know).

All contracts have clauses of course, be they intended for human consumption or computers. Sadly though, they are a more serious double-edged sword in the latter realm, when it's about machines running in production systems 24/7 without humans to watch them every single second.

Guaranteeing that a supplier will not be returning an unexpected null value to a client which had the kind courtesy to ensure its call's input remains in the domain of the contract ... is just an example of such clauses. Or just for the supplier to promise (and keep that promise!) to be terminating, is another one, even weaker.

The so difficult problem we face in CS, and software engineering (and also, to some extent, in PLT with type system) is even though harder contracts proposed by the supplier make for harder constraints to put on the clients' side (thus, adding up to their burden), it's not enough and suppliers themselves still have hard times ... to keep that promise above, to begin with.

Unlike with humans where lawyers and judges can help clients get their way eventually when a deceptive supplier doesn't behave well, today's large scale production software is still pretty much naked. Of course, one of biggest hope and (beginning of) the obvious way out is towards proof carrying code and/or proof checkers part of the build processes.

The issue is : the harder, the more hairy and complex (and there one has to ask oneself: accidentally or essentially, then?) software contracts are, especially in terms of their number of dependencies on components, external systems, or whatnots, the more difficult those specifications are to check (if anyone cares about completeness, that is).

That leaves us with the difficult tradeoff choice to make : would that be to keep contracts as simple as possible or to make provability more robust practically (and not just theoretically) ?

Maybe the pitfall is to precisely think in those terms of a binary tradeoff and there is more than just two dimensions.

My best guess is while both simplicity and better software correctness implementations are for sure necessary, they are not sufficient : I speculate that we need better human-made annotations for consumption by software clients and suppliers in regard to the contracts they deal with every milliseconds.

Then, I speculate further that the rather involved design and implementation choices of "some" client in regard to "some" of their suppliers' (re: the choice of null-capable type systems and so on) could be better informed and improve their interop quality in the long run.

null vs bottom

Here, in a strict language that has Java-like nulls.

scala> null == null
res0: Boolean = true

scala> def bottom : Nothing = bottom
bottom: Nothing

scala> bottom == bottom

Please don't hold your breath for the answer.

Now Haskell

Prelude> let bottom = bottom
Prelude> :t bottom
bottom :: t
Prelude> bottom == bottom

Again, may take approximately forever. Hope that clarifies.

I think it's good example

I think it's good example for the OP and demonstrates eloquently the two distinct natures of things at play here, I tried to explain as well:

null denotes the absence of transient state used in computations and can be defined as part of =='s domain (at the language designer's discretion). But it doesn't absolutely prevent computation that involves it (unless you try to dereference a value state thru it, a mistake that the runtime may catch, exception, or not ... coredump).

At least, well, coredumps terminate...

The part of your example with "bottom" shows how much it is different, there : as it ALWAYS denote an undefined computation (as opposed to some usages of nulls).

Yes, that does clarify,

Yes, that does clarify, thanks.

I think what I was getting confused on was that both it's possible to "have" a value of null or bottom [1] when what you really want is a value of the appropriate type.

[1] Where "having" a value of bottom is having a variable (unevaluated thunk) that doesn't inhabit the desired type, but evaluates to bottom. I think the laziness was/is tripping me up.

And because I'd seen examples like this where GHCi doesn't terminate but keeps printing the list.
Prelude> let forever = 1 : forever
Prelude> :t forever
forever :: [Integer]
Prelude> forever
[1,1,1,1,1,......

Felix pointers

Well Felix has two kinds of pointers:

  &T   // cannot be null
  @T   // can be null

The latter is defined entirely in the library, here's the code:

open class Cptr 
{
  typedef ptr[T] = &T;
  lvalue fun deref[T]: &T -> T = "*$1";

  union cptr[T] = | nullptr | Ptr of &T;

  ctor[T] cptr[T]: &T -> cptr[T] = "$1"; // safe

  ctor[T] ptr[T]( px:cptr[T]) => 
    let Ptr  ?p = px in p; // match failure if null

  fun deref[T] (px:cptr[T])=> *(px.ptr); // checked deref

  fun is_nullptr[T] (px:cptr[T])=> 
    match px with 
    | nullptr => true 
    | _ => false 
    endmatch
  ;
  
  instance[T] Eq[cptr[T]] {
    fun == : cptr[T] * cptr[T] -> bool = "$1==$2";
  }
  instance[T] Tord[cptr[T]] {
    fun < : cptr[T] * cptr[T] -> bool = "$1<$2";
  }
}

open[T] Eq[cptr[T]];
open[T] Tord[cptr[T]];

typedef fun n"@" (T:TYPE) : TYPE => cptr[T]; 

Felix uses some "tricked up" representation rules to ensure that a cptr[T] has the same representation as a C pointer, that is, the null case in Felix is NULL in C as well. The means, provided you get the signatures of binding to C functions right, you can only get an error attempting to convert a possibly null pointer to one that can't be null, and only if you use the provided conversion. That conversion is provided to avoid needing to handle the null case explicitly every time (the client is going to wrap that in a function anyhow so we might as well give them one).