Something I forgot

I think it was discussed here before but it came up in a conversation I had, and I forgot whether I ever got an answer to the following question:

Why doesn't Haskell need a value restriction a-la ML to deal with unsound interactions between mutable state and polymorphism?

Did no-one ever check this or does it have to do with invariance? I forgot.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Haskell has a severe value restriction

All expressions have to be values.

The reason you don't need a value restriction with IORefs is because the type of newIORef produces a ref in the IO monad. So the pure function is generalized, but to actually get a reference you have to instantiate it at a particular type so that you can bind it to the monad. You never have a "generalized reference".

Expressions != values

All expressions have to be values.

Clearly (\x -> x + 1) 1 is a Haskell expression but not a value.

What do you mean?

If you mean that expressions denote values and are not themselves values, then I agree, I was sloppy with language. Or did you mean something deeper?

If you mean that expressions

If you mean that expressions denote values and are not themselves values, then I agree, I was sloppy with language.

It's worse than that. What value does the Haskell expression let x = x in x denote? Non-termination (interpreted by bottom in a domain-theoretic model) is not a value in the standard sense of the word (see e.g. Harper's PFPL, or Pierce's TAPL).

Not well-formed?

Is that legal? Normally, 'let v = e0 in e1' is 'mostly' supposed to be equivalent to '(\v.e1)e0', so that would translate to '(\v.v)v' and the latest 'v' is unbound.


Is that legal?

In what sense of the word? It is well-typed Haskell.

Normally, 'let v = e0 in e1' is 'mostly' supposed to be equivalent to '(\v.e1)e0', so that would translate to '(\v.v)v' and the latest 'v' is unbound.

The expansion you give is for non-recursive let expressions, but in Haskell every let expression is recursive (unfortunately, IMO). If you had an explicit fixpoint operator fix, then you could expand let x = e1 in e2 to (\x.e2) (fix (\x.e1)).

What is a value?

Bottom is a value in the sense of the word I intended. Is that really non-standard? TAPL is focused on the types, so I'd guess it is interested in asserting that the bottom type has no values, but my impression is that bottom is considered a Haskell value (by Haskell people and documentation). In the context of untyped lambda calculus, bottom is a value.

Do you have any issue with the point I was making, outside of that definition, which was that Haskell can generalize types for any expression because all expressions denote values (in my sense)?

I don't think you are right

The question is why you can't do the same trick like in naive ML-type inference and coerce an ''a ref' to any value. I don't know a lot of Haskell but I simply tried it in ghci. It seems you can't coerce because Haskell makes 'a' rigid, which I think corresponds to a Skolem constant.

So, ML went 'ooh, you can coerce polymorphic mutable references and that is a problem, so let's make sure they are monomorphic through a value restriction' and Haskell went 'ooh, sometimes coercing polymorphic types is bad, so let's introduce Skolem constants into the type system'.

I am not sure why coercing is bad and Haskell went another way, but I doubt it's mutable state. It's more likely they fixed something else, how to treat universal quantification, and as a bonus mutable state references can't be coerced to polymorphic types.

Well, that's what I make of it.

ref objects

The documentation I found on the ML value restriction (at mlton) uses an example:

val r: 'a option ref = ref NONE

In Haskell, you could have this, leveraging an existential type wrapper, something close to:

data SillyRef = forall a. SillyRef (IORef (Maybe a))

newSillyRef :: IO SillyRef
newSillyRef = SillyRef <$> newIORef Nothing

We can't do it directly due to some restrictions on how Haskell works with existential types, hence the SillyRef wrapper. But I think that it isn't the type rigidity here that's the issue. Rather, it's that Haskell is a lot more careful about context.

ML doesn't generally control context for expressions that construct mutable objects, e.g. `ref 0` must be distinguished from every other `ref 0` expression. The value restriction is a stopgap to deal with some semantic problems created by this. Creation of mutable objects in Haskell always occurs in a typeful context like IO or ST that provides a clean, semantic opportunity for uniqueness. (Modulo unsafePerformIO.) Similarly, Haskell's context for observing generic existential types is precisely constrained.

Testing 1, 2, 3..

By Athas

module Main (main) where

import System.IO.Unsafe
import Data.IORef

ref :: IORef [a]
ref = unsafePerformIO $ newIORef []

writeBoolList :: IORef [Bool] -> IO ()
writeBoolList ref = writeIORef ref [True]

readStringList :: IORef [String] -> IO [String]
readStringList ref = readIORef ref

main :: IO ()
main = do
  writeBoolList ref
  l <- readStringList ref -- boom?
  print l -- boom!

By me

module Main (main) where

import Data.IORef

ref :: IO ( IORef [a] )
ref = newIORef []

r1 ::  IO ( IORef [String] ) -- can coerce but that just gives another action
r1 = ref

r2 ::  IO ( IORef [Bool] )
r2 = ref

-- this fails, Haskell has 'rigid' variables, I presume Skolem constants,
-- coerce1 ::  IO ( IORef [a] ) -> IO ( IORef [String] )
-- coerce1 r = r

putBool :: Bool -> IO ()
putBool True = putStr "true"
putBool _    = putStr "false"

main = do
    r <- newIORef ["ho!"]
    ref1 <- ref -- same action but different refs, of course
    ref2 <- ref
    writeIORef ref1 ["hi"]
    writeIORef ref2 [True]
    [k] <- readIORef ref1
    [b] <- readIORef ref2
    putStr k
    putBool b

This fails

-- this fails, Haskell has 'rigid' variables, I presume Skolem constants,
-- coerce1 ::  IO ( IORef [a] ) -> IO ( IORef [String] )
-- coerce1 r = r

That's just not well typed. You can't substitute for a like that. This is well typed:

coerce2 ::  (forall a. IO ( IORef [a] )) -> IO ( IORef [String] )
coerce2 r = r

But then you have to generate a forall a. IO ( IORef [a] ) value first (and it's going to contain bottom in that cell because Haskell's type system is sound).

Of course it fails.

The question isn't that it fails but why it fails under different typing systems. Note that ghc is perfectly happy to accept the other coercion.

What's the other coercion?

The one that uses unsafePerformIO?

Here's a simpler example of Haskell complaining about rigid type variables for the same reason:

foo :: a -> b
foo x = x

    Couldn't match expected type `b' with actual type `a'
      `b' is a rigid type variable bound by
          the type signature for foo :: a -> b at test.hs:1:8
      `a' is a rigid type variable bound by
          the type signature for foo :: a -> b at test.hs:1:8
    In the expression: x
    In an equation for `foo': foo x = x

Comparing type systems

Yah, the whole point of the excercise it to write down examples and see why they fail.

foo :: a -> b
foo x = x

The above, for example, is often perfectly legal ML (pending your variant of ML).

Values vs. Denotations

Bottom is a value in the sense of the word I intended. Is that really non-standard? TAPL is focused on the types, so I'd guess it is interested in asserting that the bottom type has no values, but my impression is that bottom is considered a Haskell value (by Haskell people and documentation).

The standard meaning of "value" in the literature is "result of a successful terminated computation", with "successful" depending on the language at hand. In practice, the set of values is often (but not always) characterized as a subset of the set of programs. This is the notion used in Pierce, Harper, Felleisen-Hieb, and every progress-and-preservation type safety proof that I know of.

Now, it seems you are saying that some parts of the popular Haskell community use "value" to mean "denotation" (interpretation in a model), having in mind some hypothetical domain-theoretic model of Haskell. But this is very murky, as it means that your original statement boils down to "every expression denotes a denotation", i.e., we have equipped the language with a model.

In the context of untyped lambda calculus, bottom is a value.

No, in the (weak) untyped lambda-calculus, values are simply terms in (weak) head normal form. Note that this is also the definition used in the standard paper on operational semantics for Haskell, Launchbury's A Natural Semantics for Lazy Evaluation (p. 3).

Do you have any issue with the point I was making, outside of that definition, which was that Haskell can generalize types for any expression because all expressions denote values (in my sense)?

What is a value "in your sense", precisely? This is not a rhetorical question: if "value" simply means "denotation in some model" as discussed above, your explanation for not having a value restriction would apply to any imperative language which has a model.

My opinion is as follows: if we view non-termination as an effect, then Haskell is actually an impure language. This is somewhat controversial but also consistent with the modern (post-Moggi) view of effects in programming language theory, I think. But then, there is clearly a difference between the "non-termination" effect and the "heap" effect (as found in ML), and it is this difference that allows Haskell to avoid the value restricton. What is this difference? Something related to evaluation order.


I believe you that I'm using the term incorrectly and thank you for correcting me. If you search for "non bottom value" (in quotes), you'll see that it is common, though.

I just mean that Haskell expressions denote things. A reference assignment in ML is an action or side effect. In Haskell we put that action into a monad so that it becomes a first class value. I would say values don't have side effects. (I don't consider non-termination to be an effect, even after Moggi. I acknowledge the viewpoint, but don't find it useful.). I'm curious why it is useful to consider a list containing bottom to be a value, but not bottom itself? This seems like a fairly useless distinction to me.

Edit: I understand the terminology. The emphasis in these semantics is on evaluation / reduction rather than functional denotation, why I prefer. So you can "evaluate" an expression and either get back a value or diverge. I prefer to think of values as abstract entities defined by their functional properties.

Viewing expressions as denoting things (with types simply denoting the structure of those things), it's more or less obvious that generalization is sound. If expressions have meaning that depend on the context then you have to be much more careful and in naive ML it's unsound (edit: or if expressions denote values but there is a more complicated relationship to types, as you would have in the model of imperative language).


I agree with your broad argument, and indeed I share your intuitions: Haskell programs compute things that are "less dependent on the context" than imperative programs, in a certain sense. But I'd like to make this idea more formal. Both Haskell programs and imperative programs (say, Idealized Algol) denote things, yet the denotations of Haskell programs are "more like values" than those of Algol. Can we characterize the difference? I don't know.


Yes, I obviously was struggling with that, too. The thing I was trying to suggest at the end was that maybe it has to do with the relationship between the expression and its type expression. In Haskell, you have an expression e, which denotes a value [e], and a type expression t, which denotes a type [t], and we write "e: t" when [e]:[t]. It's simple and easy to see that this supports generalization:

 forall a. [e]:[t a]   =>   [e]:[forall a. t a]

With an imperative language, the relationship is more complicated. We have to lift into some monad m, perhaps (hand wave). So when you write "e: t", that rather means [e]:m [t]. Then the generalization step would be something like this:

 forall a. [e]:m [t a]   =>   [e]:m [forall a. t a]

That no longer looks true for most m.


An imperative language distinguishes between 'code' and the result of executing the code. So for example a zero argument function is indistinguishable from its result in Haskell, but in 'C' we can reference the function 'f' and it's result 'f()'. So in an imperative language all functions are in the IO monad. 'int f(int)' in C, would be 'Int' -> IO Int' in Haskell. However see my post below, it is not the 'monadic' nature of IO that is special, IO could be any type constructor, it is the fact that type constructors have extensional equality, where functions (due to beta reduction) have intensional equality.

I'll be honest

Both of your posts completely lose me. What does intentional vs. extensional equality have to do with modeling top level programs? We could use continuation passing (with functions) instead of thinking of IO as data and everything would still work. Also, Haskell distinguishes between () -> a and a.


I think we need to agree a few things before I can explain the point about equalities.

How can I get a reference to a zero argument function in Haskell? "() -> a" is not a zero argument function, it's a one argument function that takes a type that contains a single value as its argument. Here is a zero argument function:

f = 20 + 5

In 'C' I can write:

int f() {return 20 + 5;}

And I can refer to the function as 'f', I can pass the unexecuted function as an argument to other functions, and I can execute it as 'f()'.

Continuation passing doesn't help in a pure program, because it's all still beta reduction.

Eh, still lost

I think I could argue the analogy C:Haskell is better like this:

int x = 20 + 5;                 x = 20 + 5           
int f()                         f :: () -> Int 
  { return 20+5; }              f () = 20 + 5
int g(char b)                   g :: Char -> Int
  { return 20+5; }              g b = 20 + 5
int h(int x, int y)             h :: (Int, Int) -> Int
  { return x + y; }             h (x, y) = x + y

But I won't belabor it because I still don't really know what we're talking about.

Continuation passing doesn't help in a pure program, because it's all still beta reduction.

Doesn't help what? I'm saying you can write programs that do IO in continuation passing style using functions. A program returns an OS command like "print" or "get-keyboard-input" with a continuation function that the OS calls with the result.


But I won't belabor it because I still don't really know what we're talking about.

You could make that comparison (and it's what ML does, as ML does not allow zero argument functions at all, but from this point of view ML is imperative not functional), but it misses the point. Consider:

f = fst (20 + 5, unsafePerformIO(putStrLn(".")))

It is clear "f" is a function because we can get it to cause a side effect when it evaluates, yet:

x = (f, f, f)

Does not cause the side effect multiple times. This is because Haskell does not distinguish between a function and it's result, it is free to memoise the function result.

Continuation passing won't work safely for this reason.

The point is not about which is the better "model" of 'C' in a functional setting, it's about the fundamental lack of a way to represent a computation in a functional language.

Even if we add a single () argument like this:

f x = ...
(f (), f (), f ())

It does not help because 'f' still only gets executed once, because in functional programming there is the assumption that calling the same function with the same argument always produces the same result, so again we memoise the result. This is a fundamental property of beta reduction systems.

To fix this we need uniqueness types, (see the 'Clean' language), we can then have a type that the 'single value' is guaranteed to be unique every time it is used.

x world = let world1 = f world in (let world2 = f world1 in (let world3 = f world2 in world3))

Where 'world' has a uniqueness type, and hence every application is seen to be a new unique value, forcing the memoised result of 'f' to be invalid, and causing 'f' to be evaluated each time.

However it should now be obvious that Clean does not fix the zero argument case, as you cannot pass a uniqueness type to a zero argument function. Clean solves this by forcing all IO primitives to take a uniqueness type, requiring you to pass a uniqueness type to any function using these IO primitives. However we can see that a foreign function interface would be unsound, because you could import a function with side-effects and not make it take a uniqueness type.

Are you with me so far?

Yah, that's not how all that works, I think

I disagree with your analysis of nullary functions, and you need to thread unique values.

Strange Response

I think I fixed the uniqueness types example.

I don't really see what there is to disagree with when I am sticking to the facts. Let's start with beta-reduction. Do you agree that when you call a function with identical arguments the compiler is free to memoise the result? If so a nullary function has no arguments so it could only be evaluated once.

In Haskell there is no way to distinguish

f = 25


f = 20 + 5

Any time I refer to 'f' I get 25 in both those cases. There is no before and after (hence why unsafePerformIO is unsafe).

I think "Frank" has the same analysis of Nullary functions and introduces the postfix "!" to execute a suspended computation. We can now differentiate given "f = 25 + 5" between the reference to the function definition "f" and executing the function "f!".

I disagree on your reading

Do you agree that when you call a function with identical arguments the compiler is free to memoise the result?

Well, if you've got a referentially transparent language sure, otherwise, obviously not?

In Haskell there is no way to distinguish

f = 25


f = 20 + 5

Yah, sure? Now what? There is no before and after in math, the only way to observe that there is a difference between evaluating a constant, and evaluating a complex expression, is by measuring time. "f = 20 + neverendingcomputation" sure is observable.

We Agree

Great, so we mostly agree on that. I disagree that the neverending computation is "observable". Nothing "in the system" can observe the divergence, you require a privileged observer to see it. As you say there is no before or after in mathematics, so divergence is "impossible". A series is either finite or infinite, and if it's infinite it either countable or uncountable. None of these things is a "neverending computation" and no mathematician can have worked forever (and if we believe the universe will end at some point, no mathematician can work forever, hence it is impossible). Computation is something outside of "axiomatic" mathematics (because there is a before and after in computation). I guess you could say the study of computers is a branch of mathematics, but that's just playing word games, I am sure you understand what I am getting at.

Now what? Imperative languages have a model of computation, and have a before and after, so:

int f() {return 20 + 5;}

I can reference the "before" execution with the function pointer "f", and I can reference the after computation with "f()", so imperative languages model computation not mathematics.

If we agree so far, I am having difficulty seeing where we disagree? Maybe you were objecting to something incidental about my explanation rather than something fundamental about what I was trying to explain.


Not really with you. You're not addressing how continuations would work. Note: my understanding is that haskell actually used continuations before IO monad. Also your uniqueness type example looks wrong. Might not be worth trying to sort all this out...

Small Steps

The uniqueness syntax may be wrong, but that doesn't change the essence of the argument (I have tried to improve it, I might have just been better to omit the example as it is distracting from the actual point).

So do you agree that if you call a function with the same arguments we expect the same results, so Haskell is free to memoise the result?

Note I am not saying any specific compiler will memoise the results, I am saying following the semantics the compiler is free to make that optimisation. So any reliance on the expectation that the function will get executed every time it is called is unsafe, hence "unsafePerformIO".

How the continuation approach works

Yes I agree Haskell would be free to memoize. As you say, there's a reason 'unsafe' is in the name unsafeperformIO. But the way the continuation approach works is not to rely on side effects to evaluation. The value returned from main is an action to take, and that action contains a continuation function of what the OS is to do next after taking the action. If the action was to get a value from the OS, then that value is passed in to the continuation.


Okay, I see you are treating it as a co-routine with the OS. I agree this can work, but this is also a distraction from the point I was originally trying to make (although it is an interesting distraction, and ties in with some work I have been doing modelling state using co-routines).

I like the terminology used in the "Frank" paper I mentioned above, so I'll use it.

Imperative languages provide a way to reference suspended computations without executing them. Simple lambda calculus does not, and neither does the "pure functional" part of Haskell. We can apply Frank's notation multi argument functions too, so that "f x y" passes the arguments to 'f' but does not execute 'f'. We would execute 'f' with something like (f x y)!

My observation was that in Haskell, we can use type constructors to represent suspended computations, so for the above we can define:

data F a b c = F (a -> b -> c) a b

And we can then construct:

g = F f x y

Now 'g' represents the suspended computation, which we can execute with something like:

result (F g x y) = g x y

So if we agree that Haskell cannot represent a suspended computation as an ordinary function, but it can using type constructors, then I can explain what I was trying to say about equalities.

Church encoding

Do I have to agree to that? Can't you always just church encode to represent data with functions?

I'm not trying to quibble. I just don't see where you're going with this.

Church Encoding.

When you cannot distinguish between the function and it's result, as in Haskell, then Church encoding does not help. If you provide all the arguments to the function (which you will have done in a pure program that cannot do IO), then all you have is the result.

As Marco points out above, there is no before and after in mathematics, and there is no before and after in a referentially transparent language like Haskell.

I'm not trying to quibble. I just don't see where you're going with this.

You don't need to know where I am going to come on the journey :-) If you agree all the points along the way, then we must come to the same place.

Haskell is referentially transparent

I agree that Haskell only has results, values, and doesn't expose a notion of "evaluation". Is that enough? The problem with no knowing where you're going is that I don't know which objections are just quibbles and which are important.

No Evaluation

Yes, it's enough to accept that there is no notion of evaluation for functions (note it's important that we are only talking about functions at the moment). Because of this equality is intensional as we have "25 == 20 + 5".

So do we agree that an imperative language has a notion of evaluation?


I also agree the only thing you can do with an expression is use its value (so 25 and 20 + 5 are the same). This is just referential transparency.


My understanding is that with referential transparency, the equality where "25 == 20 + 5" is intensional, it is not based on the external structure of the expressions, but the internal meaning (IE they both represent the value "25").

This seems to correspond with Coq's definition of intensional equality, which is to reduce both sides, after which the equality should be trivial.


Sure. I find it a little odd to be talking about intentional / extensional equality in this context, but go on.

Maybe it is odd?

Maybe it is odd, I don't know, but let's look at the imperative side now.

An imperative program has a before and after, and we can define a function like "int f() {return 20 + 5;}", and be able to refer to both the before as "f" and the after as "f()". So in an imperative program there is a notion of executing a computation independently of passing its arguments.

What confuses the issue is that functional languages normally support partial application of arguments. So lets imagine an imperative language with partial application of arguments (something like TypeScript):

function f(x : number) : (y : number) => () => number {
   return function(y : number) : () => number {
      return function() : number {
         return x + y

We can supply the arguments without executing the computation something like this "f(20)(5)" which would refer to the computation, and "f(20)(5)()" which would be the result. We might propose a notation where you pass arguments in a curried style with a postfix '!' indicating executing the function, so we might write:

f = add(20, 5)

Is the computation before execution, and

f = add(20, 5)!

Is the result.

So to model computation we need to have a before and after. The interesting thing is type-constructors in functional languages have a before and after.

I think that it is interesting that type constructors behave like this, so that:

data F = F (Int -> Int -> Int) Int Int

This allows us to represent a suspended computation in a functional language like this:

x = F (\x . \y . x + y) 20 5

This allows us to represent a suspended computation in a functional language, although it needs some boilerplate to execute the function.

We could then ask what kind of equality do we have on constructed types like:

data F = F (Int -> Int -> Int) Int Int
data G = G Int
if (F (\x . \y . x + y) 20 5) == (G 25) ...

In this case we do not expect them to be equal, and in Haskell we cannot even compare them for equality without defining the equality operator for a type constructor. We cannot even do:

if (G 25) == (G 25) ...

without adding definitions for equality on the 'G' type constructor. This sounds like extensional equality because we have to supply the proof of pointwise equality, which would look something like this:

instance Eq G where
   (==) (G x) (G y) = x == y

Interpreting data

You could just as easily use an "Add (Int 5) (Int 20) :: Expression" example, with the "!" operator representing an evaluate function. And you could just as easily Church-encode the Expression type or your F type, separating the data from your evaluation function. If your goal is to separate data from interpreter, it's quite trivial with or without "special" Haskell data types.

If you want to directly convert your imperative examples to Haskell, consider instead something like:

f :: (Monad m) => m Int
f = return (20 + 5)

This gives you your separation of describing f and running f in context of a monad. In this case you could "runIdentity f" to get the result via the identity monad. Or you could introduce ad-hoc effects via type class constraints on the monad. Note that the compiler or runtime could pre-evaluate (20+5), but you could expect the same from a good imperative compiler.

I'm thinking perhaps it is best to avoid "unsafePerformIO" in any discussion of Haskell semantics. IIRC, the Haskell language report doesn't require or even mention it. And it isn't supposed to be used anywhere its unsafe semantics would be problematic.

Monads are implemented using type constructors

Add (Int 5) (Int 20) :: Expression

Yes I agree, I just wanted the type constructor example to look as much like the function example as possible. My 'F' datatype could encode any binary operator, allowing generic application to be defined. What you have above would be the beginning of an AST for an interpreter (hence the title of your reply).

I disagree about the Church encoding. The point is in a pure functional program we can have no IO, and due to referential transparency there is no before and after. As such the program must start with all the data, and it beta-reduces to the result. You cannot separate the code from the data. This is the whole point I am making, pure functions do not have a before and after, they are intensional, and cannot model IO. Type constructors do have a before and after, they are extensional, and Haskell does use them to model IO.

Monads do exactly what I am describing above, because they are type constructors. The fact that they have a type class with two functions (unit and bind) associated with them is largely irrelevant to the way they separate computation from data. Unit and bind just provide a convenient way of chaining these computations together.

I agree completely that the imperative examples can be written in Haskell using Monads, but to suspend the addition we need something more like:

f = do
   x <- return 20
   y <- return 5
   return (x + y)

Which prevents the addition happening until we have bound both x and y.

Regarding unsafePerformIO, I was just using it to expose the evaluation mechanism for pure functions. My argument does not rely on it, but using it makes the behaviour more easily observable.

Are we there yet?

It sounds like you're proposing that we use an abstract operation tree as semantics for some hypothetical imperative language. Sure, we can do that.

But again, pure functions model IO just fine. With continuations.

Type constructors don't have a "before" and "after" any more than functions. Nor do I know how they are "extensional".

Using Monads doesn't imply you're using data - you can have a Monad that sets up the CPS plumbing, too.

According to the monad laws, your example is equivalent to "return 25".

Type Constructors

But again, pure functions model IO just fine. With continuations.

Actually they don't model IO in a pure functional language. Calling to the OS with a continuation as a parameter does not work because we cannot represent the OS function in the language. If like in Haskell an arrow '->' is a pure function then we cannot actually represent the OS function as an arrow. What you are suggesting would work in ML where arrows are impure (but then ML is not a 'pure functional' language anyway).

For example this:

print : string -> (succ, fail) -> ()

Still does not work because it is not pure and in Haskell an arrow '->' is a pure function. This means it is still unsafe because the compiler is not aware there are side-effects, and may chose to omit the function because its result is never used, or optimise it out in various other ways. So continuations are not a solution for a "pure functional" language.

Type constructors don't have a "before" and "after" any more than functions.

I think they do, because there is both a before and after construction. A constructed type is not referentially transparent. Consider:

data Ref a = Ref a
x = Ref 20

'x' is not '20' and it does not beta-reduce to 20. We have constructed something that as a value is opaque. We have to deconstruct it to get the the contents. We can refer to the 'container' and the value.

y = x -- the container
y = \(Ref x) . x -- the value

This is the same as with an imperative function where we can refer to the function and its result.

The point is anything composed of only functions we can keep applying reductions until the program is complete. There is referential transparency and there is no before and after. Type constructors are not functions, we don't automatically reduce them, we can refer to the 'container' as well as the value.

Functions are intensional because the reduction rules are baked into the language, there is no 'path' or proof of equality because of referential transparency. There are no reduction rules for type constructors, we have to extend equality with new rules for such types. Any equality defined on type constructors has a visible 'path' or proof in the language. Consider:

data Z = Z
data S x = S x

class Eq x y
instance Eq Z Z
instance Eq x y => Eq (S x) (S y)

Here we define equality on a data type as a type-class. The path taken to prove the equality is observable in the language (we could count the number of steps for example). This makes the equality extensional as far as I can work out.

You cannot observe the number of reduction steps or the path taken in establishing an equality on functions, as due to referential transparency both sides of the equality are reduced in the language itself. This makes the equality intensional as far as I can work out.

So 'pure functions' cannot model imperative programs because they are intensional. Pure functional programs are intensional because referential transparency means the function and its result are the same thing. type-constructors can model imperative programs because they are extensional. Imperative programs require extensionality because we need to tell the difference between a reference and the value referenced.


Agreed that you can't write an OS in lambda calculus. I think we need to add more than data constructors, though, if we want Sound Blaster support.

You have a funny idea of referential transparency. Data is referentially transparent the way most people understand the term.

And () -> T and T are different types. Does that mean functions aren't referentially transparent? Is \x. t not a "container" of t? Again, Church encoding shows how to use functions to model data.

Beta Reduction

Data is referentially transparent the way most people understand the term.

If I have a data type like:

data F = F Int
x = F 25

Now I export 'x' but not the constuctor 'F', what can you do with the value? Can you add it? can you do anything with it? Can you explain to me how this fits with the definition of referentially transparent? I have a reference to the value that I can pass around, but only code in my module can operate on the value passed. This reference seems decidedly opaque to me.

And () -> T and T are different types.

I see your point here, '->' is really just a binary infix type constructor, we could write this as "Arrow () T", but then functions are something special, they are subject to reduction rules, and I agree that the first parameter suspends the reduction. This technique works fine in ML where arrows are impure, but not in Haskell where we expect functions to be pure.

Maybe you are right though, could we define a monad using only functions?

unit : () -> a
bind : (() -> a) -> (a -> (() -> b)) -> (() -> b)

You are probably right... the only objection would be '->' is 'pure' and therefore this could not model IO.

Referential Transparency

x = F 25
y = ... x ...

is the same as

y = ... F 25 ...

That's all that referential transparency means to me (and I think this is the common definition). It means that variable references to a value are equivalent to direct uses of that value. It doesn't mean that no part of the value is hidden with some abstraction mechanism.

This technique works fine in ML where arrows are impure, but not in Haskell where we expect functions to be pure.

It works fine in Haskell, too, for suspending a computation. You're now also asking for the computation to be impure, and of course Haskell doesn't do that. Nor does putting the computation in a data wrapper.


It still seems to me that functions have an intensional equality (due to referential transparency) as we can write things like "25 == 20 + 5".

It also seems that type constructors have an extensional equality, because we need to manually search for a path/proof of equality by defining the 'Eq' typeclass on the type-constructor.

Finally it seems that imperative programs have extensional equality because we can compare the reference/address of an object/program.

So it seems to me algebraically type-constructors have the right kind of equality to represent imperative programs, whereas functions do not.


Functions don't come with built-in equality, either, though. (\x. 5+20) and (\x. 25) are intentionally equal in Coq, but you can't compare them in Haskell. Similarly, (F 5+20) is intentionally equal to (F 25) in Coq, but Haskell doesn't let you compare them by default (without a deriving Eq clause). You also seem to be ignoring the difference between meta-level equality, by declare which denotations of Haskell expression are seen to be equal at the meta-level, and an in-language comparison function.


I realise you are using a 'trick' of passing an unused argument to those functions, but things are different if the functions have no arguments? How do you explain that?

My understanding from looking at HoTT is that intensional equality is definitional, that is we have no path or proof of the equality, we just get the result, like "25 == 20 + 5" is true, but we don't know how it is proved. Extensional is one which the path/proof has to be found in the language (like my example defining a typeclass which you can clearly see step by step how the equality of "S(S Z) == S(S Z)" would be derived.

I think things change drastically if you involve metatheory, because what is an intensional equality in the language can be extensional in the metatheory.

The argument doesn't matter?

I don't think it's different if the argument is used. (\x. x+(5+20)) and (\x. x+25) are the same, intentionally in Coq. Haskell doesn't provide a mechanism for reasoning about equality, so I don't know what we're even talking about, really.


Well at the start we were discussing the difference between values in a (pure) functional language and values in an imperative language.

I was talking about nullary functions because they demonstrate that functional languages have no model of computation. I guess the point does not apply to functions with arguments, and one solution (which ML does) is simply to ban zero argument functions.

The fact that you cannot reason about equality in Haskell does not stop us reasoning about the equality in Haskell to try and better understand the difference between a value in Haskell and a value in a imperative language.

I get the impression that I am trying to explain something that is trivial and obvious, and the problem is you are looking for something deeper that is not there, it's just something that I personally found interesting. Yes, you can just not use nullary functions, but I think that is missing the point, there is a difference between nullary functions in a functional language and an imperative language, and I think this is a fundamental difference. To say you can use unary functions with a unit argument to work around the problem is missing the point.

a passing thought

It strikes me as somewhat surprising, from a historical perspective, that in this moderately extended discussion on values in relation to imperative programming, there's been afaict no occurrence of the terms l-value or r-value (nor their expanded forms left-value, right-value). I find myself wondering if this is (a) coincidence of the way the discussion has gone, (b) useful evolution of the way imperative programming is discussed, or (c) successful subversion, by the "pure FP" model, of our ability to effectively discuss its imperative rival?

l-value and r-value

I am very interested in l-values and r-values, and I see the lack of distinction between them in 'C' as a big failing of its type system.

In essence I think r-values are very much like values in a functional language - with the already noted exception that values are not subject to reduction in an imperative language. They do not have a 'location' and they are immutable. Because they have no location the only possible equality they can have is intensional.

l-values are therefore interesting because they are different. They have a location, and are mutable. In essence they are 'containers'. The simplest l-value would seem to be a single-element array. We could copy the ML type system and call this a 'Ref', although this is problematic as it overloads the word reference with three meanings, ownership, non-owning reference, and a relationship between objects.

I like the analogy that a r-value is like an "orange" or an "apple", and and l-value is like a "box", you can put the "orange" into the box, swap it for an "apple", assuming the box has type "fruit"... by default the box cannot be empty.

I would want this distinction to be evident in the type system, so we could type l-values as something like "Box[Fruit]". Now we can distinguish functions that return l-values from functions that return r-values:

f : Fruit -> Box[Fruit]
g : Fruit -> Fruit

In other words I don't want to talk about l-values and r-values, I want to talk about "simple values" that are intensional, immutable and have no location, and "containers" that are extensional (you can compare reference as well as contents), mutable and have a location.

Relating this to (pure) functional languages, an r-value is very much like a value in a functional language, except it is not subject to automatic reduction. This makes an imperative language more expressive, because we can observe the reduction. l-values are more problematic for pure functional languages and need tricks like the "State Monad" to encode them, although they can be encoded in a pure functional way, monads don't compose, which means looking towards a monadic effect system like Eff and PureScript have. However I think relating the imperative language to functional language lets us see they typing an l-value as simply 'int' is wrong.

So what is the difference between an imperative and a functional language? In the end I think very little, both should differentiate containers and simple-values in the type system. If we have a monadic effect system we should be able to compose mutable objects into the same function without any tricky boilerplate, so this also would converge. We can have everything in the "effect monad" by default and then the compiler can determine if a function is pure by the lack of any effects.

What is left would appear to be the automatic application of reduction rules to values in a functional language, hence my focus on it in the preceding discussion.

pure before and after

Aren't you too focused on IO?

The generic monad example obviously works for pure monads (like Identity or State or ST). Your `f` vs. `f!` works just as nicely if you have something like `f :: Expression` and `! :: Expression -> Expression` - you still distinguish referencing `f` from its behavior/result `f!`.

Church encoding (or Mogensen-Scott encodings, et al.) encode ad-hoc data types. Anywhere you think "type constructors" help, you can use Church encoded data structures to achieve the same. I feel this is trivial and should go without saying, and it confuses me that you seem to believe there's an important distinction to be made here.

Separation of code and data in purely functional systems is literally a spatial/syntactic separation: code in one expression, data in another; code in one module, data in another; code in one file, data in another; etc. A third expression/module/file might bring code and data together, but they can still be maintained separately. We ultimately have both space and time even within purely functional computations. Sure, you can't observe sensors or ask the network within a pure evaluation, but that isn't a necessary condition to distinguish `f` from `f!`, `before` from `after`, `data` from `code`, etc..

Mutability and Location

The generic monad example obviously works for pure monads (like Identity or State or ST). Your `f` vs. `f!` works just as nicely if you have something like `f :: Expression` and `! :: Expression -> Expression` - you still distinguish referencing `f` from its behavior/result `f!`.


Church encoding (or Mogensen-Scott encodings, et al.) encode ad-hoc data types. Anywhere you think "type constructors" help, you can use Church encoded 'data' to do the same. I feel this is trivial and should go without saying, and it confuses me that you seem to think there's an important distinction to be had here.

There is a difference between functions and data, and perhaps I am not clear what it is, or maybe I am not expressing it very well. In the OOHaskell paper we show that an object is the fixed point of a record constructor:

data Object this = Object {
   get :: this -> () -> State Int
   set :: this -> Int -> State ()

x = fix Object

The fixed point is a 'new instance' of Object, could we do this with church encoding? Something like this:

{-# LANGUAGE ScopedTypeVariables #-}

pair = \x -> \y -> \z -> z x y
first = \p -> p (\x -> \y -> x)
second = \p -> p (\x -> \y -> y)

rtn x s = pair x s
get s = pair s s
put x s = pair s x

bind m f s = let p = m s in f (first p) (second p)

test = (put (24 :: Int)) `bind` (\_ -> get) `bind` (\x -> rtn x)

main = putStrLn(show (second (test 0)))

Of course the obvious difference is that this is structurally typed, there is no 'name' for the monad, and there is no encapsulation of the state. So the only difference here might be just the difference between nominal and structural typing.

Church encoded objects

It is not difficult to Church-encode records, union types. And, of course, fixpoint function isn't anything special in lambda calculus.

OOHaskell seems to use some typeclass magic to distribute the generic "fix" over record elements. Without assuming typeclasses, we could get that by representing our record as a Church encoded trie that permits iteration over elements. Or we could simply use a different fix wrapper tuned for each object type.

My Awelon language has only one primitive data type - the function. All data, without exception, is Church encoded (or a similar encoding). Of course, naively this performs terribly. My idea is to accelerate common useful encodings and the functions that manipulate them, e.g. the compiler/runtime can use machine words to work efficiently with natural numbers and perform basic arithmetic, but semantically they're still Church encoded.

I've never had any difficulty encoding such things as streams, fibers, transducers, generators, communicating processes, etc. that tend to return an updated object state after every operation. A fixpoint function is used quite liberally in these cases.

I agree that Church encoding tends to be structural rather than nominative. Though a language could support nominative types and encapsulation via special type system support or annotations and linters.

Nominal vs Structural

The key difference between the Church encoding and datatypes seems to be that the former is structurally typed and the later is nominally typed.

Looking at if from a type system point of view, imperative languages (at least the familiar ones) tend to be nominally typed.

So considering the difference between values in imperative and functional languages the design space we need to consider seems to be:

- nominally typed Vs structurally typed.
- implicit Vs explicit reduction.

Can we create a monadic effect system using Church encoding? To make the composition work you need row-polymorphism - what is the structurally typed equivalent of row-polymorphism that could be used for a church encoding?

Re: nominative versus structural

There's no requirement for data types to be nominative. Conversely, whether Church encoded types are structural really depends on other language features, especially static analysis that may reject programs that apply certain functions outside of a controlled context.

Structural typing, IMO, is the most natural in either scenario. That's why nearly all dynamically typed languages, regardless of imperative or functional, are effectively structural. It's much simpler.

Anyhow, the easiest way to do monads in Church encoding is to use a single monadic type that is generic for effects, such as the operational monad. (Really, this is the easiest way to do monads in any language, even Haskell, as advocated in "Free and Freer monads" and elsewhere.)

And we can Church encode extensible variants and row polymorphic records. Simply use a type-level trie to model the record (e.g. use pair (x,y) to represent left and right paths in the bit trie, and use an encoding of labels into bit strings, and use unit type for empty areas of the trie). Similarly, use a left-right-left sum path to encode bit string labeled variants (and zero type for unused labels).

Performance is an issue, hence we need accelerators. But semantically it's no problem to work with extensible heterogeneous types via Church encoding.

Nominal typing of data

Data types are often nominal for a good reason - structural typing of recursive data types is often undecidable. I still think names are really important and need better language support. I recall you were going in the other direction :).

re: nominal typing

I grant that nominal typing can be convenient. Mostly for type-associative structure. For example, if a Map<Key,Value> type uses a comparison function associated with the Key type, it's easy to efficiently compare or merge map structures. With structural types, OTOH, it isn't clear how we should compare keys in general. We might provide a comparison function to a Map wrapper, but we cannot observe whether two maps use the same key-comparison function.

I use a few simplistic ways around this that work nicely in practice, or at least seem promising in scope of a few limited experiments. I can use annotations to assert at a meta-level that two functions should be structurally equivalent, for example, and hence that two maps should have compatible key-order.

But regarding decidability, I suspect that's more an issue of "inferred vs. manifest" than of "structural vs. nominative". Nothing prevents us from annotating common functions to constrain structural types below what the functions naively would permit, which is something nominative types do by default.

And yeah, I'm quite skeptical regarding the semantics surrounding names. So I tend to avoid names entirely where feasible.

Regarding decidability, yes,

Regarding decidability, yes, I think the main issues are with inference and subtyping, and I concede there are probably ways of making a usable system on top of structural types.

I think I understand your skepticism of names. Names (esp. character strings) seem like a bit of hack to get a stable pointer to some feature of a codebase.

Path of Pairs

The path through pairs is obvious now you mention it, its how the HList lists were constructed. I must admit there is a certain elegance to doing it all using Church like encodings, but the structures do get complex and I think you need to have abstraction to make all this usable. How do you abstract a structural type?

abstracting structural types

It is true that sophisticated types tend to have complex descriptions. Also, based on your several earlier comments, I suspect you're specifically interested in implementation-hiding abstractions. But I'll cover the complexity issue first.

For composing large types, that isn't much more difficult than other values. We just need some flexible means to describe, compose, parameterize types, like we do other values.

One option would be to support type-level lambdas, plus universal quantification, and perhaps a fixpoint type.

type Unit = ∀a.a->a
type Pair = Λa.Λb.∀c.(a -> b -> c) -> c 
type PairOf = Λa. Pair a a
type Either = Λa.Λb.∀c. (a -> c) -> (b -> c) -> c
type InL = Λa.Either a 0
type InR = Λb.Either 0 b
type Nat = μN.Either Unit N

Presumably, we could then Church-encode type-level constants (natural numbers, true, false, pairs) and ad-hoc computations at the type-level, and provide a little syntactic sugar (for texts, numbers). A minor disadvantage is that our type descriptors are thoroughly Turing-complete, but that isn't anymore a problem than Turing-complete macro systems.

Of course, since all of this effectively replicates our original language - modulo a few special type descriptors (like -> and 0) - a reasonable question is why we're bothering with a separate language. A reasonable alternative is to just use the plain old language to build "type descriptors" according to some AST recognized by our static analysis tools, and to separate this by simple naming conventions, perhaps:

foo.type = (foo type descriptor as value)
foo = (foo body)

Types thus become plain old values - subject to the normal abstraction, refactoring, composition, etc.. Further, it opens the door to more conventions for static analysis and linting (e.g. adding a `foo.contract`).

This leaves the challenge of implementation-hiding abstractions. The best approach I've found uses a concept of value sealing/unsealing annotations, plus simple access control:

(:foo)(.foo) == id

In this case, I introduce pairs of functional annotations that when composed are equivalent to the identity function. We 'seal' a value by applying `(:foo)`, and unseal it by subsequently applying `(.foo)`, and it's easy to understand that a sealed value has a corresponding sealed value type. Any number of symbolically distinct sealers can be formed. For access control, we simply have a linter or type-checker raise an error when these are directly used outside of functions prefixed by `foo.`. This binds direct use of sealed value types to a subset of a codebase based on function names, which simplifies reasoning about who can construct or observe an abstract data type.

Hence, we effectively simulate nominative types and abstract data types. But we get it by structural seals, linting, and a few ad-hoc codebase constraints rather than by any semantic-level feature.

Sealing Values

What is the type of sealed value? How does sealing/unsealing differ from type constructor/destructor pairs?

Sealed values

A sealed value type is similar to a 'newtype' in Haskell.

It isn't a "data constructor" because the seal type carries no observable information (no "data"). It's just a symbolic type constraint for software engineering purposes. You could say it's almost equivalent to the most trivial of data constructors (e.g. `data F a = F a`).

The analogy to newtype constructors isn't perfect. In Haskell, you can still define useful associative structure (via typeclasses) on a newtype or singleton data type. But erasing all seal/unseal annotations from a type-safe program will not affect its observable behavior.

Aside: The reason I call them seal/unseal is actually a reference to the object capability pattern. Of course, with serializable values, we could also 'seal' via encryption.

Names and Identity and Modules.

So, as a type-alias there is no data hiding. This really seems to be taking the idea with functional programs separating code and data to a new level, and applying the same idea to the type system. As someone interested in type systems, I think this is fascinating, and very different from what I am working on now. At the moment I don't think there is any way to synthesise this with my current goals, it's more like something interesting to work on afterwards (or when I got a roadblock). Definitely something I want to think about some more.

Having said that, with my current goals in mind, I think "naming" is a huge thing for humans. We categorise by naming, and by doing so abstract away whole levels of detail. When I talk about cars, I abstract away all sorts of detail about internal combustion engines, diesel engines, even electric motors, yet I am clearly not talking about trucks, vans, boats or aeroplanes. One of the key design theories I am exploring, is that the human brain can only deal with certain maximum levels of complexity in certain modes of thought. As such a programming language should try and use the thinking modes where the brain can natively handle the most complexity. This will naturally relate to the tools we use most often every day in language and mathematics. In this basis my current thinking is that nominal is the way to go. You can drive a car, in other words we have verbs that apply to categorisations, typings like "drive :: (Car, Location) -> Route -> (Car, Location)", can make sense, although I would say that functional types like that lose the identity of the car, and what seems to be missing is something that says the Car returned is the same car that is passed, which is kind of going in the opposite direction to structural typing, towards introducing the concept of object identity (in the sense of "my car") to the type system. A lot of this depends on "phantom types" where we can introduce a unique type level label for each Car, as in: "drive :: (Car C237, Location) -> Route -> (Car C237, Location)". Of course you could do this with "newtypes" as well, but I think data hiding (and type hiding) is important in managing the complexity. We want assurance that nothing outside the "object/module" can affect that which is inside it.

Structural typing allows polyadic functions to be written, so maybe data-hiding the structure at the module level makes the most sense. I think I would still want to have some type operators like sum and product, as it would simplify the types having 'a * b' for pair, rather than 'a -> b -> c -> d' which doesn't tell us a lot.

If modules have data/type hiding though, doesn't that take us back to nominal typing, if we make modules first class, as they would still appear in type signatures?

If modules have data/type

If modules have data/type hiding though, doesn't that take us back to nominal typing, if we make modules first class, as they would still appear in type signatures?

Yes. One idea I explored concurrently with use of value sealers was use of existential types for first-class modules. This would work something like this:

foo.type = ∃a.(Record of ops on `a`)
foo = (Record of ops on concrete A)

This essentially represents the signature (foo.type) and implementation (foo) of a first-class module. Clients of `foo` cannot directly observe foo's concrete implementation type `A`, but can work indirectly with the existential `a`. This would be enforced by the type checker.

The problem here is that clients of `foo` can do stuff like construct lists of `a`, and potentially pass them to other clients of `foo`. Thus, we end up needing some way to describe the type "foo's a". And that's effectively the start of nominal types.

Value sealers, for comparison, are second class but much easier to model structurally. There is no need to speak of origin "foo's sealer", just a need for simple access control so we know the sealer could only be directly used by foo. That is trivially achieved by a linter, a simple syntactic analysis.

Aside: I grant that working with nominal types might better fit some people's informal thinking. OTOH, that informal thinking has a lot of problems even outside of computation (cf. Grandfather's Axe, Ship of Theseus, metaphysics of Identity and Change). I don't trust it. To preserve identity for a Car, I'd reach instead for substructural types. Substructural types work nicely with structural types and sealers and offer a formal framework for what it means to preserve certain features of structure within a computation. Substructural types support a clean separation of concepts between implementation hiding, process, and identity.

Substructural Types, Locations and Type-Classes

Substructural types deal with ownership not identity. With linear types we are continually consuming and creating fresh instances of 'X'. This seems to me to be the opposite of preserving the identity of 'X.

I think the 'Ship of Thesius' paradox does not apply as there is a simple test of identity in a computer. If two things have the same memory address they are the same thing, if they have different memory addresses they may be equal, but not identical. Another way to consider this is with mutation, if I mutate 'X' all the aliases of 'X' change.

For me the type system has to deal with this kind of alias analysis, and adding an 'identity'-phantom type to 'Car' allows us to track which 'Cars' are aliases of the same object in memory.

I don't see how substructural types help here?

Regarding existential types as modules, does that work in a similar way to type-classes, so that if we have an existential type 'a' such that a is a 'Vehicle', we can then call any 'Vehicle' operation on the existential, but we do not know what actual vehicle it is (Car, Bus, Truck etc...)?

Identity and aliasing

Comparison of memory addresses doesn't work in general, e.g. in context of: moving garbage collectors, flyweight pattern, weak refs, memory mapped files, proxy objects and facet patterns, distributed objects and sharding, object relational mappings, etc.. Even more physical notions of address don't work with virtual memory paging, remote direct memory access, caching, etc.. I wouldn't be so eager to dismiss Ship of Theseus as irrelevant.

In any case, you seem to be conflating aliasing with identity. I don't usually do so. I don't consider the ability to "test for identity" to be essential for identity or aliasing. Rather, identity and aliasing are prerequisites to usefully "test for identity". Thus your use of a test for identity in your argument is assuming a lot more than a minimal concept of identity.

Something substructural types can protect are the properties of an object that we use to identify it. Like, having a license plate or vin component and making it linear. But more importantly, even without protecting such properties, substructural types enable developers to reason about the origin and fate of whatever they consider to be objects. I think it is this ability to reason temporally that aligns with a human notion of object identity, nothing to do with addresses or aliasing or reference equivalence.

Regarding aliasing, there is actually a variant of object oriented programming paradigm that explicitly separates object and reference. It is called "concept oriented programming". Aliasing is a consequence of interacting with objects in a shared context through references. This is a feature that must be modeled more explicitly in functional programming and concept oriented programming - yet another separation of concepts. There is no fundamental requirement for aliasing that references be global or stable or comparable; every reference could be a unique nickname or a relative path or functional lens or whatever.

Mutation and Identity

Maybe identity does not have to be tied to location. The most important property is if I mutate a object via a reference, then any reference to the identical object will see the mutation. A reference to an equal but not identical (copy) object would not see that mutation.

If you do not have mutation in the language then only value equality is left (identity simply becomes the same as value equality).

stateful aliasing

You're conflating too many concepts, I think.

We could talk about overlapping objects, such as like array segments, that are obviously NOT identical (perhaps they have different sizes) yet that allow shared observation of a state mutation. Thus we have stateful aliasing without identical objects.

We could talk about anaphoric references, such as "the previous page". Obviously the meaning of the reference itself is unstable and contextual, but may sometimes refer to the same object as another reference.

In cellular automata, we can speak of "gliders" and "guns" as human-meaningful objects with identity. But how would you reference or mutate such things? Where does stateful aliasing get involved? Perhaps the human notion of identity doesn't need those things.

Similarly, with linear logics we can model finite resources and trades without ever introducing a notion of references or mutation, yet it's clear that finite resources have some form of what humans consider identity. If I spend a linear coin, that coin has a clear origin and fate that can be tracked through the computation.

Even in languages without pervasive mutation, we can model mutation and aliasing, e.g. using a State monad. We can model references to sub-components or sub-volumes of the state value. If we optimize for fine-grained refs, we effectively get an ST monad.

The human notion of 'identity' is rather slippery and fuzzy. At best we approximate it in software and programming. And there are various ways to approximate it. You seem focused on the OOP realization of the identity concept, but if you take a hundred steps back to your original goal of aligning with human thinking, you should be open to many other possible realizations of identity (including use of substructural types). Meanwhile, stateful aliasing is a more concrete, formal, meaningful concept that is common in OOP. But it isn't quite the same as identity.

References as abstract values

Many (most?) human abstractions have the problem of being hard to pin down precisely in terms of the underlying physics. I think you're being too hard on references when you single them with the Grandfather's hammer paradox, etc. Before you have trouble tracking your Grandfather's hammer through time, I think you'd already have trouble precisely locating it in space. Which atoms, exactly, are part of the hammer? And you'd certainly have trouble locating Grandfather precisely.

I think the simplest semantics for names is that they are an abstract set of labels that exist with a set of functions from the label set to various spaces that define properties.

Re: names

References can be modeled effectively as abstract names in a probably-monadic context (like ST or IO). OTOH, use of names encounters those spatial-temporal challenges when we try to model communication between contexts, like an interactive computation between ST processes, or an incremental computation within a parent context where intermediate results might contain references or names.

I feel that basing identity on a formal abstraction of names is awkward and perhaps even backward. It is quite different from how we identify and name objects through our perception and language.


I think "abstract entity" pretty much matches how we think about identity. We imagine each person to be an abstract entity with associated properties, the most fundamental being a "body" with a continuous path through space-time. All forms of identification that we use are just proxies for this basic idea.

I don't know which challenges you're referring to. Can you explain the incremental-computation-in-parent problem? Allocation of names is one possible sore spot, since fresh names are usually obtained temporally (alloc in a monad), but we ought to support spatially based allocation, too.

use of names

I've experimented in the past with attempts to model variations on ST monad to support concurrency, parallel pipelines, coroutines, incremental computing, backtracking. I eventually gave up, but those efforts and the research surrounding it did lead me to explore use of first-class Kahn Process Network values as an alternative that's considerably more promising.

There are a lot of challenges surrounding use of names in context of incremental computations. As you mention, allocation of names seems a bit dubious for a lot of use cases (e.g. anything involving memoization). But a bigger issue in context of incremental computing, coroutines, etc. is that names can escape the scope of the computation. Although those names cannot be directly used by the recipient, they can generally be fed back into the computation at a future time.

Besides creating a huge headache in the form of (logically) distributed GC, this introduces other challenges. For example, names allocated in a computation's future can be fed back into a representation of that same computation's past, if we have any backtracking. And it isn't easy for a type system to keep track of abstract names originating from first-class values.

I experimented with use of modal types, which seem suitable for tracking 'places' in computation. I also experimented with requiring clients to wrap/rewrite names that escape, e.g. using an intermediate rename table sort of like URLs. But neither of these really solved any problems (except potential to safely delete global a reference locally).

Names are easy to use in the simple use case of a flat, global namespace. But they grow a lot more ad-hoc and complicated the moment we have more than one concurrent naming context, especially if those contexts are first-class.

Dropping names

You're right that implementations can be are tricky, but we don't have to eliminate all errors statically. Leaked references existing past their lifetimes don't need to result in a memory leak or GC nightmare. If the program is aware of which references are active, then accessing an inactive reference can be an error, detected statically if we can but dynamically in general. This looks a bit like manual memory management, lite.

I think the most important thing is getting the semantics right. Backtracking is an issue here. If you resume a computation, and it generates a fresh name, and then you backtrack and resume it again, does it re-generate that same name? Or a distinct one? You usually want the latter. But once the fresh name is generated, you want to be able to use it at different times in the computation.

Name semantics

Getting the semantics right isn't easy. With "runST" we have pure, referentially transparent, memoizable computations. Ideally we'd have the same even for incremental computation. Hence, we should reproduce the same names after backtracking (at least when the incremental inputs are the same). And I really would prefer static type safety. (Even further, stability properties would be nice so we can ensure some names remain stable over certain small changes in the Expression or input.)

Anyhow, I have never encountered it discovered any semantics for names that I would consider to be simple and satisfying. So I don't favor their use as language primitive. Maybe we can find some way to use types to track a set of active names or something, but my efforts are stuck.

Taking names

I was imagining we would have the runST itself non-pure, getting its names from a (more) global context. Thus you'd have different names when you re-ran. But if you're trying to re-run a computation and want to reproduce the same result, then yes, you should be able to set up your names to work that way too. Having reproduce-able names is going to make allocation (at least) slower, though, because it will require a fancier implementation.

Regarding type safety. This pattern will happen: an API exposes a set of codes that evolves over time, some API calls return codes and some accept previously returned codes. Statically detecting all and only stale code uses would run afoul of Rice's theorem. You can restore "type safety" by forcing the client to check that his code is still valid, even when he "is sure" it is, but the client code will probably just "assert false" if it isn't, because there isn't anything more sensible to do. I'm not a fan of ruling out dynamic failure in this way that just shifts the problem to the client. IMO just fail and indicate that the potential failure is present in the IDE. Let the programmer try to prove it doesn't happen.

Static Naming

I am not sure I follow this. The way I think about naming (nominal typing) all names get assigned statically at compile time. Some names are relative to the stack position, but that does not mean the names change when the runtime computation backtracks.

Names aren't just for types,

Names aren't just for types, and my discussion with Matt isn't specifically about types. Every "new" call in OOP, or "newIORef" or "newSTRef" etc. can be understood as creating or allocating an abstract name (in context of the monad's runner).

Existential Abstract Names

I think you can deal with a lot statically, at least that's the way I read the TAL paper. Static values get known static abstract names, and "malloced" memory would get an existential abstract name, as we know each chunk of memory from malloc is unique. Then we track aliases by tagging these abstract names as phantom types to the objects and the references to the objects.

Runtime names are just the objects extent in memory (start address and size). I am not saying there are no problems with this approach, it just seems the best compromise and reflects the way the computer actually works.

Categories and objects

We don't just name individuals (like my dog) we also name categories. Gliders and guns are names of categories because they are not composed of the same cells, they do not move, but the pattern evolves. Eventually everything is categoric but some categories only have one object in them. My dog is unique, a clone of my dog is not the same dog.

I kind of agree about array segments, but the objects in the array would be the same. By introducing collections of objects you are making things more complex. I still don't see this as a problem for identity, it's just that instead of saying things with the same location are identical, we would say things with the same location and size have the same identity. If caching is transparent we objects on different CPUs can be considered identical, if it is not, then they must be considered different.

I am certainly not focused on OO for my definition of reference, I am considering two things, Von Neumann architecture, that computers are RAM machines, and this is efficient, and simple human understanding, the kind that develops in early childhood, not the sophistication that we pile on top of it.


I actually intended for you to name a glider, or a gun, not the whole category. But even if you name a glider, gun, or your dog, you cannot modify/mutate it through that name. In general, you cannot even identify it through that reference. How we use identity in real life, and our intuitions surrounding it, seem very different than the simplification of identity in context of OO or Von Neumann architecture.

Anyhow, I'm not arguing that your approach to identity is bad. But rather that you should keep your mind open to other possible approaches to identity rather than effectively insisting yours is the canonical one. You essentially rejected use of substructural types to support "identity" because it's not about memory addresses, aliasing, and imperative mutations. I would argue that the concept of identity, and our intuitions for it, were never about those things in the first place.

Substructural Types.

I think substructural types are useful for resource management, however I think 'alias' types are necessary to type native machine code (see all the work on Typed Assembly Language, where they started with linear types, and explain why they are not enough before introducing alias types).

In any case, I think "rejection" is a bit strong, it is more like substructural types are not the complete solution.

External References

I think at least some of your problem with identity comes from in confusing a local reference (something in the memory of the computer) with an external reference (something outside of the computer).

In the computers memory two objects are identical if they have the same address and size.

However we can have multiple records all referring to the same 'employee' or 'user' in the computers memory.

The computer memory is not the world, it is a model of the world, so often we have to deal with the native identity of the computer, and the identity in the model at the same time.

I think a type system should stick to clearly defining the native identity, but should be adaptable enough to allow us to build sound an checkable models on top of this.

In this way you can build a model of the identity of your grandfather's hammer that works and is sound and checkable against the native machine operations.

Regarding existential types

Regarding existential types as modules, does that work in a similar way to type-classes, so that if we have an existential type 'a' such that a is a 'Vehicle', we can then call any 'Vehicle' operation on the existential, but we do not know what actual vehicle it is (Car, Bus, Truck etc...)?

Yes. And importantly in context of my earlier discussion, we can also work with a `VehicleFactory` that can construct abstract vehicles of an unknown Vehicle type `a`.

Vehicle Factories

Could you elaborate a bit around Vehicle Factories? Does this mean I can create a new instance with the same 'type' as something in an existential module? Do you have a simple minimal example of this?

existential constructors

Something like:

class Vehicle a where
   ... vehicle operations
class (Vehicle a) => VehicleFactory a where
    newRandomVehicle :: RandomSource -> a

With existential types, in general, we can have constructors.

Here's something more useful, based on a simple graph language I vaguely remember reading about several months ago:

type Graph = ∃g.
  { empty :: g
    singleton :: g
    union :: g -> g -> g
    join :: g -> g -> g

    countVertices :: g -> Integer
    countEdges :: g -> Integer

Here 'union' would compose two graphs without adding any edges while 'join' starts with a union but simultaneously adds an edge from every vertex in the first graph to every vertex in the second graph.

Of course, an efficient implementation doesn't need to actually track the edges and vertices. It's sufficient to track a count of edges and a count of vertices, at least until we add more observer functions.

Anyhow, in context, the ability to both construct and observe objects is what leads to the challenges for nominal types. Basically, the issue is that we need to somehow track that our graph or whatever originates from a specific existential value.

Church Encodings and Existentials.

I'm a bit confused by this, if 'g' is some structural church Encodings of a set, then we know it's type, and it's not really existential, as all sets would have the same type.

Doesn't this kind of existential encapsulation only work with nominal types where we can give the same data structure an arbitrary name from an infinite supply, therefore we can have a unique but unknowable name for each unique set?


In context of the graph type above, two different *implementations* of that type could use two different data representations for `g`. E.g. one could use a count of vertices and a list of edges, while another could use a count of vertices and a count of edges.

The restriction that the data representation is not observed (outside of the graph operators) could be enforced by a type-checker, which can reject programs that attempt to directly observe the value. (In terms of Church Encoding, "observing" a value means "applying" it.)

And I've already mentioned the challenges surrounding this that effectively link existential types to nominative types:

The problem here is that clients of `foo` can do stuff like construct lists of `a`, and potentially pass them to other clients of `foo`. Thus, we end up needing some way to describe the type "foo's a". And that's effectively the start of nominal types.

Two implementations, same structure

I agree, that hiding the structure is not necessarily the problem, but without names, two different implementations that use the same structure don't seem possible? I guess you could add some kind of 'phantom value' to the structure to differentiate it?

same structure no problem

It's ultimately up to the typechecker to prevent incorrect use of a type. For existential types, this means knowing something about its origin. Within Haskell, this is achieved by severely constraining scope of an existential type. We can model something like:

data GraphClass g = GraphClass
  { empty :: g
  , singleton :: g
  , union :: g -> g -> g
  , join :: g -> g -> g
  -- and so on
  , countEdges :: g -> Integer
  , countVertices :: g -> Integer
data AGraphClass = forall g . AGraphClass (GraphClass g)
type Graph g = (GraphClass g, g)
data AGraph = forall g . AGraph (Graph g)
testGraph :: AGraphClass -> AGraph
testGraph (AGraphClass c) = 
   let g = {- bunch of joins and unions, etc. -} in
   (AGraph (c,g))
edgeCount :: AGraph -> Integer
edgeCount (AGraph (c,g)) = countEdges c g

In this case we aren't using typeclasses, and it isn't a problem if we have multiple GraphClass implementations that all use the same data representation type.

What Haskell does is logically introduce phantom types when we open AGraph type, and forbids escape of the existential type from the scope in which we open it (though we may freely wrap it within another universally quantified data structure). OTOH, Haskell's model isn't fully expressive, e.g. we cannot open the AGraph or AGraphClass globally and thus expose existential types at the global namespace. We can only expose it within scope of something like `testGraph`. We cannot use type annotations referring to existential type `g` except indirectly, as a generic type. And even if we open the exact same AGraphClass value twice, we'll essentially have two distinct origins, two distinct phantom types, that cannot interact. This constraint is how Haskell avoids a lot of the problems surrounding use of existential types.

With this constraint, we don't actually need explicit nominal types to work with existential types. The phantom type might be understood as an implicit nominal type internal to the type checker. We do have rather severe restrictions on use of existentials. OTOH, it isn't a problem to open an existential just once at the program root then use it as a generic `GraphClass g` and `g` elements everywhere else.

Genuine (stupid) question

I just dabble in Haskell, but, to me, it looks like Haskell thinks it is a value:

eperez@eperez-ws$ ghci
Prelude> :t (\x -> x + 1)
(\x -> x + 1) :: Num a => a -> a

What does this mean otherwise?

[Edited to change the title and to add: Should have rtfm before asking :) ]

Actually It Is a Value

It's value is "2", but in effect it is itself a value, it's just a different way of writing it, like Roman numerals "II" is also different way of writing "2". Just like Church encodings we can write "3" as:
(\x . x + 1) ((\x . x + 1) 1). Because we have beta-reduction there are many ways to write the same value, they are all isomorphic.

What this means is in Haskell there is not "normally" a way to represent a program, and tell if it has executed (because beta reductions are isomorphisms, that is they have an intensional equality), so we have to introduce Monads (mainly the IO monad) to allow programs to be represented, and Monads need to be "run", hence we can have a program and it's result as two separate things.

This is interesting because there is nothing special about Monads, it is type-constructors that are "special", we have extensional equality for type-constructors. In a way this is obvious because two functions are equal if they beta-reduce to the same value, but we have to define equality for two type-constructors (or in fact any non function type).


data F = F (Int -> Int) Int

Consider the difference between:

(\x . x + 1) 1


F (\x . x + 1) 1