## Pure Subtype Systems

Pure Subtype Systems, by DeLesley S. Hutchins:

This paper introduces a new approach to type theory called pure subtype systems. Pure subtype systems differ from traditional approaches to type theory (such as pure type systems) because the theory is based on subtyping, rather than typing. Proper types and typing are completely absent from the theory; the subtype relation is defined directly over objects. The traditional typing relation is shown to be a special case of subtyping, so the loss of types comes without any loss of generality.

Pure subtype systems provide a uniform framework which seamlessly integrates subtyping with dependent and singleton types. The framework was designed as a theoretical foundation for several problems of practical interest, including mixin modules, virtual classes, and feature-oriented programming.

The cost of using pure subtype systems is the complexity of the meta-theory. We formulate the subtype relation as an abstract reduction system, and show that the theory is sound if the underlying reductions commute. We are able to show that the reductions commute locally, but have thus far been unable to show that they commute globally. Although the proof is incomplete, it is "close enough" to rule out obvious counter-examples. We present it as an open problem in type theory.

A thought-provoking take on type theory using subtyping as the foundation for all relations. He collapses the type hierarchy and unifies types and terms via the subtyping relation. This also has the side-effect of combining type checking and partial evaluation. Functions can accept "types" and can also return "types".

Of course, it's not all sunshine and roses. As the abstract explains, the metatheory is quite complicated and soundness is still an open question. Not too surprising considering type checking Type:Type is undecidable.

Hutchins' thesis is also available for a more thorough treatment. This work is all in pursuit of Hitchens' goal of feature-oriented programming.

## Comment viewing options

### Sounds like a similar

Sounds like a similar identification of values and types as the one found in Tim Sweeney, et al.'s lambda aleph. I wonder how the two are related.

### They likely have similar

They likely have similar expressiveness, but one interesting restriction in the subtype calculus is that function subtyping intentionally forbids contravariance on argument types (Section 3.3), ie. λx.t u ≤ λx.t' u, iff t ≡ t'. Lambda-aleph supports contravariance.

### Yes, but what about the theory?

The ideas behind lambda aleph are similar. In particular, lambda aleph does not distinguish between functions (lambda) and function types (pi). Moreover, a "type" in lambda aleph can be thought of as a term that returns a set of values, which echoes an idea that I explored in an earlier paper (OOPSLA '06).

However, lambda aleph has not yet been well-formalized. In particular, although they mention subtyping, they do not provide a formal definition of it, nor do they discuss the meta-theory. In my experience, it's not too hard to implement a type-checker for these systems. The algorithmic subtyping system that I describe in the paper is easy to implement, works well in practice, and returns the expected results. Proving that the type system is sound is a completely different matter; hopefully Sweeney et. al. will have better luck than I did.

IMHO, contravariance is not the main sticking point. If you can prove soundness for the invariant system, then going back and adding contravariance won't be a big problem.

### Proof of confluence

Interesting paper!

I have proved the conjectured confluence of rewrite system in section 7. A proof in Agda can be found on https://gist.github.com/twanvl/7453495. The trick I used is to bound the 'depth' of terms as well as the depth of steps. A step of depth 'n' can only treat terms of depth 'n', and contain only steps of lower depth, including inside a transitive-symmetric closure.

Extending this proof to the lambda calculus might be possible, but it will certainly get (even more) ugly.

### This is a great proof!

This is a great proof! Unfortunately, after reading through it, I must conclude that my conjecture in section 7 was wrong; a proof for the simplistic system does not lead to a proof for the lambda calculus. :-( Defining a depth on terms is a clever idea, but I can think of no corresponding way to define "depth" on lambda-expressions.

I introduced the simple system because it (1) has the same local confluence property as subtyping, and (2) is not strongly normalizing. But in my rush to simplify, I did not stop to think that the simple system has an additional inductive principle (depth) that was not present in the lambda-calculus.

Still, it's good to see some discussion. At this point, I would love to see a counter-example to confluence as much almost as much I'd love to see a proof, just so I know one way or the other.

P.S. The author contact information in the paper is out of date. I can be reached at gmail.com, with user name delesley.

### I believe there are proofs

I believe there are proofs of confluence of lambda calculi that rely on some kind of size or depth property.

For the untyped lambda calculus, you can do label all beta redexes in a term, with the lowest index at the end of the term. Then the reduction relation for this labelled calculus only allows beta reduction of labelled redexes. Such a reduction eliminates one redex, but can duplicate lower numbered ones. This means that it decreases the string of all redexes in lexicographic order. Hence it is strongly normalizing. So you only need to show local confluence of this relation.

For a calculus with steps based on symmetric-transitive closure you might be able to do the same, or maybe you need to do something more clever. Perhaps it can be combined with the 'depth' idea.

Another trick is to add simple types, just for the purposes of one step. This is effectively the same idea as the labelling. For all beta redexes you just invent a fresh type X for the argument, which ensures that this particular application is type correct. Then you say that only type correct applications can reduce.

By the way, I had also previously encountered this same issue. In my case I was looking at the untyped lambda calculus with beta reduction together with the eta contraction rule "(fst x, snd x) â†’ x". In an attempt to prove confluence, this rule had to be changed to "x â†” y â‡’ (fst x, snd y) â†’ x". And now you have a transitive-symmetric closure as a precondition.

Do you have a proof of confluence for the untyped lambda-calculus with pairs and the eta-contraction rule? That sounds very promising! (It's also a much better "open problem" than the one that I proposed.)

I think I'm missing something here, though. With the redex-labeling technique, we have a term where every redex is labeled with a natural number, such that the labels decrease with depth. I.e. Given a redex M@N with label x, x > max_label(M) and x > max_label(N).

In that case, even if the labels are chosen so that they initially decrease with depth, beta-reduction may substitute a redex with larger label into a redex with smaller label, so the depth property is not preserved under substitution. A sequence of reductions thus does not have a decreasing lexicographic order. What am I missing?

One further complication is that types matter. If you look at pages 51-52 of the thesis, I have a counter-example to confluence for the untyped lambda-calculus, which is adapted from Van Oostrom's counter-example for conditional rewrite systems. However, the counter-example is not well-formed in the typed system.

### At the time, I gave up on

At the time, I gave up on the proof of confluence for the untyped lambda calculus with eta. However, I have been working on the ideas from my previous reply, and I now think I have a working proof. It is for lambda calculus with a D construct, since this seemed like less work than adding pairs. I'll polish it a bit and put it online soon.

The crucial idea was to add typing of the form "depth Î“ x â‰¤ n" where the context Î“ gives the depth of each variable. Lambdas are annotated with the type/depth of the argument. An application "(Î»(x@k).a) b" can only reduce if "depth Î“ b â‰¤ k". This ensures that "depth Î“ (a[x|->b]) â‰¤ depth Î“ (Î»(x@k).a)". I believe that when converting from non-indexed steps to indexed steps, you can always pick k large enough. But I haven't finished that part yet.

As for the 'classic' counterexample (bottom of page 51), it relies on using a condition "if C(x) ->> . âˆª by the greatest fixed point instead of a least fixed point. I am not familiar enough with the subtype calculus to comment on your version of the counterexample.

### Universes by another name.

This makes sense to me; thanks for the explanation. It looks like what you have done is essentially to add a universe structure to terms. The notation (Î»(x@k).a) (b) states that x is a term in universe k, and it is only possible to substitute b for x if b is in the same or lower universe.

I played around with a similar system in which the universe of a term is statically determined:

U(x@k) = k (variables are tagged with their universe)
U(f(a)) = U(f)
U(Î»x@k:t. u) = U(u)

The universes here mostly correspond to traditional notions of objects, types, kinds etc. It's easy to prove confluence given the following restriction:

Î»x:t.u well-formed if U(t) > U(u)

Unfortunately, that restriction rules out higher-order types with bounded quantification, such Î»X<:T.U, because T and U would be in the same universe. Higher order types would have to use kinding, e.g. List = Î»X:K. U, for some kind K in a higher universe. At the time, that was not a restriction that I was willing to live with, since it seemed to defeat the whole point of using subtyping to begin with.

Your depth construct is slightly different from my universe construct, since the depth of a term can decrease as it's reduced. That may give you some extra flexibility in assigning depths to variables, so that you aren't locked into a rigid object/type/kind hierarchy. I'd be interested to see what you come up with when you finish the proof.

### On fixpoints.

In the untyped lambda-calculus, the counter-example just uses the Y-combinator to find the fixpoint. I suppose that is the greatest fixpoint, because you can use the Y-combinator to write non-terminating expressions, although I'm not used to seeing the greatest/least distinction in an untyped setting.

IMHO, switching to the least fixpoint is going to require a type system, and I imagine it's going to require one that is not only sound, but also consistent, i.e. not one which has Type:Type. Universes would also do the trick, albeit with less flexibility.

### Step-indexing for strong normalization

Off a tangent, this technique of adding indexes inside terms rather than looks (closely?) related to recent (draft) work by Julien Cretin on using step-indexing techniques in the context of strong reduction, rather than weak reduction (not going under lambdas) as is usually done (where indices appear only at the top level).

In his system, applying a k+1-indexed abstraction (Î»^(k+1) x. t) to a (j+1)-indexed argument e^(j+1) is possible even when j>k, but the argument's indices are lowered during reduction to something like (t[e/x])^min(j,k) (where this lowering application is performed in depth, reducing all subterm indices if they're higher than min(j,k)).

See this blog post for an announcment of his PhD manuscript â€“ linking to a pdf and a full Coq formalization. The indexed calculus is described in section 5.2, and used to define a reductibility-candidate style soundness model for a rich lambda-calculus with strong Î²-reduction.

I hadn't looked at step-indexed reduction before, but the technique seems promising. The reducibility-candidate style proofs are much harder to do in a dependently typed setting, but the indices by themselves may be enough to prove confluence for indexed terms.

The key would be to show that for any well-formed term, it is possible to assign valid indices.

### Julien Cretin's idea is to

Julien Cretin's idea is to add 'fuel' to terms. This makes the reduction relation trivially strongly normalizing. Then he proves that beta reduction is confluent with enough fuel. And you can just add enough fuel to make this work.

However, this idea will not work for the symmetric-transitive closure. There you sometimes have to more than one step, i.e. a -> b -> c. There is no way to add enough fuel to a to guarantee that you still have enough fuel left after the step to b to also be able to use confluence for the b->c step.

The annotation idea outlined in my previous comment also fails for a similar reason. It is possible to prove confluence of reduction this way, as well as lifting a *single* reduction step to a depth annotated one. But lifting multiple steps turns out to be impossible.

My current idea is to annotate the reduction by the number of steps taken, rather then their depth. This should be preserved by confluence, i.e. if a --[m]--> b and a --[n]--> c then there exists a d such that b --[n]--> d and c --[m]--> d.

### A different idea.

I tried something similar, but didn't have much luck. However, I came up with an alternate way to make the proof go through. See my post below. I think you might be able to use a similar technique for eta-contraction.

### I now finally have a proof

I now finally have a proof of beta reduction plus 'D contraction'. D works as in the toy problem from the paper, i.e. "D a b -> a" and "D a b -> b" if "a * b". This is a simplified form of eta contraction of pairs, and I am confident that the strategy will also work for pairs.

Another nice thing about the strategy is that it ditches all the size/depth/fuel/index nonsense. Rather, the idea is that you only ever need two levels of nesting, since any nested D-contractions can be moved outward. For example, instead of having "D (D a b) c a" because ("D a b a c" because "a b" and "a c"); you can have "D (D a b) c D a a a c". (I am not sure if this make it any clearer, if not, just ignore the example :))

So, I have split the reduction relation into two, I and S. In the I relation, the D-contraction is of the form "D a a -I-> a". This relation is not confluent, but that doesn't matter.
In the S relation D-contraction takes the form "D a b -S-> c" if "a -S-> c" and "a * b". But only on the first component of D. It turns out that S is confluent, and that you can convert between I, S and the original reduction relation.

I put up the Agda code on https://gist.github.com/twanvl/8076766. Next I will try to adapt this idea to real beta+eta, and if I find the time, I'll maybe make a more detailed writeup.

### Counter-example to confluence of pairs

Unfortunately, I ran across the following counter-example to confluence of pairs with eta-contraction. I'm not sure exactly how it relates to your proof.

http://www.seas.upenn.edu/~sweirich/types/archive/1991/msg00014.html

### Declarative subtyping

Would it be possible to improve the theory by making "declarative subtyping" a lemma rather than an axiom? Since every function is ultimately built out of smaller computational steps, intuitively it seems like one can prove the lemma over well constructed functions. Any particular language would require examination of its primitive steps, but that's reality anyway.

### I'm not sure I understand.

I think that's what the paper does, assuming that I understand your meaning. All of the meta-theory is done in the algorithmic system, where subtyping is broken into small steps. The declarative rules are not taken to be axioms; there is a set of lemmas which demonstrate that the declarative rules are admissible in the algorithmic system. The sticking point is the transitivity rule, which is not admissible unless the algorithmic system is confluent.

### Has anyone ever come up with

Has anyone ever come up with a type system based purely on mutable assignment? The statement "A := B" means that any object bound to B could be bound to A, or "objects(B) subseteq objects(A)." Likewise, "B is A," means almost the same thing "objects(B) subseteq objects(A)." It seems like it might be better to focus on := than is.

### Program invariants

I think that would be quite challenging. Typing is done at compile-time. As a result, types capture program invariants -- things that are always true.

If you allow mutable assignment between types, then basic type relationships would change over the course of program execution, and might even differ each time you run the program. An object would have type B before the A := B assignment, and have type A afterwards. You would have a hard time establishing any invariants that were guaranteed to hold, and the end result would be a dynamic language, like Javascript.

### I think perhaps Sean is

I think perhaps Sean is suggesting bidirectional type checking where assignment is taken as a type declaration that the types of the variables on both sides of := unify.

### An Actor type needs to respond to an Extends[aType] message

An Actor type needs to respond to an Extends[aType] message asking if it extends type aType. Is it clear that unification is needed in practice?

### := actually isn't

:= actually isn't bidirectional. If a := b, a := c, a.F := d, then b.F := d, c.F := d, but b.F and c.F are otherwise unrelated.

### We don't allow mutable

We don't allow mutable assignments between types, we just model subtyping as assignment. We can be conservative about considering all assignments at once in forming our inferred types.

### I don't understand.

I'm afraid I really don't understand what you are asking. When you say A := B, are A and B types or objects? I assumed they were types, other responders seem to assume that they are objects.

Moreover, subtyping and assignment look like completely different operations to me; I don't see how one could be modeled with the other. You'll need to give a larger example of what you envision the type system would look like before I can make sense of the question.

### A & B could either be types

A & B could either be types or terms in the program, just as in this paper's approach. If A is a type and B is a term, it means B is of type A in the original language, if A and B are both types, it means B is a subtype of A, if A and B are both terms, it is just typical assignment. If A is a term and B is a type...that isn't sensical from a PL standpoint but the general theory would still deal with it.

### Notation for programming languages with first-class types

Below is a proposal for notation for programming languages with first-class types:
* anExpression:aType means anExpression has type aType (The value of anExpression can be a type.)
* aTypeâ‰º:anotherType means aType is a subtype of anotherType. (Both must evaluate to types.)
* aVariable:=anExpression means aVariable is assigned the value of anExpression (This works even if the value of anExpression is a type.)

### Not mutable assignment?

Assignment usually denotes equality, so x and y are equal after the assignment x := y. Using it to denote equality for terms, and an inequality (subtyping) for types, seems odd.

Mutable assignment is also temporal in nature. x and y may be different before the assignment, but are equal afterwards, and x can be assigned to different things at different times and in different scopes. I don't think you can "collect all of the assignments" at compile-time, without actually running the program, unless the types in your system are both simple and globally scoped. It's also going to be hard to build a sound type system if a typing judgement done at one time may be invalidated by assignments at later times.

I think your idea is far too radical to describe here in three sentences; I'll wait for the paper.

### Given x := y, x and y

Given a mutable x := y, x and y may not be referring to the same object before the assignment and, sometime after assignment, they might not again be referring to the same object; e.g. an x := z or even y := z executes sometime after x := y.

If you think about object flows, we can easily reason that the set of objects that could be referred to by x must be a super set (not an equivalent set) of those referred to by y. If you think of it that way, then its very easy to see the relationship with subtyping, which has the same behavior.

Note I'm only considering languages where terms are references to objects, not actual values themselves. I haven't considered the consequences of terms as values, but I suspect they are stronger but inclusive of reference semantics.

### That's a much clearer

That's a much clearer explanation; I think I finally understand what you're doing. :-)

The set of terms that x could refer to really is a set, not a type in the usual sense. In other words, rather than saying "x is an integer", you end up saying "x could have have values {1, 3, y+1}". That's fine, but working with such extremely precise types can be challenging in practice. E.g. alias analysis is major problem, and I'm hoping that the type of y doesn't refer back to x. Moreover, the set of values that a variable can acquire depends on the exact control flow of the program. E.g. if b==true, then x:{ 1,3 }, while if b==false, then x:{1, y+1}. You'll have to choose whether to merge the sets, or track the dependency.

Type inference algorithms for dynamic languages sometimes follow this approach.

### Now go backwards

For x := y, if objects(y) subseteq objects(x), then types(x) subseteq types(y) (the object set gets smaller as the number of types increases). Now, rather than make types(x) smaller in your analysis, make types(y) bigger instead. Incredibly feasible. The assignment analysis with a basic field sub-assignment rule and field re-assignment rule give us a sound treatment aliases to make this type system sound.

### Types as first-class Actors

Sorry, I misunderstood your question. Of course, types can be first-class Actors, including being passed around in messages and held in structures, etc. As another commentator pointed, this may require more checking at run-time. Also, parametrized types can become a lot more challenging. None of this necessarily has a relationship to variable assignment.

### ":=" is not good for the subtype relationship; use "â‰º:"?

":=" is not good notation for the subtype relationship. Maybe it would be better to use "â‰º:" instead where aTypeâ‰º:anotherType means that anotherType.Extends[aType] is true.

### I agree with Hewitt here.

I agree with Hewitt here. You're just modeling a set of subtyping constraints between types here, aren't you Sean? Isn't the fact that these constraints arise out of assignments sort of incidental?

### We also want to model := as

We also want to model := as typical assignment. We are just shoehorning subtype into the := relationship rather than have two different relationships that behave exactly the same way. In other words, if we can conquer assignment, we conquer subtyping also.

I'm working on a new version of the "Going Against the Flow" paper that should be ready soon.

### Need types for both variable assignment and identifier binding

Types are needed for both variable assignment and identifier binding. The rules should basically be the same for both.

And, of course, type Any should be the default type unless the compiler can abduct a stronger type. It is often more robust (although more verbose) to declare types than to rely on type abduction.

### "Type" is both of type "Type" and a subtype of "Type"

Type is both of type Type and a subtype of Type.
Even using parametrized types, type checking for programming languages is efficiently decidable.

It is a mistake to try to turn a programming language type system into a programming language.

### I believe I have a proof...

I [update: do not believe] I have a proof of transitivity elimination, and thus type safety, for pure subtype systems. The idea was inspired by Twan van Laarhoven's proof of the rewrite system in section 7, along with the subsequent discussion here.

I introduce explicit type casts as a technical device to make the proof go through. (t as u)_k casts the term t to the type u. Casts are eliminated by reduction:

   (t as u)_k  -->^{<:}   u   // (t as u)_k  is a subtype of u (promotion)
(t as u)_k  -->^{=}_k  t   // (t as u)_k  reduces to t.  (elimination)


Beta-reduction now inserts a cast during substitution:

   (\lambda x<:t. u)(s)_k -->{=}_0  [x = (s as t)_k] u


Both casts and applications are tagged with an explicit size k. Beta-reduction has size 0, but introduces a cast of size k. Eliminating the cast is a reduction of size k. A subtype derivation t <:_{k} u also has size k, and can only contain reductions of size k or smaller. Promoting a cast is is unconditional, but eliminating a cast is conditional on the premise that t is a subtype of u:

           t <:_{k} u
-------------------------------
(t as u)_{k+1} -->^{=}_{k+1} t


Notice that the derivation of t <: u is strictly smaller than the reduction. Subtype reduction is thus confluent, because the diagrams only introduce steps of smaller size.

The fundamental problem described the paper is a circularity between substitution and transitivity. The substitution lemma requires transitivity, but transitivity requires a substitution lemma. Using explicit casts breaks this circularity, by breaking substitution into two steps: (1) substitute, introducing a cast and (2) eliminate the cast. In my new proof, the substitution lemma only involves step (1), and thus no longer requires transitivity. Substitution over subtyping derivations will replace a single promotion x -->^{<:} t (of size 0) with a single promotion (u as t)_k -->_{<} t (also of size 0). Most importantly, the size of reductions and subtype derivations are preserved under substitution.

The final part of the proof is to show that, given a program in which all terms are untagged, it is possible to infer/assign a tag k to every term that requires one. That proof is by induction on the size of the well-formedness judgement. The size of an application or cast is k+1, where k is the size of the subtype derivations needed to prove that the application or cast is well-formed.

I'm working to formalize the proof in agda now, but it's slow going; this is my first time using agda.

### Still no proof.

The new proof doesn't work -- I should learn to keep my optimism under control. :-( Explicit casts solve all of the previous problems, including the circularity between substitution and transitivity, but, sadly, they introduce a different set of non-decreasing diagrams.