A New Kind of Type System

I've been working on this note to describe the type system I've been working on that collects many of the ideas I've been working on (see here, here and here, and here). Now that I actually have something fairly complete, feedback would be appreciated. I'm currently implementing this in my live programming work and it will hopefully be in my next prototype, though the ideas are relatively unrelated so far. Abstract:

This note introduces a novel type system that is static, annotation free, and object-oriented with class-like mixin-style traits. Inference in this YinYang type system diverges significantly from Hindley-Milner by computing graphs of assignment relationships rather than principal types, hiding values and preserving connectivity in encapsulated graphs. Inferred types are then useful not only for checking purposes, but also in allowing code completion to specify program functionality in a relaxed order.

Again, the note is hosted here on GitHub.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Copied comment

Copied from another submission, by Ray Dillinger:

Interesting...

I like the way this makes a lot of type/method boilerplate relatively implicit, incremental, and emergent. It's a good approach, I think.

One thing that bothers me about it is that I'm afraid the existence of emergent conflicts might be too subtle for a lot of programmers to sort out in some cases. You can help them if you make error messages really specific and explain *why* the typechecker infers that something is of a particular type, or of each of two conflicting types.

The method implementation, if I understand it correctly, is in your system a link connecting the type and its interface for some graph theory that you use to typecheck. You ought to formalize the 'laws' of this particular graph theory, and explain it with node/edge diagrams and examples.

So you take the 'connection' and use it to infer that the other methods of that interface, even if not already visible, will be valid on that type.

When you declare that something implements a method for a given type that makes it exclusive; It means that no object can be of both that type and of another type that provides an implementation for that method. So in your example, something could be both a 'cat' and a 'dog', while the cat has an explicit method implementation, right up until the point that a different method is declared explicitly to implement that method for 'dog'. This is reasonable and logical on its face, but it will confuse some because they now get errors not local to the thing they were working on, and the evidence that something is both a 'cat' and a 'dog' in the first place can be hard to follow if it's more than a few steps.

You can help them if you

You can help them if you make error messages really specific and explain *why* the typechecker infers that something is of a particular type, or of each of two conflicting types.

I can do a bit better than that I think by inlining error messages directly into code as part of a live programming experience. They might have to navigate between multiple statements (visible in their scope of course) to debug the type error; inference and a flexible type system definitely makes things harder! I have not designed the visual experience for type error debugging yet, however.

You ought to formalize the 'laws' of this particular graph theory, and explain it with node/edge diagrams and examples.

Definitely! I have one heck of a conference paper to write about this. It doesn't help that much of the existing type theory out there is based on term structure, not graph theory. However, I've done this before for Jiazzi, so I think its doable. It also turns out that any type theory that involves nominative subtyping is also based on some amount of graph theory.

This is reasonable and logical on its face, but it will confuse some because they now get errors not local to the thing they were working on, and the evidence that something is both a 'cat' and a 'dog' in the first place can be hard to follow if it's more than a few steps.

I used to call this explicit "exclusive" inheritance, but realized that exclusion could be established implicitly when two traits implemented the same abstract method. A type error in YinYang then always consists of (a) two incompatible traits extended by the same object and (b) a set of local assignments that cause either trait to be extended. Exposing (a) and (b) to the programmer is key to making the type system usable, but as I stated already, I don't know how this UI looks like yet.

Inlining errors into the code

There was once a BSD command called, I think, errors (Google search is unavailing, as you can imagine) which did precisely that. It read standard Unix compiler error messages of the form "filename:lineno:column:text" from the standard input, opened up "filename", and inserted "text" after line "lineno". The error was turned into a one-line comment, using the extension of "filename" to guess the correct comment syntax.

I once wrote a version of it that used ed to rewrite the files. It was necessary to sort the error messages into reverse order so that the generated "a" commands would have the correct line numbers.

In the C shell, you could write make |& errors; in a Posix shell you have to say make 2>&1 | errors.

Thanks! I'll check these

Thanks! I'll check these posts out from work tomorrow (blogspot is unfortunately blocked here) and get back.

inferred row polymorphism?

This seems to be an extension of the concept of row polymorphism to type/object/state inheritance and construction (read as a complement; I like extensions of excellent systems). That is, where row polymorphism accepts use site variance, YinYang treats it as a point of assertion, constructing the structural definition from the implicit name-use constraints. YinYang does for specification-definitional simplicity what RP does for use-site specification.

Then again, I've fallen for row polymorphism recently so I'm seeing its face in every crowd. Does somebody have a distinction between the systems I'm missing?

That is a quite useful

That is a quite useful comparison. I don't think YinYang is very close to row polymorphism because it is completely nominative, but I'll try to learn more about the topic to be sure.

I don't understand that.

I'm not able to relate this system to row polymorphism in any way. Maybe we're thinking differently about what row polymorphism means, or maybe you're having an insight about this that I haven't followed.

Can you explain what you mean here?

An explanation, sadly long-winded

Here's the analogy that I saw, with apologies in advance: (1) I'm very good at mixing up (mixing in?) concepts under names that are inappropriate, (2) I think in Lisp syntax so I'm not quite following your YinYang syntactic conventions, and (3) this ended up a lot longer than I'd hoped.

From your YinYang post: "what class an object extends depends on what it does"
The analogous Row Polymorphism style: "the elements required of input types depend on names used"

I have seen the view of row polymorphism that I would characterize as "implementation-oriented", where RP is an indexing trick to expose subsets of a product type. I.e., the type (a * b * c) would carry around the map (a->1, b->2, c->3) so that a use-site requiring (a * c) could look up the appropriate indices in the type it is handed.

I'm here using RP in terms of its type-requirements inference rather than its implementation. That is, seeing a function use (a * c) tells us that it can use any type exposing "a" and "c". To repeat the example from the post:

(def foo (a b)
    (+ a b))
; |+| must be a function of a, b
; for example
(def |+| (a b)
    (_add_ a.as_number b.as_number))
; so now we know that "a" and "b" need an "as_number" named property
; thus, we can infer that "foo" is polymorphic over any types with "as_number" property  

We can apply this kind of reasoning to the ThicknessListener function:

(def ThicknessListener (w)
    (if w.Pressed
        (assign w.Thickness 10)
        nil))
; "w" must have "Pressed" and "Thickness" properties
; deriving type (w::(product Pressed Thickness))  

What I see YinYang as extending this RP-like name-polymorphism inference is the capability to infer construction, and not merely inferring possible uses. Thus, if we use the ThicknessListener as so:

I see YinYang extending this in two ways

  1. namespaced product types through the trait extension hierarchy
  2. inferred type construction

We can see the first (namespace names) in the trait definitions:

(defTrait Widget () ())
(defTrait Bordered (Widget)
    (Thickness : double))
(defTrait Button (Widget)
    (Pressed : bool)) 

These imply that the named-trait "Pressed" can only be inferred in some type already declared or inferred to be a "Widget".
In RP, I might make a type (say, "Cylinder") with "Thickness:double" and "Pressed:bool" properties, referring to an actual physical object made of pressed metal rather than a graphical widget.

We can see the second (inferred construction) in this hacky use of the above:

(let
    (w (new Widget))
    (assign w.Pressed false)
    (ThicknessListener w) )

The ThicknessListener invocation exports the need for a type that has Pressed and Thickness properties; this can be validated against the declaration of w as a Widget type. Then, this export informs the instantiating environment that it will have to construct a type with the given traits and their initialization/methods mixed in. A (merely) row-polymorphic world would need a full declaration:

 (let
    (w (Thickness Pressed))
    (assign w.Pressed false)
    (ThicknessListener w) )

And doesn't have any native way of delineating identically-named properties.

This is quite useful,

This is quite useful, thanks!

The fact that YinYang traits are nominal allows us to infer construction, while the structural approach have no hope of doing that (it doesn't even make sense).

But structural typing also has a problem with large libraries and code completion in general. I mean, when all you care about is if some string of bytes is equivalent, what does that mean for a code completion menu?

Hmm, okay.

I guess I think of row polymorphism primarily as a shorthand for composition of application. No type inference required (or allowed!) in the usual case, because typing is something of a Gordian knot under row polymorphism. Every function in the row takes a function as its argument and returns a function as its return value. Further, every function they take (and return) also takes (and returns) functions as its operands (and return values). That has already put us beyond the power of many languages' type systems to express any difference in function types.

But in a language with an absolutely complete type system, we may still have some generics with superposed distinct type signatures, either as functions in the application or as arguments/returns along the application row, or both. In that case an inference process may deduce which type signatures among them can be used.

However, you need the information from the ending type of the row, backpropagated through an arbitrary number of intermediate functions, in order to statically determine what the type-signature of the function something ought to return (and therefore the choices of the type signatures available of the function returning it and the choice of possible function types all of those can take as arguments) needs to be; the type of the argument alone is not sufficient.

The unfortunate things about backpropagating type inference on generic functions under row polymorphism are first that it is hard to guarantee that the inference process gives a unique result, second hard to guarantee that it gives a result at all without actually performing it, and third that it may require exponential compute power relative to the length of the longest application row between any two monomorphic functions.

It simply had never occurred to me to think of type inference on generic functions as part of an implementation of row polymorphism, nor even really to consider generics with multiply-typed definitions to be compatible with row-polymorphism. Now that you have drawn it out I see how the type-inference process required would be related to the type-and-trait inference here.

I'm sure that I've ever seen that type-inference process actually implemented on a row-polymorphic system anywhere, nor worked with any row-polymorphic system that allowed generic functions at all for that matter. I'm somewhat amazed that you were even able to conceive of it.

I guess I was having a blind spot due to my preconceptions about the limited expressiveness of most type systems and the incompatibility of these paradigms. Thanks for explaining.

Bear

A better name

I'm going to turn this into a FOOL submission, anyone have an idea for a good name for this? The current one is too pretentious, but I want to push that this is a new point in the type system design space (I think) with real benefits.

How about The Constraint-o-matic Typenator

?

Where are the constraints

Where are the constraints though? Back propagating requirements?

Modular object and type inference with assignment graphs

Sounds a bit boring, but at least descriptive.

If you're the sort who'll go a mile for an acronym....

If you're looking for a good acronym, you're pretty close to MOTION or MOAT something.

Modular Object and Type Inference On Naming Association - MOTION-A?

The Yin and Yang of Modular Object Type Inference

Better as a working title. Here is the working abstract to go with it:

Type inference does not deal well with languages that support nominal subtyping: only local type inference often works reliably without costly global analyses. We present the YinYang language where all program term and object types are inferred with full support for encapsulated mutable fields, nominal subtyping, genericity, easy overriding, downcasting, along with local reasoning about type errors and other forms of semantic feedback. Inference supports these good OO features by forgoing typical unification that computes types, principal or otherwise, and instead by tracking type constraints in term assignment graphs that modularly deals with fields. Required mixin-like trait extensions are then propagated along these graphs towards assignees, providing a basis to form objects and provide semantic feedback; e.g. detect incompatibilities. This paper describes our design, trade-offs made for feasibility, and implementation.

micro feedback

I offer this in the spirit of getting you the best abstract in the end. If you want someone to nitpick in minute detail, here's feedback aiming to improve clarity without much rewording. Basically, this is my reaction on first reading, with each impulse I felt to alter phrasing or punctuation. I'll explain each suggested change. So far I don't have a fix for my impression that modular is undefined where it appears first in "non-modular", but I suppose an earlier extra sentence would work. Feel free to take this any way you find useful. (Yeah, I know this is weird; just consider it my idea of Friday evening amusement.)

Phrase "with having to devise them" seems awkward, and I like a colon better in one spot:

Explicit type declarations enable semantic feedback, but burden programmers with devising them: a high cost when complex types are considered.

Formula rather than X, better Y usually implies some parallel construction, with something that stays the same, like a common subject—possibly implicit. It bothers me the subject of forsake is not repeated in the next phrase, to the extent I re-read multiple times to parse any sense I might have missed. When I look for the referent of "both worlds" I can't find two things. Use of via implies something cooperative, but I think you mean dynamic typing interferes with semantic feedback. If so, here's how I might say it:

Rather than losing semantic feedback under dynamic typing, use of type inference can deliver the best of both worlds—but it's limited by subtyping and intricate relationships in object-based programs.

I also cut definite article the before intricate relationships above because it functions as a back reference. This way proposes the idea now; the original phrasing implies everyone knows that already which gives a reader a chance to start resisting.

Next I make three changes for flow reasons only. The word only often works better in a different spot than one's first impulse. Swapping reliance for relying is probably an attempt to avoid repetition of gerund verb forms. Making analyses singular makes it more generic to me, whereas plural sounds like you have specific ones in mind.

Object-oriented (OO) languages can then be limited to inferring local types only, or reliance on ad-hoc non-modular analysis.

I already noted I don't know what modular means there, so I'd fix it but don't know how.

A hyphen in object-types strikes me as wrong, changing the sense slightly. I think you really mean two words there. When multiple words denote one concept, hyphenation helps a reader parse that correctly. Here I think you mean types of objects, and not more specific jargon.

We present a novel OO language called YinYang where object types are completely inferred.

The next sentence is about a list of four things, but I didn't get that at first. You might re-order those four things, or use semi-colons as list-item separators, or something else to presage list structure. A first complex item in a list is problematic. I assume you wanted semantic feedback to go first. Numbering also works but is heavy-weight. This is just a stab:

Our type system fully supports separate compilation with local reasoning about encapsulation; mutable object fields; subtype and parametric polymorphism; and finally semantic feedback like Intellisense and type errors.

Phrasing infers not just types is slightly awkward, and once again definite article the looks like a back reference when you're introducing class extensions. Note use of as well here only works when removed from the preceding sentence.

YinYang also infers not only types, but class extensions of both object and class definitions as well.

Something awkward happened again with rather, so this is just an attempt at smoothing. (But I still wonder what a modular graph is here.)

For such features we track type requirements rather than type provisions, in a modular graph of assignment relationships.

This arrangement gets it closer to referent type system, hyphenates trade offs which should not be two words, and loses definite article the again because it introduces existence of tradeoffs rather than refers back to them.

This paper describes our type system and its implementation, with trade-offs necessary for feasibility.

If this sort of feedback isn't useful, just ignore it, and warn me off if you hope it never happens again. :-)

Thanks for the feedback. I

Thanks for the feedback. I fixed some of the writing, but its still kind of early; I'll probably re-invent the abstract a few times before I get it right. I eventually want something edgy and confrontational; I prefer to make very strong statements in abstracts.

This is just a first step before I can write the rest of the paper.

TACE tool.

The section on determining types as a set of constraints over a graph reminds me of the work by Pirkelbauer and Stroustrup (http://www.stroustrup.com/sle2010_webversion.pdf) on automatic "concept" deduction in C++. The idea is that given a set of concepts (roughly analagous to traits) and a function is it possible to recover the set of concepts that the function uses. Their answer was that it depended on how well-written the concepts were.

Possibly similar to field inference in BitC?

I'm not sure if this will turn out to be useful, but here goes.

In BitC, as we tried to sneak up on objects, we concluded that there was a way to look at object field (slot) references (whether method or data) as a constraint typing problem. This, given something like


    def Foo(a, b)
      return a.m + b

we would infer that a.m and b must be of the same type 'b, that this must be some type 'b that admits plus (a type class constraint), and that there must exist some type 'a satisfying the constraint


    has-field 'a "m" 'b

where by "m" I mean the literal symbol m That is: the type 'a must have a field named "m" that in turn has type 'b. Note this teeters on something similar to atoms in LISP.

Because BitC was already framed heavily in terms of type class constraints, and already had a notion of literals as constraints, this was a natural extension of the type system that was already in hand. For a while, it was tempting to think that this would be a sufficient alternative to inferring sub/superclass relations. But it has some of the flavor of gathering a ring of dependencies in a way that works better for interactive prediction use, and I don't think it introduces anything that conflicts with sub/supertype inference.

Anyway, thought I'd throw it out there and see if it provoked any useful thoughts on your part for YinYang.

This is quite useful, I'll

This is quite useful, I'll check it out. BTW, bitc-lang.org seems to be down.

Thanks - fixed

Thanks for the heads up. I rebooted that machine last night on the way out of the house, and apparently it didn't come back properly. Should be resolved now. Sorry for the inconvenience.