Why type systems are interesting - part III: latent types

The "part II" thread was getting too long, so here beginneth part III. For ease of indexing, the prior two threads are:

This post is a response to Frank's recent post, "I say we shall have no mo' "latent types"..." (here's the same post in context). The first couple of points below are tangential, but after that comes my defense of the syntactic aspect of latent types that I've been discussing.

Frank Atanassow wrote:
You are conflating dynamic loading of untyped code with dynamic loading of typed code. They are not the same thing, and untyped languages only support the first, and typed languages can easily accomodate dynamic loading of Univ code.
In practice, typed languages are not usually used like this, probably because of convenience and practicality. See next point.
I find it very irritating that people continually bias their arguments for untyped languages in this fashion, as if typed languages need to "catch up" or something.
Are there any typed languages for which highly dynamic IDEs exist, along the lines of those for Smalltalk or Self? If so, I want to play with them! If not, then what I was saying was not biased, since these are examples of "what current realistic implementations are capable of in terms of dynamic loading and so on." (Lauri Alanko's thesis seems relevant here.)
But this relates to the previous point: as long as DT languages ignore their type systems
Untyped languages don't have type systems.

I wrote "DT", you wrote "untyped". That's part of the stacked terminology deck here: I'm quite happy to agree that an untyped lambda calculus has no type system. But DT languages aren't untyped lambda calculi. A simple demonstration of this is the existence of soft type inferencers for languages like Scheme and Erlang.

Does the existence of these soft typers mean that Scheme and Erlang should be considered typed languages? Or are they untyped languages when you don't run the soft typer on their programs, and typed languages when you do? It makes much more sense to say that these languages have latent types, i.e. their programs have types which can be inferred by a soft typer. More importantly, and much more commonly, those types also inferred by the programmer.

Further, the situation is made even less black and white by the fact that an inferencer can't, without any hints, discover all the types - and I mean real, syntactic types - present in these programs. I'm not saying that all of the "program properties" which you referred to are types, but some of them are. An inferencer produces an approximation which can typically be improved by a programmer.

This is something else I find irritating, Anton. I spent many, many hours explaining exactly why and in what sense this is true,
If you have explained in what sense "Untyped languages don't have type systems" is true, then either my points above must fit into your explanation somehow, or one of our positions is erroneous.
and I thought at last that people had started to grasp what a type is, what it is not,

If people have really grasped what a (syntactic) type is, then what I am saying should make sense to them -- that is, even DT programs contain latent type information, as evidenced by the fact that at least some of this information can be mechanically extracted. Further, this information is depended on by programmers when reasoning about their code.

My opening forum post in the prior thread was intended to made the case that type systems were interesting because of their utility when it comes to reasoning about code, and that this kind of reasoning takes place in DT languages as well as in static languages, which is why type systems are relevant even to DT programmers. I still consider that to be true, and I don't think any good evidence has been given to contradict it.

As a nice example of this in action, this LL1 post quotes a claim by Joe Armstrong, talking about a type-checking phase being applied to previously written Erlang programs: "The type system has uncovered no errors. The kernel libraries were written by Erlang 'experts' -- it seems that good programmers don't make type errors."

Joe ignored the fact that even good programmers do make type errors, but that these errors are ultimately caught, even if only at runtime, and corrected. The result of this process is that even DT programs can, and do, end up with consistent type schemes. Of course, to statically prove that, we have to do something like successfully run a type inferencer on the code. However, even if you fall back on some kind of Heisenberg-like effect, e.g. "type systems/schemes don't exist until you formally specify them", the fact remains that human programmers - aided by a debugging process - "somehow" end up creating valid type schemes. I quoted "somehow" because I don't think it's any mystery - they're reasoning about the program's types, albeit along with other program properties, to come up with working programs, which tend to have (or even must have) valid type schemes.

and what the relationship between typed and untyped languages is.

I'm actually not sure what your explanation is in this respect. We've seen that Univ isn't a complete explanation, and I don't think I've seen an alternative explanation of the relationship from you. In your defense, the usual type theory sources don't have much to say about this either. It's fine if you want to say that you prefer to stick to pure syntactic types, where you have formally specified type systems and languages and there's absolutely no ambiguity. It's also fine if you want to say that you're happy to restrict yourself to being able to embed DT programs in statically typed languages using a unitype. But as soon as you extend your claims beyond these very precisely-defined ones, and want to say anything about the nature of DT languages and their relationship to types, then what I've been talking about enters the picture.

Now you are diluting my message with your talk of "latent type systems."

I'm trying to discuss the relationship between syntactic types and DT languages. There obviously is a relationship, and to me it seems quite a bit deeper than most type theory explanations give it credit for. Traditional type theory doesn't seem to offer much to explain the abovementioned Erlang experience, for example.

If this dilutes your message, I suspect it's only because the message needs to be more clearly delineated along the lines I mentioned above, i.e. that you're only talking about types that are fully formally specified. However, you can't claim that types don't exist outside those boundaries, only that our formal descriptions of them are often incomplete, etc.

As far as I can tell, your notion of "latent type" only amounts to program properties which might be extractable with some blue-sky theorem prover, or more realistically a proof assistant being manually operated by a type theorist.
...or with a soft typing system assisted by hints from a programmer, which is much less blue-sky. But I'm not just talking about some theoretical future possibility: I'm saying that DT programs succeed, in part, by virtue of their programmers creating valid latent type schemes for them.
In particular, as a notion of "type system", latent type systems fail the "tractable" and "syntactic" conditions which you yourself quoted from Pierce.
Soft typers seem to contradict this (although their developers might have something to say about tractability). In any case, Pierce characterizes his definition as "one plausible definition". Let's look at what else Pierce says:
Pierce: "this definition identifies type systems as tools for reasoning about programs"
This applies to latent type systems: programmers use them for reasoning about their programs.
Pierce: "Another important element in the above definition is its emphasis on classification of terms -- syntactic phrases -- according to the properties of the values that they will compute when executed."
As I've pointed out, the need to statically classify terms applies to DT programs as well as ST programs. It is impossible, in general, to reliably invoke operations without such classification.
Pierce: "A type system can be regarded as calculating a kind of static approximation to the run-time behaviors of the terms in a program."

The word "approximation" reminds us that not everything needs to be known statically. DT type schemes are certainly likely to be more approximate than ST ones - there are likely to be many more cases in which the exact operation being invoked is only known at runtime. But a programmer can only reliably perform operations with some knowledge of the properties of the values being operated on, even if it's "approximate" (e.g. expecting that an object accepts the "print" message). Do these properties always qualify as types? Presumably not - in some cases, they go well beyond what traditional type systems are capable of - but some of these properties do qualify as types.

Another way in which latent type systems mirror more traditional static type systems is that they're used to reject programs. However, the programmer has more discretion in this case than when rejection is performed mechanically. Rejection nevertheless does occur - there are many ways in which DT programs are designed to make sense from a type perspective, thus effectively rejecting alternative programs which make less sense. As I've previously pointed out, in some circles it's common practice to annotate functions with their type as a comment. Programmers constantly make design decisions which center around consistent usage of types.

I like Lauri Alanko's take on type systems, from his thesis referenced above, section 2.2:

Alanko: "A type system is a method for analyzing a program syntactically to prove the impossibility of certain erroneous conditions. It is based on types, syntactic structures that usually represent the sets of values which expressions can evaluate to. The type system works by assigning types to the expressions and variables of a program, and by checking that expressions are combined properly so that their types are compatible".

That entire description maps very well onto DT languages, with one change: "prove" usually becomes a weaker assurance, more like a theory of the program's types, which represents the best available knowledge at any given time, but may yet be falsified in future, requiring improvements to the theory. These activities are performed by the programmer, instead of mechanically, with assistance from runtime errors based on tagged values. These runtime errors are falsifications of the current theory, and their absence in the face of testing provides reasonable assurances as to the validity of the current theory of the program's types, among other things.

Physics vs. Mathematics

Based on the perspective in the previous paragraph, one can see DT programming as being more like physics, to ST programming's mathematics. Physicists are used to having theories that could be proved wrong at any time, by some previously untested special case. They even use theories that they know are wrong in some respect, that don't perfectly match observations. They put up with this because they have no alternative, because they are modelling "real" phenomena which have characteristics which can't simply be formalized away.

The areas which currently rely heavily on DT languages are rife with such characteristics, often having to do with complex interoperation between disparate systems, such as networks, databases, web servers, operating systems, and other programs. In these environments, the evolving-theory approach to development is often essential, and it's fairly natural that the type system aspect is simply fitted into this approach, even if theoretically better solutions exist.

Responding to one final comment of Frank's:

Instead of jumping on the type bandwagon, why not call your "latent types" "program properties"? Or, if you think they are different, please demonstrate how.
This isn't about jumping on any bandwagon. If the above doesn't explain my position sufficiently clearly, or credibly, tell me what kind of demonstration is wanted, and I'll see what I can do, within reason.

Comment viewing options

Difference Between a Testis and an Ovary

Anton, I'm a bit pressed for time now, but rest assured I will write a reply in the near future.

Diffinitions

In the meantime, I have a clarification which may save some time in the response (or not ;)

I wrote:

However, you can't claim that types don't exist outside [fully formally specified] boundaries, only that our formal descriptions of them are often incomplete, etc.

Strictly speaking, this is wrong, since of course you can claim that the only things that qualify as types are those that fit the appropriate formal definitions - indeed, that's more or less the default type theory position. I did address this point above, but I'll restate it a little differently: this position simply arises from the fact that you can't say anything about how a formal system applies to something outside itself, from within the system.

This leaves the onus on the latent type user to demonstrate that a program does indeed have types, and to define them sufficiently rigorously. Something as simple as including comments about the types of functions is a sufficient starting point, though.

But the question of whether latent types are actually "types" is a quibble - to have this discussion, about types in DT languages, we need to talk about something more specific than just "program properties". We'd have to talk about e.g. "type-like program properties", which, with appropriate definitions, can become types. One might call them proto-types, if that weren't an excruciatingly painful pun.

The point is that there are meaningful relationships between latent types, formal syntactic types, and runtime tagged values -- but one can't examine those relationships only from within the strict boundaries of fully-specified syntactic type systems. It's an inherently multi-system discussion. For the purposes of that discussion, it makes no sense to sever some of the relationships in question by choice of terminology, since that simply impairs our ability to usefully discuss the relationships that do exist.

Perhaps it would help to have some clearer distinctions between terms as used within the different systems -- I've been doing that by using the terms "latent types" vs. "syntactic types", and sometimes (perhaps confusingly) treating the term "type" alone as relating to the meta-concept which finds partial expression in all three arenas (latent, formal, runtime). Of course, that meta-concept is precisely what type theorists don't usually want to talk about, since it's outside their formal purview. However, while it's certainly important not to become confused about terminological boundaries, it's also important to be able to discuss how formal systems fit into a broader context, and to that end, the idea that a system boundary should cut off all valid usage of a term can be counterproductive.

Regarding dilution of messages etc., I'm willing to try to agree on some terminology that we can safely use without causing undue irriation on either side. I started using the term "latent type" because it has plenty of precedent, and also conveys the intended meaning very well, i.e. that it is intended to refer to "real" types which happen to be latent in a given program.

"What is a type system"

Anton writes:

But the question of whether latent types are actually "types" is a quibble - to have this discussion, about types in DT languages, we need to talk about something more specific than just "program properties". We'd have to talk about e.g. "type-like program properties", which, with appropriate definitions, can become types.

A similar view was championed by Patrick Cousot in his 1997 POPL invited paper, Types as Abstract Interpretations. He starts with a standard semantics of the untyped lambda-calculus, and then designs Curry-style type systems (including familiar rule-based presentations) using abstract interpretation.

The teaser is from Section 14:

The question "What is a type system" has hardly received a formal answer. Type systems can be viewed as abstract semantics in the lattice of abstract interpretations which are more abstract than a type collecting semantics which is the most general type system in that it is more precise than the reduced product of all existing type systems.

The paper is quite dense, and the reader is expected to have some prior knowledge of abstract interpretation.

Thanks

Nice reference, thanks, although I'm going to need to take a few months off to get through it. IIRC, Paul Snively made some reference to this sort of work -- the idea of a DT to ST slider based on partial evaluation -- in one of the earlier incarnations of this thread. Perhaps Frank would be happier if I expressed my point of view in terms of abstract interpretation. In fact, I did that to some extent in this LL1 post last year.

Continuous sliders

This post looks relevant (look for "continuum").

Nice catch!

I think it's worth "reading into the record" in the current discussion, so: in this post, Frank Atanassow wrote:
Also, I think that in twenty years there will no longer be any debate about static vs. dynamic typing, and that instead most languages will instead provide a smooth continuum between the two, as people realize that there is no fundamental dichotomy there. Also languages will have more than one level of types: the types themselves will have types, and so on. But we will be able to treat types in a similar way to the way in which we treat values: we will be doing computations with them.

I wonder if one of the "levels of types" in this picture might be latent types? ;o)

Another comment of Frank's which is right in line with what I think is "I have a feeling that DT languages are not at the 'end of the line' and could go further by adopting online partial evaluation techniques".

Just to Expand a Bit

I completely agree with both Frank's quote and Anton's response to it. Let me see if I can also find the old thread about how such a "tower of types" might look, with Tim Sweeney and Frank ping-ponging. Ah, here it is. I guess it's no secret that I'm basically following Tim's lead here, having read the Ontic paper and found it just as compelling as Tim did. That's not to say that Frank's line of reasoning lacks merit, but rather that I'm not competent enough in the requisite theory to see the value in a non-set-based approach to the subject. But it's worth noting that both approaches seem to result in a kind of "tower of types" that one can construct terms around, and if these types are first-class and can even be defined/checked at runtime and we wish to call these "latent types," I find nothing wrong with that.

Heck, I don't even have anything against latent types as they're defined today. I only have a problem with claims that static types can't do things that they have been demonstrated to do, or that latent type systems somehow represent a proper superset of static type systems. That's all. Maybe if I posted more about some of Oleg's Scheme stuff people would believe me. :-)

I agree with Frank...

...because interpreting types as sets of values breaks down badly in the presence of polymorphism. Informally. the reason is that if you interpret a type as a set, quantifying over a polymorphic type means that you are quantifying over the class of all sets, and that is "too big" -- you can do Russell's paradox-style tricks.

Sets of What?

Tim Sweeney posted a while back on that very subject. In the context of Ontic, there's apparently an interplay of the notion of types as sets of potential values, along with some other constraints on its semantics, that prevent the collision with Russell's paradox and its ilk.

Yeah, and you can also...

...do things like interpret types as domains. But whatever you do, it's no longer the case that the element of a set is a value, and you can't use ordinary set-theoretic to reason about types. It just seems cleaner to me to bite the bullet and adopt a categorical semantics, and treat any pointwise reasoning you have to do as operations in the internal language of whatever categorical thing you're using.

Latent types aren't first class

and if these types are first-class and can even be defined/checked at runtime and we wish to call these "latent types," I find nothing wrong with that.

I'm not sure if we're using the term "latent types" in the same sense. I've been using it in the context of "syntactic" type theory, as defined by e.g. Pierce in TAPL. In that context, types are syntactic, and apply to program terms. The things you can test at runtime are tags, related to but not to be confused with types (because they're not completely isomorphic to each other). Latent types aren't first class, even if the language has some first-class representation of its tags (and note that e.g. standard Scheme has neither first-class types or tags, although it supports runtime checking of tags).

The difference between a latently- and a statically-typed program from this perspective is that the statically-typed program has a fully formally-defined type scheme. The latently typed program does not have a formally defined type scheme. It may have a partially-specified type scheme, perhaps specified in comments, as is common in Scheme and Erlang. But even if its types are never written down, programmers still rely on implicit, static, type-like information in order to be able to write code - it's how they know that an object referred to by a particular variable will accept a particular message, for example.

Programmer in dynamically-typed languages tend to think about types in first-class terms, as runtime properties of values, but in Piercian terms, those runtime properties are tags, and those programmers are still in fact reasoning about types when they write e.g. "x.foo()", knowing statically that the values that flow through x at runtime will accept the 'foo' message.

Part of the reason I was focusing on this back when we were having this discussion is that many people on both the static and dynamic sides seem not to appreciate that both sides deal with the same kind of static type-like information. I started these threads by responding to Luke Gorrie's question about why type systems are interesting, and my point was that even the most dynamically-typed programs contain this latent, static type-like information, and that we depend on it for reasoning about code.

In fact, even statically-typed languages may contain latent type information. In OCaml, if you make use of structural subtyping by using a set of common methods across multiple unrelated classes, you've effectively created a latent interface or type, just as you might do in Smalltalk or some other dynamically-typed language. In a Haskell program, you would be forced to encode this kind of latent information in the type system, e.g. in the form of a typeclass.

Similarly, if you use a data type for tagged values, like Univ, to encode a latently-typed program in a statically-typed language, the nature of the latent type information doesn't change - it remains latent. Again, we have a situation in which an apparently statically-typed program has latent type information - potentially lots of it. This implies that programmers in statically-typed languages should be aware of latent types, although in their case, their response to it may often be to attempt to encode it fully the type system, so that it is no longer latent.

I only have a problem with claims that static types can't do things that they have been demonstrated to do, or that latent type systems somehow represent a proper superset of static type systems.

In the sense I'm using the term, I certainly claim that latent type systems are an informal superset of static type systems. Static type systems can be seen as "merely" particular formalizations of a much larger class of less well-defined latent type systems. Being informal, this relationship may not qualify as a "proper" superset, though.

Typed Reflection

I just wanted to follow up extremely briefly to let you all know that I still need to read this paper. I skimmed it very briefly the day it was posted, and noted the bullet points, which basically say that typed reflection is both necessary and possible. I found that very heartening and look forward to reading the full paper.

I haven't read of course all

I haven't read of course all the posts on the topic of types, so excuse me if I say something already said.

From my perspective (the perspective of an average programmer), a type system is just a safety net for correctly managing values.

Any property of a value (either data or code - it does not matter) can be expressed as a constraint of a type. Therefore, type systems are interesting because they allow us to statically prove dynamic properties of a program.

For example, C's 'const' attribute is a type constraint: it makes sure a variable is not modified. Java's 'final' attribute is also a type constraint: it makes sure a method can not be overriden any more.

I think that types can also be used for proving if a program terminates or not. I have suggested in another post in LtU a system of annotating variables with constraints of possible values that would make C-like code safe, as in Cyclone as someone else has indicated.

In a way similar to 'const', one could say that a variable can only be, for example, positive, or takes the values [1, 2, 3] or a pointer is NULL after a piece of code executes. Just like 'constness' ripples through out the program, so can the rest of the constraints.

I don't think a complicated theorem prover is needed for such a type system. Just like 'const', a simple matching algorithm is enough: two pieces of code can not co-operate if their properties do not match.

Simple matching will fail to

Simple matching will fail to prove compatibility of types if it is too simple. For example imagine that I am trying to do this:

void foo(int : [1,...] y);
...
int : [5,6,7] x = 5;
foo (x);


Textual matching would fail to detect that {5,6,7} is a subset of the positive integers, and hence that x can be passed as the argument of foo. This is why type systems are not trivial.

I think textual matching is e

I think textual matching is enough, if the following substitutions exist:

5 => positive 5
6 => positive 6
7 => positive 7


Suppose that our function is:

void foo(positive n);


Then the following is valid:

foo(5);
foo(6);
foo(7);


because '5', '6', '7' can be substituted by 'positive 5', 'positive 6', 'positive 7' respectively.

The system above could be generalized with any symbol. Let's suppose we have variables, instead of numbers:

let unsigned n1 = 5;
let unsigned n2 = 6;
let unsigned n3 = 7;


with the following substitutions:

unsigned N => positive N


by simple textual matching, we have the following substitutions:

unsigned n1 => positive n1
unsigned n2 => positive n2
unsigned n3 => positive n3


Therefore the variables n1, n2 and n3 can be used in any context that a positive number is required.

OK, but all that shows is tha

OK, but all that shows is that for a single type constraint you can add some substitutions that perform a match. This does not show that richer type constraints can be accommodated in this way.

To have an extensible type system based on such substitutions you would have to allow users to add their own substitutions. Now, you have to make decision about what substitution rules you allow, and whether or not you want to constrain that so that so that type checking can be proven to terminate.

Doesn't Ada already allow constraints to be applied via subtyping? I'd imagine that provides a fairly good existence proof that "a complicated theorem prover" is not required.

Run-time checks

I read on the same page that

conversion is checked at runtime

so it's not purely static.

Type vs Subtype

To be honest, I know little or nothing about Ada (Ehud? can you help here?). However, my impression is that the runtime checks are carried out only in the case of converting between different Types, but not in the case of different Subtypes of the same Type (i.e. constrained version of some larger Type).