Why type systems are interesting - part II

The type systems thread is getting ridiculously long. Please continue the discussion here...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Boy that worked well....

I'm responding just to bump this back to the top.
Maybe posters will notice on the second try.
;-)

Recidivism

Anton van Straaten: I disputed this point, and Frank agreed with me: "It's true that Scheme, Perl, Python, etc. have a more concise syntax for untyped programs". That translates to their being more expressive for those programs. That's the state of the art right now.

I have to imagine that you realize that when folks like Frank and I say "expressive," we are using the term formally, whereas your use of "more expressive" here is as a synonym for "more concise," and a major chunk of my point was that Frank and I had indeed already stipulated that untyped languages have a syntactic convenience benefit with respect to untyped code vs. languages like Haskell. Indeed, this argument seems tautological to me.

Anton: I'm saying that until these languages have been demonstrated being used successfully in the scenarios where untyped languages excel (e.g. here), whether untyped languages are a proper subset of typed languages is not very relevant to actual practice.

OK, I followed the link, read the three pages, and can't see anything there that untyped languages do any better than a good statically-typed language. I happen to prefer O'Caml, personally, so when they talk about Twisted and Zope and so forth, I can talk about Ensemble and WDialog and so forth. I'm sorry, but it just seems like every time the dynamic language apologists put forth their position, it reveals that they haven't actually done much research into what's available, today, on the static typing side, nevermind what's on the cutting edge.

Anton: You quoted someone at Cambridge (maybe Peter Sewell, but the link was messed up):

I'm afraid that my conclusion remains what it has been for the past 3-5 years: the real problem is that once those committed to rigor deliver results, they discover that industry still can't be bothered to be rigorous.

Good heavens, no, that wasn't a quote! I said that, and regrettably, it remains my overwhelming impression. Basically I keep seeing a chicken-and-egg problem: folks in industry keep saying that they want static typing proven in the real world; static type theorists keep delivering more and more practical results such as the work on C and UDP or phantom types providing type-safe wrappers around type-unsafe C APIs; we keep seeing adoption refusal on the industrial side, typically with a lot of verbiage about how the issues are technical when in fact they're sociological; and the project failure rate in software development doesn't budge an iota. At some point the type theorist begins to wonder how serious the untyped language apologists are about wanting improvement. At least I do.

Anton: This is more of the attitude that I'm arguing against: because they don't get the answer that makes them happy, some of these proponents of rigor assume the "other side" must be at fault.

Mea culpa, mea culpa, mea maxima culpa. When people like Frank and Brynn and I provide dozens and dozens of links to real tools, real systems, things you can use today, even such mundane stuff as streaming MP3 servers written in statically-typed languages, and some of the other links are to collections like the Caml Humps, which themselves contain hundreds or thousands of other links to real stuff (applications, frameworks, libraries, you name it), and even manage to point to formal work on such prosaica as C and TCP/IP sockets, and it's still dismissed with a handwave, or more likely still, not even referred to in the reply, I am left with no choice but to not merely question my interlocutor's familiarity with the material, but also their commitment to becoming familiar with the material. Certainly there hasn't been anything resembling an effective rebuttal of any of the points we've made; there's just been a lot of repetition of the claim that static type theorists haven't proven the value of their work. Well, if you won't download it and try it, there's really not a lot more to say.

Anton: It never seems to occur to wonder whether there's something that's been overlooked in terms of real-world requirements.

Nothing would please me more than to see a set of real-world requirements from the untyped language side. What we get instead are "How does your static language handle a list that contains objects and a dictionary?" That's not a requirement; that's a tautology.

Anton: It's easy from the academic side to get infatuated with the beauty of a narrowly-focused theoretical system, forget about the hairy issues of integration with the rest of the world, and then blame everyone else when the results aren't universally acclaimed as being useful.

Yeah, that's why we get stuff like Why Functional Programming Matters, Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell, and the recently referred-to Rigour is good for you and feasible: reflections on formal treatments of C and UDP sockets. Clearly, these are just academics unconcerned with the requirements of the real world.

Anton: Muttering about "industry still can't be bothered to be rigorous" is just the easy way out.

But I see no evidence that industry has actually made a good-faith effort to determine whether some of the practical work that's been done in static typing is useful to them or not. Since that's a generalization that can't be proven one way or the other, and at the considerable risk of personal insult, I see no evidence that you have made such a good-faith effort. What I see is the same decade-old handwaving about how the Univ-based code that you write in an untyped language is awkward to express in a statically-typed language. My question is: are you willing to learn how not to write Univ-based code and see if it's advantageous to you? The problem with that is that it might be a non-trivial time commitment to do so. But since we're on the subject, keep in mind that I'm not a CS or PLT grad student; I'm a guy who writes C++ and Java for a living and wrote all of his recreational stuff in Common Lisp or Scheme for well over a decade; I'm very familiar with the benefits of untyped languages. My perspective is very far from academic. I went through the very process that I'm asking if you're willing to.

Anton: Saying that the answer to the need for dynamic behaviors today is to press on with the development of more rigorous typesystems may be a theoretically valid answer - time will tell how complete an answer can be reached this way - but it's of no use to someone evaluating languages today.

Time will not tell how complete an answer can be reached this way; those results are already in. Here's the summary:

  1. Untyped languages are a proper subset of typed languages.
  2. Type inference for generalized algebraic datatypes is undecidable, so in a sufficiently expressive type system, some annotation will always be necessary.
  3. For statically-typed languages to model popular use cases of untyped languages requires a "sufficiently expressive type system" in the sense used previously, so some level of type annotation will be necessary in such languages.
  4. Statically-typed language design has, to date, not prioritized syntax design in such a way as to address the previous point.
  5. By using tools such as hygenic macros, they can.
  6. The converse is not true (untyped languages can't mysteriously gain the benefits of statically-typed languages).
    1. "You cannot procede from the informal to the formal by formal means." — Alan Perlis
As for being useful in evaluating languages today, it clearly is valuable: some of us have made the switch from untyped languages to typed ones and have direct experience of the benefits. I keep waiting to hear from an untyped language apologist who's tried, e.g. O'Caml and ended up at non-sociological reasons for not switching. Now at this point let me hasten to remind the reader that elsewhere I've indicated that sociological issues matter, and that when the technical differences between a good untyped language and the currently-in-production statically-typed languages are epsilon, as I believe them to be, sociological issues dominate. My only complaint is when people refuse to acknowledge that that's what they're basing their decision on, because the outcome then is never to revisit the technology as it improves, because the fantasy is that we're already evaluating the technology as opposed to the cultural issues.

Anton: Something similar goes for systems like Acute, as well as many of the other references you've given. Acute is described as an "experimental" language.

Acute is a better example of your argument than most, because they apparently don't yet have any code to download! So I wouldn't even be comfortable yet calling it an "experimental language."

Anton: So great, in future all these hard problems are going to be solved by advanced type systems.

When I can download them and use them, I stop saying "are going to be." The question then changes to technology uptake rates, and I've also already written elsewhere that I'm aware that the average new-technology uptake period is a decade. I still question whether that doesn't mean that the best strategy for an individual to adopt is to pursue those technologies now so as to be fluent in them when they're mainstream. But that's clearly a different observation than to ask "Why isn't your employer using this today?"

Anton: But that doesn't make, say, OCaml any more attractive, today.

More attractive than what?

Anton: I can't count on Acute's functionality being integrated into OCaml, ever.

Unless, of course, you were to do the integration yourself: O'Caml is open source and the Acute papers are there for the taking. :-)

Anton: Plus there are a dozen other capabilities that have been raised - which language is going to be the winner that ends up having the most useful combination of features?

I've stipulated multiple times that the language that combines all of these features doesn't exist. You seem to believe that this means that no current statically-typed language is worth pursuing, even as an individual vs. a commercial software endeavor. As far as I can tell, this is the only point we actually disagree on.

Anton: One of the most rational things a business can do in this scenario is wait, and to some extent, the same applies to the developers they employ. Unfortunately, that's going to be one helluva long wait, as long as there's so little focus on bridging the gap between beautiful formal systems that solve tough theoretical problems, and making them do something useful as part of other real systems.

I'm not going to refer to the bodies of work in which integration with real systems is the focus, or in which the tough problems aren't just theoretical, again; it's getting wearisome. I will only note that I find this characterization at best grossly unfair; on average, obtuse; and at worst, dishonest.

Anton: But I'm saying there are lots of good uses for type systems, and there are lots of benefits from checking of types, and it is important to educate oneself about type theory. But you have to distinguish between the benefits that can be obtained now, and the benefits we expect these systems to provide at some indeterminate point in the future. Confusing those two is only going to lead to disillusionment - I don't think we want to be tricking people into using languages with advanced typesystems. You earlier said something about honesty in the debate with DT folk - but be careful that in your enthusiasm, they don't end up feeling the same way about you.

OK, well, I'm done. I've stated explicitly when I was being speculative, I've offered concrete examples of validating C and socket APIs, I've pointed to the use of phantom types to make unsafe C APIs safe, I've pointed to region-based memory management to eliminate the need for garbage collection in embedded/real-time systems, I've pointed to streaming MP3 servers just to offer something really mind-numbingly prosaic and non-theoretical, and I've reiterated multiple times that I like untyped languages (I still intend to become fluent in Oz) and that sociological issues not only matter in language selection, they dominate given the current state of implementation. I know neither how to be more honest nor more clear.

Namaste

I have to imagine that you realize that when folks like Frank and I say "expressive," we are using the term formally whereas your use of "more expressive" here is as a synonym for "more concise,"

Which formal sense of expressivity are you referring to? Perhaps the Felleisen sense, described in "On the Expressive Power of Programming Languages". If so, since you say you're using the term formally, you presumably have some kind of reasoning for why you think my use of the term doesn't qualify? Could you outline that?

For my part, I can show the sense in which dynamically-checked languages are more expressive. At a minimum, this will show why Frank's Univ argument is overly narrow, doesn't model the entire picture, and thus ends up being fallacious. Whether that explanation will rise specifically to the level of greater Felleisen-expressivity, I'll have to examine - however, it will at least exhibit some of the symptoms thereof, such as requiring "repeated occurrences of programming patterns" in the statically-checked language.

However, I don't have time to lay out this argument now. I'll try to do so on Wednesday. As a clue, though, my position is inherent in the first paragraph of my original post in this thread: "...even in the most dynamic language, you always know something about the types of the values you're operating on." IOW, "untyped" is not as untyped as you've been led to believe, and you have to take that into account when comparing languages. The Univ example doesn't.

As for the rest of your post, I find it disappointing that we've gone from your earlier post "100 Proof vs. 100% Proven :-)", where we were in substantial agreement - I hadn't gotten around to responding to that yet, since it was so long - to your latest post, which is rather less constructive.

Part of the problem seems to have been that your original response to my response to Frank is based on your misinterpretation of the applicability of the Univ example. Until we've cleared that up, we can't really get into the advantages of dynamic languages. Because of that issue, the six-point summary in your latest post goes wrong starting with point 1 (to be demonstrated), and is also wrong in points 4 (because syntax is not the point), 5 (ditto), and 6 (are you familiar with soft typing systems at all?) But I'll deal with that later.

I'm very curious about one point:
Basically I keep seeing a chicken-and-egg problem: folks in industry keep saying that they want static typing proven in the real world; static type theorists keep delivering more and more practical results such as the work on C and UDP or phantom types providing type-safe wrappers around type-unsafe C APIs; we keep seeing adoption refusal on the industrial side, typically with a lot of verbiage about how the issues are technical when in fact they're sociological; and the project failure rate in software development doesn't budge an iota. At some point the type theorist begins to wonder how serious the untyped language apologists are about wanting improvement. At least I do.

OK, so can you explain why you wrote earlier, "To most of the team, the idea of changing languages would be laughable, but quite honestly I can't tell you how much better off we'd be developing in Erlang or Oz or Alice. And yes, I'm mindful of the fact that Erlang and Oz are dynamically checked."

Why are you suggesting dynamically checked languages in this context, given all the benefits of static proofs that you've been enumerating, and your stated preference for OCaml?

Finally, I guess I have to deal with this:

My question is: are you willing to learn how not to write Univ-based code and see if it's advantageous to you? The problem with that is that it might be a non-trivial time commitment to do so. But since we're on the subject, keep in mind that I'm not a CS or PLT grad student; I'm a guy who writes C++ and Java for a living and wrote all of his recreational stuff in Common Lisp or Scheme for well over a decade; I'm very familiar with the benefits of untyped languages. My perspective is very far from academic. I went through the very process that I'm asking if you're willing to.

I have gone through that process. I don't like to characterize myself based on what languages I develop in, since I've used quite a few throughout my career, both statically- and dynamically-checked. I'm not a recent convert to either typing discipline. C++ and Java have been prominent in my professional life in recent years - I contributed early on to the transaction API of Hibernate, which I noticed you mentioned a while ago.

Recreationally, I prefer Scheme, mainly because no-one's invented my ideal language yet, and in Scheme it's easy to create languages. I've also developed code professionally in both ML and Scheme. I've implemented various interpreters in ML, including a fairly complete Scheme interpreter, based loosely on the Scheme denotational semantics (which I've also implemented in Scheme). I learned enough OCaml to be able to work with the examples in Pierce's "Types and Programming Languages".

I don't divide the languages up the way you seem to, though: to me, the similarities between Scheme, ML, and all the applied lambda calculi seem much greater than their differences, and I find the type checking issue to be superficial, in a sense. That's related to the point I've been arguing - more on that later.

Metacircularity

Kevin Millikin: See? Staticly typed code in Scheme is no more inconvenient than untyped code in ML. So obviously (with tongue firmly in cheek), staticly typed languages offer no benefits except a convenient syntax for restricted code generation and some builtin checking algorithms.

Even with tongue in cheek, this is an important reminder: metacircularity works! I see an analogue here, which I cannot at the moment make precise in any meaningful way, to the materials I've seen on multi-stage programming. But you're obviously correct to point out that delaying evaluation with lambda is a direct analogue to the difference between "compile-time" and "runtime," and in fact in the presence of, e.g. serializable closures, this notion could be made to map onto user-time in exactly the same fashion that a compiler does.

Following this through to its more formal conclusion, this is really a recapitulation of the study of partial evaluation and specifically Futamura's second projection. But as a Scheme-based language-implementation fan, it's overwhelmingly likely that you know this. :-)

Offline partial evaluation by hand

I see an analogue here, which I cannot at the moment make precise in any meaningful way, to the materials I've seen on multi-stage programming.

I was going to title that post "Lambda: the Ultimate Staging Annotation?", but i usually forget to title my posts.

Re: Point of static proofs?

Daniel Yokomizo wrote:
IME one of the most important sources of problems in large projects (more than 4 people for more than 3 months) is communicating the contracts of programs.
...
This kind of proof is, IMO, quite underrated IMO: we usually forget how many hours we spent getting burned by this kind of problem and the effort used to build the discipline necessary to avoid them.
I agree. This was one of the two benefits "for most systems" of static proofs that I mentioned in my original "Point of static proofs" post: "Better, machine-checkable documentation and enforcement for interfaces between modules." I completely agree that "we usually forget how many hours we spent getting burned by this kind of problem". The same goes for how many hours are spent finding bugs by running tests instead of just running the compiler. So that's a solid, widely applicable benefit of static proofs.
There are cases where the effort required to correctly prevent the [security] violation at runtime is much harder than the statically inferred properties. For example assume program_b has a tighter policy than program_a:
That's an interesting point, thanks.
Hey, good idea! Except if you move type checking from compile time to runtime, you lose your static proof capability.
That's not true. If we use a staging system we can ensure that every program constructed in stage N is type-safe before executing it in stage N + K.
Another good point. I was just pointing out (a little indirectly) that Paul might be implicitly acknowledging that e.g. dynamic behavior can be traded off against static proof.
...The answer, I think, is not enough to rise to the level of "must have". ...
Well the proposition here is that some kind of statically verifiable behavior is better than none. In DTLs we have X possible classes of errors and in STLs we have Y possible classes of errors where 0 < Y < X. So STLs provide value even if some of the errors in DTLs are still possible.

But "some kind of statically verifiable behavior" may not compensate for what the dynamic language user perceives as being lost. When a major change is being contemplated, or even just when a tradeoff is involved, it's easier to pick a winner if the scale tips heavily in one direction or the other. It certainly seems as though statically-checked languages may fit that criteria in future; but I'm saying that right now, that choice isn't as clear to everyone.

In addition, I've mentioned my concern about the narrow focus that tends to be introduced with 0 < Y < X, where it's tempting to think that the goal should be to minimize Y. For some kinds of system, that makes sense. I don't think it does for every system, though, and I believe that in some future well-designed type systems, Y might be a lot bigger than many current static-checking advocates would be comfortable with.

Power Cycle; Start Over

Anton van Straaten: I was just pointing out (a little indirectly) that Paul might be implicitly acknowledging that e.g. dynamic behavior can be traded off against static proof.

That's absolutely correct. I think the way that I tried to frame this earlier, albeit again in an unnecessarily impatient and inflammatory way, was to observe that if there were no necessary dynamic behavior of a program, you'd simply get the final answer from your compiler, like in those factorial-as-C++-metatemplate programs that are floating around out there.

Anton: But "some kind of statically verifiable behavior" may not compensate for what the dynamic language user perceives as being lost.

This is another great point that I wish to ensure readers know that I recognize as valid. In rereading much of what I've written, it seems to me that I've allowed myself to overlook at least one crucial point on the dynamic language side, which is: the programmer knows what he means. With his dynamic language, he's been able to express what he means sufficiently well that his program works to the extent he's able to observe it in a sufficient number of cases. None of the dynamic language guys have ever claimed to be able to observe all cases. They've all agreed on the need for extensive testing and agility in revision. Conversely, sometimes when we try to express what we mean using a statically-typed language, we either get sometimes-arcane error messages and/or we find that we have to annotate the code with a sometimes-arcane type name, and in those cases the type and its name don't necessarily obviously map to the problem domain. These are true things that I had not intended to imply were not the case.

Booting up

...the programmer knows what he means. With his dynamic language, he's been able to express what he means sufficiently well that his program works to the extent he's able to observe it in a sufficient number of cases.

Absolutely. Dynamic languages give you the freedom to write code in which the type system tail doesn't have to wag the semantic dog.

In the statically-checked functional languages, the base model for writing code is a sort of "type-oriented programming (TOP)" style, which is analogous to OOP. If you want to write all your code in a single strictly-imposed style, both OOP and TOP are reasonable choices (other things being equal, which of course they never are). BTW, in this context, OCaml's appeal is obvious, since it does a reasonable job of combining the two models, so would seem to be a best of both worlds.

But both OOP and TOP are fundamentally constraining, although you get benefits in exchange for the constraints. Earlier, Frank entitled a post "no constraints", but he doesn't really want that for TOP, and nor do I. I want TOP, more or less as currently formulated (but better :), available at the end of the typing spectrum. But it's not necessary, and for me not desirable, to operate at one end of that spectrum all the time.

The point about being "able to express what he means sufficiently well that his program works" touches on the human factor that type theory can't possibly account for. What it comes down to is that mental type inference is actually a pretty easy process for humans, in those "most cases" I've been talking about. Dynamically-checked programs are syntactically typed, they're just not statically checked by machine. They have to be syntactically typed, to be able to write useful code. We have to know something about the types of the values we're operating on, in almost every situation.

The "Univ" trick can't account for our mental model of these programs, because it misses the fact that for all terms in an allegedly "untyped" program, the programmer has some assumptions about types, and those assumptions are borne out by the fact that the program works. That's strong evidence for the correctness of the program's assumed type scheme - not as strong as static proof, but strong enough for most practical purposes.

I've started writing a more thorough debunking of the Univ type meme (which is quite prevalent in the syntactic typing community) - I'll probably post that tomorrow.

Can't we have fully static proofs even with dynamic behavior?

[Anton:] ...dynamic behavior can be traded off against static proof.
[Paul:] ...observe that if there were no necessary dynamic behavior of a program, you'd simply get the final answer from your compiler, like in those factorial-as-C++-metatemplate programs that are floating around out there.

Can't you have a function whose properties are all statically provable but still have its dynamic behaviour be useful?

For example, even if you can prove that a sorting function sorts correctly, it is still not feasible for the compiler to convert it into a lookup table with all input/output pairs.

Proofs at Different Times

Kannan Goundan: Can't you have a function whose properties are all statically provable but still have its dynamic behaviour be useful?

Absolutely, with VLISP: A Verified Implementation of Scheme as an existence proof. So it's interesting—and I think this is a big part of what Anton has been trying to convey, and I've been slow to understand—to contemplate the difference between the kinds of proofs that can be "encapsulated," if you will, in types and compile-time type checking vs. the kinds of proofs employed in something like VLISP, where you've proven that your language's semantics have the properties it claims to have, albeit the majority of those semantics in Scheme's case are, in fact, dynamic.

It's interesting to reflect on this; just earlier today I called out Scheme as an obvious counterexample to my general claim that dynamic language lack rigor. Maybe what I really mean is that the popular scripting languages (Python, Perl, TCL...) lack rigor. On the other hand, as Abadi and Cardelli demonstrated so well with their Object Calculus, sometimes the practice comes first and the rigor later.

Kannan: For example, even if you can prove that a sorting function sorts correctly, it is still not feasible for the compiler to convert it into a lookup table with all input/output pairs.

Unless, of course, all of the parameters to the sorting function are known at compile time. But your example is a popular one in dependent-typing circles, where the discussion of a "sorted" type that expresses the order invariant of the values is feasible. Of course, one can make the observation that dependent type systems blur the distinction between compile-time and runtime, and that's true. On the other hand, dependently-typed expressions either type-check or they don't, and that still happens at compile-time, AFAIK (little help here, guys: is this true for Cayenne? Epigram? DML? Xanadu)? This is what leads some people to feel that these type systems are too conservative: they're expressive, but some number of perfectly legitimate programs won't type-check in them. The type theorists' counter to this is that rejecting some legitimate programs is preferable to accepting an (apparently) unbounded number of illegitimate ones.

Incompleteness of typing (plus Zelazny, Egyptian gods)

Paul Snively: This is what leads some people to feel that these type systems are too conservative: they're expressive, but some number of perfectly legitimate programs won't type-check in them. The type theorists' counter to this is that rejecting some legitimate programs is preferable to accepting an (apparently) unbounded number of illegitimate ones.

[bold emphasis mine - F]

There is a theorem to the effect that any sound, static type system is necessarily incomplete (proved roughly along the lines of the usual incompleteness results, I suppose). In other words, any type system (which actually works) must reject some programs at compile time which would nevertheless "run OK" (i.e., not "go wrong") at run time.

This seems to means that all static type systems are too conservative in your sense.

Personally, I've always taken this in a positive light, referring to it as the Type System Designers' Job Security Theorem*: it seems to imply that there could always be a better static type system than any one you have now. (Steve Ganz from IU tried to dispute this implication with me, but I think it's true in at least this trivial sense: for any witness program P you exhibit that fails to pass type-checking but still runs OK, I can just add an ad-hoc axiom to my system that says P checks OK. Of course it would be nice to think that I could find a more general principle that covers P to add to the system, but I think that's where the human creativity comes in, and I doubt there's any way to prove a general result.)

[* I suppose the original theorem should have a different name, and the more positively-phrased consequence should actually be the TSDJS Corollary.]

For me, this leads to an image of an infinitely-detailed ragged (fractal?) fringe at the boundary of typability: you can extend it indefinitely, but you can never get the whole enchilada (semantic completeness here being metaphorically Mexican food :) ).

Going further out on a limb (but bear with me, there's a point), it also reminds me of the never-ending dance between the gods Set and Thoth in Roger Zelazny's "Creatures of Light and Darkness". Set and Thoth are, as in the original mythology, mutually-recursively father and son to each other (it makes a cute little classroom demonstration to express this in Prolog and show that, for many purposes, it works just fine). In any case, Zelazny has Thoth sowing monsters before Set (the Destroyer) throughout eternity, as distractions from his otherwise imminent self-destruction. (There are other cute metaphoric possibilities: a bound god known as the Nameless One, presumably a de Bruijn variable :) , a personified black hole/Abyss reminiscent of the semantic bottom, etc.)

Anyway, the promised point here is this eternal back-and-forth dance also rather smacks of game theory, which undoubtedly provides other useful perspectives on the whole typing issue. (Well, OK, I hope also to tempt you to read the book, but only because you might enjoy it. Plus, I have the impression that some people around here could use a couple of hours away from the LtU discussion boards :) .)

Rejecting programs is important too

There is a theorem to the effect that any sound, static type system is necessarily incomplete (proved roughly along the lines of the usual incompleteness results, I suppose). In other words, any type system (which actually works) must reject some programs at compile time which would nevertheless "run OK" (i.e., not "go wrong") at run time.

That's true, but it's a desirable feature. There are programs that won't technically "go wrong" but which I don't want to type-check. That's because the more flexible the type system the more programs can be given types -- even erroneous programs. A type system rejecting a program can be a feature -- rejecting a class of programs whose members are erroneous 95% of the time and only correct 5% of the time will actually speed up development in the typical case.

This isn't idle theory, either. As a concrete example, I don't want a language to accept programs which require arbitrary equi-recursive types to type-check. They almost always indicates an erroneous program which "accidentally" has a valid type assignment. Adding support for this makes the type system more expressive in a theoretical sense, but as a practical matter it makes the language less usable because errors get detected later.

An equi-recursive type is a type like this:

type int_stream = int * int_stream

That is, one in which the type itself is part of the type expression that describes it.

ML and Haskell have "iso-recursive" types, in which a recursive instance of a type can occur only behind a data constructor. So you would have to write

type int_stream = Stream of int * int_stream
to get a recursive stream.

Dynamic type systems

Dynamic type Systems are more powerful than static type systems; because static type systems reduce your ability to write code for ill-defined problems.

The more powerful the feature; the more room for hanging your self; because the more power they give you the more room abuse that power!
Therefor do not be affraid to add features that are dangerous; because they make the languige more powerful. Power = Rope

Power = Rope = Ability to

Power = Rope = Ability to tie yourself up.

Dynamic type systems are so vanilla!

[agreement snipped] But "som

[agreement snipped]

But "some kind of statically verifiable behavior" may not compensate for what the dynamic language user perceives as being lost. When a major change is being contemplated, or even just when a tradeoff is involved, it's easier to pick a winner if the scale tips heavily in one direction or the other. It certainly seems as though statically-checked languages may fit that criteria in future; but I'm saying that right now, that choice isn't as clear to everyone.

Well, I don't know the tradeoff involved in this choice, I clearly don't grok what's so great about DT. I tried very hard to understand it but every time I ask somebody to show me what is impossible to statically type and why would it be useful in a language DT enthusiasts start to ignore me.

For me the tradeoff is pretty simple: either I can have some form of static behavior or not. One of the ways static behavior is essential (IMHO) is when we are writing higher-order functions. When I'm on the bus I like to hack using LispMe on my Palm. Most of the time I get unexpected answers or infinite loops in pieces of code that would fail to type-checked in Haskell, but using LispMe I don't have a clue of why the code fails to me my expectations (other than my expectations are wrong ;-)) where in Haskell the error message (even if it is enigmatic most of the time) points me to a particular piece of code. Sometimes I get frustrated with the errors and go play Solitaire, waiting to get home and see if Hugs give me a clueful type error. The value added by this kind of static checks is much more interesting that the unit-tests that I would have to write to verify the code (I'm a big TDD fan but it annoys me to have to write code to verify something that I know would fail to type-check in Haskell if it was wrong). Static behavior checking with inference is something that reduces my effort to write and understand code and, IME, DT fails to give me an adequate substitute for that.

In addition, I've mentioned my concern about the narrow focus that tends to be introduced with 0

The problem is what kind of expectations people put in ST systems. There's a common fallacy where if we propose to replace A with B, users of A criticize B for having any failure (specially those that A has too), even if the number and magnitude of problems in B is smaller than in A. It's a fact that we have to write less unit-tests in ST languages with type-inference than we would in DT languages, because the type system rules out many classes of errors, but some people believe that if in a ST language they'll still have to write tests than it isn't worth the effort, even if the amount of testing decreases by an order of magnitude.

We won't be able to discuss the real benefits of ST vs. DT if we don't start to communicate in a common language without resorting to statements like "The type system gets in the way", "Write code to satisfy the compiler", "The industry isn't interested in rigor", etc..


[rant]
P.S.: This debate isn't impartial because most of the DT proponents never used a modern ST language and use the shortcomings of Java/C++ as inherent problems of ST. Some of the most vocal proponents of DT today (e.g. Bruce Eckel, David Thomas, Robert C. Martin) don't. OTOH most proponents of ST have some knowledge of DT languages (e.g. Scheme, Smalltalk, Python) and don't fail to acknowledge those languages when they compare the two.
[/rant]

It's all Larry & Guido's fault :-)

Well, I don't know the tradeoff involved in this choice, I clearly don't grok what's so great about DT. I tried very hard to understand it but every time I ask somebody to show me what is impossible to statically type and why would it be useful in a language DT enthusiasts start to ignore me.

I think that's a valid point, but part of the problem is that it's not a question of what's "impossible" to statically type, but what's convenient, and there's obviously an element of subjectivity there. Also, as I mentioned in one of my posts, the type-theoretic statically checked languages tend to impose a kind of type-oriented programming style, which may not always be desirable. This is also subjective, of course, but there's nevertheless a tradeoff and a choice there.

Also, some of the current benefits of dynamic languages are not in the languages themselves, but in what current realistic implementations are capable of in terms of dynamic loading and so on. As a silly example, how come you don't run Haskell on your Palm? :)

I think it'd be a good idea, as I think Dominic Fox suggested in another thread, if a good list of DT benefits were put together, perhaps alongside a list of static alternatives, and an analysis of tradeoffs. I think that would require some careful work, because many of the DT benefits are admittedly "fuzzy" ones - but that doesn't necessarily mean they're not real, it may just mean they're more difficult to formalize. The latent type system discussion is an example of this kind of thing, where halfhearted attempts at formalization are misleading.

One of the ways static behavior is essential (IMHO) is when we are writing higher-order functions.
That's a really good point. Of course, most dynamic languages (with a few exceptions like Scheme) don't make heavy use of HOFs. I don't say that as a defense, but rather as one reason they might "get away" without static typing.
When I'm on the bus I like to hack using LispMe on my Palm.
Heh, me too - in fact it's now the only reason I still keep my Palm around (it's an old Palm III).
Static behavior checking with inference is something that reduces my effort to write and understand code and, IME, DT fails to give me an adequate substitute for that.
I agree! Part of my interest in this discussion comes from a belief (supported by some evidence in the Scheme world) that a middle ground is possible.
The problem is what kind of expectations people put in ST systems.
I agree that's a potential perception problem. However, I bet if someone produced a language that allowed DT programmers to write programs the way they always have, but gave them useful feedback about possible type errors, they wouldn't complain. They might also begin adjusting their programs to take advantage of the type system.
[rant] P.S.: This debate isn't impartial because most of the DT proponents never used a modern ST language and use the shortcomings of Java/C++ as inherent problems of ST. Some of the most vocal proponents of DT today (e.g. Bruce Eckel, David Thomas, Robert C. Martin) don't. OTOH most proponents of ST have some knowledge of DT languages (e.g. Scheme, Smalltalk, Python) and don't fail to acknowledge those languages when they compare the two. [/rant]
I think that's valid too. My original post, which started this thread, was aimed at arguing the benefits of having some knowledge of type theory and type systems, independent of statically vs. dynamically checked. But this relates to the previous point: as long as DT languages ignore their type systems, programmers who only have experience in DT are going to be at a disadvantage. In a sense, that's the "fault" of DT language designers, for paying so little attention to the type system aspect of their languages.

I agree that's a potential

I agree that's a potential perception problem. However, I bet if someone produced a language that allowed DT programmers to write programs the way they always have, but gave them useful feedback about possible type errors, they wouldn't complain. They might also begin adjusting their programs to take advantage of the type system.

I have a self-quotation which supports this, from my gung-ho Python days, when I was beginning to get interested in functional programming:

Type inference (to my mind at least) fits the Python mindset very
well. I think most Python programmers would be glad to have strong typing,
so long as they don't have to press more keys to get it. If you have to
declare all your types up front, it just means more time spent changing type
declarations as the design evolves, but if the compiler can just ensure your
usage is consistent, that's hard to argue with.

I say we shall have no mo' "latent types"...

Those that are "latent" already—all but one—shall abide,
The rest shall keep as they are. To /dev/null, go!
Also, some of the current benefits of dynamic languages are not in the languages themselves, but in what current realistic implementations are capable of in terms of dynamic loading and so on.

You are conflating dynamic loading of untyped code with dynamic loading of typed code. They are not the same thing, and untyped languages only support the first, and typed languages can easily accomodate dynamic loading of Univ code.

I find it very irritating that people continually bias their arguments for untyped languages in this fashion, as if typed languages need to "catch up" or something.

But this relates to the previous point: as long as DT languages ignore their type systems

Untyped languages don't have type systems.

This is something else I find irritating, Anton. I spent many, many hours explaining exactly why and in what sense this is true, and I thought at last that people had started to grasp what a type is, what it is not, and what the relationship between typed and untyped languages is.

Now you are diluting my message with your talk of "latent type systems." As far as I can tell, your notion of "latent type" only amounts to program properties which might be extractable with some blue-sky theorem prover, or more realistically a proof assistant being manually operated by a type theorist. In particular, as a notion of "type system", latent type systems fail the "tractable" and "syntactic" conditions which you yourself quoted from Pierce.

Instead of jumping on the type bandwagon, why not call your "latent types" "program properties"? Or, if you think they are different, please demonstrate how.

Response in part III

Since this thread is getting long, and since this question re latent types is a topic in its own right, I've replied here.

Dynamic type systems

Dynamic type systems allow you to have more fredom but the price of fredom is eternal vidulance; i use haskell i find that type system to be while nice; it is harder to debug programs with type errors because i cant figure out why there is a type error in the first place.

End of nonsense?

OK, here goes. I needed to step back and think things over before I made my final reply.


(recap) A couple of days ago, I made a rather misguided attempt at setting Frank Atanassow straight. That posting didn't go down well with the resident debaters (which in retrospect I fully understand). Some people took me to task on issues like the origins and contents of type theory and the relative merits of dynamic versus static typing, which confused me since I was quite unprepared for having those issues brought up. (I am attempting to abbreviate here, not to misrepresent. The original postings are still available for the reader who wishes to study the complete argument.)


It all began (for me) when I saw mr Atanassow make this statement:

I've said it before and I will say it again, until people stop repeating the party line and start using their brains.


This statement has a context (see the entry "No contraints" at <URL: http://lambda-the-ultimate.org/node/view/100>) but let me for now skip that, because it was the actual wording that dismayed me and made me reply. When someone says something like this, I read it approximately as follows:

No one takes [what I say] seriously, but I'm sure this is just because people just accept disinformation instead of thinking for themselves.


Again, this isn't what he wrote, just what I read. I've come to associate statements like this with creationists, orgonomists, etc, and sometimes I feel compelled to respond, with the intention of making the other person question his/her own line of thinking ever so little. Off-hand, I can't think of any occasion outside face-to-face discussion where this has actually worked, but still I'd say this is a fairly well-intentioned thing to do. It does, however, help very much if one doesn't forget one crucial element: when you set out to enlighten someone, you really need to be focussed enough to actually be at least bright yourself. Put another way, any sufficiently unfocussed line of questioning is indistinguishable from idle bitching (and will be treated like such).


(My fondness for overloading seriously-intentioned arguments with jokes and attempts at profoundness using polysyllables and awkward phrase structure in a non-first language doesn't help much either, as I think this paragraph proves.)


So, I sincerely apologize for the unnecessary fuss I've caused. I meant well, but I did badly. I will continue to read LtU with interest, but I will do my best to avoid stepping into discussions and being a bumbling old fool in this manner in the future.


Brief clarifications and retractions


(I wish to thank Paul Snively and Anton van Straaten for patient and informational replies. The following is not as much an attempt to be argumentative as to avoid leaving some of their most poignant questions to me unanswered.)


(From "Russell supports the modern type theorist's case") The whole point of Russell's work was to be able to reject formulae which were considered meaningless, based on the types of their terms. This is exactly analogous to rejecting erroneous programs based on the types of their terms, which is what syntactic type theory does.


Yes. I meant to say that Russell's work does not depend on the specifics of modern syntactic type theory or the discipline of static type checking of program text (for chronological reasons, if nothing else). As far as I can tell, modern syntactic type theory is indeed fully supported by Russell.


(Ibid.) Now, you might say that this is all very well, but there still should be some way to apply Russell-like type analysis to runtime values.


(Squirms embarrassedly) Yes, I might actually say that, but the only support I can cite for this notion is my own squishy brain.


(Ibid.) Even if Russell's work didn't inherently fail to support your case, it would still fail on the basis that it doesn't deal with programming languages


The fact that it doesn't deal with programming languages was rather my point in that case, I'm afraid.


In the entry "Striving for Clarity", Paul Snively asked which paper by Benjamin Pierce it was that "had me laughing out loud". The paper was "Dynamic Typing in a Statically Typed Language" (<URL http://www.cis.upenn.edu/~bcpierce/papers/dynamic.ps>). I have tried, and failed, to put down an explanation for my laughter that does not become argumentative, and probably only will render me a new helping of well-deserved crow. Just one more careless and unnecessarily inflammatory comment, I fear.


("Striving for Clarity") I have to confess that I find this vexing; it isn't a "branch" of type theory; it's type theory.


I'm sorry, I did not mean to be rude. Throughout this sorry debate, you can see me wildly flailing for the right way to express my meaning. I suppose I was trying to catch the sense of something that is now (a "branch") coming from something that was then (a "root"), and that the branch isn't strictly equal to the root.


(Ibid.) Types are all about knowing at compile time that certain classes of things are not possible at runtime. The assertion that you've been making without any support is that this is of "limited value." "Limited relative to what?" is the question that I should have asked from the outset, because I can keep enumerating values until the cows come home.


Limited mostly in the sense that those classes of things that are made impossible do not include all possible problems in non-trivial programs (I know it's a dodgy term, I just want to make the admission that programs exist where all possible problems are eliminated by static type checking).


(Ibid.) Oh, nice Dowdism, Peter.


Actually no, just a bad case of underquoting combined with a confused attempt to get back to what I saw as the main point. Apologies again.


(Ibid.) your assertion is isomorphic to claiming at least one of the following:

  1. Standard ML, Haskell, or O'Caml's type systems make no more guarantees than Pascal's.
  2. The guarantees that SML, Haskell, or O'Caml's type systems make are unenforceable.
I'm curious as to which one you meant, or both.


Mostly (1). In my understanding, the guarantee that static type checking makes is that it will not allow type errors to pass. There are still "bugs" that aren't type errors, or that are undetectable as type errors by the type checker in question. Modern ST languages do have more sophisticated type checkers that catch errors much more effectively and conveniently than older ones such as Pascal's -- I think it's even fair to say that they are fullfilling the guarantees that Pascal made. Still, as I see it, essentially they make no more guarantees.


(Ibid.) How can it possibly be detrimental to say "I want to prove before runtime that certain bad things can't happen?"


That's not detrimental, that's prudent. Saying "If the compiler accepts it, no bad things can happen at runtime" is detrimental. I know you aren't making this kind of statement, but some people do.


I hope I haven't omitted any points I ought to have addressed here. I'll sit back and stay quiet now.

The LtU ethos again...

I meant well, but I did badly. I will continue to read LtU with interest, but I will do my best to avoid stepping into discussions and being a bumbling old fool in this manner in the future.

Discussions about typing are usually more aggressive than the norm around here. As you might have noticed, I tried to stay out of the discussion myself...

I hope you will continue to post. I think your contribution was interesting and useful.

As a general rule, we strive for meaningful and rigorous discussions. I think that as far as ST vs. DT discussions go, we are still much better than the competition. But since these discussions are never ending (as the thread demonstrated so well) people tend to be more aggressive than perhaps they should.

Whew.

I'm not going to say anything here other than to attempt to humbly apologize to Peter and Anton. I'm afraid that Anton's right—I'd lapsed from finding some very constructive common ground and room for discussion to mere pedantry and perhaps even error. At the very least I had begun to ascribe bad faith to my interlocutors, and in retrospect I find that unconscionable.

So I'm very sorry, very glad to see Peter post again, and beg Peter and Anton's forgiveness and forebearance.

Re: whew

Apology accepted, thanks. Really, we all know how these things can go, and for my part I apologize for anything I said that was inflammatory, and for not being as clear about what I was getting at, or where I was coming from, as I could have been.

Haskell and typing

I've found the discussion in this thread very interesting and its made me a lot more curious about (static) typing in general. But I'm a typing newbie and I couldn't quite figure out what was possible now, and what was going to be available in the future. So here's a question I'm hoping someone could help me with. Is there a way in Haskell to create a type which represents the even integers? In the code below, is there something I can replace the ??? with to cause a compile-time type error (i.e. the 3 isn't even)?

data Even = ???
f :: Integer -> Even
f x = 3

re: Haskell and typing

In Haskell the LHS of a data definition defines a new type, while the RHS declares which are the possible constructors of that type. The constructors aren't subtypes of the type declared. So when you declare a Even data you should provide constructors that create such values. Let's create a constructor that wraps an integer:


data Even = MkEven Integer
f :: Integer -> Even
f x = MkEven 3

That doesn't create a static error as we would liked, because the constructor accepts any kind of Integer. We can't write down an invariant of what possible values should be used in MkEven because in Haskell type system ins't dependently typed (i.e. we can't use expressions with terms (e.g. x > 3) and treat them as types). So the correct way of doing so is hiding the constructor and provide a way to transform an Integer into a Even value and a way to get an Integer out of a Even:


-- MkEven shouldn't be exported out of this module
-- Also we could use newtype instead of data
data Even = MkEven Integer deriving (Show, Eq)

makeEven :: Integer -> Maybe Even
makeEven x | x `rem` 2 == 0 = Just (MkEven x)
           | otherwise      = Nothing
unEven :: Even -> Integer
unEven (MkEven x) = x

-- Useful primitives
times :: Even -> Integer -> Even
times (MkEven x) n = MkEven (x * n)

zero :: Even
zero = MkEven 0

-- in other module

f :: Integer -> Even
f x = case makeEven x of
          Just n -> n `times` x
          Nothing -> zero

The idea is to create a type that encode the invariants, some primitives to use it and them use the primitives. This particular example doesn't use any advanced feature in the Haskell type-system (e.g. polymorphic types) but the general idea is that.

Not that same thing, is it?

The idea is to create a type that encode the invariants, some primitives to use it and them use the primitives. This particular example doesn't use any advanced feature in the Haskell type-system (e.g. polymorphic types) but the general idea is that.

I agree that this is the idiomatic way to write it in Haskell, but let's be clear that it also doesn't provide the static check that he asked for. The short answer AFAIK is that there is no easy way to accomplish this task statically in Haskell. I'm confident that there is a complicated, awkward way which is probably also a lot of fun (Oleg?), but as someone (Paul?) pointed out, it ends up being a lot like template metaprogramming tricks in C++: it's cool that you can do it, but you shouldn't have to... ;)

Haskell typing

After some more searching, I found two good papers discussing the possibilities for advanced (dependent-like?) typing in Haskell.
Faking It and
Fun with Functional Dependencies. Here's the money quote from _Faking It_...

"...we are forced to choose whether data belongs at compile-time in types, or at run-time in terms. The barrier represented by :: has not been broken, nor is it likely to be in the near future."
So it would seem like I couldn't get the desired type error in the code from above.

Double-colon barrier

Greg Buchholz quoting Conor McBride from `Faking It'

"...we are forced to choose whether data belongs at compile-time in
types, or at run-time in terms. The barrier represented by :: has
not been broken, nor is it likely to be in the near future."

One should mention a matching quote from `Implicit Configurations',
joint work with Chung-chieh Shan:

McBride mentions that, with all the tricks, the programmer still
must decide if data belong in compile-time types or
run-time terms. ``The barrier represented by :: has not been
broken, nor is it likely to be in the near future.''
If our |reflect| and especially |reify| functions have not
broken the barrier, they at least dug a tunnel underneath.

Implicit Configurations

And here's the link to Lambda's discussion of that paper.

Even Number Type

After two years, here's something close to the answer I was originally looking for...
{-# OPTIONS -fglasgow-exts #-}
{-# OPTIONS -fallow-undecidable-instances #-}
{- 
    Here's how to create an "even number" type and get GHC to complain
    about mismatches.  The key is that we need to let know the compiler
    know about the values we want to operate on.  This require us to 
    encode a value as a type.
-}

module Even where

type Three = Succ (Succ (Succ Zero))
three = type2val (undefined::Three)

-- f :: Integer -> Even
-- f x = three

{- for the function "f" above, GHCi reports a type error...

    Couldn't match `Odd' against `Even'
      Expected type: Odd
      Inferred type: Even
-}

-- Some further examples...

zero = type2val (undefined::Zero)
one  = type2val (undefined::Succ Zero)
two  = type2val (undefined::Succ (Succ Zero))

main = do putStr $ "1 + 2 = "
          putStrLn.show $ one +. two  
          putStr $ "2 * 3 = "
          putStrLn.show $ two *. three

-- and the implementation...
newtype Even = E Integer deriving (Eq,Show)
newtype Odd  = O Integer deriving (Eq,Show)

infixl 5 +.
infixl 6 *.

class Add a b c | a b -> c where (+.) :: a -> b -> c
                                 (-.) :: a -> b -> c
class Mul a b c | a b -> c where (*.) :: a -> b -> c

instance Add Even Even Even where (E a) +. (E b) = E (a+b)
                                  (E a) -. (E b) = E (a-b)
instance Mul Even Even Even where (E a) *. (E b) = E (a*b)

instance Add Even Odd  Odd  where (E a) +. (O b) = O (a+b)
                                  (E a) -. (O b) = O (a-b)
instance Mul Even Odd  Even where (E a) *. (O b) = E (a*b)

instance Add Odd  Even Odd  where (O a) +. (E b) = O (a+b)
                                  (O a) -. (E b) = O (a-b)
instance Mul Odd  Even Even where (O a) *. (E b) = E (a*b)

instance Add Odd  Odd  Even where (O a) +. (O b) = E (a+b)
                                  (O a) -. (O b) = E (a-b)
instance Mul Odd  Odd  Odd  where (O a) *. (O b) = O (a*b)

-- dynamically convert to an Even/Odd from an integer
class From a where from :: Integer -> a

instance From Even where 
    from x | even x = E x
           | otherwise = error $ (show x) ++ " isn't even"

instance From Odd where 
    from x | odd x = O x
           | otherwise = error $ (show x) ++ " isn't odd"

-- Type numerals...
data Zero
data Succ a

-- statically convert to an Even/Odd from a type
class Type2Val a b | a -> b where
    type2val :: a -> b
instance Type2Val Zero Even where
    type2val _ = E 0
instance (Type2Val a c, Add Odd c b) => Type2Val (Succ a) b where
    type2val _ = (O 1) +. type2val (undefined::a)

Re: Even number type

Greg Buchholz wrote

newtype Even = E Integer deriving (Eq,Show)
newtype Odd  = O Integer deriving (Eq,Show)

Yes, this is the idea of using abstract data types to enforce
invariants. It dates back to Robin Milner and his LCF theorem
prover. It's the idea that begat ML (after all, you need a type system
to statically enforce that, for example, a function accepting Even
won't accept anything else).

Incidentally, you may want to make Even an instance of Num, and so can
take advantage of `fromInteger' conversions (of course, (fromInteger
1)::Even) should throw an exception. Also, for projection (aka dynamic
conversion), you might want to define
toEO:: Integer -> Either Even Odd

Integer \cong Even

There is no need to hide the constructor if you use a different model.

It is obvious how to represent the even integers as integers if you've ever seen two's complement, which represents an integer as a natural; just take Even = Integer, using the (inverse of the) iso which assigns integer n to the even integer 2*n.

... -3 -2 -1  0  1  2  3 ...
... -6 -4 -2  0  2  4  6 ...

Clearly this is a one-to-one mapping. The idea is that, instead of MkEven n denoting n, it denotes n * 2.

So then it is perfectly sensible to export the data constructor MkEven, since it just denotes the function (* 2) :: Int -> Even. By hiding it you are only forcing the user to redefine it if they need it. If you think about this you will see why hiding constructors is generally a poor way to do data abstraction.

Of course, you still need makeEven and unEven, but you could call them half and twice or something.

Any enumerable then?

Ah, so if we want to represent, say, prime numbers, we just use (MkPrime n), which means nth prime number? By the same token, we can represent any enumerable set... Hmm...

Pretty powerful, but wouldn't this affect complexity of algorithms without programmer even noticing? "Casting" natural to prime to natural will have non-constant complexity...

I agree. I thought about doin

I agree. I thought about doing this but I opted to show a solution that could be used in problems other than Even/Odd.

Could static algebraic expressions be part of types?

A fictional programming language that based its typing system on static algebraic expressions would allow the following declaration concerning even numbers:

type even n : integer where n mod 2 == 0

The above would be translated by the compiler as 'any number that, when divided by 2, gives 0 modulo'.

The above could be used for algorithms that expect even numbers. A compiler could easily infer when such a constraint (i.e. even number) is violated and report an error. For example, let's see the following program:

#1 let foo(n : even) = ...
#2 let n = input_integer
#3 foo(n)

In line #1, a function 'foo' is defined which accepts only even numbers. In line #2, the user enters a number. The type of the function 'input_integer' is 'integer'. In line #3, the function foo is called with parameter the input number from the previous step. A compiler could easily infer that line #3 contains an error, because integer numbers can have modulo 0 or 2, when divided by 2. Then it could report to the programmer:

line 3 error: integer n does not honour even constraint 'n mod 2 == 0'
So the programmer can then change line #3 to:
if n mod 2 == 0 then foo(n) else ...

The compiler then checks the condition that leads to invocation of 'foo' and finds out that it matches the constraints imposed by by the 'even' type. In case the if condition is not that trivial, then the expression in the condition should curry the same static algebraic constraint.

I made this neat thing, I'll call it a wheel

Congratulations, you just reinvented dependant types :-) You might want to take a look at Epigram.

The Next Thing You Know...

... someone will decide that it'd be useful sometimes to accept a type based solely upon whether it supports the requested operation or not, apart from whether it has a declared relationship to the entity defining the operation. Let's call that "Duck Typing."

Not a bad idea

But I'd call it a type class.

More Than What is Required

It's actually just plain ol' structural typing, with or without type classes. But it's another feature that's been around since the 1970s that certain folks in the scripting/latent typing community seem to think sprung fully-formed from Guido or Matz's brow. :-)

perhaps you mean structural S

perhaps you mean structural SUBtyping?

Nonsense!

Guido and/or Matz didn't invent structural subtyping.

Alan Kay did; Python and Ruby just inherited it from Smalltalk. :)

And like the poster above points out, you mean structural subtyping.

I Was Speaking With Tongue Firmly in Cheek...

...as I'd hoped to convey with the smiley.

With that said, however, I'm afraid that you and the previous poster are both incorrect: I meant "structural typing," vs. "nominal typing," and in the context of static typing (vs. the latent typing of any member of the Smalltalk family), "structural" vs. "nominal" typing has a well-defined meaning that dates at least to ML's definition in the mid-1970s.

However, even if it were true that "structural typing" derived first from the Smalltalk tradition, it wouldn't affect the point, which is that to come up with new terminology to describe it in the Python or Ruby context exhibits an unfortunate level of historical ignorance. The point of making that observation isn't to assert my superiority or ML's, but rather just to reinforce that language design doesn't happen in a vacuum, and that even if, as a designer, you make many different choices than some historical precedent did, e.g. Python and Ruby being latently-typed and ML statically typed, it's still important to realize when you are, in fact, reproducing an extant language feature, the pros and cons of that feature as it is implemented in a historical context, and how those pros or cons might change given your other design decisions.

Finally, of course, people have to learn your language, and creating new terminology for what is an extant feature can only serve to hinder that process. No one is helped by artificial divergence in language features or documentation.

FWIW

I retracted (or weakened, as you care to see it) my claim "untyped languages are inferior to typed languages in every way." See here.

Modulo anything, untyped languages are not unityped.

Responding to the above-linked post here because of thread size/depth:
The unityped (note the "i") languages are a proper subset of the typed languages, but making an untyped language into a unityped language involves several global choices: a choice for the type which plays the role of Univ, and a choice for how to translate values into type Univ; modulo those choices, an untyped language is unityped.

I disagree that "modulo those choices, an untyped language is unityped". At the end of the same post, is the statement "an untyped language is not unityped". That's much better. ;)

What is done when an untyped language is transformed into a unityped language is trivial, and it has nothing to do with the actual latent types present in programs in the language.

Untyped languages are latently typed. As I've been pointing out, a program in an untyped language does in fact contain implicit, or latent, syntactic type information -- since one can't reliably perform operations on values without knowing something about the types of the values being operated on.

Since it's usually difficult to statically identify these latent types via mechanical means, one can make a program safe for a static type-processing stage by wrapping all values in a Univ type. Other than its utility in the internals of an implementation, this doesn't achieve very much. In particular, it has no effect on the latent types that remain present in the program, other than obscuring them for the human reader of the program. What has been achieved is that a unityped language is used as a transport to carry a latently-typed program through a static stage to a dynamic stage. To extrapolate claims about this intermediate unityped language, and apply them to the entire language, is misleading at best.

The truth of the matter is that untyped programs are in fact richly typed, in the syntactic sense. Other adjectives than "rich" might include complex, sophisticated, but also often messy, or chaotic. A consequence of these rich but potentially messy latent type schemes is that they're difficult to deal with statically.

Indeed, when humans work with dynamically-checked languages, they do more than just mental static type inference in order to get the types right - they also perform dynamic simulations, whether mentally, or by running tests on a computer. This allows humans to develop complex but mostly correct latent type schemes, that do not lend themselves to the relatively simplistic static analyses that a compiler can perform.

Many translations

I disagree that "modulo those choices, an untyped language is unityped".

Strange that you disagree with this.

What is done when an untyped language is transformed into a unityped language is trivial, and it has nothing to do with the actual latent types present in programs in the language.

The point of my post was that there are many ways to do the embedding. I only showed one, and I only showed it by example. You appear to assume that it is the only way to do the embedding.

Instead of translating to Univ, for example, you could translate to an type identical to it modulo renaming, Univ'. Or you could translate to Nat (or Integer) by a Gödel encoding. Or to String; this last is particularly easy to see if you code up pretty-printing and parsing functions:

pretty :: Univ -> String
parse :: String -> Univ

Furthermore, even if you fix the target type, there are many ways to do the translation. For example, if the language does not require "type disjointness", instead of encoding an untyped integer using the Int tag, you could encode it as a Church numeral using the Fun tag. Or, you could interpose a CPS translation. Or you could do various (semantics-preserving, of course!) optimizations on the untyped program before spitting out the corresponding Univ value.

Basically the translation can be viewed as a compilation method, where you are compiling an untyped language to a typed target language. So you can do arbitrarily complex things, if you like.

Untyped languages are latently typed. As I've been pointing out, a program in an untyped language does in fact contain implicit, or latent, syntactic type information -- since one can't reliably perform operations on values without knowing something about the types of the values being operated on.

Yes, I noticed your posts had started adopting this line of reasoning, and let it pass without comment. But I don't believe it, for the reasons I just gave above: there are many ways to assign types to untyped terms, according to many distinct type systems and translations. I don't think you can say an untyped term has a type, latent or otherwise, unless you fix this "free-floating" type system, which in particular demands fixing the target type and the translation itself.

The truth of the matter is that untyped programs are in fact richly typed, in the syntactic sense.

Given my points above, I think you would have to be very generous (perhaps "biased"? :) to adopt this view. It is thoroughly non-constructive (in the technical sense), and moreover in multiple ways.

For example, from the proof-theoretic point of view, it amounts to insisting that every well-formed term (in normal form, i.e., cut-free) is a proof of some theorem, but not only do we not know the theorem, we don't even know what logic it belongs to! Maybe that's not nonsensical, but to me it has the ring of a new postulate for mathematical foundations, and I am not much interested in such drastic measures.

The unbearable usefulness of multiple unknown logics

I disagree that "modulo those choices, an untyped language is unityped".
Strange that you disagree with this.
I think my understanding of the term "modulo" may be different from yours. But no matter.
The point of my post was that there are many ways to do the embedding. I only showed one, and I only showed it by example. You appear to assume that it is the only way to do the embedding.
The possibility of multiple non-Univ embeddings was the point of my post, since you didn't seem to be saying anything about those possibilities.
Untyped languages are latently typed. As I've been pointing out, a program in an untyped language does in fact contain implicit, or latent, syntactic type information -- since one can't reliably perform operations on values without knowing something about the types of the values being operated on.
Yes, I noticed your posts had started adopting this line of reasoning, and let it pass without comment. But I don't believe it, for the reasons I just gave above: there are many ways to assign types to untyped terms, according to many distinct type systems and translations. I don't think you can say an untyped term has a type, latent or otherwise, unless you fix this "free-floating" type system, which in particular demands fixing the target type and the translation itself.
That's a valid concern. I'm really trying to characterize the relationship of untyped languages to their types, certainly not formalize it. How would you characterize the statically-available type-like knowledge that a programmer has about values being operated on? This knowledge can range from knowing that a value has a single type, like int or string, or a single class or interface; or a set of types/classes/interfaces; or in dynamic OO-style, you may know that the object being dealt with supports, say, a particular message or set of messages (an ad-hoc interface); plus, presumably, a number of other possibilities. I chose to refer to this as being knowledge about latent types. Certainly, the Univ type doesn't capture such knowledge at all.
The truth of the matter is that untyped programs are in fact richly typed, in the syntactic sense.

Given my points above, I think you would have to be very generous (perhaps "biased"? :) to adopt this view.

It is not my intent to be either generous or biased, but rather to try to examine some aspects of dynamic languages which, it seems to me, are usually overlooked when describing those languages from a type theory perspective.
It is thoroughly non-constructive (in the technical sense), and moreover in multiple ways.

For example, from the proof-theoretic point of view, it amounts to insisting that every well-formed term (in normal form, i.e., cut-free) is a proof of some theorem, but not only do we not know the theorem, we don't even know what logic it belongs to!

Yet amazingly, people are able to exploit these multiple unknown logics to produce useful, working systems. Such is the power of dynamic languages! ;oP
Maybe that's not nonsensical, but to me it has the ring of a new postulate for mathematical foundations, and I am not much interested in such drastic measures.

I appreciate your generosity in acknowledging the possibility of non-nonsensicality. :) Your position is quite understandable, since an adequate formalization of the issues in question has all the indications of being a quagmire with questionable returns. It is far simpler to restrict oneself to languages in which there's a single, well-defined type system, and useful results seem far more certain.

Nevertheless, I see something of interest in this static knowledge that millions of ordinary programmers use virtually unconsciously, and I think it could stand some analysis.

How to know what you know

How would you characterize the statically-available type-like knowledge that a programmer has about values being operated on?

You mean, how would I formalize it? I would formalize it as properties expressed in a first-order logic over the domain of programs.

Now you will say, ''But that's exactly a type system!" But it's not, not according to Curry-Howard at least—a type system is a proof theory, not merely a logic, and the programs are the proofs, not the semantic domain.

Furthermore, as I've said before, I don't subscribe to the view that types are merely program properties, or even provable program properties, since it assumes you can talk about typed programs without talking about types.

Yet amazingly, people are able to exploit these multiple unknown logics to produce useful, working systems. Such is the power of dynamic languages! ;oP

When people "produce useful, working systems" with untyped languages, they are not exploiting "multiple unknown logics" any more than I am exploiting Pauli's Exclusion Principle when I rest my chin on my knuckles. Otherwise, I would never have had to take that quantum physics class in college.

There is a big difference between something being true, knowing that it is true, and knowing that you know it is true. We might say something is true if it is a theorem; we know that it's true if we know it is a theorem and there exists a proof; and we know that we know it's true if we can exhibit a proof that we understand.

If the first were the same as the last, then there would be no need for science or mathematics or objectively verifiable methods of argument. Furthermore, we would not be having this discussion in the first place, since truth would be self-evident, and so we could not possibly disagree.

Nevertheless, I see something of interest in this static knowledge that millions of ordinary programmers use virtually unconsciously, and I think it could stand some analysis.

Odd how they all know it "unconsciously" yet have such difficulty grasping it when told it explicitly. :)

Unconsious type systems

Odd how they all know it "unconsciously" yet have such difficulty grasping it when told it explicitly.

This made me think what type system I use unconsciously (in javascript). There's certainly a lot of flow analisys going on. When calling a method on an object, I'm thinking "Do all the possible objects that this variable might get, support this method?"

I'm certainly not thinking "what type does this variable have".

I certainly also do a lot of dynamic checking. Not because I'm not certain if everything is correct, but because it's web programming, and a lot of data properties are optional.

The various type systems I've seen on LtU don't seem to match this programming style even remotely.

Interesting

sjoerd visscher: This made me think what type system I use unconsciously (in javascript). There's certainly a lot of flow analisys going on. When calling a method on an object, I'm thinking "Do all the possible objects that this variable might get, support this method?"

I'm certainly not thinking "what type does this variable have".

In JavaScript, it doesn't make sense to talk about "the" type of a value anyway, although it makes sense to talk about the type of a variable, so one might wonder why you don't think "what type does this variable have?" because it seems as though the variable type could answer the question "does this object support this method?" for you. :-)

I haven't done enough JavaScript programming to be sure, but this makes it sound like subtyping in JavaScript is structural rather than nominal. Is that true? In any case, if so, it's in good company: O'Caml is also structurally, rather than nominally, subtyped: calling a method on an object in O'Caml typechecks iff the object implements the method signature, regardless of who extends what.

sjoerd: I certainly also do a lot of dynamic checking. Not because I'm not certain if everything is correct, but because it's web programming, and a lot of data properties are optional.

The various type systems I've seen on LtU don't seem to match this programming style even remotely.

That's interesting. Haskell has the "Maybe" monad; O'Caml has the "option" type; Nice has ? to signify optional values; Scala has "Some" and "None" for option types. If there's one thing that modern typed languages seem to do better than currently-popular typed languages and untyped languages, it's express the notion of having/not-having a value!

Re: Interesting

calling a method on an object in O'Caml typechecks iff the object implements the method signature, regardless of who extends what

Yes, Javascript works in that way (except then that there's no static type checking). It's what Python calls duck typing. Btw, what does O'Caml infer when the argument is also an object which has to support a method and so on (possibly ad infinitum)? (F.e. a tail method)

If there's one thing that modern typed languages seem to do better than currently-popular typed languages and untyped languages, it's express the notion of having/not-having a value!

Absolutely. My language Loell does this too, by default, everywhere. What's interesting is that all predicate functions of type a -> Boolean become filters of type a -> a?. I'm now working on Moiell which lets variables also have multiple values, like the list monad in Haskell, or like XSLT.

Known unknowns

How would you characterize the statically-available type-like knowledge that a programmer has about values being operated on?
You mean, how would I formalize it?
Even before formalizing it, it would be useful to be able to discuss it, which requires some sort of informal agreement on the nature and scope of the knowledge in question, and some appropriate terminology.

I would formalize [statically-available knowledge about DT values] as properties expressed in a first-order logic over the domain of programs.

Now you will say, ''But that's exactly a type system!" But it's not, not according to Curry-Howard at least—a type system is a proof theory, not merely a logic, and the programs are the proofs, not the semantic domain.

Furthermore, as I've said before, I don't subscribe to the view that types are merely program properties, or even provable program properties, since it assumes you can talk about typed programs without talking about types.

First, even if you don't call it a type system, this all still seems to suggest that there's some formalization of such logic(s) which could be useful, but which hasn't actually been done (or has it?)

Second, I don't really want to talk about typed programs without talking about types: I want to see better attempts to identify the types (or the types of types) that lurk in DT programs. Put another way, is "Univ" the best that can be done? Of course, it isn't.

Note that in some cases, the program properties we're talking about are demonstrably types. Extreme examples include the cases where types can be mechanically inferred, at which point DT programs are presumably indistinguishable from corresponding ST programs. But there are also many cases in which the programmer could clearly identify unambiguous and obvious types in a DT program, even if they can't be mechanically inferred.

Presumably there's an opposite extreme on this spectrum of "type-like knowledge", where the knowledge in question is so chaotic as to not be usefully organizable into something resembling types. However, I doubt that most programs fall into this category.

When people "produce useful, working systems" with untyped languages, they are not exploiting "multiple unknown logics" any more than I am exploiting Pauli's Exclusion Principle when I rest my chin on my knuckles. Otherwise, I would never have had to take that quantum physics class in college.

Darn, there goes my excuse for not taking that quantum physics class in college!

But actually, the example doesn't apply here. Programmers produce these systems by performing various kinds of modeling and analyses, in their heads if nowhere else. The "unconscious" aspect I referred to is that they aren't usually working within a formalized framework, but they're nevertheless going through a thought process in order to determine whether particular operations can validly be performed on the values in play at any given point in a program. Precisely because they don't have the constraint of having to formalizing this knowledge, they may in the process create "latent type-like schemes" which might be a challenge to formalize coherently. But given that the end result is a working program, I don't think the static framework within which those programs operate should be dismissed quite as cavalierly as many type theorists seem to want to do.

There is a big difference between something being true, knowing that it is true, and knowing that you know it is true.
Hmm, perhaps the U.S. Secretary of Defense should consider working this into his famous poem, "The Unknown". But Rumsfeld also waxed poetically about known unknowns, and I think that's really what we're talking about here: things that we know we don't know (enough about).
We might say something is true if it is a theorem; we know that it's true if we know it is a theorem and there exists a proof; and we know that we know it's true if we can exhibit a proof that we understand.

Right: we're talking about useful, working things, and their relationship to our formal theories about those things.

Imagine if an exercise physiologist were to analyze runners and running based on models constructed of inflexible rods, joined by metal hinges. That physiologist might discover and be able to prove all sorts of interesting things about the nature of running (if running is defined as what the models do), but someone planning to compete in the Olympics would be well-advised to take the physiologist's advice with a grain of salt.

Type theory is not in this position, because in fact it has "runners" (languages) that embody its models perfectly. However, when type theory is casually applied to DT languages - which don't embody its models as neatly - then it starts seeming a lot more like my naive physiologist.

Nevertheless, I see something of interest in this static knowledge that millions of ordinary programmers use virtually unconsciously, and I think it could stand some analysis.
Odd how they all know it "unconsciously" yet have such difficulty grasping it when told it explicitly. :)
Ah, but as I have just pointed out, they have not yet been told "it" explicitly. What they have been told instead is that if they're willing to restrict themselves in certain ways and apply more discipline of a certain kind when designing programs, that there are benefits to be had. It's not so much a question of not grasping it, it's more a question of not buying into the tradeoffs, and having good enough instincts to disbelieve any exaggerated claims of subsumption.

Disknowns

Frank Atanassow: We might say something is true if it is a theorem; we know that it's true if we know it is a theorem and there exists a proof; and we know that we know it's true if we can exhibit a proof that we understand.

Anton van Straaten: Right: we're talking about useful, working things, and their relationship to our formal theories about those things.

I think this is where I get stuck (or "go wrong" :-) ): how do we know that the true thing is a theorem without there existing a proof? Anton talks about "useful, working things," but the evidence of 20 years in the software business tells me that "working" is far too generous a term to describe what the overwhelming majority of software does (cf. my conversation with sean, though, about whether that says more about software development or about economic models of it).

Anton: Ah, but as I have just pointed out, they have not yet been told "it" explicitly. What they have been told instead is that if they're willing to restrict themselves in certain ways and apply more discipline of a certain kind when designing programs, that there are benefits to be had. It's not so much a question of not grasping it, it's more a question of not buying into the tradeoffs, and having good enough instincts to disbelieve any exaggerated claims of subsumption.

Eh, this also contradicts my experience, both in the sense of banging my own head against type theory and in the sense of trying to get my fellow Lispers to even try something like O'Caml, which very much strikes me as "Lisp with Algol-inspired syntax." And while we've concluded that untyped != unityped, the subsumption argument still stands, if for no other reason than that we know that both typed and untyped languages boil down to the Lambda Calculus (or the SK Combinator Calculus, or the Universal Turing Machine; pick your poison). It is precisely this disavowal of subsumption that I characterize as mysticism. Perhaps you think I mean that in a perjorative sense. Let me assure you that I do not; as a practicing Christian I participate in what some would charitably call "mysticism" and a vastly greater number would call "crackpot nonsense" daily. What I don't engage in is what would in theological terms be called animism: the notion that, because God is a "higher power," He actively infuses literally everything with some kind of life force. Similarly, while I can agree that type theory doesn't yet tell us everything we'd like to be able to (formally) say about programming and programming languages, I don't take that to mean that untyped languages and the programs written in them therefore are a "higher power" than typed ones; that's contradicted by the evidence in the form of the aforementioned Lambda/Combinatoric/Turing results.

What I'm driving at is that typed languages are good for saying that certain things are always true, and untyped languages are good for saying that a larger number of things are true often enough to be useful. I think our industry could benefit from striving to move the dial a bit closer to "always" than it currently is, but at heart that's an economic claim (cf. sean again), and in the end the market gets to decide whether I'm right or wrong.

Type systems and God: Man's search for certainty

Paul Snively wrote:
I think this is where I get stuck (or "go wrong" :-) ): how do we know that the true thing is a theorem without there existing a proof?
I'm not claiming any theorem, I'm talking about an area that looks as though it needs theorems, but doesn't seem to have any, other than essentially trivial descriptions such as Univ.
Anton talks about "useful, working things," but the evidence of 20 years in the software business tells me that "working" is far too generous a term to describe what the overwhelming majority of software does

Again, that's some other battle. You and I, and many other people, are capable of distinguishing between software that works, and software that doesn't. I'm talking about software that works, and there's plenty of it around. I've already addressed the point about how you know whether software works, and the answer is essentially the same for both DT and ST languages: you test it, and you use it, and you fix it when it doesn't work.

Certainly, the set of working software suffers from selection bias by definition, but I'm not making any claims about which software methodology has the greatest productivity, etc.: I'm simply saying that there are DT programs which work, and perform useful tasks, which indicates that DT languages have some utility, and are therefore interesting on some level; given that, I'm saying that type theory, despite its close connection to these issues, hasn't provided good theories of what DT languages do with their latent, static type-like information, which I'll call "latent types" (despite Frank's perfectly valid formal objection to the use of the term "type" outside the context of a defined type system).

One type-theoretic description of DT languages that has been provided is the Univ one, but it doesn't capture the relationship of DT programs to their latent types. The fact that this description has been so easily accepted by many people (it's actually taught at some universities, as is the ST-subsumes-DT idea) indicates to me that there's a widespread lack of recognition of the nature of latent types (or alternatively, a determined desire not to deal with them).

One big reason for that lack of recognition is precisely a lack of theories, the "known unknown" I referred to (although to some people, it's apparently an "unknown unknown"). Even if the lack of treatment has perfectly reasonable justifications - e.g. it's uninteresting from the context of pure type theory, or it's being dealt with bit by bit in other ways (dependent types, first class labels, whatever), it's still unsatisfactory not even to be able to discuss the subject with commonly accepted terminology.

the subsumption argument still stands, if for no other reason than that we know that both typed and untyped languages boil down to the Lambda Calculus (or the SK Combinator Calculus, or the Universal Turing Machine; pick your poison). It is precisely this disavowal of subsumption that I characterize as mysticism.

The Turing equivalence argument works in both directions, so there's no meaningful sense in which the "argument still stands". Few arguments emerge from the Turing tarpit. I'm not making any mystical claims, in any sense of the word.

the notion that, because God is a "higher power," He actively infuses literally everything with some kind of life force

Now I know you're reading in things that I'm most definitely not saying. :)

I don't think it's at all controversial, or even remotely mystical, to say that type theory hasn't provided any good theories for DT languages.

I'm not disputing that any DT program can be translated into an ST form, whether via a full translation or a more superficial embedding - and of course, the reverse is true too. (Note, btw, that an embedding of a DT program via Univ leaves its latent type information intact, and un-captured by the host language's static type system.)

I'm not even trying to make value judgements about which approach is "better". However, in the excitement of the perfect match between type theories and the real languages which implement those theories, it sometimes seems to be forgotten that the theories as currently formulated do not in fact capture everything that's going on in DT languages -- and as we've seen, it's even been explicitly, but incorrectly, claimed otherwise.

I'm interested in the technical issues of type systems in DT languages, and as I see it this has nothing to do with how many broken DT programs there have been throughout history, or how many Lisp programmers refuse to try OCaml, etc. Of course, I invited the latter remark by responding to Frank's jibe. However, there's a very real sense in which Frank's comment was unfair, which is that the story from the type theory side, as it applies to DT languages, is incomplete, but isn't always acknowledged as such; that was the point I was making in that case. Certainly, with misleading arguments about things like subsumption floating around, little progress is going to be made on finding common ground.

A big reason I'm interested in type systems in DT languages is that I'd like to see more of an ability to migrate DT programs to ST, in the same language (or in a series of closely related languages). Given a capability like that (and no, I'm not talking about what some CL implementations do), much of this argument would evaporate. But barring an unexpected breakthrough, we're not going to achieve that while DT is dismissed as uninteresting, or trivially embeddable, or easily convertible to some ST alternative.


[postscript:]
What I'm driving at is that typed languages are good for saying that certain things are always true, and untyped languages are good for saying that a larger number of things are true often enough to be useful. I think our industry could benefit from striving to move the dial a bit closer to "always" than it currently is, but at heart that's an economic claim (cf. sean again), and in the end the market gets to decide whether I'm right or wrong.

I agree with all of that, particularly the point that one of your concerns is an essentially economic claim, which isn't what I'm really interested in since I take sean's perspective as a given. (I had previously alluded to some similar economic arguments, but sean put it much better than I did.)

But as I mentioned above, I'm interested in having the dial you mention be more of a dial, rather than a discontinuity.

Re: Type systems and atheism: man's search for logic ;-)

I'm not disputing that any DT program can be translated into an ST form, whether via a full translation or a more superficial embedding - and of course, the reverse is true too.

The reverse is possible but, AFAICT, it requires much more programmer effort. The DT->ST translation requires only annotation for literals if the host language don't support some form of implicit embedding (e.g. subtyping, Haskell treatment of numeric literals). OTOH the ST->DT translation requires a compile time language (e.g. macros) and the programer should recreate the ST system in it. With features like parametric polymorphism, type inference, type-classes, etc. the effort is going to be orders of magnitude higher than the DT->ST translation.

A big reason I'm interested in type systems in DT languages is that I'd like to see more of an ability to migrate DT programs to ST, in the same language (or in a series of closely related languages). Given a capability like that (and no, I'm not talking about what some CL implementations do), much of this argument would evaporate. But barring an unexpected breakthrough, we're not going to achieve that while DT is dismissed as uninteresting, or trivially embeddable, or easily convertible to some ST alternative.

DT languages' type systems are uninteresting, trivially embeddable and easily convertible to some ST alternative, using the standard definitions of type system and DT. OTOH if we want to change the definition of type to include "latent types" or tags then the problem is different, because now we want to transform something that has no static behavior into something that has static behavior. The main problem with this is that this can lead very quickly to undecidability (e.g. defining integer division to return a ratio or an integer depending on the parameter's values) and there's no clear line to what should be static behavior and what should be dynamic behavior.


IME, most of correct programs (i.e. that can be executed without raising doesNotUnderstand or such errors) written in DT languages can have the types trivially inferred in a type system with parametric types and coalgebraic data-types (modulo subtyping issues). The only tricky parts to infer are the ones that depend on dynamic information (e.g. if null xs then tail xs else [] when tail is a total function) or clever tricks/hacks (e.g. using method names as roman numerals, Smalltalk become:, treating a dictionary as an object in Groovy, (call/cc call/cc)). Some of these cases could be typed (but you would need annotations or extra code) or a workaround could be used, the rest should be discarded. Now the question is: Are the discarded cases realy necessary (i.e. are much more convenient than the other solutions)?


Again IME there are three kinds of answer I give to a DT enthusiast when they show me a code that is much better written in DT than in ST:

1. The Haskell equivalent (i.e. with just syntax changes) doesn't require a single type annotation.

2. This piece of code could fail for such and such reasons at runtime. If you add the runtime checks to make it work correct you'll notice that it's again equivalent to a similar piece of Haskell code.

3. This property can't be enforced as a type without a dependently typed system (without changing the code).


I never saw a case that these answers didn't cover, except for manufactured counter-examples that never would be used in The Real World ®.

Re: Type systems and atheism: man's search for logic ;-)

I like your subject line much better. ;)
DT languages' type systems are uninteresting, trivially embeddable and easily convertible to some ST alternative, using the standard definitions of type system and DT.
Note that I wasn't saying that DT programs aren't easily convertible to some ST alternative - rather that one's preference might be to not do that. Prototyping is one classic case where this may be desirable; deferring decisions to runtime is another. As for embedding, it doesn't really achieve anything beyond mixing DT & ST - you're just writing DT code in an ST language, you're not really applying ST to the DT code, so this is an uninteresting case all around.
OTOH if we want to change the definition of type to include "latent types" or tags then the problem is different, because now we want to transform something that has no static behavior into something that has static behavior.

Note that what I mean by "latent types", as I've defined the term in this thread, does actually involve static information, and is not the same thing as runtime tags. Because of this, I'm also not necessarily talking about "changing the definition of type", but rather talking about type systems which can handle this information. Maybe there are existing type systems that would; I don't think any of the usual functional language suspects really come close, though.

I'll see if I can come up with an example, although it may take a while: I have a number of real programs implemented to some degree or another in both Scheme and ML, and while ML's type system isn't everything it would need to be, I think some of the issues that come up when converting would apply across type systems.

Again IME there are three kinds of answer I give to a DT enthusiast when they show me a code that is much better written in DT than in ST:
For me, if I write a piece of code that works well enough for the problem at hand, the only kind of answer that works for me, in general, is your (1): "The Haskell equivalent (i.e. with just syntax changes) doesn't require a single type annotation." (2) may be important in some cases, but not others; and I'm not clear on the implications you mean for (3) - are you suggesting that the features in question could be implemented easily if not enforced as a type?

Implications for (3)

and I'm not clear on the implications you mean for (3) - are you suggesting that the features in question could be implemented easily if not enforced as a type?

(3) is the case where the types depend on the terms:

foo bar n = if even n then n:bar else bar + 3

That is bar is either a list of (even) numbers or a (odd) number (but not both), depending on the value of n. This was mentioned by some people in a OO vs. FP flamewar/thread on comp.lang.functional last year, where they said foo variables can point to an object that responds to either bar or baz messages depending on some conditional, but no object can respond to both messages.

For example, from the proof

For example, from the proof-theoretic point of view, it amounts to insisting that every well-formed term (in normal form, i.e., cut-free) is a proof of some theorem, but not only do we not know the theorem, we don't even know what logic it belongs to! Maybe that's not nonsensical, but to me it has the ring of a new postulate for mathematical foundations, and I am not much interested in such drastic measures.

Sounds like Borges' "Library of Babel"... Great story...

Really a problem?

I retracted (or weakened, as you care to see it) my claim "untyped languages are inferior to typed languages in every way."

Since this would be the biggest event in the history of LtU ;-), I've been thinking about this problem and I'm not sure if I think it is really there.

If I have understood correctly, the problem is that there is not a unique, obvious isomorphism between untyped languages and unityped ones.

But if that is all, who cares? Just as one's choice of representation of cartesian products doesn't change its properties, why would this invalidate your previous assertion? (Setting aside that one has to equate "is a proper subset of" with "is inferior to in every way" ;-))

Or can you think of a case where an untyped language cannot be mapped to a unityped one in consistent way (or any number of arbitrary ways)?

Not universal

But if that is all, who cares? Just as one's choice of representation of cartesian products doesn't change its properties, why would this invalidate your previous assertion?

Good question. Suppose two typed programmers want to write part of their programs in an untyped way, using an embedding. The fact that there are many embeddings means that, in order for their programs to communicate, they have to agree to use the same embedding.

Similarly, if two mathematicians write two proofs using cartesian products, and they want to merge their proofs, they have to either 1) adopt an abstract definition of cartesian products (like the categorical one), or 2) agree to use the same implementation, like for example the von Neumann definition of ordered pairs. The embedding is like 2) rather than 1) because it doesn't characterize an untyped language abstractly enough; it only gives an implementation.

A way to address this is to impose a convention (protocol, standard, ...), say by adding Univ and the translation itself (say, by adding a quotation syntax like I suggested) to the definition of the untyped language, and hoping that everyone will use that instead of a custom-made embedding.

However, my original claim was that any ST language embeds an untyped language, and not every ST language has such a convention, and no ST language I know of can disallow programmers from cooking up their own embedding. In a language like Arrow that can treat isomorphic types as identical, you can effectively treat any home-made definition of cartesian products as if it were the standard one, since any two definitions will be canonically isomorphic, but, for technical reasons, I don't think this will ever be possible for the Univ embedding. (Categorically, I would say cartesian products can be characterized by a universal property, whereas I don't know of such a characterization for untyped languages, and even if it exists, I don't think the associated canonical equivalence would be decidable.)

In my post to Anton above, I compared the embedding to a compilation method, or a compiler for an untyped language. An ST language with standard type Univ and a quotation syntax amounts to an "ST language + a compiler for an untyped language". This is less general than an "ST language + untyped language", and less general still than simply "an ST language".

"Mostly... Simplistic"

Anton van Straaten: Indeed, when humans work with dynamically-checked languages, they do more than just mental static type inference in order to get the types right - they also perform dynamic simulations, whether mentally, or by running tests on a computer. This allows humans to develop complex but mostly correct latent type schemes, that do not lend themselves to the relatively simplistic static analyses that a compiler can perform.

See, this is exactly what I've been afraid of: a DL escape hatch here formulated as "mostly correct latent type schemes," followed by the countervailing and, I have to say, dismissive "relatively simplistic static analyses that a compiler can perform." As far as I can tell, "mostly correct" falls into the category of content-free grammar :-) and "relatively simplistic static analyses that a compiler can perform" only serves to convince me that Anton needs to read more of the literature and, yes, try out some of these compilers with their "simplistic static analyses." In particular, these analyses do things that the DL aficionados also tend to dismiss: prove properties, not of the software developed with the compiler, but of the compiler itself: for example, that the compiler will terminate, that it won't generate code that attempts to add an integer and a string, and so on. I find this very worrisome: I have followed Frank in an error that's now been corrected, given what I think is a very sound rearticulation of my understanding of the real benefits of dynamic languages, and here we are not even a full 24 hours later and I see the same old cant that has got me all but believing that there's no hope for the DL community. DLs don't have rich type systems relative to SLs' simplistic ones; they have ill-defined ones that appeal to intuition relative to the SLs' ones based on the entirety of the history of mathematical logic.

Having said that, let me point out that I'm now convinced that one can craft a rigorous defense of latent typing, e.g. as I've alluded to elsewhere, perhaps based on the Futamura projections, since partial evaluation seems to be the dimension along which the slider from "compile time" to "runtime" moves. But I've never seen one. All I see from my brothers and sisters on the DL side is like this: the strong suggestion that type theory lacks nuance or subtlety or richness or some other double-plus good gerund. Most often, it comes with a "static type systems can't do X" assertion, with the most recent example being "dynamic in-place upgrade." When an existence proof of the inaccuracy of the assertion is profferred, the goalposts move to "That's not in a popular language, not even as popular as, e.g. O'Caml." Perhaps not. That doesn't contradict the existence proof.

These kinds of exchanges are what prompted me to assert before that the big cultural gap between DL and SL fans is that the DL fans are not rigorous. I'm sorry; I know that's somehow an impolitic thing to say, but I'm sitting here reading yet another example of it. Even when ST fans say, yes, programming is a human endeavor; yes, STs have a lot of work to do to bridge the syntactic convenience gap; yes, really expressive type systems haven't yet hit the mainstream; yes; yes; yes; uncle; I give! this is what we get.

I'm going to have to start holding DL aficionados to a higher standard. I want citations, references, formal semantics and soundness proofs for your favorite DLs. I don't care if they're in denotational semantics, structured operational semantics, axiomatic semantics, action semantics, or modular monadic action semantics. I don't care if they're proven with Maude, NuPrl, MetaPRL, OBJ3, CafeObj, HOL, HOL-Light, Coq, Isabelle, or pencil and paper. But until then, there are no grounds for even suggesting, let alone claiming, that DLs are richer than SLs.

Lest anyone think the bar is too high or assert that there is no precedent for this on the DL side, I refer the reader to The VLISP Verified Scheme System.

Short uptime

[...] and "relatively simplistic static analyses that a compiler can perform" only serves to convince me that Anton needs to read more of the literature and, yes, try out some of these compilers with their "simplistic static analyses."

I'll take that as a quaint way of saying that my point wasn't understood. (For the record, I gave some indication of my experience with applied typed lambda calculi, and their compilers, in an earlier post above; perhaps the power cycle referred to in an earlier subject line wiped out some memory.)

Please note that I said "relatively simplistic". You omitted the qualifier twice when quoting me, and that changes the sense of what I wrote.

So, relative to what? My original statement made that clear: humans. What compilers do can be characterized as "relatively simplistic" compared to what humans do, pretty much by definition: compilers are applying strict, well-defined rules, based on statically available information. What human programmers do is much more complex, and the evidence of this is that humans are able to write working, dynamically-typed programs that have type schemes which work well in practice, thus demonstrating their integrity, but which nevertheless resist the formal static analysis of the sort that compilers are currently capable of doing.

I'm going to have to start holding DL aficionados to a higher standard. I want citations, references, formal semantics and soundness proofs for your favorite DLs.

Well, as I mentioned, my own implementation of the formal semantics for my favorite DL is here. Soft type systems, which expose some of the latent syntactic types in this language, can be found in SoftScheme, MrSpidey, and MrFlow (the latter is available in the PLT CVS).

But particularly in these contexts where we're talking about that which hasn't yet been well formalized, we have to be quite careful of formal proofs: the Univ argument is a case in point. There's an understandable tendency to focus only on what's formalizable, and to some extent, only on what's easily formalizable, where "easily" means not too large a step from what the current state of the art is capable of. This approach is perfectly sensible, but it goes wrong when it's used to attempt to dismiss that which hasn't yet fallen to formal analysis as uninteresting, unimportant, etc. The Univ argument did exactly that, and that argument has been used quite widely, not just here.

BTW, I don't consider myself a "DL aficionado", any more than I'm a type theory aficionado or a static checking aficionado - I'd say I'm more like a centrist in these areas. I'm certainly a type theory fan, but type theory does not equal static checking! We've just seen how the intense focus on static checking is leading to errors in thinking - that kind of error arises for the classic reason of wanting a particular result, finding an argument to support it, and being satisfied by the fact that it supports the preconception.

But until then, there are no grounds for even suggesting, let alone claiming, that DLs are richer than SLs.
You're incorrect on that point, assuming you're referring to the more specific point that I made, about the latent type schemes present in dynamically-checked programs. All the grounds for my claim have been given already, and the conclusion should be quite obvious once you've rid yourself of the notion that dynamically-checked programs are unityped. I wouldn't even say the claim is controversial. If you think it is, perhaps you could give me some idea why you think so.

Head Scratching?

Anton van Straaten: I'll take that as a quaint way of saying that my point wasn't understood.

I think that must be true, because...

Anton: Please note that I said "relatively simplistic"... So, relative to what? My original statement made that clear: humans.

OK. I had understood you to mean "relative to dynamic languages," and that was my point of objection. On the other hand, perhaps you believe there to be a one-to-one correspondence between a human programmer's intution about the semantics of his DL and the semantics that are actually expressed by the DL. On the basis of fairly extensive study into cognitive science, I would have to disagree with such a mapping were you to express one (and I am not claiming that you have). But hopefully this helps explain my objection, albeit an objection to a statement that it turns out that you didn't make.

Anton: What human programmers do is much more complex, and the evidence of this is that humans are able to write working, dynamically-typed programs that have type schemes which work well in practice, thus demonstrating their integrity, but which nevertheless resist the formal static analysis of the sort that compilers are currently capable of doing.

Here is where I think we're going to have to agree to disagree: barring some probabilistic model of system correctness (and personally I'd love such a model, because it would enable me to account for real-world phenomena such as power outages, end-of-life hardware failures, etc.), I can't accept that "which work well in practice, thus demonstrating their integrity" has any meaning. Basically, this seems to me like a classic confusion of correlation and causation: you observe that your understanding of the semantics of your latent types coupled with using your program in environment A (by which I mean the inclusion of dynamic values at runtime) worked yesterday, running in environment B (different dynamic values) worked today, and you predict that running in environment C will work tomorrow. Let's say that happens to be true. That's a nice correlation. Is it causation? No. So specifically the "integrity" claim seems not to have any grounding.

Anton: Well, as I mentioned, my own implementation of the formal semantics for my favorite DL is here.

Hmmm. I have to apologize; I don't recall having seen this reference before.

Anton: Soft type systems, which expose some of the latent syntactic types in this language, can be found in SoftScheme, MrSpidey, and MrFlow (the latter is available in the PLT CVS).

Exactly! You might have seen that I refer explicitly to MrSpidey and MrFlow on the MzTake thread. In any case, from the abstract for SoftScheme, we find:

Its underlying type system generalizes conventional Hindley-Milner type inference by incorporating recursive types and a limited form of union type. Soft Scheme accommodates all of R4RS Scheme including uncurried procedures of fixed and variable arity, assignment, and continuations.
This sounds like any number of the type inferencers for any number of the richer type systems that are found in the modern type theory literature, and in fact some of the very recent work, e.g. the GADT paper or the generalization of HM(X) goes quite a lot farther. Now, of course, the appropriate response to this is "yes, but Scheme lets me go on whether the type inferencer comes up with a principal type or not, and never requires annotations." This is the point at which I think the observation that we can encode a dynamic type system in our statically-typed language if we want to comes in, modulo the observation that the popular ST languages make it syntactically very awkward, and that we don't have to annotate either, if you're willing to accept that type inferencing might not terminate (Cayenne) or if you're willing to limit your dependent types to expressing total functions (Epigram, at least for now).

In any case, I appreciate the references, as they both clarify where you're coming from and support, I think, the point where I expected us to meet in the middle: I've got some post-Hindley-Milner inferencer in my compiler that will stop the compile; you've got some post-Hindley-Milner inferencer that you only use if you want to, and even then it won't stop the compilation. As I suspected, in a very strong sense, we are both doing partial evaluation, and the only difference is the level of assurance that we want out of it.

Anton: But particularly in these contexts where we're talking about that which hasn't yet been well formalized, we have to be quite careful of formal proofs: the Univ argument is a case in point. There's an understandable tendency to focus only on what's formalizable, and to some extent, only on what's easily formalizable, where "easily" means not too large a step from what the current state of the art is capable of. This approach is perfectly sensible, but it goes wrong when it's used to attempt to dismiss that which hasn't yet fallen to formal analysis as uninteresting, unimportant, etc. The Univ argument did exactly that, and that argument has been used quite widely, not just here.

I see a lot of truth here. Regretably, my experience continues to be—as does, I'm sure, yours in the reverse—that members of the different communities define "easily formalizable" differently, based largely on their own preconceptions as to what a desirable level of formalism is! So while I see the truth of your observation, I also am mindful of the ongoing claims of "what static types can't do" that keep falling to new results. I confess: I'd rather make errors of the untyped/unityped kind than of the "static types can't do X" and then discover that they can kind, because the latter is the difference between being able to prove something about my system and not being able to. While it's true that that's not always important, it's important in a lot of domains, and I continue to believe on the basis of personal experience that it's perceived not to be important in a lot of other domains, not because it isn't, but because it's believed to be impossible in a lot of cases in which it's actually quite possible!

Anton: We've just seen how the intense focus on static checking is leading to errors in thinking - that kind of error arises for the classic reason of wanting a particular result, finding an argument to support it, and being satisfied by the fact that it supports the preconception.

Me: But until then, there are no grounds for even suggesting, let alone claiming, that DLs are richer than SLs.

Anton: You're incorrect on that point, assuming you're referring to the more specific point that I made, about the latent type schemes present in dynamically-checked programs. All the grounds for my claim have been given already, and the conclusion should be quite obvious once you've rid yourself of the notion that dynamically-checked programs are unityped. I wouldn't even say the claim is controversial. If you think it is, perhaps you could give me some idea why you think so.

That's fair as far as it goes, but it doesn't address the concern that led to my previous post, which is basically that the tendency on the part of those who observe that DLs are not, in fact, a proper subset of SLs is to claim that DLs are more expressive on the basis of what can charitably be characterized as metaphysics, and less charitably as mysticism. The references to soft typing not only don't contradict this observation; they reinforce it: as I just described above, the difference between soft type systems and hard type systems is that the latter won't let through anything that they can't prove certain things about, like confluence, termination, etc. whereas the former will. That's the difference between the scientific method and the lack of the scientific method in a nutshell.

Perhaps the model for this duality of outlook is Mulder and Scully: it's not that Scully didn't believe her own experience; it's that she firmly believed that the phenomena she experienced fit within the scientific framework even if the mapping wasn't currently available. Mulder, at the very least, wasn't sure that such a mapping existed; I'm not sure whether or not he believed such a mapping was important. So perhaps what we're learning is that I'm Scully to your Mulder: that there's stuff that you can say in a DL that we don't (yet!) know how to statically type that you can correlate with "working" well enough to satisfy both you and the customer doesn't say to me "DLs are richer than SLs;" it says to me that there's more work to do in type theory still.

Towards a broader conception of the scientific method

On the other hand, perhaps you believe there to be a one-to-one correspondence between a human programmer's intution about the semantics of his DL and the semantics that are actually expressed by the DL.
I think the correspondence can be a lot stronger than you give it credit for. For example, in serious programs in DT languages (I would point to Lisp and Scheme as one possible source of examples), it's not unusual to use assertions to verify that the types are as expected. Programs are interrelated wholes; a wrong assumption in one place can have serious consequences. When a program developed following good practices passes all its tests, and none of its assertions about types fail, that provides quite a strong assurance of the program's correctness, and therefore the correctness of its type scheme. It doesn't rise to the level of static proof, but in many contexts, that may not be significant.
Basically, this seems to me like a classic confusion of correlation and causation: you observe that your understanding of the semantics of your latent types coupled with using your program in environment A (by which I mean the inclusion of dynamic values at runtime) worked yesterday, running in environment B (different dynamic values) worked today, and you predict that running in environment C will work tomorrow. Let's say that happens to be true. That's a nice correlation. Is it causation? No.
Certainly, if one doesn't test in a particular environment, one's assumptions about a program's behavior in that environment may turn out to be incorrect. But that's where test suites, assertions, and other testing comes in. It's not safe to run statically-checked programs in environments in which they haven't been tested, either. The Ariane 5 accident is an example of that. In fact, on this point I start to think Peter Lewerin may have had a point, in being concerned that static type checking could provide a false sense of security [edit: if one over-emphasizes it at the expense of other considerations].
So specifically the "integrity" claim seems not to have any grounding.

If you're saying that because the standard of evidence for correctness of a system's type scheme is lower than it is for the static proof case, that claiming its type scheme has integrity therefore has "no grounding", that's clearly false. It also underscores my point about being dismissive of anything that's not statically checked. You, and others, in your enthusiasm for static checking, seem to have a bit of a blind spot in this area.

Note that in the real world, there can also be other priorities than theoretical perfection.

Anton: "Well, as I mentioned, my own implementation of the formal semantics for my favorite DL is here."

Hmmm. I have to apologize; I don't recall having seen this reference before.

Second last paragraph of my post entitled "Namaste" above.
[SoftScheme] sounds like any number of the type inferencers for any number of the richer type systems that are found in the modern type theory literature
Certainly. That's why I wrote that these systems "expose some of the latent syntactic types". If you have any experience with them, you'll know that they aren't capable of identifying the types as narrowly as a program's author is capable of doing. [Edit: you can think of inferencing in these cases as providing a first approximation to the syntactic types in a program. In statically-checked languages, that's all you're allowed.]
As I suspected, in a very strong sense, we are both doing partial evaluation, and the only difference is the level of assurance that we want out of it.
I'd say the only difference is in our understanding of the level of assurance we actually achieve. :)
I confess: I'd rather make errors of the untyped/unityped kind than of the "static types can't do X" and then discover that they can kind, because the latter is the difference between being able to prove something about my system and not being able to. While it's true that that's not always important, it's important in a lot of domains, and I continue to believe on the basis of personal experience that it's perceived not to be important in a lot of other domains, not because it isn't, but because it's believed to be impossible in a lot of cases in which it's actually quite possible!
Well, that may be the case in "a lot of other domains", but that doesn't have to be true of you, if you apply your own thought processes as opposed to simply following a crowd.
That's fair as far as it goes, but it doesn't address the concern that led to my previous post, which is basically that the tendency on the part of those who observe that DLs are not, in fact, a proper subset of SLs is to claim that DLs are more expressive on the basis of what can charitably be characterized as metaphysics, and less charitably as mysticism.
I've given some hints as to why DL's are, in fact, more expressive in a particular dimension, namely that of their latent type systems. Should I expand on that?
The references to soft typing not only don't contradict this observation; they reinforce it: as I just described above, the difference between soft type systems and hard type systems is that the latter won't let through anything that they can't prove certain things about, like confluence, termination, etc. whereas the former will.

As I've now pointed out, soft typing is only part of the picture when it comes to latent type schemes. I'm claiming that (competent) humans are capable of developing type schemes that are useful, but that can't currently be inferred mechanically, without major hints to the inferencer; and even then, these schemes aren't necessarily fully statically checkable - for example, they may require operations like downcasting. However, it's important to recognize that not being "fully statically checkable" doesn't automatically mean unreliable, or even unprovable.

One of the mostly implicit theses in my posts has been that if more type theorists could bring themselves to pay some attention to type systems that aren't fully statically checkable, they may find that there are many useful possibilities there. As I've already observed, though, maximum static checkability is an attractive goal, and it's difficult to distract most type theorists from that goal.

That's the difference between the scientific method and the lack of the scientific method in a nutshell.
No, that's wrong. There are other means of proving things, or verifying them, than the static type checks that current compilers perform. It's possible to produce proofs based on dynamic tests, for example (take a look at the proof for Fermat's Last Theorem :) You're focusing very narrowly on a single class of techniques! Granted, those techniques are very powerful, with a huge range of applications. But that power appears to be blinding you to the possibility that there's any value in alternative approaches, as evidenced by your making what appear to be blatantly false statements, such as "the 'integrity' claim seems not to have any grounding".
Perhaps the model for this duality of outlook is Mulder and Scully: it's not that Scully didn't believe her own experience; it's that she firmly believed that the phenomena she experienced fit within the scientific framework even if the mapping wasn't currently available. Mulder, at the very least, wasn't sure that such a mapping existed; I'm not sure whether or not he believed such a mapping was important. So perhaps what we're learning is that I'm Scully to your Mulder: that there's stuff that you can say in a DL that we don't (yet!) know how to statically type that you can correlate with "working" well enough to satisfy both you and the customer doesn't say to me "DLs are richer than SLs;" it says to me that there's more work to do in type theory still.

Once again, you appear to be conflating type theory with static checking; that's an error. In any case, I'm not saying anything like what you suggest. I would absolutely agree with the Scully position: "the phenomena she experienced fit within the scientific framework even if the mapping wasn't currently available". And that's the whole point: the current mappings, where they exist, aren't always satisfactory, or the combinations of required features aren't always available together, or in a usable form. You seemed to acknowledge this in your response to James Hague.

[P.S.: In addition, we know that not all phenomena in Turing-complete languages are statically checkable, so we have to look for solutions elsewhere as well.]

Striving for Positivism

Anton van Straaten: I think the correspondence can be a lot stronger than you give it credit for. For example, in serious programs in DT languages (I would point to Lisp and Scheme as one possible source of examples), it's not unusual to use assertions to verify that the types are as expected.

Not only do I know this to be true, I'll go one step farther and observe that CMU CL, its direct descendant SBCL, and the Stalin Scheme compiler all make this explicit: "declarations are assertions." At a certain optimization level, the compiler checks, and these compilers all do type inferencing.

Anton: Programs are interrelated wholes; a wrong assumption in one place can have serious consequences. When a program developed following good practices passes all its tests, and none of its assertions about types fail, that provides quite a strong assurance of the program's correctness, and therefore the correctness of its type scheme. It doesn't rise to the level of static proof, but in many contexts, that may not be significant.

Once again, I don't believe we disagree, and in fact, everything you say here echoes what I've written before about comprehensive testing. Unfortunately, "comprehensive testing" is very much like modern type inferencing with generalized algebraic datatypes: great in theory; virtually non-existent in practice.

Anton: Certainly, if one doesn't test in a particular environment, one's assumptions about a program's behavior in that environment may turn out to be incorrect. But that's where test suites, assertions, and other testing comes in.

The question becomes one of how you know you've tested all possible execution paths with all possible dynamic values. To the extent that such knowledge can be automated, the problem is isomorphic to static type-checking coupled with whole-program global flow analysis. Once again, I think we agree that this is a good goal to strive for, but realistically, I've never seen it in practice, and I don't even know any automated test-generation systems bold enough to claim that they exercise all code paths with all possible dynamic values.

Anton: It's not safe to run statically-checked programs in environments in which they haven't been tested, either. The Ariane 5 accident is an example of that.

Yes, Ariane 5 is an interesting example for a number of reasons: one, as an example of relying on an inadequate type system, in this case specifically one that allowed type conversion even when it meant loss of numeric precision. Two, because it showed clearly the dangers inherent in not enforcing type safety across module boundaries coupled with testing of modules independently but not in conjunction. Finally, it dramatically underscored the importance of disallowing uncaught exceptions and prompted the research that led to Type-based analysis of uncaught exceptions.

Anton: In fact, on this point I start to think Peter Lewerin may have had a point, in being concerned that static type checking could provide a false sense of security [edit: if one over-emphasizes it at the expense of other considerations].

Oh, of course. Again, no one is suggesting that unit and integration testing are no longer necessary, or anything of the kind. What I find astonishing is the notion that fewer properties proven at compile time is somehow an improvement over more properties proven at compile time.

Anton: If you're saying that because the standard of evidence for correctness of a system's type scheme is lower than it is for the static proof case, that claiming its type scheme has integrity therefore has "no grounding", that's clearly false.

My observation was that the specific claim of being able to write code in a DL that "mostly works" somehow shows that the type scheme "has integrity" has no grounding, not that a specific DL's type scheme necessarily has no grounding, and I've even posted a link to VLISP to offer a specific example of a verified DL. More recently, of course, I've been greatly enjoying the reduction-semantics library of post-208 DrScheme CVS pulls; one of its examples is a reduction semantics for their "beginner" language, which would, of course, be isomorphic to the executable denotational semantics that you did for Scheme. That is, I understand that things can be proven about DLs and their latent type schemes. Successfully running a program written in one N times doesn't rise to that level.

Anton: It also underscores my point about being dismissive of anything that's not statically checked. You, and others, in your enthusiasm for static checking, seem to have a bit of a blind spot in this area.

I'll admit that conversations like this have a radicalizing effect. I like DrScheme, and thanks to the reduction semantics library and MzTake I'm liking it even more. EMACS + SLIME + SBCL rocks. I haven't quit my C++ day job despite C++ fitting neither the best of the SLs' characteristics nor the best of the DLs'. Just today I reinstalled Oberon, Self, and Squeak. I should write a nice, long essay on the lessons regarding the importance of direct manipulation in object-oriented languages that Self and Squeak have to offer. So you're seeing a limited range of my thinking on these subjects. Unfortunately, I don't feel I can move much for the reason I outlined in my last post: whenever I do, whatever solid ground that I feel like we've finally gotten under us seems to vanish into dust. I'll deal with that more specifically in a moment.

Anton: Note that in the real world, there can also be other priorities than theoretical perfection.

Of course there can be. However, again, what I find in the real world is a tremendous number of people—far and away the overwhelming majority—who think that "static typing" means "mandatory type annotations that relate entities in my code to machine types, maybe, and sometimes even that doesn't work." So one of my points is that it isn't the DL perspective that needs promotion; the DL perspective is the de facto winner in the marketplace. As a free marketeer, it galls me to have to say that that's not because of inherent benefits of DLs, but due to market ignorance. I would just like to see the market better informed so that more informed choices can be made. I'll know we've arrived when I hear "we chose not to use Epigram on this project because..." and the reason isn't "because no one knows it." Besides, no one's arguing theoretical perfection. All that's being argued for is a larger set of correctness guarantees before the software ever leaves the shop.

Anton: That's why I wrote that these systems "expose some of the latent syntactic types". If you have any experience with them, you'll know that they aren't capable of identifying the types as narrowly as a program's author is capable of doing. [Edit: you can think of inferencing in these cases as providing a first approximation to the syntactic types in a program. In statically-checked languages, that's all you're allowed.]

Except, of course, that in statically-checked languages, they aren't first approximations: they're proven principal types, which means that you know, at compile time, exactly what operations can and cannot be performed on them. In my previous post, I make quite clear that I understand that the difference between soft typing systems and static typing systems is that the former lets you proceed even when it can't find a principal type and the latter doesn't. Which you prefer is a question of personal values, and given that we've come to an agreement that ST and DT are dual up to the state of the art in type inference, it's not clear to me at this point what's left to say.

Anton: I'd say the only difference is in our understanding of the level of assurance we actually achieve. :)

OK, then we really are back where we started: the level of assurance that you achieve in DLs is strictly less than the level of assurance that you achieve in SLs modulo your use of a soft typing system. If you use a soft typing system, you're doing type inference just like I am. We both have to do testing; you just have to test things that I don't. In the end, the real point of static type checking is that the proof that it works only has to be done once: by the compiler developer. In DLs, the proof burden is on the programmer on a program-by-program basis. Again, this is modulo soft typing/partial evaluation. If you do that, by definition you end up in what we've been calling the "static typing" realm.

Anton: Well, that may be the case in "a lot of other domains", but that doesn't have to be true of you, if you apply your own thought processes as opposed to simply following a crowd.

I think you've misunderstood me. My comment was that there are a lot of domains that have given in to sour grapes with respect to expecting to be able to prove, or to have a proof implicit in the fact that their program compiles successfully, certain properties of their program, i.e. they say "It's not important for us to prove that" when what they really mean is "We don't believe it's possible/feasible to prove that." My feeling as a former DL aficionado is that the DL community exacerbates this problem. That is, the DL community continues to insist that it confers unique benefits on its members when those benefits are in fact not unique or are not even addressed by DLs. The biggie, of course, is "incremental interactive development," but I think TyPiCal is another great example. What on earth could a DL possibly do that would be isomorphic to statically proving lack of deadlock/race conditions?

Anton: I've given some hints as to why DL's are, in fact, more expressive in a particular dimension, namely that of their latent type systems. Should I expand on that?

Please! The references to the executable denotational semantics for Scheme and the soft typing systems were an excellent start.

Anton: I'm claiming that (competent) humans are capable of developing type schemes that are useful, but that can't currently be inferred mechanically, without major hints to the inferencer; and even then, these schemes aren't necessarily fully statically checkable - for example, they may require operations like downcasting.

Good conclusion; bad example. There are plenty of decidably-inferred type systems with subtyping. A better example would be, e.g. a dependent type system with partial functions such as Cayenne's. But I must be misunderstanding you; if you've gone through "Types and Programming Languages," you know all of this and more besides, so now I fear that I'm once again missing something in what you're claiming.

Anton However, it's important to recognize that not being "fully statically checkable" doesn't automatically mean unreliable, or even unprovable.

Right, again, it just shifts the burden, and perhaps not even all that much: a good partial evaluator for Scheme, like Similix or PGG, makes a world of difference in practice. All of that I'm perfectly happy to stipulate. What I can't imagine stipulating—what doesn't even seem like it can be made coherent—is the notion that arbitrary code written in an arbitrary DL can somehow be said to be on the same-or-better reliability/safety footing as arbitrary code written in an arbitrary SL.

Anton: One of the mostly implicit theses in my posts has been that if more type theorists could bring themselves to pay some attention to type systems that aren't fully statically checkable, they may find that there are many useful possibilities there. As I've already observed, though, maximum static checkability is an attractive goal, and it's difficult to distract most type theorists from that goal.

Type theorists are the ones doing the work on "existential types," "phantom types," and "dependent types:" more generally, the rubric of "generalized algebraic datatypes," which, as you know, combine types and terms in interesting ways. Cayenne isn't fully statically checkable; heck, the compiler itself may diverge! Epigram, as we've all recently learned, allows dependent types but only in the context of total, vs. partial, functions, and makes a major point of assisting the programmer in understanding type inference and providing annotation as necessary. So once again, you're asserting an absolute position on the part of type theorists that isn't warranted. Conversely, apart from your executable Scheme denotational semantics, PLT's reduction semantics for their "beginner" Scheme, VLISP, SoftScheme and MrSpidey/MrFlow, I don't see much coming from the DL community on what claims they can actually make. I'd be a whole lot less cranky if I saw more MrFlows and heard several orders of magnitude less about how testing accomplishes the same thing—unless, of course, that assertion comes coupled with a whole-program control-and-data-flow analyzer that automagically generates those tests for me.

Anton: No, that's wrong. There are other means of proving things, or verifying them, than the static type checks that current compilers perform. It's possible to produce proofs based on dynamic tests, for example (take a look at the proof for Fermat's Last Theorem :)

Funny; in private correspondence I recently asked an acquaintance who's working on a dependently-typed language design and used Fermat's Last Theorem as an example of something that wouldn't type-check why it wouldn't! He did offer a definition of Pythagorean triples that would type-check, though. :-)

In any case, it's not that I don't know that dynamic proofs are possible and even necessary in some domains; it's that dynamic languages generally don't do them, and even when they do, the result is an error in the process that the customer is running vs. the process that I was running when I wrote the program.

Anton: You're focusing very narrowly on a single class of techniques! Granted, those techniques are very powerful, with a huge range of applications.

Right—an enormous aspect of my thesis being that the DL side has a habit of saying "static type checking/inference can't do X," where "X" is an element of a set whose membership keeps shrinking on them. That's admittedly a disjoint observation from "DLs allow us to express things that we don't yet know how to formalize."

Anton: But that power appears to be blinding you to the possibility that there's any value in alternative approaches, as evidenced by your making what appear to be blatantly false statements, such as "the 'integrity' claim seems not to have any grounding".

I think, in context, it's far from blatantly false; I even offered the notion of a probabilistic model of correctness that I think would serve DLs well in the absence of even soft typing, let alone something stronger such as model checking or partial evaluation. But the conjunction "mostly correct" and "shows that it has integrity" has no grounding in the absence of some model against which to interpret "mostly correct." Does it have intuitive appeal? Of course it does! It's the basis of every piece of C++ I write for my employer. Does the foundation have to be probabilistic? Of course not, nor does it have to be full-on self-applicative partial evaluation. But it does have to be stronger than intuition.

Anton: Once again, you appear to be conflating type theory with static checking; that's an error.

I can see why you'd think so, but no, and hopefully continuing to loop back to partial evaluation drives the point home: the issue isn't really what you can check and what you can't; we both agree, the more the better and it can't all be checked statically. The real difference is who gets the error: me or my customer? The more I get, the better, and from that POV, it matters not in the slightest whether I get them from a type checker that's built into my compiler and mandatory or one that's optional, external, and that I run voluntarily. Similarly, yes, I need to run tests, but those are going to be really limited unless I'm also doing flow analysis and type inferencing there, and at that point why wouldn't I be using either a hard or soft typer before testing?

So that's it, I think: the technical observation is that SLs give me the error instead of my customers; the social observation is that there aren't enough DL designers who give us soft typing and formal semantics while there aren't enough SL designers who give us nice syntax for dynamics.

Anton: And that's the whole point: the current mappings, where they exist, aren't always satisfactory, or the combinations of required features aren't always available together, or in a usable form. You seemed to acknowledge this in your response to James Hague.

Absolutely. That doesn't mean DLs are somehow better in any sense other than being available, and I think that's the thing that irks me most: what I perceive as a tendency on the DL side to translate "compromise" into "advantage." Again, I'd also care a lot—a lot—less if I didn't keep encountering the "static typing can't do X" myths.

Anton: In addition, we know that not all phenomena in Turing-complete languages are statically checkable, so we have to look for solutions elsewhere as well.

Indeed. As I wrote elsewhere, I recall refusing to learn Cayenne upon reading that type inference was undecidable in it, all while having used C++ with its Turing-complete template system and "template instantiation depth" limiting flags for years! Most embarrassing.

Are GADT more general than dependent types?

Type theorists are the ones doing the work on "existential types," "phantom types," and "dependent types:" more generally, the rubric of "generalized algebraic datatypes," which, as you know, combine types and terms in interesting ways.
I am quite new to the area (less than a month), but are not GADTs just a part of dependent types research? Could we try and establish some kind of (at least crude and naive) taxonomy? There are probably already some papers doing this, we just need to find references AND perform translation between terms used in different papers.

For example, wobbly types paper states:

Generalised algebraic data types (GADTs) [...] have appeared in the literature with a variety of names (guarded recursive data types, first-class phantom types, equalityqualified types, and so on), although they have a much longer history in the dependent types community
and
In the dependent types community, GADTs have played a central role for over a decade, under the name inductive families of data types
PS: I hope there is no need to distinguish between generalised data types and generalized datatypes? :)

[OT] Politics are irrelevant

Paul Snively wrote:
I'll admit that conversations like this have a radicalizing effect.
I'd like to address that. You seem to be arguing with me as though I were a representative of, say, the Python or Perl community. You've responded to things I haven't said, and are reacting based on what you perceive a majority of people as thinking, e.g. in this latest post:

However, again, what I find in the real world is a tremendous number of people—far and away the overwhelming majority—who think that "static typing" means "mandatory type annotations that relate entities in my code to machine types, maybe, and sometimes even that doesn't work."

...

That doesn't mean DLs are somehow better in any sense other than being available, and I think that's the thing that irks me most: what I perceive as a tendency on the DL side to translate "compromise" into "advantage." Again, I'd also care a lot—a lot—less if I didn't keep encountering the "static typing can't do X" myths.

I'm raising technical points which I believe to be applicable to dynamic languages; you're objecting because many dynamic language users you've apparently encountered, make use of such points in ways you don't like. You admit as much: "So you're seeing a limited range of my thinking on these subjects."

A discussion along these lines is of little interest to me. That limited range of thinking doesn't do you credit, because it leads you to (apparently at least) overlook points, make errors and inconsistent claims, and has even led you in an ad-hominem direction when you don't understand or agree with something I've written.

Unfortunately, I don't feel I can move much for the reason I outlined in my last post: whenever I do, whatever solid ground that I feel like we've finally gotten under us seems to vanish into dust.

Afaict, the last time you felt that, it was an overreaction to something I said which you had misunderstood - my comment about "relatively simplistic" static analyses, which I wrote in a response to a post of Frank's. Unfortunately, up until that point, any solid ground under us was suspect, since it was contaminated by a misunderstanding of the implications of the Univ technique. (The clarification of which represents at least one solid piece of technical progress in this discussion.)

I'm short on time right now, so I'm not going to respond on any technical points at the moment. In my future responses on this, I'll keep your political filter in mind, and plan to ignore or dismiss anything that I don't consider to be relevant.

_|_

Rather than continue to respond point by point, let me just say that I think I've been very clear when I've believed you to be in error rather than the DL community at large to be in error. I also think I've been sufficiently quick to acknowledge when I've been in error.

And finally, in my last post, I got absolutely clear about where we are: SLs result in me getting errors; DLs result in customers getting errors. The slider on that observation is some form of partial evaluation. Everything else has been a distraction, and I'm sorry for that.

While generally sharing the

While generally sharing the ideas expressed by Paul, I agree with your point that he is sometimes outflanking you instead of directly addressing the issues you rise.

I still hope you both will continue the discussion, which already is quite fruitful.

We should probably amend Godwin's law to include "ad hominem"... And excuse me for interrupting.

On Topic

Andris Birkmanis: While generally sharing the ideas expressed by Paul, I agree with your point that he is sometimes outflanking you instead of directly addressing the issues you rise.

I apologize; that's not my intention, as I hope my last post made clear.

I remain frustrated, however, because I feel many of my points have been evaded as well. To name one recent example, I still don't know what the DL response to TyPiCal is, or even what it could be. The thread is supposed to be about "Why type systems are interesting," right? If I had to offer a capsule summary answer that has evolved from this discussion, attempting to leave aside the heat and keep the light, it would be that "Type systems are interesting because as they evolve we learn that we can characterize things statically that we believed could only be characterized dynamically, which leads to increasingly-good safety characteristics over time." There. No slam at DLs or anyone in that community, and I challenge anyone to contradict the observation.

I remain frustrated, howeve

I remain frustrated, however, because I feel many of my points have been evaded as well. To name one recent example, I still don't know what the DL response to TyPiCal is, or even what it could be.

This seems like an example of my objection about who you're arguing with. I have no idea what the "DL response to TyPiCal" is, nor do I particularly care. I also haven't really seen much input from any DL community here. Or are you saying you're interested in my response?

But, after your latest long post, I believe I have a better understanding of some of your positions, so am in a much better position to respond substantively on the issues that I am interested in, and will do so tonight.

Hopefully on topic, too

I apologize; that's not my intention, as I hope my last post made clear.
I happened to post concurrently with you, so I composed that before reading your last post. I hope my post was taken with a grain of salt - I tend to post stupid things, just cannot control myself :)
"Type systems are interesting because as they evolve we learn that we can characterize things statically that we believed could only be characterized dynamically, which leads to increasingly-good safety characteristics over time."

I think something is lost. One thing I liked in the thread was "partial evaluation slider". Relationship between multi-staged programming and (dependent) type systems always made me curious.

Sliders Are Fascinating!

Andris Birkmanis: I happened to post concurrently with you, so I composed that before reading your last post. I hope my post was taken with a grain of salt - I tend to post stupid things, just cannot control myself :)

Heavens, you're not the one with anything to apologize for; I am.

Andris: I think something is lost. One thing I liked in the thread was "partial evaluation slider". Relationship between multi-staged programming and (dependent) type systems always made me curious.

Quite right! I was attempting to answer what seems to me like a rather narrow question: why are types systems interesting? Here I am indeed taking "type systems" to mean "the subject of type theory," and I think we've well established that that means "static type checking."

However, I think we've also well established, largely thanks to Anton's patience, that that's not all that there is to say about how languages in general work, or even about how all languages use types. I'm very firmly with you; I'm intrigued by Epigram and MetaOCaml, and I want to see how far we can go with this. Since I used to be in the game industry, I find myself wondering if some kind of multi-staged programming environment couldn't be used to, e.g. craft a game engine that does a lot of PVS construction, pixel and vertex shader specialization, etc. at scene load time—that is, literally generating customized CPU and GPU code at runtime based on the loaded scene. It would be especially good if this could be done linearly, taking advantage of asynchronous I/O so as to minimize the impact on perceived load times.

I still see at least one serious disjunction, though, which I've been using TyPiCal as the "poster-boy" for: a type system that can prove the absence of deadlock/race conditions in my code at compile time is huge, and as far as I can see has no conceptual analog if that checking is done at runtime. Similarly, information-flow type systems like Flow Caml or the information-flow aspect of TyPiCal are huge advances in security, although I feel strongly that there's a necessary dynamic component of them that dependent types are almost certainly required in order to address (basically, it makes no sense to talk about a "Principal" in these systems without being able to talk abour their dynamic behavior, unless the only thing you want to be able to say about principals is that they are "trusted" or "untrusted" in a global, static sense).

So I maintain that type systems are interesting because you can do seemingly-impossible things with them. However, Anton's point that I have been arriving at that conclusion through a process that has ignored real non-static value, e.g. his executable denotational semantics for Scheme, SoftScheme, MrFlow, partial evaluation, multi-stage programming, etc. is fair, and if we wanted to start a new thread to talk about multi-stage programming/partial evaluation/dependent typing, I think that would be a great thing.

let a thousand flowers bloom

I still don't know what the DL response to TyPiCal is, or even what it could be.

I wouldn't dare speak for the "DL community" either, but I would just point out that types are one kind of static analysis, and that there are in fact many kinds of analysis. The people I know who use, um, dynamic-- er, untyped-- er, unityped-- oh, what the hell, Scheme, don't dislike types, they just see them as one form of analysis among many. Anyway, my personal response to TyPiCal was, "neat! Can't wait to read about it."

TyPiCal makes an interesting example. I just had the good fortune of meeting and briefly chatting with Arnaud Venet at NASA, much of whose research concerns abstract interpretation of pi-calculus. I asked him about type systems for pi-calculus, and he mentioned that one of the problems of type systems was that they had trouble distinguishing replicated copies of the same process description, whereas some of his and Jérôme Feret's work apparently deals better with the interactions between replicated copies of processes. Arnaud claimed that in real programs1 this comes up a lot, and since the whole gimmick of pi-calculus is the idea of dynamic process creation (otherwise they would have stuck with older fixed-process models like in CSP), this sounds plausible to me.

All analyses have to make trade-offs. Type systems must make these decisions like any other system. A typical theme is balancing precision with efficiency. I think types are cool, but they're an abstraction of a program, and therefore necessarily leave out some information. Other analyses--some static, some dynamic--look at a program from different levels of abstraction.


1 For an appropriate definition of a "real program" in an abstract calculus. :)

Converging on our divergence

The following (long) post is a nearly point-by-point response to the parent post, which wasn't what I originally intended to write. However, the result does seem to touch on all our important points of difference, so here goes:
Unfortunately, "comprehensive testing" is very much like modern type inferencing with generalized algebraic datatypes: great in theory; virtually non-existent in practice.

I disagree - I'm not interested in whether or not the majority do it - good testing can be done, and is done. I have clients with software QA departments, or with employees who perform software testing, who work with developers to ensure that appropriate tests exist and are performed. I think you've indicated that you're not suggesting that static checks are an appropriate way to make up for a lack of proper testing. So, the way I see it, testing is a given.

On this point, I'd have to agree with the TDD camp that given assertions and reasonable test practices, type errors tend to be caught long before production. That's why I earlier proposed that one of the concrete benefits of ordinary static checking (as available in non-research languages) is simply the time saving, of finding type errors during development without having to run tests.

The question becomes one of how you know you've tested all possible execution paths with all possible dynamic values. To the extent that such knowledge can be automated, the problem is isomorphic to static type-checking coupled with whole-program global flow analysis. Once again, I think we agree that this is a good goal to strive for,
This seems to be a basic point of disagreement, actually. I don't think testing "all possible execution paths with all possible dynamic values" is nearly as important a goal as you seem to think, in most real systems. You may be concerned with measuring how good coverage is, but is there a business problem you're trying to solve here? I don't see customers concerned about getting too many type errors after a system has gone into production. Again, I've more commonly seen that concern expressed during development.
but realistically, I've never seen it in practice, and I don't even know any automated test-generation systems bold enough to claim that they exercise all code paths with all possible dynamic values.
If a project needs that level of assurance, a statically checked language would certainly make sense. It wouldn't seem to make a lot of sense to expend that kind of effort on tools for DT systems.
Yes, Ariane 5 is an interesting example for a number of reasons: one, as an example of relying on an inadequate type system, in this case specifically one that allowed type conversion even when it meant loss of numeric precision. Two, because it showed clearly the dangers inherent in not enforcing type safety across module boundaries coupled with testing of modules independently but not in conjunction. Finally, it dramatically underscored the importance of disallowing uncaught exceptions and prompted the research that led to Type-based analysis of uncaught exceptions.
This could be an entirely different discussion. Regarding disallowing uncaught exceptions, you may have seen discussion in the Java world about how the exception enforcement tends to lead people to effectively stub out exceptions, sometimes rendering them worse than useless. Many developers will subvert restrictions which they perceive as in their way. Of course, if you're developing critical code, you'd better have practices in place to ensure that you catch such bad behavior; but I think it's far from clear whether "disallowing uncaught exceptions" is the best way to do that. At the very least, that's situation-specific. I mention this because it's related to the issue of perceived restrictions in statically-checked languages - if developers feel they have to program around things, that can be a concern.
What I find astonishing is the notion that fewer properties proven at compile time is somehow an improvement over more properties proven at compile time.
I don't know anyone who says that. What you can reasonably say is that if proving properties at compile time has costs, you have to balance those costs against other factors.
My observation was that the specific claim of being able to write code in a DL that "mostly works" somehow shows that the type scheme "has integrity" has no grounding, not that a specific DL's type scheme necessarily has no grounding
Programs in any language "mostly work" by definition, unless they've been proved correct, beyond just their handling of types (and of course, the proof has to be proved correct, etc.) Programs in DLs "mostly work" with respect to their type scheme because, using a latent type system, they could have latent type errors. But a working program with latent type errors still has a type scheme with integrity, in my judgement. There are systems that run for years without exhibiting bugs and then suddenly exhibit a bug due to an unexpected set of inputs - and such bugs aren't necessarily type-system related, at least with today's type systems.
So one of my points is that it isn't the DL perspective that needs promotion; the DL perspective is the de facto winner in the marketplace.
I see that mainly in certain segments of the market, such as open source web application development (e.g. the "LAMP" platform). I don't see e.g. aerospace or embedded applications moving to DLs, and I don't even see corporate application development moving in that direction - instead, Java and presumably soon C# are big players. In those contexts, the benefits of static checking are largely recognized and believed in, even given the limitations of the type systems of those languages. The static checking issue certainly isn't the point that's stopping widespread adoption of more formally-based languages here.
As a free marketeer, it galls me to have to say that that's not because of inherent benefits of DLs, but due to market ignorance. I would just like to see the market better informed so that more informed choices can be made.
I think you might find that "more informed choices" doesn't translate to "agreeing with your position", simply because there are so many other dimensions to consider.
I'll know we've arrived when I hear "we chose not to use Epigram on this project because..." and the reason isn't "because no one knows it."
The reason right now is, no-one's commercialized it. As a free marketeer, you ought to be familiar with the long and uncertain road from invention to successful product.
Besides, no one's arguing theoretical perfection. All that's being argued for is a larger set of correctness guarantees before the software ever leaves the shop.
To support that argument in a free market, you need to show the connection between static checking and economically valuable results. I don't think that case is as obvious as you seem to think, and it's a notoriously difficult kind of case to make. If it's not made very solidly, then it's easily countered from the other side by claims about e.g. productivity, flexibility, time to market, etc. Who's right? Does it matter? Incumbent languages will have an edge in this sort of thing. Understand I'm not taking any side here, but I'm saying I haven't seen strong arguments along these lines from either side.
Except, of course, that in statically-checked languages, [the types] aren't first approximations: they're proven principal types, which means that you know, at compile time, exactly what operations can and cannot be performed on them.
Which is precisely where restrictions and trade-offs arise, of course.
given that we've come to an agreement that ST and DT are dual up to the state of the art in type inference, it's not clear to me at this point what's left to say.
I don't agree that they're dual in that sense. I previously commented about the duality of static proof against dynamic behavior. What I'm saying about DT is that it allows the expression of complicated, messy, difficult-to-statically-check, but nevertheless useful type schemes. Of course, you can transform such programs into statically typed programs, but if they're not written with static typechecking discipline in mind, they require some global reorganization, and may require more effort and discipline on the part of the programmer. Even if that effort has a payback, it doesn't always matter enough.
OK, then we really are back where we started: the level of assurance that you achieve in DLs is strictly less than the level of assurance that you achieve in SLs modulo your use of a soft typing system.
More generally, replace the latter phrase with "modulo your use of other testing and assurance techniques". The question then is how important any resulting difference is, in practice, and in a particular context.
In DLs, the proof burden is on the programmer on a program-by-program basis. Again, this is modulo soft typing/partial evaluation. If you do that, by definition you end up in what we've been calling the "static typing" realm.
Soft typing is just an aid - the "actual" type schemes of the program in question are typically more complex than a soft typer can infer. If one simply uses soft typing as a replacement for static typing, little if any advantage is gained, other than the ability to run programs with known flaws in their type scheme.
I think you've misunderstood me. My comment was that there are a lot of domains that have given in to sour grapes with respect to expecting to be able to prove, or to have a proof implicit in the fact that their program compiles successfully, certain properties of their program, i.e. they say "It's not important for us to prove that" when what they really mean is "We don't believe it's possible/feasible to prove that."
What I'm saying is that in many contexts, the importance of static proof is not paramount.
I think TyPiCal is another great example. What on earth could a DL possibly do that would be isomorphic to statically proving lack of deadlock/race conditions?

At present, I think this question is entirely orthogonal to the question of the utility of static proof of ordinary type usage in existing, non-research languages. It's evidence that static proof is an important and interesting area, but it's difficult to factor into practical decisions today.

When I originally asked why static proof was important, I was thinking in terms of benefits that you could point to right now, that a developer could reasonably take advantage of by changing languages, and without having to use experimental research languages with no community. (I perhaps might have saved some trouble if I'd made that much clearer up front, and for that omission I apologize.) My conjecture was that the practical, pointable-to benefits were mundane ones, and I mentioned two: convenience/time-saving and something which could be described as better support for modularization.

In a way, and I mean this very loosely so don't take it too seriously, you're trying to sell a Ford Taurus by pointing to the jet-powered flying concept car that's demoed at the annual auto show. That might work on some people, but on it's own it's not so convincing - you at least have to show how the Taurus is better than whatever the customer's already driving.

Anton: "I've given some hints as to why DL's are, in fact, more expressive in a particular dimension, namely that of their latent type systems. Should I expand on that?"

Please! The references to the executable denotational semantics for Scheme and the soft typing systems were an excellent start.

I don't have anything else as formal - what I was going to say in this area has to do with the flexibility of latent type systems. I was originally going to write this up along with a refutation of the Univ argument, but Kevin and then Frank beat me to that. I'll write something up over the weekend.

BTW, I didn't particularly intend my Scheme semantics example as part of any argument, other than that you specifically asked for formal semantics of DLs.

Anton: "I'm claiming that (competent) humans are capable of developing type schemes that are useful, but that can't currently be inferred mechanically, without major hints to the inferencer; and even then, these schemes aren't necessarily fully statically checkable - for example, they may require operations like downcasting."

Good conclusion; bad example. There are plenty of decidably-inferred type systems with subtyping. A better example would be, e.g. a dependent type system with partial functions such as Cayenne's. But I must be misunderstanding you; if you've gone through "Types and Programming Languages," you know all of this and more besides, so now I fear that I'm once again missing something in what you're claiming.

I simply used downcasting as an example of a case where, regardless of how decidably one can deal with it, there are situations in which you don't want it to be decidable - where you want to downcast something without any static check being able to tell whether what you're doing is valid. This goes back to the original relaxing vs. constraining example that Frank questioned. I'm saying that relaxing can be good.

What I can't imagine stipulating—what doesn't even seem like it can be made coherent—is the notion that arbitrary code written in an arbitrary DL can somehow be said to be on the same-or-better reliability/safety footing as arbitrary code written in an arbitrary SL.
I didn't suggest that. But I am saying that given code developed with good practices by good people, any difference in reliability/safety is not necessarily important, depending on the context. (Although "good people" might ask for some type checking for the kinds of reasons I've mentioned.)
Type theorists are the ones doing the work on "existential types," "phantom types," and "dependent types:" more generally, the rubric of "generalized algebraic datatypes," which, as you know, combine types and terms in interesting ways. Cayenne isn't fully statically checkable; heck, the compiler itself may diverge! Epigram, as we've all recently learned, allows dependent types but only in the context of total, vs. partial, functions, and makes a major point of assisting the programmer in understanding type inference and providing annotation as necessary. So once again, you're asserting an absolute position on the part of type theorists that isn't warranted.

To illustrate my point, let me say that these are all wonderful and exciting things, but they don't move us any closer to a better type system in, say, Python. As I've been saying, DLs do have (latent) type systems, not just unitypes. The static-checking position starts based on the assumption that anything that can conceivably be statically checked, should be, and only after statically checking everything in sight, does the focus shift elsewhere.

This focus isn't compatible with work on the type systems of DLs - rather, the general idea is that ST should simply replace DT, and everyone should just get on board.

My opinion: it's never gonna happen.

Of course, STLs will be successful - heck, a bunch of weak ones already are. But until the sorts of fuzzy factors we've danced around, and which drive [some] formalists crazy, are acknowledged and addressed, people will continue to implement and use DLs. You can look at that and see an uninformed market, or you can look at that and see something that you haven't fully understood yet. I place myself in the latter category, with a strong sense that there are interesting things there that won't necessarily be "solved" by static checking, even if they can be expressed in that way.

Now, I can fully understand why producing, say, a version of Python with a more formally specified type system, and with some kind of soft or semi-soft checking, would be utter anathema to any self-respecting type theorist. But that doesn't mean it wouldn't be useful. In fact, I think we're ultimately likely to see things along these lines happening the way continuations have spread into languages like Parrot, (Stackless) Python, and Javascript. Then at some point, the research community may be forced by simple ubiquity to pay attention, the way they have been with e.g. Java and the JVM.

Conversely, apart from your executable Scheme denotational semantics, PLT's reduction semantics for their "beginner" Scheme, VLISP, SoftScheme and MrSpidey/MrFlow, I don't see much coming from the DL community on what claims they can actually make.

That's a direct consequence of my point - everyone's off working on the most statically-checkable systems they can, or on what's left over when there's nothing left to static check (minor hyperbole). Seriously, point me to the academic DL community - as you suggest, aside from PLT, it doesn't seem very big. You might believe that's the rightful state of affairs based on the technical merits; I don't believe that.

Funny; in private correspondence I recently asked an acquaintance who's working on a dependently-typed language design and used Fermat's Last Theorem as an example of something that wouldn't type-check why it wouldn't! He did offer a definition of Pythagorean triples that would type-check, though. :-)
OT, but I'm curious if there's something special about the latter? It's a fairly classic tutorial example, e.g. in Haskell:
pythagTriples = [(x,y,z) | z <- [2..], y <- [2..z-1],
		   x <- [2..y-1], x*x + y*y == z*z]
In any case, it's not that I don't know that dynamic proofs are possible and even necessary in some domains; it's that dynamic languages generally don't do them, and even when they do, the result is an error in the process that the customer is running vs. the process that I was running when I wrote the program.
You earlier talked about holding people to a higher standard, wanting hard evidence. Do you have any in this area? I'm not convinced that type errors are such a problem in production systems (ignoring the "true mission critical"). OTOH, I'm quite convinced that type errors can waste a lot of time during development.
Right—an enormous aspect of my thesis being that the DL side has a habit of saying "static type checking/inference can't do X," where "X" is an element of a set whose membership keeps shrinking on them.
Not my battle.
That's admittedly a disjoint observation from "DLs allow us to express things that we don't yet know how to formalize."

Much more interesting. Or, "can't yet do 'conveniently' in an off-the-shelf language".

I think, in context, it's far from blatantly false; I even offered the notion of a probabilistic model of correctness that I think would serve DLs well in the absence of even soft typing, let alone something stronger such as model checking or partial evaluation. But the conjunction "mostly correct" and "shows that it has integrity" has no grounding in the absence of some model against which to interpret "mostly correct." Does it have intuitive appeal? Of course it does! It's the basis of every piece of C++ I write for my employer. Does the foundation have to be probabilistic? Of course not, nor does it have to be full-on self-applicative partial evaluation. But it does have to be stronger than intuition.

So do you consider you shipping applications to be correct based purely on intuition, or what? Again, I see a mistake in an underlying idea here, which goes something like "if it's not formalized, we know nothing worthwhile about it" (feel free to recharacterize).

You want to measure reliability or correctness? Well, you still have to do that with statically checked programs. So if we don't have good enough means for making such measurements, that's a software engineering flaw which affects both STLs and DTLs.

Absolutely. That doesn't mean DLs are somehow better in any sense other than being available, and I think that's the thing that irks me most: what I perceive as a tendency on the DL side to translate "compromise" into "advantage."
I understand what you're getting at, but "available" is an absolutely unassailable benefit in any practical context! And the "compromise" is in a different dimension from the benefit, a dimension whose importance isn't always considered paramount, for reasons which I consider valid.
the issue isn't really what you can check and what you can't; we both agree, the more the better and it can't all be checked statically.
I'm not completely sure it's what you meant, but just in case, I wouldn't agree "the more [static checking] the better" - that's only unequivocally true when everything else is equal, which it usually isn't. In which case, other factors come into play.
As I wrote elsewhere, I recall refusing to learn Cayenne upon reading that type inference was undecidable in it, all while having used C++ with its Turing-complete template system and "template instantiation depth" limiting flags for years! Most embarrassing.

Preconceptions can be such a barrier to learning. That's why I'm a little suspicious of a position in which the statically checkable choice, if available, is always better.

Peple break rules that no

Peple break rules that no one folows; therefore you should not force anything that is too restrictive like static typeing.

Also C# is not the next big thing in enterprise software. it is probibly going to be Ruby.

sorted-list type

Any opinions on which language would be the easiest/best to use to implement the sorted-list type? (mergesort :: list -> sorted-list)

Automatic Algorithm Inference

  • Premise A: Premature optimizaion is the root of all evil.
  • Premise B: Specifying an algorithm is an optimization (compared to the ephemeral thoughts of the Mind).
  • Conclusion: We should avoid writing algorithms.

Ok, I've been reading up on Epigram a little bit, and it dawned on me that maybe it hasn't gone far enough. Supposing we had a sufficiently advanced type system, could we instead create programs by writing down type signatures, and have the complier automatically infer algorithms for us? Of course the complier might not always have enough information to craft an appropriate algorithm, so we would have to provide some code annotations. And if efficieny is a concern, we could override the compiler's code with something better. Can anyone point me in the direction of papers/documentation which discusses this? Is this a completely crazy idea, which only leads to a meta-circular nightmare where you'd want a meta-type system to check your type system? Would dependent types break the heirearchy and allow the type system to analyze itself? Would the type system then just be a higher level language?
Thoughts?

Theorems for free!

Theorems for free! are somewhat related to what you described. Just be sure to read Tim Sweeney's comment, but especially Frank's answer. I am still unsure, what was agreed upon :( (except the need of referential transparency for the theorem to work)

The LtU post has a broken link, use this, or just google for this title if you prefer PDF or other format.

A meta-type system to check your type system

Is this a completely crazy idea, which only leads to a meta-circular nightmare where you'd want a meta-type system to check your type system? Would dependent types break the heirearchy and allow the type system to analyze itself? Would the type system then just be a higher level language? Thoughts?

I've just been learning about the area myself, but there's been quite a bit of work in logical frameworks, which are described as formal meta-language[s] for deductive systems. You can prove all kinds of properties about programming languages and logics with Twelf, for example.

My Favorite So Far Is...

MetaPRL

See especially Compiler Implementation in a Formal Logical Framework. You can see the specific results of that research at Mojave.

The use of logical frameworks for programming language design really consists mostly of taking the Lambda Calculus seriously. This is often cast in terms of Barendregt's Lambda Cube, where the simply-typed Lambda Calculus is at the origin of the cube and the axes are assigned various popular extensions, so the vertices represent the simply-typed Lambda Calculus extended with whatever features lie along the axes that connect the vertex to the origin. The combination of all of the popular features is called "The Calculus of Constructions." It turns out that folks have integrated the Calculus of Constructions and Martin-Löf type theory, calling that the "Extended Calculus of Constructions." Martin-Löf is called an "intuitionistic type theory" and has close ties to intuitionistic logics, as you would expect.

MetaPRL, the logical programming and verification environment that I'm trying to learn :-) supports multiple theories. One of the most useful for specifying programming language semantics is the theory named "itt," which stands for "intuitionistic type theory," which is an extension of Martin-Löf type theory. There's a nice, albeit brief and very dense, overview of it at <http://cvs.metaprl.org:12000/metaprl/system/mp-itt.html>.

One of the things that makes A Visual Environment for Developing Context-Sensitive Term Rewriting Systems exciting is precisely that it could serve as a visual prototyping tool for the development of the inference rules that would allow a tool like MetaPRL to do its thing, where "its thing" might even consist of producing an honest-to-God compiler for the language being designed. Exciting stuff!

Is Epigram too short?

could we instead create programs by writing down type signatures, and have the complier automatically infer algorithms for us? Of course the complier might not always have enough information to craft an appropriate algorithm, so we would have to provide some code annotations

I think, programming in Epigram is quite close to that, except it's interactive: you write down a type signature, the computer infers what it can, leaving some choices for you to make. You make some choice, it elaborates further. When all choices are done, you have a program written interactively by you and Epigram - your contributions interleave.
IMO, this differs from traditional type annotations as much as driving a car differs from programming a bot to drive a car. Subconciously I feel this is related to multi-staged programming, but I cannot put a finger at that.

PS: a process of programming in Epigram also resembles (to me) processes in collaborative environments (including LtU).

Your subconcious feeling migh

Your subconcious feeling might be explained in Edwin Brady and James McKinna's:
Phase Distinctions In The Compilation Of Epigram
a new paper submitted to POPL 2006. It states in its introduction:

"The blurring of the distinction between types and values means that the typechecker must do some evaluation at compile-time; correspondingly, it means that there are values which exist only to ensure type correctness. By identifying such values, we can recover an erasure semantics."

Looks like a few letters dropped out of there ...

Try http://www-fp.dcs.st-andrews.ac.uk/~eb/writings/epiphase.pdf instead. I trust y'all to be able to add the "me" that should start the tenth quoted word.

Prolog

Supposing we had a sufficiently advanced type system, could we instead create programs by writing down type signatures, and have the complier automatically infer algorithms for us?

Prolog is something like this; if you think of a goal as a type, then you can think of the answer to a goal as coming from a (particular instantiation of) a program with that type. However, these programs are cut-free proofs, which means type-theoretically that the correspond to normal forms (values); that is, there is no pointing in running them.

Arrow also does something similar: given two types which are isomorphic in precisely one way, it will deduce the unique conversion function between them.

P.S. Hint for Safari users: I have noticed, when quoting messages here on LtU in Safari, that often the Copy (Command-C) command doesn't work, especially for multiline text, which drives me crazy. I just figured out a workaround, which seems to work in at least some cases: instead of drag-highlighting, releasing and then hitting Copy, hit Copy while dragging, that is, don't release until after Copying. If that doesn't work, try backing up until the highlighted section is small enough that it does work, and drag the cursor forward hitting Copy every so often.)

I think that's a valid point,

I think that's a valid point, but part of the problem is that it's not a question of what's "impossible" to statically type, but what's convenient, and there's obviously an element of subjectivity there.

Hmmm I should've said "impossible". So let me rephrase my question: What is inconvenient to statically type and why couldn't we build a ST language that handled well this case. My primary hobby is PL design and I want to see what are the minimal set of characteristics necessary to create an expressive (in Felleisen sense) and convenient (in your sense) ST system. For example I think I know how to type the Groovy example (i.e. lists with objects and maps) recently discussed and arbitrary message delegation, but I have no idea on how to type Smalltalk Object>>become:.

Also, as I mentioned in one of my posts, the type-theoretic statically checked languages tend to impose a kind of type-oriented programming style, which may not always be desirable. This is also subjective, of course, but there's nevertheless a tradeoff and a choice there.

What is a type-oriented style?

Also, some of the current benefits of dynamic languages are not in the languages themselves, but in what current realistic implementations are capable of in terms of dynamic loading and so on. As a silly example, how come you don't run Haskell on your Palm? :)

I was trying to Teach Myself Scheme in Fixnum Days. OTOH I agree that Scheme is much simpler to implement than Haskell 98 + usual extensions (e.g. MPTC, FunDeps), but that's because Scheme is a minimalistic language, not particularly related to ST issues. For example how come we can't run Perl (AFAIK) on the Palm OS?

I think that would require some careful work, because many of the DT benefits are admittedly "fuzzy" ones - but that doesn't necessarily mean they're not real, it may just mean they're more difficult to formalize.

Let's do it! As long as there isn't honest communication between the two camps we are going to have half-baked languages and libraries. IMHO every non-esoteric language has some benefits otherwise it wouldn't be used. The trick is trying to reduce the benefits to a minimal set of orthogonal features that can explain it.

Type Systems for Language Designers

I have a different theory about how type systems help a language. I was looking at the slides for Simon Peyton Jones' Haskell Retrospective where he talks about how lazyness isn't used that often in programs, but it's important because it prevents people from adding evil side effecting features into the language. Lazyness is important for language designers, not for programmers.

In the same way, types make language designers work a lot harder to get the same features of an untyped language, but it helpes them avoid a lot of mistakes. There are good untyped languages, such as Scheme, but Scheme came after 20 years of Lisp mistakes. The Scheme designers realized that dynamic scope, using nil as a boolean and having undefined identifiers return the symbol with the same name were bad because people kept writing difficult to understand Lisp programs. ML chose the right designs for all these problems immediatly without needing to go through revisions because none of those features type well. I don't think this is a coincidence.

Perhaps the same can be said for Haskell's type classes style of OO in dealing with the problems of inheritance. Haskell's system may be more restrictive, but it offers solutions that other languages will probably gravitate towards in the future.

OK, I can't resist...

From <http://www.nedbatchelder.com/blog/200408.html#e20040810T090045>:
After a small project of mine was converted from Python to C#, I have a new-found admiration for Python's dynamic typing, and other cool features like named arguments in function calls. I have come to believe this about static typing:

Static typing prevents certain kinds of failures. Unfortunately, it also prevents certain kinds of successes.

Let the battle begin!

King's Indian Defense

Let the battle begin!

I think it's your move... :)

I believe the usual opening gambit in this situation is the "C# doesn't have a modern type system" defense, but that leads to a draw in 70 messages.

I'll supplement Ned's opening move with a frustratingly hard-to-pin-down "the benefits are fuzzy but undeniable": I do think that the experience of converting dynamic programs to static ought to give one a certain amount of respect for dynamic checking, if one is completely honest about it, thinking about everything that has to be changed, why, and whether it's important. I also think it's difficult to boil this experience down into separate, individual features, since that loses the overall gestalt, the way the features interact (a little like that other discussion where mere vocabulary is focused on at the expense of language).

King's Indian Reversed

Believe it or not, I find nothing to disagree with, and in fact I only posted the link for its amusement value: it is indeed just another example of the danger of identifying "static typing" with "explicit annotations in a relatively weak type system" and adds nothing new to the conversation.

With respect to the second paragraph, it's an excellent description at least of my own lurching about in type theory! I'm satisfied that at some point the research into multi-stage programming will indeed give us the "slider" that we've alluded to in earlier conversations. I remember that slider being one of the original selling points behind Dylan, and I wish more had come out of that. And again, having spent a little quality time with some really nice dynamic tools, like Self 4.2 and its direct manipulation approach to constructing objects, has done a very effective job of reinforcing the value of such systems for me.

So I'm really not trying to reopen the debate; I was really only going for laughs.

As a matter of fact, I recently saw a reference (but have since regrettably misplaced it) to some research that John Reynolds was doing on applying type theory to dynamic languages. If I succeed in finding it again I'll be sure to post it. In the meantime, if anyone else is interested in pursuing it, at least we have a name and a concept to Google for.

[Edit: I had originally titled this "King's Indiana Reversed," in a bizarre Freudian slip. You can take the boy out of Indiana, but you can't take Indiana out of the boy, apparently.]