Why type systems are interesting - part II

The type systems thread is getting ridiculously long. Please continue the discussion here...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Boy that worked well....

I'm responding just to bump this back to the top.
Maybe posters will notice on the second try.
;-)

Recidivism

Anton van Straaten: I disputed this point, and Frank agreed with me: "It's true that Scheme, Perl, Python, etc. have a more concise syntax for untyped programs". That translates to their being more expressive for those programs. That's the state of the art right now.

I have to imagine that you realize that when folks like Frank and I say "expressive," we are using the term formally, whereas your use of "more expressive" here is as a synonym for "more concise," and a major chunk of my point was that Frank and I had indeed already stipulated that untyped languages have a syntactic convenience benefit with respect to untyped code vs. languages like Haskell. Indeed, this argument seems tautological to me.

Anton: I'm saying that until these languages have been demonstrated being used successfully in the scenarios where untyped languages excel (e.g. here), whether untyped languages are a proper subset of typed languages is not very relevant to actual practice.

OK, I followed the link, read the three pages, and can't see anything there that untyped languages do any better than a good statically-typed language. I happen to prefer O'Caml, personally, so when they talk about Twisted and Zope and so forth, I can talk about Ensemble and WDialog and so forth. I'm sorry, but it just seems like every time the dynamic language apologists put forth their position, it reveals that they haven't actually done much research into what's available, today, on the static typing side, nevermind what's on the cutting edge.

Anton: You quoted someone at Cambridge (maybe Peter Sewell, but the link was messed up):

I'm afraid that my conclusion remains what it has been for the past 3-5 years: the real problem is that once those committed to rigor deliver results, they discover that industry still can't be bothered to be rigorous.

Good heavens, no, that wasn't a quote! I said that, and regrettably, it remains my overwhelming impression. Basically I keep seeing a chicken-and-egg problem: folks in industry keep saying that they want static typing proven in the real world; static type theorists keep delivering more and more practical results such as the work on C and UDP or phantom types providing type-safe wrappers around type-unsafe C APIs; we keep seeing adoption refusal on the industrial side, typically with a lot of verbiage about how the issues are technical when in fact they're sociological; and the project failure rate in software development doesn't budge an iota. At some point the type theorist begins to wonder how serious the untyped language apologists are about wanting improvement. At least I do.

Anton: This is more of the attitude that I'm arguing against: because they don't get the answer that makes them happy, some of these proponents of rigor assume the "other side" must be at fault.

Mea culpa, mea culpa, mea maxima culpa. When people like Frank and Brynn and I provide dozens and dozens of links to real tools, real systems, things you can use today, even such mundane stuff as streaming MP3 servers written in statically-typed languages, and some of the other links are to collections like the Caml Humps, which themselves contain hundreds or thousands of other links to real stuff (applications, frameworks, libraries, you name it), and even manage to point to formal work on such prosaica as C and TCP/IP sockets, and it's still dismissed with a handwave, or more likely still, not even referred to in the reply, I am left with no choice but to not merely question my interlocutor's familiarity with the material, but also their commitment to becoming familiar with the material. Certainly there hasn't been anything resembling an effective rebuttal of any of the points we've made; there's just been a lot of repetition of the claim that static type theorists haven't proven the value of their work. Well, if you won't download it and try it, there's really not a lot more to say.

Anton: It never seems to occur to wonder whether there's something that's been overlooked in terms of real-world requirements.

Nothing would please me more than to see a set of real-world requirements from the untyped language side. What we get instead are "How does your static language handle a list that contains objects and a dictionary?" That's not a requirement; that's a tautology.

Anton: It's easy from the academic side to get infatuated with the beauty of a narrowly-focused theoretical system, forget about the hairy issues of integration with the rest of the world, and then blame everyone else when the results aren't universally acclaimed as being useful.

Yeah, that's why we get stuff like Why Functional Programming Matters, Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell, and the recently referred-to Rigour is good for you and feasible: reflections on formal treatments of C and UDP sockets. Clearly, these are just academics unconcerned with the requirements of the real world.

Anton: Muttering about "industry still can't be bothered to be rigorous" is just the easy way out.

But I see no evidence that industry has actually made a good-faith effort to determine whether some of the practical work that's been done in static typing is useful to them or not. Since that's a generalization that can't be proven one way or the other, and at the considerable risk of personal insult, I see no evidence that you have made such a good-faith effort. What I see is the same decade-old handwaving about how the Univ-based code that you write in an untyped language is awkward to express in a statically-typed language. My question is: are you willing to learn how not to write Univ-based code and see if it's advantageous to you? The problem with that is that it might be a non-trivial time commitment to do so. But since we're on the subject, keep in mind that I'm not a CS or PLT grad student; I'm a guy who writes C++ and Java for a living and wrote all of his recreational stuff in Common Lisp or Scheme for well over a decade; I'm very familiar with the benefits of untyped languages. My perspective is very far from academic. I went through the very process that I'm asking if you're willing to.

Anton: Saying that the answer to the need for dynamic behaviors today is to press on with the development of more rigorous typesystems may be a theoretically valid answer - time will tell how complete an answer can be reached this way - but it's of no use to someone evaluating languages today.

Time will not tell how complete an answer can be reached this way; those results are already in. Here's the summary:

  1. Untyped languages are a proper subset of typed languages.
  2. Type inference for generalized algebraic datatypes is undecidable, so in a sufficiently expressive type system, some annotation will always be necessary.
  3. For statically-typed languages to model popular use cases of untyped languages requires a "sufficiently expressive type system" in the sense used previously, so some level of type annotation will be necessary in such languages.
  4. Statically-typed language design has, to date, not prioritized syntax design in such a way as to address the previous point.
  5. By using tools such as hygenic macros, they can.
  6. The converse is not true (untyped languages can't mysteriously gain the benefits of statically-typed languages).
    1. "You cannot procede from the informal to the formal by formal means." — Alan Perlis
As for being useful in evaluating languages today, it clearly is valuable: some of us have made the switch from untyped languages to typed ones and have direct experience of the benefits. I keep waiting to hear from an untyped language apologist who's tried, e.g. O'Caml and ended up at non-sociological reasons for not switching. Now at this point let me hasten to remind the reader that elsewhere I've indicated that sociological issues matter, and that when the technical differences between a good untyped language and the currently-in-production statically-typed languages are epsilon, as I believe them to be, sociological issues dominate. My only complaint is when people refuse to acknowledge that that's what they're basing their decision on, because the outcome then is never to revisit the technology as it improves, because the fantasy is that we're already evaluating the technology as opposed to the cultural issues.

Anton: Something similar goes for systems like Acute, as well as many of the other references you've given. Acute is described as an "experimental" language.

Acute is a better example of your argument than most, because they apparently don't yet have any code to download! So I wouldn't even be comfortable yet calling it an "experimental language."

Anton: So great, in future all these hard problems are going to be solved by advanced type systems.

When I can download them and use them, I stop saying "are going to be." The question then changes to technology uptake rates, and I've also already written elsewhere that I'm aware that the average new-technology uptake period is a decade. I still question whether that doesn't mean that the best strategy for an individual to adopt is to pursue those technologies now so as to be fluent in them when they're mainstream. But that's clearly a different observation than to ask "Why isn't your employer using this today?"

Anton: But that doesn't make, say, OCaml any more attractive, today.

More attractive than what?

Anton: I can't count on Acute's functionality being integrated into OCaml, ever.

Unless, of course, you were to do the integration yourself: O'Caml is open source and the Acute papers are there for the taking. :-)

Anton: Plus there are a dozen other capabilities that have been raised - which language is going to be the winner that ends up having the most useful combination of features?

I've stipulated multiple times that the language that combines all of these features doesn't exist. You seem to believe that this means that no current statically-typed language is worth pursuing, even as an individual vs. a commercial software endeavor. As far as I can tell, this is the only point we actually disagree on.

Anton: One of the most rational things a business can do in this scenario is wait, and to some extent, the same applies to the developers they employ. Unfortunately, that's going to be one helluva long wait, as long as there's so little focus on bridging the gap between beautiful formal systems that solve tough theoretical problems, and making them do something useful as part of other real systems.

I'm not going to refer to the bodies of work in which integration with real systems is the focus, or in which the tough problems aren't just theoretical, again; it's getting wearisome. I will only note that I find this characterization at best grossly unfair; on average, obtuse; and at worst, dishonest.

Anton: But I'm saying there are lots of good uses for type systems, and there are lots of benefits from checking of types, and it is important to educate oneself about type theory. But you have to distinguish between the benefits that can be obtained now, and the benefits we expect these systems to provide at some indeterminate point in the future. Confusing those two is only going to lead to disillusionment - I don't think we want to be tricking people into using languages with advanced typesystems. You earlier said something about honesty in the debate with DT folk - but be careful that in your enthusiasm, they don't end up feeling the same way about you.

OK, well, I'm done. I've stated explicitly when I was being speculative, I've offered concrete examples of validating C and socket APIs, I've pointed to the use of phantom types to make unsafe C APIs safe, I've pointed to region-based memory management to eliminate the need for garbage collection in embedded/real-time systems, I've pointed to streaming MP3 servers just to offer something really mind-numbingly prosaic and non-theoretical, and I've reiterated multiple times that I like untyped languages (I still intend to become fluent in Oz) and that sociological issues not only matter in language selection, they dominate given the current state of implementation. I know neither how to be more honest nor more clear.

Namaste

I have to imagine that you realize that when folks like Frank and I say "expressive," we are using the term formally whereas your use of "more expressive" here is as a synonym for "more concise,"

Which formal sense of expressivity are you referring to? Perhaps the Felleisen sense, described in "On the Expressive Power of Programming Languages". If so, since you say you're using the term formally, you presumably have some kind of reasoning for why you think my use of the term doesn't qualify? Could you outline that?

For my part, I can show the sense in which dynamically-checked languages are more expressive. At a minimum, this will show why Frank's Univ argument is overly narrow, doesn't model the entire picture, and thus ends up being fallacious. Whether that explanation will rise specifically to the level of greater Felleisen-expressivity, I'll have to examine - however, it will at least exhibit some of the symptoms thereof, such as requiring "repeated occurrences of programming patterns" in the statically-checked language.

However, I don't have time to lay out this argument now. I'll try to do so on Wednesday. As a clue, though, my position is inherent in the first paragraph of my original post in this thread: "...even in the most dynamic language, you always know something about the types of the values you're operating on." IOW, "untyped" is not as untyped as you've been led to believe, and you have to take that into account when comparing languages. The Univ example doesn't.

As for the rest of your post, I find it disappointing that we've gone from your earlier post "100 Proof vs. 100% Proven :-)", where we were in substantial agreement - I hadn't gotten around to responding to that yet, since it was so long - to your latest post, which is rather less constructive.

Part of the problem seems to have been that your original response to my response to Frank is based on your misinterpretation of the applicability of the Univ example. Until we've cleared that up, we can't really get into the advantages of dynamic languages. Because of that issue, the six-point summary in your latest post goes wrong starting with point 1 (to be demonstrated), and is also wrong in points 4 (because syntax is not the point), 5 (ditto), and 6 (are you familiar with soft typing systems at all?) But I'll deal with that later.

I'm very curious about one point:
Basically I keep seeing a chicken-and-egg problem: folks in industry keep saying that they want static typing proven in the real world; static type theorists keep delivering more and more practical results such as the work on C and UDP or phantom types providing type-safe wrappers around type-unsafe C APIs; we keep seeing adoption refusal on the industrial side, typically with a lot of verbiage about how the issues are technical when in fact they're sociological; and the project failure rate in software development doesn't budge an iota. At some point the type theorist begins to wonder how serious the untyped language apologists are about wanting improvement. At least I do.

OK, so can you explain why you wrote earlier, "To most of the team, the idea of changing languages would be laughable, but quite honestly I can't tell you how much better off we'd be developing in Erlang or Oz or Alice. And yes, I'm mindful of the fact that Erlang and Oz are dynamically checked."

Why are you suggesting dynamically checked languages in this context, given all the benefits of static proofs that you've been enumerating, and your stated preference for OCaml?

Finally, I guess I have to deal with this:

My question is: are you willing to learn how not to write Univ-based code and see if it's advantageous to you? The problem with that is that it might be a non-trivial time commitment to do so. But since we're on the subject, keep in mind that I'm not a CS or PLT grad student; I'm a guy who writes C++ and Java for a living and wrote all of his recreational stuff in Common Lisp or Scheme for well over a decade; I'm very familiar with the benefits of untyped languages. My perspective is very far from academic. I went through the very process that I'm asking if you're willing to.

I have gone through that process. I don't like to characterize myself based on what languages I develop in, since I've used quite a few throughout my career, both statically- and dynamically-checked. I'm not a recent convert to either typing discipline. C++ and Java have been prominent in my professional life in recent years - I contributed early on to the transaction API of Hibernate, which I noticed you mentioned a while ago.

Recreationally, I prefer Scheme, mainly because no-one's invented my ideal language yet, and in Scheme it's easy to create languages. I've also developed code professionally in both ML and Scheme. I've implemented various interpreters in ML, including a fairly complete Scheme interpreter, based loosely on the Scheme denotational semantics (which I've also implemented in Scheme). I learned enough OCaml to be able to work with the examples in Pierce's "Types and Programming Languages".

I don't divide the languages up the way you seem to, though: to me, the similarities between Scheme, ML, and all the applied lambda calculi seem much greater than their differences, and I find the type checking issue to be superficial, in a sense. That's related to the point I've been arguing - more on that later.

Metacircularity

Kevin Millikin: See? Staticly typed code in Scheme is no more inconvenient than untyped code in ML. So obviously (with tongue firmly in cheek), staticly typed languages offer no benefits except a convenient syntax for restricted code generation and some builtin checking algorithms.

Even with tongue in cheek, this is an important reminder: metacircularity works! I see an analogue here, which I cannot at the moment make precise in any meaningful way, to the materials I've seen on multi-stage programming. But you're obviously correct to point out that delaying evaluation with lambda is a direct analogue to the difference between "compile-time" and "runtime," and in fact in the presence of, e.g. serializable closures, this notion could be made to map onto user-time in exactly the same fashion that a compiler does.

Following this through to its more formal conclusion, this is really a recapitulation of the study of partial evaluation and specifically Futamura's second projection. But as a Scheme-based language-implementation fan, it's overwhelmingly likely that you know this. :-)

Offline partial evaluation by hand

I see an analogue here, which I cannot at the moment make precise in any meaningful way, to the materials I've seen on multi-stage programming.

I was going to title that post "Lambda: the Ultimate Staging Annotation?", but i usually forget to title my posts.

Re: Point of static proofs?

Daniel Yokomizo wrote:
IME one of the most important sources of problems in large projects (more than 4 people for more than 3 months) is communicating the contracts of programs.
...
This kind of proof is, IMO, quite underrated IMO: we usually forget how many hours we spent getting burned by this kind of problem and the effort used to build the discipline necessary to avoid them.
I agree. This was one of the two benefits "for most systems" of static proofs that I mentioned in my original "Point of static proofs" post: "Better, machine-checkable documentation and enforcement for interfaces between modules." I completely agree that "we usually forget how many hours we spent getting burned by this kind of problem". The same goes for how many hours are spent finding bugs by running tests instead of just running the compiler. So that's a solid, widely applicable benefit of static proofs.
There are cases where the effort required to correctly prevent the [security] violation at runtime is much harder than the statically inferred properties. For example assume program_b has a tighter policy than program_a:
That's an interesting point, thanks.
Hey, good idea! Except if you move type checking from compile time to runtime, you lose your static proof capability.
That's not true. If we use a staging system we can ensure that every program constructed in stage N is type-safe before executing it in stage N + K.
Another good point. I was just pointing out (a little indirectly) that Paul might be implicitly acknowledging that e.g. dynamic behavior can be traded off against static proof.
...The answer, I think, is not enough to rise to the level of "must have". ...
Well the proposition here is that some kind of statically verifiable behavior is better than none. In DTLs we have X possible classes of errors and in STLs we have Y possible classes of errors where 0 < Y < X. So STLs provide value even if some of the errors in DTLs are still possible.

But "some kind of statically verifiable behavior" may not compensate for what the dynamic language user perceives as being lost. When a major change is being contemplated, or even just when a tradeoff is involved, it's easier to pick a winner if the scale tips heavily in one direction or the other. It certainly seems as though statically-checked languages may fit that criteria in future; but I'm saying that right now, that choice isn't as clear to everyone.

In addition, I've mentioned my concern about the narrow focus that tends to be introduced with 0 < Y < X, where it's tempting to think that the goal should be to minimize Y. For some kinds of system, that makes sense. I don't think it does for every system, though, and I believe that in some future well-designed type systems, Y might be a lot bigger than many current static-checking advocates would be comfortable with.

Power Cycle; Start Over

Anton van Straaten: I was just pointing out (a little indirectly) that Paul might be implicitly acknowledging that e.g. dynamic behavior can be traded off against static proof.

That's absolutely correct. I think the way that I tried to frame this earlier, albeit again in an unnecessarily impatient and inflammatory way, was to observe that if there were no necessary dynamic behavior of a program, you'd simply get the final answer from your compiler, like in those factorial-as-C++-metatemplate programs that are floating around out there.

Anton: But "some kind of statically verifiable behavior" may not compensate for what the dynamic language user perceives as being lost.

This is another great point that I wish to ensure readers know that I recognize as valid. In rereading much of what I've written, it seems to me that I've allowed myself to overlook at least one crucial point on the dynamic language side, which is: the programmer knows what he means. With his dynamic language, he's been able to express what he means sufficiently well that his program works to the extent he's able to observe it in a sufficient number of cases. None of the dynamic language guys have ever claimed to be able to observe all cases. They've all agreed on the need for extensive testing and agility in revision. Conversely, sometimes when we try to express what we mean using a statically-typed language, we either get sometimes-arcane error messages and/or we find that we have to annotate the code with a sometimes-arcane type name, and in those cases the type and its name don't necessarily obviously map to the problem domain. These are true things that I had not intended to imply were not the case.

Booting up

...the programmer knows what he means. With his dynamic language, he's been able to express what he means sufficiently well that his program works to the extent he's able to observe it in a sufficient number of cases.

Absolutely. Dynamic languages give you the freedom to write code in which the type system tail doesn't have to wag the semantic dog.

In the statically-checked functional languages, the base model for writing code is a sort of "type-oriented programming (TOP)" style, which is analogous to OOP. If you want to write all your code in a single strictly-imposed style, both OOP and TOP are reasonable choices (other things being equal, which of course they never are). BTW, in this context, OCaml's appeal is obvious, since it does a reasonable job of combining the two models, so would seem to be a best of both worlds.

But both OOP and TOP are fundamentally constraining, although you get benefits in exchange for the constraints. Earlier, Frank entitled a post "no constraints", but he doesn't really want that for TOP, and nor do I. I want TOP, more or less as currently formulated (but better :), available at the end of the typing spectrum. But it's not necessary, and for me not desirable, to operate at one end of that spectrum all the time.

The point about being "able to express what he means sufficiently well that his program works" touches on the human factor that type theory can't possibly account for. What it comes down to is that mental type inference is actually a pretty easy process for humans, in those "most cases" I've been talking about. Dynamically-checked programs are syntactically typed, they're just not statically checked by machine. They have to be syntactically typed, to be able to write useful code. We have to know something about the types of the values we're operating on, in almost every situation.

The "Univ" trick can't account for our mental model of these programs, because it misses the fact that for all terms in an allegedly "untyped" program, the programmer has some assumptions about types, and those assumptions are borne out by the fact that the program works. That's strong evidence for the correctness of the program's assumed type scheme - not as strong as static proof, but strong enough for most practical purposes.

I've started writing a more thorough debunking of the Univ type meme (which is quite prevalent in the syntactic typing community) - I'll probably post that tomorrow.

Can't we have fully static proofs even with dynamic behavior?

[Anton:] ...dynamic behavior can be traded off against static proof.
[Paul:] ...observe that if there were no necessary dynamic behavior of a program, you'd simply get the final answer from your compiler, like in those factorial-as-C++-metatemplate programs that are floating around out there.

Can't you have a function whose properties are all statically provable but still have its dynamic behaviour be useful?

For example, even if you can prove that a sorting function sorts correctly, it is still not feasible for the compiler to convert it into a lookup table with all input/output pairs.

Proofs at Different Times

Kannan Goundan: Can't you have a function whose properties are all statically provable but still have its dynamic behaviour be useful?

Absolutely, with VLISP: A Verified Implementation of Scheme as an existence proof. So it's interesting—and I think this is a big part of what Anton has been trying to convey, and I've been slow to understand—to contemplate the difference between the kinds of proofs that can be "encapsulated," if you will, in types and compile-time type checking vs. the kinds of proofs employed in something like VLISP, where you've proven that your language's semantics have the properties it claims to have, albeit the majority of those semantics in Scheme's case are, in fact, dynamic.

It's interesting to reflect on this; just earlier today I called out Scheme as an obvious counterexample to my general claim that dynamic language lack rigor. Maybe what I really mean is that the popular scripting languages (Python, Perl, TCL...) lack rigor. On the other hand, as Abadi and Cardelli demonstrated so well with their Object Calculus, sometimes the practice comes first and the rigor later.

Kannan: For example, even if you can prove that a sorting function sorts correctly, it is still not feasible for the compiler to convert it into a lookup table with all input/output pairs.

Unless, of course, all of the parameters to the sorting function are known at compile time. But your example is a popular one in dependent-typing circles, where the discussion of a "sorted" type that expresses the order invariant of the values is feasible. Of course, one can make the observation that dependent type systems blur the distinction between compile-time and runtime, and that's true. On the other hand, dependently-typed expressions either type-check or they don't, and that still happens at compile-time, AFAIK (little help here, guys: is this true for Cayenne? Epigram? DML? Xanadu)? This is what leads some people to feel that these type systems are too conservative: they're expressive, but some number of perfectly legitimate programs won't type-check in them. The type theorists' counter to this is that rejecting some legitimate programs is preferable to accepting an (apparently) unbounded number of illegitimate ones.

Incompleteness of typing (plus Zelazny, Egyptian gods)

Paul Snively: This is what leads some people to feel that these type systems are too conservative: they're expressive, but some number of perfectly legitimate programs won't type-check in them. The type theorists' counter to this is that rejecting some legitimate programs is preferable to accepting an (apparently) unbounded number of illegitimate ones.

[bold emphasis mine - F]

There is a theorem to the effect that any sound, static type system is necessarily incomplete (proved roughly along the lines of the usual incompleteness results, I suppose). In other words, any type system (which actually works) must reject some programs at compile time which would nevertheless "run OK" (i.e., not "go wrong") at run time.

This seems to means that all static type systems are too conservative in your sense.

Personally, I've always taken this in a positive light, referring to it as the Type System Designers' Job Security Theorem*: it seems to imply that there could always be a better static type system than any one you have now. (Steve Ganz from IU tried to dispute this implication with me, but I think it's true in at least this trivial sense: for any witness program P you exhibit that fails to pass type-checking but still runs OK, I can just add an ad-hoc axiom to my system that says P checks OK. Of course it would be nice to think that I could find a more general principle that covers P to add to the system, but I think that's where the human creativity comes in, and I doubt there's any way to prove a general result.)

[* I suppose the original theorem should have a different name, and the more positively-phrased consequence should actually be the TSDJS Corollary.]

For me, this leads to an image of an infinitely-detailed ragged (fractal?) fringe at the boundary of typability: you can extend it indefinitely, but you can never get the whole enchilada (semantic completeness here being metaphorically Mexican food :) ).

Going further out on a limb (but bear with me, there's a point), it also reminds me of the never-ending dance between the gods Set and Thoth in Roger Zelazny's "Creatures of Light and Darkness". Set and Thoth are, as in the original mythology, mutually-recursively father and son to each other (it makes a cute little classroom demonstration to express this in Prolog and show that, for many purposes, it works just fine). In any case, Zelazny has Thoth sowing monsters before Set (the Destroyer) throughout eternity, as distractions from his otherwise imminent self-destruction. (There are other cute metaphoric possibilities: a bound god known as the Nameless One, presumably a de Bruijn variable :) , a personified black hole/Abyss reminiscent of the semantic bottom, etc.)

Anyway, the promised point here is this eternal back-and-forth dance also rather smacks of game theory, which undoubtedly provides other useful perspectives on the whole typing issue. (Well, OK, I hope also to tempt you to read the book, but only because you might enjoy it. Plus, I have the impression that some people around here could use a couple of hours away from the LtU discussion boards :) .)

Rejecting programs is important too

There is a theorem to the effect that any sound, static type system is necessarily incomplete (proved roughly along the lines of the usual incompleteness results, I suppose). In other words, any type system (which actually works) must reject some programs at compile time which would nevertheless "run OK" (i.e., not "go wrong") at run time.

That's true, but it's a desirable feature. There are programs that won't technically "go wrong" but which I don't want to type-check. That's because the more flexible the type system the more programs can be given types -- even erroneous programs. A type system rejecting a program can be a feature -- rejecting a class of programs whose members are erroneous 95% of the time and only correct 5% of the time will actually speed up development in the typical case.

This isn't idle theory, either. As a concrete example, I don't want a language to accept programs which require arbitrary equi-recursive types to type-check. They almost always indicates an erroneous program which "accidentally" has a valid type assignment. Adding support for this makes the type system more expressive in a theoretical sense, but as a practical matter it makes the language less usable because errors get detected later.

An equi-recursive type is a type like this:

type int_stream = int * int_stream

That is, one in which the type itself is part of the type expression that describes it.

ML and Haskell have "iso-recursive" types, in which a recursive instance of a type can occur only behind a data constructor. So you would have to write

type int_stream = Stream of int * int_stream
to get a recursive stream.

Dynamic type systems

Dynamic type Systems are more powerful than static type systems; because static type systems reduce your ability to write code for ill-defined problems.

The more powerful the feature; the more room for hanging your self; because the more power they give you the more room abuse that power!
Therefor do not be affraid to add features that are dangerous; because they make the languige more powerful. Power = Rope

Power = Rope = Ability to

Power = Rope = Ability to tie yourself up.

Dynamic type systems are so vanilla!

[agreement snipped] But "som

[agreement snipped]

But "some kind of statically verifiable behavior" may not compensate for what the dynamic language user perceives as being lost. When a major change is being contemplated, or even just when a tradeoff is involved, it's easier to pick a winner if the scale tips heavily in one direction or the other. It certainly seems as though statically-checked languages may fit that criteria in future; but I'm saying that right now, that choice isn't as clear to everyone.

Well, I don't know the tradeoff involved in this choice, I clearly don't grok what's so great about DT. I tried very hard to understand it but every time I ask somebody to show me what is impossible to statically type and why would it be useful in a language DT enthusiasts start to ignore me.

For me the tradeoff is pretty simple: either I can have some form of static behavior or not. One of the ways static behavior is essential (IMHO) is when we are writing higher-order functions. When I'm on the bus I like to hack using LispMe on my Palm. Most of the time I get unexpected answers or infinite loops in pieces of code that would fail to type-checked in Haskell, but using LispMe I don't have a clue of why the code fails to me my expectations (other than my expectations are wrong ;-)) where in Haskell the error message (even if it is enigmatic most of the time) points me to a particular piece of code. Sometimes I get frustrated with the errors and go play Solitaire, waiting to get home and see if Hugs give me a clueful type error. The value added by this kind of static checks is much more interesting that the unit-tests that I would have to write to verify the code (I'm a big TDD fan but it annoys me to have to write code to verify something that I know would fail to type-check in Haskell if it was wrong). Static behavior checking with inference is something that reduces my effort to write and understand code and, IME, DT fails to give me an adequate substitute for that.

In addition, I've mentioned my concern about the narrow focus that tends to be introduced with 0

The problem is what kind of expectations people put in ST systems. There's a common fallacy where if we propose to replace A with B, users of A criticize B for having any failure (specially those that A has too), even if the number and magnitude of problems in B is smaller than in A. It's a fact that we have to write less unit-tests in ST languages with type-inference than we would in DT languages, because the type system rules out many classes of errors, but some people believe that if in a ST language they'll still have to write tests than it isn't worth the effort, even if the amount of testing decreases by an order of magnitude.

We won't be able to discuss the real benefits of ST vs. DT if we don't start to communicate in a common language without resorting to statements like "The type system gets in the way", "Write code to satisfy the compiler", "The industry isn't interested in rigor", etc..


[rant]
P.S.: This debate isn't impartial because most of the DT proponents never used a modern ST language and use the shortcomings of Java/C++ as inherent problems of ST. Some of the most vocal proponents of DT today (e.g. Bruce Eckel, David Thomas, Robert C. Martin) don't. OTOH most proponents of ST have some knowledge of DT languages (e.g. Scheme, Smalltalk, Python) and don't fail to acknowledge those languages when they compare the two.
[/rant]

It's all Larry & Guido's fault :-)

Well, I don't know the tradeoff involved in this choice, I clearly don't grok what's so great about DT. I tried very hard to understand it but every time I ask somebody to show me what is impossible to statically type and why would it be useful in a language DT enthusiasts start to ignore me.

I think that's a valid point, but part of the problem is that it's not a question of what's "impossible" to statically type, but what's convenient, and there's obviously an element of subjectivity there. Also, as I mentioned in one of my posts, the type-theoretic statically checked languages tend to impose a kind of type-oriented programming style, which may not always be desirable. This is also subjective, of course, but there's nevertheless a tradeoff and a choice there.

Also, some of the current benefits of dynamic languages are not in the languages themselves, but in what current realistic implementations are capable of in terms of dynamic loading and so on. As a silly example, how come you don't run Haskell on your Palm? :)

I think it'd be a good idea, as I think Dominic Fox suggested in another thread, if a good list of DT benefits were put together, perhaps alongside a list of static alternatives, and an analysis of tradeoffs. I think that would require some careful work, because many of the DT benefits are admittedly "fuzzy" ones - but that doesn't necessarily mean they're not real, it may just mean they're more difficult to formalize. The latent type system discussion is an example of this kind of thing, where halfhearted attempts at formalization are misleading.

One of the ways static behavior is essential (IMHO) is when we are writing higher-order functions.
That's a really good point. Of course, most dynamic languages (with a few exceptions like Scheme) don't make heavy use of HOFs. I don't say that as a defense, but rather as one reason they might "get away" without static typing.
When I'm on the bus I like to hack using LispMe on my Palm.
Heh, me too - in fact it's now the only reason I still keep my Palm around (it's an old Palm III).
Static behavior checking with inference is something that reduces my effort to write and understand code and, IME, DT fails to give me an adequate substitute for that.
I agree! Part of my interest in this discussion comes from a belief (supported by some evidence in the Scheme world) that a middle ground is possible.
The problem is what kind of expectations people put in ST systems.
I agree that's a potential perception problem. However, I bet if someone produced a language that allowed DT programmers to write programs the way they always have, but gave them useful feedback about possible type errors, they wouldn't complain. They might also begin adjusting their programs to take advantage of the type system.
[rant] P.S.: This debate isn't impartial because most of the DT proponents never used a modern ST language and use the shortcomings of Java/C++ as inherent problems of ST. Some of the most vocal proponents of DT today (e.g. Bruce Eckel, David Thomas, Robert C. Martin) don't. OTOH most proponents of ST have some knowledge of DT languages (e.g. Scheme, Smalltalk, Python) and don't fail to acknowledge those languages when they compare the two. [/rant]
I think that's valid too. My original post, which started this thread, was aimed at arguing the benefits of having some knowledge of type theory and type systems, independent of statically vs. dynamically checked. But this relates to the previous point: as long as DT languages ignore their type systems, programmers who only have experience in DT are going to be at a disadvantage. In a sense, that's the "fault" of DT language designers, for paying so little attention to the type system aspect of their languages.

I agree that's a potential

I agree that's a potential perception problem. However, I bet if someone produced a language that allowed DT programmers to write programs the way they always have, but gave them useful feedback about possible type errors, they wouldn't complain. They might also begin adjusting their programs to take advantage of the type system.

I have a self-quotation which supports this, from my gung-ho Python days, when I was beginning to get interested in functional programming:

Type inference (to my mind at least) fits the Python mindset very
well. I think most Python programmers would be glad to have strong typing,
so long as they don't have to press more keys to get it. If you have to
declare all your types up front, it just means more time spent changing type
declarations as the design evolves, but if the compiler can just ensure your
usage is consistent, that's hard to argue with.

I say we shall have no mo' "latent types"...

Those that are "latent" already—all but one—shall abide,
The rest shall keep as they are. To /dev/null, go!
Also, some of the current benefits of dynamic languages are not in the languages themselves, but in what current realistic implementations are capable of in terms of dynamic loading and so on.

You are conflating dynamic loading of untyped code with dynamic loading of typed code. They are not the same thing, and untyped languages only support the first, and typed languages can easily accomodate dynamic loading of Univ code.

I find it very irritating that people continually bias their arguments for untyped languages in this fashion, as if typed languages need to "catch up" or something.

But this relates to the previous point: as long as DT languages ignore their type systems

Untyped languages don't have type systems.

This is something else I find irritating, Anton. I spent many, many hours explaining exactly why and in what sense this is true, and I thought at last that people had started to grasp what a type is, what it is not, and what the relationship between typed and untyped languages is.

Now you are diluting my message with your talk of "latent type systems." As far as I can tell, your notion of "latent type" only amounts to program properties which might be extractable with some blue-sky theorem prover, or more realistically a proof assistant being manually operated by a type theorist. In particular, as a notion of "type system", latent type systems fail the "tractable" and "syntactic" conditions which you yourself quoted from Pierce.

Instead of jumping on the type bandwagon, why not call your "latent types" "program properties"? Or, if you think they are different, please demonstrate how.

Response in part III

Since this thread is getting long, and since this question re latent types is a topic in its own right, I've replied here.

Dynamic type systems

Dynamic type systems allow you to have more fredom but the price of fredom is eternal vidulance; i use haskell i find that type system to be while nice; it is harder to debug programs with type errors because i cant figure out why there is a type error in the first place.

End of nonsense?

OK, here goes. I needed to step back and think things over before I made my final reply.


(recap) A couple of days ago, I made a rather misguided attempt at setting Frank Atanassow straight. That posting didn't go down well with the resident debaters (which in retrospect I fully understand). Some people took me to task on issues like the origins and contents of type theory and the relative merits of dynamic versus static typing, which confused me since I was quite unprepared for having those issues brought up. (I am attempting to abbreviate here, not to misrepresent. The original postings are still available for the reader who wishes to study the complete argument.)


It all began (for me) when I saw mr Atanassow make this statement:

I've said it before and I will say it again, until people stop repeating the party line and start using their brains.


This statement has a context (see the entry "No contraints" at <URL: http://lambda-the-ultimate.org/node/view/100>) but let me for now skip that, because it was the actual wording that dismayed me and made me reply. When someone says something like this, I read it approximately as follows:

No one takes [what I say] seriously, but I'm sure this is just because people just accept disinformation instead of thinking for themselves.


Again, this isn't what he wrote, just what I read. I've come to associate statements like this with creationists, orgonomists, etc, and sometimes I feel compelled to respond, with the intention of making the other person question his/her own line of thinking ever so little. Off-hand, I can't think of any occasion outside face-to-face discussion where this has actually worked, but still I'd say this is a fairly well-intentioned thing to do. It does, however, help very much if one doesn't forget one crucial element: when you set out to enlighten someone, you really need to be focussed enough to actually be at least bright yourself. Put another way, any sufficiently unfocussed line of questioning is indistinguishable from idle bitching (and will be treated like such).


(My fondness for overloading seriously-intentioned arguments with jokes and attempts at profoundness using polysyllables and awkward phrase structure in a non-first language doesn't help much either, as I think this paragraph proves.)


So, I sincerely apologize for the unnecessary fuss I've caused. I meant well, but I did badly. I will continue to read LtU with interest, but I will do my best to avoid stepping into discussions and being a bumbling old fool in this manner in the future.


Brief clarifications and retractions


(I wish to thank Paul Snively and Anton van Straaten for patient and informational replies. The following is not as much an attempt to be argumentative as to avoid leaving some of their most poignant questions to me unanswered.)


(From "Russell supports the modern type theorist's case") The whole point of Russell's work was to be able to reject formulae which were considered meaningless, based on the types of their terms. This is exactly analogous to rejecting erroneous programs based on the types of their terms, which is what syntactic type theory does.


Yes. I meant to say that Russell's work does not depend on the specifics of modern syntactic type theory or the discipline of static type checking of program text (for chronological reasons, if nothing else). As far as I can tell, modern syntactic type theory is indeed fully supported by Russell.


(Ibid.) Now, you might say that this is all very well, but there still should be some way to apply Russell-like type analysis to runtime values.


(Squirms embarrassedly) Yes, I might actually say that, but the only support I can cite for this notion is my own squishy brain.


(Ibid.) Even if Russell's work didn't inherently fail to support your case, it would still fail on the basis that it doesn't deal with programming languages


The fact that it doesn't deal with programming languages was rather my point in that case, I'm afraid.


In the entry "Striving for Clarity", Paul Snively asked which paper by Benjamin Pierce it was that "had me laughing out loud". The paper was "Dynamic Typing in a Statically Typed Language" (<URL http://www.cis.upenn.edu/~bcpierce/papers/dynamic.ps>). I have tried, and failed, to put down an explanation for my laughter that does not become argumentative, and probably only will render me a new helping of well-deserved crow. Just one more careless and unnecessarily inflammatory comment, I fear.


("Striving for Clarity") I have to confess that I find this vexing; it isn't a "branch" of type theory; it's type theory.


I'm sorry, I did not mean to be rude. Throughout this sorry debate, you can see me wildly flailing for the right way to express my meaning. I suppose I was trying to catch the sense of something that is now (a "branch") coming from something that was then (a "root"), and that the branch isn't strictly equal to the root.


(Ibid.) Types are all about knowing at compile time that certain classes of things are not possible at runtime. The assertion that you've been making without any support is that this is of "limited value." "Limited relative to what?" is the question that I should have asked from the outset, because I can keep enumerating values until the cows come home.


Limited mostly in the sense that those classes of things that are made impossible do not include all possible problems in non-trivial programs (I know it's a dodgy term, I just want to make the admission that programs exist where all possible problems are eliminated by static type checking).


(Ibid.) Oh, nice Dowdism, Peter.


Actually no, just a bad case of underquoting combined with a confused attempt to get back to what I saw as the main point. Apologies again.


(Ibid.) your assertion is isomorphic to claiming at least one of the following:

  1. Standard ML, Haskell, or O'Caml's type systems make no more guarantees than Pascal's.
  2. The guarantees that SML, Haskell, or O'Caml's type systems make are unenforceable.
I'm curious as to which one you meant, or both.


Mostly (1). In my understanding, the guarantee that static type checking makes is that it will not allow type errors to pass. There are still "bugs" that aren't type errors, or that are undetectable as type errors by the type checker in question. Modern ST languages do have more sophisticated type checkers that catch errors much more effectively and conveniently than older ones such as Pascal's -- I think it's even fair to say that they are fullfilling the guarantees that Pascal made. Still, as I see it, essentially they make no more guarantees.


(Ibid.) How can it possibly be detrimental to say "I want to prove before runtime that certain bad things can't happen?"


That's not detrimental, that's prudent. Saying "If the compiler accepts it, no bad things can happen at runtime" is detrimental. I know you aren't making this kind of statement, but some people do.


I hope I haven't omitted any points I ought to have addressed here. I'll sit back and stay quiet now.

The LtU ethos again...

I meant well, but I did badly. I will continue to read LtU with interest, but I will do my best to avoid stepping into discussions and being a bumbling old fool in this manner in the future.

Discussions about typing are usually more aggressive than the norm around here. As you might have noticed, I tried to stay out of the discussion myself...

I hope you will continue to post. I think your contribution was interesting and useful.

As a general rule, we strive for meaningful and rigorous discussions. I think that as far as ST vs. DT discussions go, we are still much better than the competition. But since these discussions are never ending (as the thread demonstrated so well) people tend to be more aggressive than perhaps they should.

Whew.

I'm not going to say anything here other than to attempt to humbly apologize to Peter and Anton. I'm afraid that Anton's right—I'd lapsed from finding some very constructive common ground and room for discussion to mere pedantry and perhaps even error. At the very least I had begun to ascribe bad faith to my interlocutors, and in retrospect I find that unconscionable.

So I'm very sorry, very glad to see Peter post again, and beg Peter and Anton's forgiveness and forebearance.

Re: whew

Apology accepted, thanks. Really, we all know how these things can go, and for my part I apologize for anything I said that was inflammatory, and for not being as clear about what I was getting at, or where I was coming from, as I could have been.

Haskell and typing

I've found the discussion in this thread very interesting and its made me a lot more curious about (static) typing in general. But I'm a typing newbie and I couldn't quite figure out what was possible now, and what was going to be available in the future. So here's a question I'm hoping someone could help me with. Is there a way in Haskell to create a type which represents the even integers? In the code below, is there something I can replace the ??? with to cause a compile-time type error (i.e. the 3 isn't even)?

data Even = ???
f :: Integer -> Even
f x = 3

re: Haskell and typing

In Haskell the LHS of a data definition defines a new type, while the RHS declares which are the possible constructors of that type. The constructors aren't subtypes of the type declared. So when you declare a Even data you should provide constructors that create such values. Let's create a constructor that wraps an integer:


data Even = MkEven Integer
f :: Integer -> Even
f x = MkEven 3

That doesn't create a static error as we would liked, because the constructor accepts any kind of Integer. We can't write down an invariant of what possible values should be used in MkEven because in Haskell type system ins't dependently typed (i.e. we can't use expressions with terms (e.g. x > 3) and treat them as types). So the correct way of doing so is hiding the constructor and provide a way to transform an Integer into a Even value and a way to get an Integer out of a Even:


-- MkEven shouldn't be exported out of this module
-- Also we could use newtype instead of data
data Even = MkEven Integer deriving (Show, Eq)

makeEven :: Integer -> Maybe Even
makeEven x | x `rem` 2 == 0 = Just (MkEven x)
           | otherwise      = Nothing
unEven :: Even -> Integer
unEven (MkEven x) = x

-- Useful primitives
times :: Even -> Integer -> Even
times (MkEven x) n = MkEven (x * n)

zero :: Even
zero = MkEven 0

-- in other module

f :: Integer -> Even
f x = case makeEven x of
          Just n -> n `times` x
          Nothing -> zero

The idea is to create a type that encode the invariants, some primitives to use it and them use the primitives. This particular example doesn't use any advanced feature in the Haskell type-system (e.g. polymorphic types) but the general idea is that.

Not that same thing, is it?

The idea is to create a type that encode the invariants, some primitives to use it and them use the primitives. This particular example doesn't use any advanced feature in the Haskell type-system (e.g. polymorphic types) but the general idea is that.

I agree that this is the idiomatic way to write it in Haskell, but let's be clear that it also doesn't provide the static check that he asked for. The short answer AFAIK is that there is no easy way to accomplish this task statically in Haskell. I'm confident that there is a complicated, awkward way which is probably also a lot of fun (Oleg?), but as someone (Paul?) pointed out, it ends up being a lot like template metaprogramming tricks in C++: it's cool that you can do it, but you shouldn't have to... ;)

Haskell typing

After some more searching, I found two good papers discussing the possibilities for advanced (dependent-like?) typing in Haskell.
Faking It and
Fun with Functional Dependencies. Here's the money quote from _Faking It_...

"...we are forced to choose whether data belongs at compile-time in types, or at run-time in terms. The barrier represented by :: has not been broken, nor is it likely to be in the near future."
So it would seem like I couldn't get the desired type error in the code from above.

Double-colon barrier

Greg Buchholz quoting Conor McBride from `Faking It'

"...we are forced to choose whether data belongs at compile-time in
types, or at run-time in terms. The barrier represented by :: has
not been broken, nor is it likely to be in the near future."

One should mention a matching quote from `Implicit Configurations',
joint work with Chung-chieh Shan:

McBride mentions that, with all the tricks, the programmer still
must decide if data belong in compile-time types or
run-time terms. ``The barrier represented by :: has not been
broken, nor is it likely to be in the near future.''
If our |reflect| and especially |reify| functions have not
broken the barrier, they at least dug a tunnel underneath.

Implicit Configurations

And here's the link to Lambda's discussion of that paper.

Even Number Type

After two years, here's something close to the answer I was originally looking for...
{-# OPTIONS -fglasgow-exts #-}
{-# OPTIONS -fallow-undecidable-instances #-}
{- 
    Here's how to create an "even number" type and get GHC to complain
    about mismatches.  The key is that we need to let know the compiler
    know about the values we want to operate on.  This require us to 
    encode a value as a type.
-}

module Even where

type Three = Succ (Succ (Succ Zero))
three = type2val (undefined::Three)

-- f :: Integer -> Even
-- f x = three

{- for the function "f" above, GHCi reports a type error...

    Couldn't match `Odd' against `Even'
      Expected type: Odd
      Inferred type: Even
-}

-- Some further examples...

zero = type2val (undefined::Zero)
one  = type2val (undefined::Succ Zero)
two  = type2val (undefined::Succ (Succ Zero))

main = do putStr $ "1 + 2 = "
          putStrLn.show $ one +. two  
          putStr $ "2 * 3 = "
          putStrLn.show $ two *. three

-- and the implementation...
newtype Even = E Integer deriving (Eq,Show)
newtype Odd  = O Integer deriving (Eq,Show)

infixl 5 +.
infixl 6 *.

class Add a b c | a b -> c where (+.) :: a -> b -> c
                                 (-.) :: a -> b -> c
class Mul a b c | a b -> c where (*.) :: a -> b -> c

instance Add Even Even Even where (E a) +. (E b) = E (a+b)
                                  (E a) -. (E b) = E (a-b)
instance Mul Even Even Even where (E a) *. (E b) = E (a*b)

instance Add Even Odd  Odd  where (E a) +. (O b) = O (a+b)
                                  (E a) -. (O b) = O (a-b)
instance Mul Even Odd  Even where (E a) *. (O b) = E (a*b)

instance Add Odd  Even Odd  where (O a) +. (E b) = O (a+b)
                                  (O a) -. (E b) = O (a-b)
instance Mul Odd  Even Even where (O a) *. (E b) = E (a*b)

instance Add Odd  Odd  Even where (O a) +. (O b) = E (a+b)
                                  (O a) -. (O b) = E (a-b)
instance Mul Odd  Odd  Odd  where (O a) *. (O b) = O (a*b)

-- dynamically convert to an Even/Odd from an integer
class From a where from :: Integer -> a

instance From Even where 
    from x | even x = E x
           | otherwise = error $ (show x) ++ " isn't even"

instance From Odd where 
    from x | odd x = O x
           | otherwise = error $ (show x) ++ " isn't odd"

-- Type numerals...
data Zero
data Succ a

-- statically convert to an Even/Odd from a type
class Type2Val a b | a -> b where
    type2val :: a -> b
instance Type2Val Zero Even where
    type2val _ = E 0
instance (Type2Val a c, Add Odd c b) => Type2Val (Succ a) b where
    type2val _ = (O 1) +. type2val (undefined::a)

Re: Even number type

Greg Buchholz wrote

newtype Even = E Integer deriving (Eq,Show)
newtype Odd  = O Integer deriving (Eq,Show)

Yes, this is the idea of using abstract data types to enforce
invariants. It dates back to Robin Milner and his LCF theorem
prover. It's the idea that begat ML (after all, you need a type system
to statically enforce that, for example, a function accepting Even
won't accept anything else).

Incidentally, you may want to make Even an instance of Num, and so can
take advantage of `fromInteger' conversions (of course, (fromInteger
1)::Even) should throw an exception. Also, for projection (aka dynamic
conversion), you might want to define
toEO:: Integer -> Either Even Odd

Integer \cong Even

There is no need to hide the constructor if you use a different model.

It is obvious how to represent the even integers as integers if you've ever seen two's complement, which represents an integer as a natural; just take Even = Integer, using the (inverse of the) iso which assigns integer n to the even integer 2*n.

... -3 -2 -1  0  1  2  3 ...
... -6 -4 -2  0  2  4  6 ...

Clearly this is a one-to-one mapping. The idea is that, instead of MkEven n denoting n, it denotes n * 2.

So then it is perfectly sensible to export the data constructor MkEven, since it just denotes the function (* 2) :: Int -> Even. By hiding it you are only forcing the user to redefine it if they need it. If you think about this you will see why hiding constructors is generally a poor way to do data abstraction.

Of course, you still need makeEven and unEven, but you could call them half and twice or something.

Any enumerable then?

Ah, so if we want to represent, say, prime numbers, we just use (MkPrime n), which means nth prime number? By the same token, we can represent any enumerable set... Hmm...

Pretty powerful, but wouldn't this affect complexity of algorithms without programmer even noticing? "Casting" natural to prime to natural will have non-constant complexity...

I agree. I thought about doin

I agree. I thought about doing this but I opted to show a solution that could be used in problems other than Even/Odd.

Could static algebraic expressions be part of types?

A fictional programming language that based its typing system on static algebraic expressions would allow the following declaration concerning even numbers:

type even n : integer where n mod 2 == 0

The above would be translated by the compiler as 'any number that, when divided by 2, gives 0 modulo'.

The above could be used for algorithms that expect even numbers. A compiler could easily infer when such a constraint (i.e. even number) is violated and report an error. For example, let's see the following program:

#1 let foo(n : even) = ...
#2 let n = input_integer
#3 foo(n)

In line #1, a function 'foo' is defined which accepts only even numbers. In line #2, the user enters a number. The type of the function 'input_integer' is 'integer'. In line #3, the function foo is called with parameter the input number from the previous step. A compiler could easily infer that line #3 contains an error, because integer numbers can have modulo 0 or 2, when divided by 2. Then it could report to the programmer:

line 3 error: integer n does not honour even constraint 'n mod 2 == 0'
So the programmer can then change line #3 to:
if n mod 2 == 0 then foo(n) else ...

The compiler then checks the condition that leads to invocation of 'foo' and finds out that it matches the constraints imposed by by the 'even' type. In case the if condition is not that trivial, then the expression in the condition should curry the same static algebraic constraint.

I made this neat thing, I'll call it a wheel

Congratulations, you just reinvented dependant types :-) You might want to take a look at Epigram.

The Next Thing You Know...

... someone will decide that it'd be useful sometimes to accept a type based solely upon whether it supports the requested operation or not, apart from whether it has a declared relationship to the entity defining the operation. Let's call that "Duck Typing."

Not a bad idea

But I'd call it a type class.

More Than What is Required

It's actually just plain ol' structural typing, with or without type classes. But it's another feature that's been around since the 1970s that certain folks in the scripting/latent typing community seem to think sprung fully-formed from Guido or Matz's brow. :-)

perhaps you mean structural S

perhaps you mean structural SUBtyping?

Nonsense!

Guido and/or Matz didn't invent structural subtyping.

Alan Kay did; Python and Ruby just inherited it from Smalltalk. :)

And like the poster above points out, you mean structural subtyping.

I Was Speaking With Tongue Firmly in Cheek...

...as I'd hoped to convey with the smiley.

With that said, however, I'm afraid that you and the previous poster are both incorrect: I meant "structural typing," vs. "nominal typing," and in the context of static typing (vs. the latent typing of any member of the Smalltalk family), "structural" vs. "nominal" typing has a well-defined meaning that dates at least to ML's definition in the mid-1970s.

However, even if it were true that "structural typing" derived first from the Smalltalk tradition, it wouldn't affect the point, which is that to come up with new terminology to describe it in the Python or Ruby context exhibits an unfortunate level of historical ignorance. The point of making that observation isn't to assert my superiority or ML's, but rather just to reinforce that language design doesn't happen in a vacuum, and that even if, as a designer, you make many different choices than some historical precedent did, e.g. Python and Ruby being latently-typed and ML statically typed, it's still important to realize when you are, in fact, reproducing an extant language feature, the pros and cons of that feature as it is implemented in a historical context, and how those pros or cons might change given your other design decisions.

Finally, of course, people have to learn your language, and creating new terminology for what is an extant feature can only serve to hinder that process. No one is helped by artificial divergence in language features or documentation.

FWIW

I retracted (or weakened, as you care to see it) my claim "untyped languages are inferior to typed languages in every way." See here.

Modulo anything, untyped languages are not unityped.

Responding to the above-linked post here because of thread size/depth:
The unityped (note the "i") languages are a proper subset of the typed languages, but making an untyped language into a unityped language involves several global choices: a choice for the type which plays the role of Univ, and a choice for how to translate values into type Univ; modulo those choices, an untyped language is unityped.

I disagree that "modulo those choices, an untyped language is unityped". At the end of the same post, is the statement "an untyped language is not unityped". That's much better. ;)

What is done when an untyped language is transformed into a unityped language is trivial, and it has nothing to do with the actual latent types present in programs in the language.

Untyped languages are latently typed. As I've been pointing out, a program in an untyped language does in fact contain implicit, or latent, syntactic type information -- since one can't reliably perform operations on values without knowing something about the types of the values being operated on.

Since it's usually difficult to statically identify these latent types via mechanical means, one can make a program safe for a static type-processing stage by wrapping all values in a Univ type. Other than its utility in the internals of an implementation, this doesn't achieve very much. In particular, it has no effect on the latent types that remain present in the program, other than obscuring them for the human reader of the program. What has been achieved is that a unityped language is used as a transport to carry a latently-typed program through a static stage to a dynamic stage. To extrapolate claims about this intermediate unityped language, and apply them to the entire language, is misleading at best.

The truth of the matter is that untyped programs are in fact richly typed, in the syntactic sense. Other adjectives than "rich" might include complex, sophisticated, but also often messy, or chaotic. A consequence of these rich but potentially messy latent type schemes is that they're difficult to deal with statically.

Indeed, when humans work with dynamically-checked languages, they do more than just mental static type inference in order to get the types right - they also perform dynamic simulations, whether mentally, or by running tests on a computer. This allows humans to develop complex but mostly correct latent type schemes, that do not lend themselves to the relatively simplistic static analyses that a compiler can perform.

Many translations

I disagree that "modulo those choices, an untyped language is unityped".

Strange that you disagree with this.

What is done when an untyped language is transformed into a unityped language is trivial, and it has nothing to do with the actual latent types present in programs in the language.

The point of my post was that there are many ways to do the embedding. I only showed one, and I only showed it by example. You appear to assume that it is the only way to do the embedding.

Instead of translating to Univ, for example, you could translate to an type identical to it modulo renaming, Univ'. Or you could translate to Nat (or Integer) by a Gödel encoding. Or to String; this last is particularly easy to see if you code up pretty-printing and parsing functions:

pretty :: Univ -> String
parse :: String -> Univ

Furthermore, even if you fix the target type, there are many ways to do the translation. For example, if the language does not require "type disjointness", instead of encoding an untyped integer using the Int tag, you could encode it as a Church numeral using the Fun tag. Or, you could interpose a CPS translation. Or you could do various (semantics-preserving, of course!) optimizations on the untyped program before spitting out the corresponding Univ value.

Basically the translation can be viewed as a compilation method, where you are compiling an untyped language to a typed target language. So you can do arbitrarily complex things, if you like.

Untyped languages are latently typed. As I've been pointing out, a program in an untyped language does in fact contain implicit, or latent, syntactic type information -- since one can't reliably perform operations on values without knowing something about the types of the values being operated on.

Yes, I noticed your posts had started adopting this line of reasoning, and let it pass without comment. But I don't believe it, for the reasons I just gave above: there are many ways to assign types to untyped terms, according to many distinct type systems and translations. I don't think you can say an untyped term has a type, latent or otherwise, unless you fix this "free-floating" type system, which in particular demands fixing the target type and the translation itself.

The truth of the matter is that untyped programs are in fact richly typed, in the syntactic sense.

Given my points above, I think you would have to be very generous (perhaps "biased"? :) to adopt this view. It is thoroughly non-constructive (in the technical sense), and moreover in multiple ways.

For example, from the proof-theoretic point of view, it amounts to insisting that every well-formed term (in normal form, i.e., cut-free) is a proof of some theorem, but not only do we not know the theorem, we don't even know what logic it belongs to! Maybe that's not nonsensical, but to me it has the ring of a new postulate for mathematical foundations, and I am not much interested in such drastic measures.

The unbearable usefulness of multiple unknown logics

I disagree that "modulo those choices, an untyped language is unityped".
Strange that you disagree with this.
I think my understanding of the term "modulo" may be different from yours. But no matter.
The point of my post was that there are many ways to do the embedding. I only showed one, and I only showed it by example. You appear to assume that it is the only way to do the embedding.
The possibility of multiple non-Univ embeddings was the point of my post, since you didn't seem to be saying anything about those possibilities.
Untyped languages are latently typed. As I've been pointing out, a program in an untyped language does in fact contain implicit, or latent, syntactic type information -- since one can't reliably perform operations on values without knowing something about the types of the values being operated on.
Yes, I noticed your posts had started adopting this line of reasoning, and let it pass without comment. But I don't believe it, for the reasons I just gave above: there are many ways to assign types to untyped terms, according to many distinct type systems and translations. I don't think you can say an untyped term has a type, latent or otherwise, unless you fix this "free-floating" type system, which in particular demands fixing the target type and the translation itself.
That's a valid concern. I'm really trying to characterize the relationship of untyped languages to their types, certainly not formalize it. How would you characterize the statically-available type-like knowledge that a programmer has about values being operated on? This knowledge can range from knowing that a value has a single type, like int or string, or a single class or interface; or a set of types/classes/interfaces; or in dynamic OO-style, you may know that the object being dealt with supports, say, a particular message or set of messages (an ad-hoc interface); plus, presumably, a number of other possibilities. I chose to refer to this as being knowledge about latent types. Certainly, the Univ type doesn't capture such knowledge at all.
The truth of the matter is that untyped programs are in fact richly typed, in the syntactic sense.

Given my points above, I think you would have to be very generous (perhaps "biased"? :) to adopt this view.

It is not my intent to be either generous or biased, but rather to try to examine some aspects of dynamic languages which, it seems to me, are usually overlooked when describing those languages from a type theory perspective.
It is thoroughly non-constructive (in the technical sense), and moreover in multiple ways.

For example, from the proof-theoretic point of view, it amounts to insisting that every well-formed term (in normal form, i.e., cut-free) is a proof of some theorem, but not only do we not know the theorem, we don't even know what logic it belongs to!

Yet amazingly, people are able to exploit these multiple unknown logics to produce useful, working systems. Such is the power of dynamic languages! ;oP
Maybe that's not nonsensical, but to me it has the ring of a new postulate for mathematical foundations, and I am not much interested in such drastic measures.

I appreciate your generosity in acknowledging the possibility of non-nonsensicality. :) Your position is quite understandable, since an adequate formalization of the issues in question has all the indications of being a quagmire with questionable returns. It is far simpler to restrict oneself to languages in which there's a single, well-defined type system, and useful results seem far more certain.

Nevertheless, I see something of interest in this static knowledge that millions of ordinary programmers use virtually unconsciously, and I think it could stand some analysis.

How to know what you know

How would you characterize the statically-available type-like knowledge that a programmer has about values being operated on?

You mean, how would I formalize it? I would formalize it as properties expressed in a first-order logic over the domain of programs.

Now you will say, ''But that's exactly a type system!" But it's not, not according to Curry-Howard at least—a type system is a proof theory, not merely a logic, and the programs are the proofs, not the semantic domain.

Furthermore, as I've said before, I don't subscribe to the view that types are merely program properties, or even provable program properties, since it assumes you can talk about typed programs without talking about types.

Yet amazingly, people are able to exploit these multiple unknown logics to produce useful, working systems. Such is the power of dynamic languages! ;oP

When people "produce useful, working systems" with untyped languages, they are not exploiting "multiple unknown logics" any more than I am exploiting Pauli's Exclusion Principle when I rest my chin on my knuckles. Otherwise, I would never have had to take that quantum physics class in college.

There is a big difference between something being true, knowing that it is true, and knowing that you know it is true. We might say something is true if it is a theorem; we know that it's true if we know it is a theorem and there exists a proof; and we know that we know it's true if we can exhibit a proof that we understand.

If the first were the same as the last, then there would be no need for science or mathematics or objectively verifiable methods of argument. Furthermore, we would not be having this discussion in the first place, since truth would be self-evident, and so we could not possibly disagree.

Nevertheless, I see something of interest in this static knowledge that millions of ordinary programmers use virtually unconsciously, and I think it could stand some analysis.

Odd how they all know it "unconsciously" yet have such difficulty grasping it when told it explicitly. :)

Unconsious type systems

Odd how they all know it "unconsciously" yet have such difficulty grasping it when told it explicitly.

This made me think what type system I use unconsciously (in javascript). There's certainly a lot of flow analisys going on. When calling a method on an object, I'm thinking "Do all the possible objects that this variable might get, support this method?"

I'm certainly not thinking "what type does this variable have".

I certainly also do a lot of dynamic checking. Not because I'm not certain if everything is correct, but because it's web programming, and a lot of data properties are optional.

The various type systems I've seen on LtU don't seem to match this programming style even remotely.

Interesting

sjoerd visscher: This made me think what type system I use unconsciously (in javascript). There's certainly a lot of flow analisys going on. When calling a method on an object, I'm thinking "Do all the possible objects that this variable might get, support this method?"

I'm certainly not thinking "what type does this variable have".

In JavaScript, it doesn't make sense to talk about "the" type of a value anyway, although it makes sense to talk about the type of a variable, so one might wonder why you don't think "what type does this variable have?" because it seems as though the variable type could answer the question "does this object support this method?" for you. :-)

I haven't done enough JavaScript programming to be sure, but this makes it sound like subtyping in JavaScript is structural rather than nominal. Is that true? In any case, if so, it's in good company: O'Caml is also structurally, rather than nominally, subtyped: calling a method on an object in O'Caml typechecks iff the object implements the method signature, regardless of who extends what.

sjoerd: I certainly also do a lot of dynamic checking. Not because I'm not certain if everything is correct, but because it's web programming, and a lot of data properties are optional.

The various type systems I've seen on LtU don't seem to match this programming style even remotely.

That's interesting. Haskell has the "Maybe" monad; O'Caml has the "option" type; Nice has ? to signify optional values; Scala has "Some" and "None" for option types. If there's one thing that modern typed languages seem to do better than currently-popular typed languages and untyped languages, it's express the notion of having/not-having a value!

Re: Interesting

calling a method on an object in O'Caml typechecks iff the object implements the method signature, regardless of who extends what

Yes, Javascript works in that way (except then that there's no static type checking). It's what Python calls duck typing. Btw, what does O'Caml infer when the argument is also an object which has to support a method and so on (possibly ad infinitum)? (F.e. a tail method)

If there's one thing that modern typed languages seem to do better than currently-popular typed languages and untyped languages, it's express the notion of having/not-having a value!

Absolutely. My language Loell does this too, by default, everywhere. What's interesting is that all predicate functions of type a -> Boolean become filters of type a -> a?. I'm now working on Moiell which lets variables also have multiple values, like the list monad in Haskell, or like XSLT.

Known unknowns

How would you characterize the statically-available type-like knowledge that a programmer has about values being operated on?
You mean, how would I formalize it?
Even before formalizing it, it would be useful to be able to discuss it, which requires some sort of informal agreement on the nature and scope of the knowledge in question, and some appropriate terminology.

I would formalize [statically-available knowledge about DT values] as properties expressed in a first-order logic over the domain of programs.

Now you will say, ''But that's exactly a type system!" But it's not, not according to Curry-Howard at least—a type system is a proof theory, not merely a logic, and the programs are the proofs, not the semantic domain.

Furthermore, as I've said before, I don't subscribe to the view that types are merely program properties, or even provable program properties, since it assumes you can talk about typed programs without talking about types.

First, even if you don't call it a type system, this all still seems to suggest that there's some formalization of such logic(s) which could be useful, but which hasn't actually been done (or has it?)

Second, I don't really want to talk about typed programs without talking about types: I want to see better attempts to identify the types (or the types of types) that lurk in DT programs. Put another way, is "Univ" the best that can be done? Of course, it isn't.

Note that in some cases, the program properties we're talking about are demonstrably types. Extreme examples include the cases where types can be mechanically inferred, at which point DT programs are presumably indistinguishable from corresponding ST programs. But there are also many cases in which the programmer could clearly identify unambiguous and obvious types in a DT program, even if they can't be mechanically inferred.

Presumably there's an opposite extreme on this spectrum of "type-like knowledge", where the knowledge in question is so chaotic as to not be usefully organizable into something resembling types. However, I doubt that most programs fall into this category.

When people "produce useful, working systems" with untyped languages, they are not exploiting "multiple unknown logics" any more than I am exploiting Pauli's Exclusion Principle when I rest my chin on my knuckles. Otherwise, I would never have had to take that quantum physics class in college.

Darn, there goes my excuse for not taking that quantum physics class in college!

But actually, the example doesn't apply here. Programmers produce these systems by performing various kinds of modeling and analyses, in their heads if nowhere else. The "unconscious" aspect I referred to is that they aren't usually working within a formalized framework, but they're nevertheless going through a thought process in order to determine whether particular operations can validly be performed on the values in play at any given point in a program. Precisely because they don't have the constraint of having to formalizing this knowledge, they may in the process create "latent type-like schemes" which might be a challenge to formalize coherently. But given that the end result is a working program, I don't think the static framework within which those programs operate should be dismissed quite as cavalierly as many type theorists seem to want to do.

There is a big difference between something being true, knowing that it is true, and knowing that you know it is true.
Hmm, perhaps the U.S. Secretary of Defense should consider working this into his famous poem, "The Unknown". But Rumsfeld also waxed poetically about known unknowns, and I think that's really what we're talking about here: things that we know we don't know (enough about).
We might say something is true if it is a theorem; we know that it's true if we know it is a theorem and there exists a proof; and we know that we know it's true if we can exhibit a proof that we understand.

Right: we're talking about useful, working things, and their relationship to our formal theories about those things.

Imagine if an exercise physiologist were to analyze runners and running based on models constructed of inflexible rods, joined by metal hinges. That physiologist might discover and be able to prove all sorts of interesting things about the nature of running (if running is defined as what the models do), but someone planning to compete in the Olympics would be well-advised to take the physiologist's advice with a grain of salt.

Type theory is not in this position, because in fact it has "runners" (languages) that embody its models perfectly. However, when type theory is casually applied to DT languages - which don't embody its models as neatly - then it starts seeming a lot more like my naive physiologist.

Nevertheless, I see something of interest in this static knowledge that millions of ordinary programmers use virtually unconsciously, and I think it could stand some analysis.
Odd how they all know it "unconsciously" yet have such difficulty grasping it when told it explicitly. :)
Ah, but as I have just pointed out, they have not yet been told "it" explicitly. What they have been told instead is that if they're willing to restrict themselves in certain ways and apply more discipline of a certain kind when designing programs, that there are benefits to be had. It's not so much a question of not grasping it, it's more a question of not buying into the tradeoffs, and having good enough instincts to disbelieve any exaggerated claims of subsumption.

Disknowns

Frank Atanassow: We might say something is true if it is a theorem; we know that it's true if we know it is a theorem and there exists a proof; and we know that we know it's true if we can exhibit a proof that we understand.

Anton van Straaten: Right: we're talking about useful, working things, and their relationship to our formal theories about those things.

I think this is where I get stuck (or "go wrong" :-) ): how do we know that the true thing is a theorem without there existing a proof? Anton talks about "useful, working things," but the evidence of 20 years in the software business tells me that "working" is far too generous a term to describe what the overwhelming majority of software does (cf. my conversation with sean, though, about whether that says more about software development or about economic models of it).

Anton: Ah, but as I have just pointed out, they have not yet been told "it" explicitly. What they have been told instead is that if they're willing to restrict themselves in certain ways and apply more discipline of a certain kind when designing programs, that there are benefits to be had. It's not so much a question of not grasping it, it's more a question of not buying into the tradeoffs, and having good enough instincts to disbelieve any exaggerated claims of subsumption.

Eh, this also contradicts my experience, both in the sense of banging my own head against type theory and in the sense of trying to get my fellow Lispers to even try something like O'Caml, which very much strikes me as "Lisp with Algol-inspired syntax." And while we've concluded that untyped != unityped, the subsumption argument still stands, if for no other reason than that we know that both typed and untyped languages boil down to the Lambda Calculus (or the SK Combinator Calculus, or the Universal Turing Machine; pick your poison). It is precisely this disavowal of subsumption that I characterize as mysticism. Perhaps you think I mean that in a perjorative sense. Let me assure you that I do not; as a practicing Christian I participate in what some would charitably call "mysticism" and a vastly greater number would call "crackpot nonsense" daily. What I don't engage in is what would in theological terms be called animism: the notion that, because God is a "higher power," He actively infuses literally everything with some kind of life force. Similarly, while I can agree that type theory doesn't yet tell us everything we'd like to be able to (formally) say about programming and programming languages, I don't take that to mean that untyped languages and the programs written in them therefore are a "higher power" than typed ones; that's contradicted by the evidence in the form of the aforementioned Lambda/Combinatoric/Turing results.

What I'm driving at is that typed languages are good for saying that certain things are always true, and untyped languages are good for saying that a larger number of things are true often enough to be useful. I think our industry could benefit from striving to move the dial a bit closer to "always" than it currently is, but at heart that's an economic claim (cf. sean again), and in the end the market gets to decide whether I'm right or wrong.

Type systems and God: Man's search for certainty

Paul Snively wrote:
I think this is where I get stuck (or "go wrong" :-) ): how do we know that the true thing is a theorem without there existing a proof?
I'm not claiming any theorem, I'm talking about an area that looks as though it needs theorems, but doesn't seem to have any, other than essentially trivial descriptions such as Univ.
Anton talks about "useful, working things," but the evidence of 20 years in the software business tells me that "working" is far too generous a term to describe what the overwhelming majority of software does

Again, that's some other battle. You and I, and many other people, are capable of distinguishing between software that works, and software that doesn't. I'm talking about software that works, and there's plenty of it around. I've already addressed the point about how you know whether software works, and the answer is essentially the same for both DT and ST languages: you test it, and you use it, and you fix it when it doesn't work.

Certainly, the set of working software suffers from selection bias by definition, but I'm not making any claims about which software methodology has the greatest productivity, etc.: I'm simply saying that there are DT programs which work, and perform useful tasks, which indicates that DT languages have some utility, and are therefore interesting on some level; given that, I'm saying that type theory, despite its close connection to these issues, hasn't provided good theories of what DT languages do with their latent, static type-like information, which I'll call "latent types" (despite Frank's perfectly valid formal objection to the use of the term "type" outside the context of a defined type system).

One type-theoretic description of DT languages that has been provided is the Univ one, but it doesn't capture the relationship of DT programs to their latent types. The fact that this description has been so easily accepted by many people (it's actually taught at some universities, as is the ST-subsumes-DT idea) indicates to me that there's a widespread lack of recognition of the nature of latent types (or alternatively, a determined desire not to deal with them).

One big reason for that lack of recognition is precisely a lack of theories, the "known unknown" I referred to (although to some people, it's apparently an "unknown unknown"). Even if the lack of treatment has perfectly reasonable justifications - e.g. it's uninteresting from the context of pure type theory, or it's being dealt with bit by bit in other ways (dependent types, first class labels, whatever), it's still unsatisfactory not even to be able to discuss the subject with commonly accepted terminology.

the subsumption argument still stands, if for no other reason than that we know that both typed and untyped languages boil down to the Lambda Calculus (or the SK Combinator Calculus, or the Universal Turing Machine; pick your poison). It is precisely this disavowal of subsumption that I characterize as mysticism.

The Turing equivalence argument works in both directions, so there's no meaningful sense in which the "argument still stands". Few arguments emerge from the Turing tarpit. I'm not making any mystical claims, in any sense of the word.

the notion that, because God is a "higher power," He actively infuses literally everything with some kind of life force

Now I know you're reading in things that I'm most definitely not saying. :)

I don't think it's at all controversial, or even remotely mystical, to say that type theory hasn't provided any good theories for DT languages.

I'm not disputing that any DT program can be translated into an ST form, whether via a full translation or a more superficial embedding - and of course, the reverse is true too. (Note, btw, that an embedding of a DT program via Univ leaves its latent type information intact, and un-captured by the host language's static type system.)

I'm not even trying to make value judgements about which approach is "better". However, in the excitement of the perfect match between type theories and the real languages which implement those theories, it sometimes seems to be forgotten that the theories as currently formulated do not in fact capture everything that's going on in DT languages -- and as we've seen, it's even been explicitly, but incorrectly, claimed otherwise.

I'm interested in the technical issues of type systems in DT languages, and as I see it this has nothing to do with how many broken DT programs there have been throughout history, or how many Lisp programmers refuse to try OCaml, etc. Of course, I invited the latter remark by responding to Frank's jibe. However, there's a very real sense in which Frank's comment was unfair, which is that the story from the type theory side, as it applies to DT languages, is incomplete, but isn't always acknowledged as such; that was the point I was making in that case. Certainly, with misleading arguments about things like subsumption floating around, little progress is going to be made on finding common ground.

A big reason I'm interested in type systems in DT languages is that I'd like to see more of an ability to migrate DT programs to ST, in the same language (or in a series of closely related languages). Given a capability like that (and no, I'm not talking about what some CL implementations do), much of this argument would evaporate. But barring an unexpected breakthrough, we're not going to achieve that while DT is dismissed as uninteresting, or trivially embeddable, or easily convertible to some ST alternative.


[postscript:]
What I'm driving at is that typed languages are good for saying that certain things are always true, and untyped languages are good for saying that a larger number of things are true often enough to be useful. I think our industry could benefit from striving to move the dial a bit closer to "always" than it currently is, but at heart that's an economic claim (cf. sean again), and in the end the market gets to decide whether I'm right or wrong.

I agree with all of that, particularly the point that one of your concerns is an essentially economic claim, which isn't what I'm really interested in since I take sean's perspective as a given. (I had previously alluded to some similar economic arguments, but sean put it much better than I did.)

But as I mentioned above, I'm interested in having the dial you mention be more of a dial, rather than a discontinuity.