"We Really Don't Know How to Compute!" by Gerald Sussman

In this talk from Strange Loop 2011, Gerald Sussman talks about where we are in the scope of "computing" (hint: not very) and what kinds of systems we should be building.

"We Really Don't Know How to Compute!"

I would love to hear the LtU take on his talk...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

apology for abstraction

It is an interesting apology for abstraction and flexibility in a language. A good "Lisp is cool because it allows you to handle problems of type X" easily.

His broader theme is that software systems are too fragile, they are designed to handle very small ranges of problem types and our systems need to be much more generic. Here I think he is just being impatient. People are writing more generically today than they were a generation ago. It takes time to notice a design pattern and then more time to figure out what a good generic solution for it is. It is not so much a failure of computer languages as it is a failure of collective human knowledge. We just haven't thought deeply enough about these problems yet.

I'm only pushing this idea,

(at 40:30) I'm only pushing this idea, not because I think it's the right answer. I'm trying to twist us, so we say, "This is a different way to think." We have to think fifty-two different ways to fix this problem. I don't know how to make a machine that builds a person out of a cell. But I think the problem is that we've been stuck for too long diddling with our details. We've been sitting here worrying about our type system, when we should be worrying about how to get flexible machines and flexible programming.

That's a worrying argument...

That's a worrying argument...

Where...

...worrying == incoherent.

Not at all. He's just saying

Not at all. He's just saying that we should look for more expressive languages, instead of looking to improve correctness checking for existing languages or explore small variations of existing languages. I'm sad, but not surprised, that this is controversial.

Here is a statement that is undoubtedly even more controversial: type systems are not just a time wasting diversion, they are actively harmful. Type systems fundamentally put a box around us to limit the kinds of programs we can express to a class that is known to be "good" in some sense. That means that when we find a new idea or idiom, we might not be able to express it. Even worse, the box creeps into our minds and limits even our thoughts about programs to stay inside the box, so we won't even think of the idea. In this way type systems may (or may not) lead to a short term increase in productivity, but long term they harm expressiveness and thus further increase in productivity. Occasionally a type system researcher finds an idea, an idea that is almost always taken for granted in dynamically typed languages. He then either writes a paper about how to convince an existing type system to accommodate the idea, or writes a PhD about a new type system whose box fits around the idea (or around part of it -- and the type system will be incompatible with other efforts to expand the box in different directions).

</tongue-in-cheek>

It's controversial...

...because it presumes that increased expressiveness comes from something other than mathematical logic. In taking biology as inspiration, it's also regressive: modern biology is increasingly drawing from... mathematical logic.

Some will argue that things that are "statistical" or "probabilistic," as computational biology certainly is, are divorced from mathematical logic. It's precisely this perspective that is in error, given a proper, modern understanding of probability. No one is saying that stochastic processes have nothing to tell us about how to make systems flexible and robust in the face of change. On the contrary; the point is precisely that our modern understanding of stochastic processes is well-grounded in mathematical logic. This is the foundation of modern machine learning, for example.

Sorry, I added the second

Sorry, I added the second paragraph to my post while you were writing yours. Perhaps you want to edit your reply according to this. I don't want to make your reply appear in a different light than it was intended. In transactional terms: write skew happened.

Hmm...

Has the foundation of modern machine learning benefited (at all) from type systems? For example, for a reinforcement learning problem, is it useful to encode the problem in a type system that disallows describing state via continuous parameters, to guarantee that computation won't diverge and is deterministic? I don't know enough about machine learning to know how encoding ML problems in a type system would benefit, and what those examples would be, so my example question might be lame. But how would you use a type system to (a) check that there is a unique global minima (b) prove that it can automatically generate efficient computation to solve for that global minima? And what do you do for non-convex problems like guiding an airplane along an optimal route to avoid radar dedection?

Related:
Has computational biology benefited (at all) from type systems?

I'm not interested in bashing Sussman's argument as incoherent. That's ridiculous and rude, and let's not go there any further. Edit: Please see LtU Rules of Discussion in the Policies page.

More later...

...but, as a matter of principle and personal policy: just because someone has a doctorate, is a professor at MIT, and co-discovered the Scheme programming language doesn't give them license to be polemical and disallow polemical responses. So:

Z-Bo: "II'm not interested in bashing Sussman's argument as incoherent."

Tough.

Z-Bo: "That's ridiculous and rude, and let's not go there any further."

I'll go where I damn well please, thank you very much.

Please do go there, but add

Please do go there, but add some reasoning and address the actual point. Posts that consist entirely of things like:

Where...worrying == incoherent.

don't add anything to the discussion.

I watched the first 15 mins

I watched the first 15 mins or so of his talk. I have to admit that I too found it rambling and incoherent, and since he didn't really seem to make a point in that time, I shut it off.

It's hard to argue against such confused thinking, in my opinion.

The sides of the box are

The sides of the box are sturdier than I thought ;)

I interpreted his talk as follows:

(1) We are not there yet, not even close. We know this because there are two biological systems that we haven't the foggiest idea how to do: the brain and the genome. With a computational depth of on the order of 10 steps, the brain does amazing things, like recognize an object and taking the appropriate action. With about 1GB of information, the genome encodes humans. About 1GB of information encodes Microsoft Windows. Furthermore, with small mutations to the genome you get vastly different but functional organisms, like a dog instead of a human. Currently we are exploring the programming language (and hardware) landscape by taking small steps around a local maximum, and we are trying to devise type systems for the wrong local maximum. Go do something radically different.

(2) Propagators are an interesting different way of doing computation.

What did you find incoherent about his talk? Again, just saying that it is doesn't add anything to the discussion.

Yes

I did promise "more later," and I intend to hold to that. I'm just back from business travel at the moment, so today is probably not the day. But I won't forget.

Not by biology but it's subject matter

Clearly, math is an increasingly greater aid to biology, the study of life. However, aside from Sussman's use of that term, I think it is quite clear that he (as is Alan Kay) is inspired by the subject matter of biology -- the solutions to large scale problems that have evolved in nature -- and not biology itself. A biologist might need math to understand these adaptations, but they evolved without a designer reading from a math text.

[Edit] Of course, Leslie Lamport had something to say about that.

Static type systems enforce semantics

You're right that, most often, it takes more time to get a new language feature into a statically typed language than a dynamically typed one. The reason is quite simple: a time system being a static analysis that, in some sense, proves the program correct, to get a satisfying type system around a feature you need to understand why it is correct and how to prove it. In fact, it's even harder : you don't only need to understand, for a given use case, how to prove it correct, you need to formulate a proof of correctness that will compose with all uses of the feature and of the rest of the language.

I don't think, however, that the process of forcing oneself to think about why a feature is correct, and how to uniformly prove it, has no value. It is perfectly fine if it comes as a second step, but your insinuation that doing a PhD on how to statically prove the absence of error for new kind of programs that are useful but can fail in subtle ways is waste of time is debatable.

I also don't think the "creeps into our minds and limits our thoughts" part is true. It's true that a lot of researchers from the static typing community will not be comfortable thinking about a feature they don't know how to type, because they'll feel they don't understand it well enough to encourage its use; but I don't think this is a "thought-limiting" aspect, rather a kind of self-imposed quality control. If you specific examples in mind, please share them. I would consider delimited continuations to be a counter-example : while notoriously difficult to type precisely, they have in a large part been developed by people from the static typing community.

Overall, well-behaved static type systems often play a positive role in language design, by adding a pressure to make the semantics well-defined and robust. I think of it as a co-design between static and dynamic aspects of the language.

I don't think, however, that

I don't think, however, that the process of forcing oneself to think about why a feature is correct, and how to uniformly prove it, has no value. It is perfectly fine if it comes as a second step, but your insinuation that doing a PhD on how to statically prove the absence of error for new kind of programs that are useful but can fail in subtle ways is waste of time is debatable.

Type systems don't prove a new feature correct, they prove a limited class of uses of a feature correct in a rather weak sense of correct. For example simply typed lambda calculus lets you write a limited class of programs. If you add polymorphism you expand your box, but no new feature was added. The same goes for subtyping and other extensions. I don't really believe that this is a waste of time, but I do believe type systems are neither sufficient nor required to understand language features, and that we should re-evaluate the usefulness of both "here's how to convince existing type system X to do Y" and "here's a type system that lets you do Y" where Y is taken for granted in dynamically typed languages. Perhaps we should do a bit more of expressing new things instead and a bit less of expressing old things in either old or new boxes.

I also don't think the "creeps into our minds and limits our thoughts" part is true. It's true that a lot of researchers from the static typing community will not be comfortable thinking about a feature they don't know how to type, because they'll feel they don't understand it well enough to encourage its use; but I don't think this is a "thought-limiting" aspect, rather a kind of self-imposed quality control. If you specific examples in mind, please share them. I would consider delimited continuations to be a counter-example : while notoriously difficult to type precisely, they have in a large part been developed by people from the static typing community.

Delimited continuations are just one language feature. Almost all other language features come from dynamically typed languages, and even delimited continuations originated in dynamically typed languages. Type classes are a notable counter-example. [Do type classes feel a bit wrong to other people? On the one hand the let you express certain things nicely by inferring function dictionaries from types, but the concept of run-time behavior being affected by types (i.e. by what the particular compiler is able to deduce statically about your code) feels off.]

PLT researchers' thoughts are probably not easily limited. Here's a bunch of examples that still hold today: multiple dispatch, first class module linking, open recursion across modules, run-time code loading and macros & meta programming in general. Although there are type systems that let you do limited forms of each, to my knowledge there are no type systems that let you do all of them together (i.e. choose your poison). For example function and data type definitions in Haskell cannot span multiple modules, even though that is exactly what you want in many cases. Now I'm sure that type system researchers have thought of this, but as far as I know there hasn't been a satisfactory solution, and neither has it been recognized as a major issue. People stay in the box and invent workarounds. Instead of waiting for the type systems to catch up, new and exciting things could be done with these features.

Suppose an alternative world where the Lisp family of languages didn't exist. We'd all be doing C. What would type system researchers be doing in that world? They would be trying to give types to various ways of using pointers and memory management. Very advanced type systems would even allow you to type return stack manipulation. Once the boxes are set up firmly, closures would probably never be invented, since the existing type systems wouldn't be able to type their encoding in C. Wouldn't it be better to invest into increasing the expressiveness beyond C, rather than trying to type check classes of C programs? This is the world we're in, except instead of C we have Haskell.

That's why work on optional and gradual types is exciting. Instead of forcing you to stay in a box, you can control the expressiveness/guarantees slider. When you program in a class with limited expressiveness and you get more guarantees, but you can always trade some guarantees for extra expressiveness. Instead of brick walls, we have lines on the ground. Ironically this allows more advanced type systems to be practical. If your box is not made of brick walls then you don't have to make your box large and convenient enough so that useful programs can be expressed in it in full. You can take advantage of the expressiveness/guarantees slider. Strong guarantees can be useful in parts of a program, even though working in a subset with limited expressiveness and/or high type annotation burden on the programmer wouldn't be feasible for the whole program. Static+dynamic contract checking is especially interesting. You can write programs with contracts as strong or as weak as you want. The compiler tries to check as much as it can statically and defers the rest until run time. The compiler tells the programmer which contracts were proven to succeed, which were proven to fail, and which were deferred until run time.

Fabric

You see boxes where I see fabric. I don't know why you are reiterating the bogus old implication that types merely take away expressiveness. More importantly, they add expressiveness elsewhere. It's a trade-off. (And the bigger your team and your project, the larger the yield will be on the typed side of the deal.)

Also, I don't know what C has got to do with anything. Types were invented by the same logicians and in the same context as lambda calculus. The same source where Lisp got its inspiration from.

Way to ignore what was said

Whether type systems improve productivity is a whole other discussion.

Do you disagree that type systems take away important expressiveness? I didn't say that they merely take away expressiveness, but take it away they certainly do.

Do you disagree that the vast majority of language features were invented in dynamically typed languages? Higher-order functions, pattern matching, conditionals, objects, multiple dispatch, generic arithmetic, memoization, marshaling, meta programming. Monads were invented in the context of static typing, but most if not all of its uses were invented before -- in dynamically typed languages (nondeterminism, probabilistic programming, exceptions, continuations, etc.). Even recently there were papers on how to do X within the constraints of Haskell, like memoization, open recursion, marshaling and run-time code loading.

Do you disagree that many of those are difficult to do in many statically typed languages because of the type system? If you can publish a paper on it, then it's probably difficult.

Do you disagree that if something is made difficult by the language you're working in then you probably won't think of it?

If you believe, as Sussman argues, that we have only started to explore the language design landscape, isn't it better to work with languages that don't impede this exploration? I think he's saying shouldn't we set the bar higher than variations of "how to do X in statically typed language Y" (where Y=Haskell most often)? Instead of taking the functional/imperative basis as a given and designing ever more type systems that let you express yet another subset of its programs? That is what I was trying to get at with the comparison to C. I think we can agree that that basis is certainly obsolete, and that if we were in a world where we only had C we should focus on improving the basis first and design type systems for that later.

Note the similarity between what Sussman is saying and what Alan Kay is saying: expressiveness of the methods and languages today is far too low. Something with the same usefulness as Windows shouldn't take multiple millions of lines of code, but 20k lines.

(And the bigger your team and your project, the larger the yield will be on the typed side of the deal.)

Do you have any evidence for that?

Type systems will win

Do you disagree that type systems take away important expressiveness?

I disagree with that on a technicality. Sound type systems that lack an unsafe coercion take away important expressiveness. As soon as you add the ability to do unproven type refinement, there's really no downside to static typing. Static typing is pretty much exactly what's needed to keep intellisense-type features well behaved. There are other features you need to support meta-programming, but I'll go out on a limb and claim that languages of the future feel more like static languages than dynamic ones.

I do agree with your general position on the relative importance of finding the right abstractions over proving the correctness of those abstractions. But I also think that searching for abstractions that are amenable to correctness proof is a good way of finding the simplest and best version of a given abstraction. Sure, there are lots of worthless papers diddling with worthless new type systems, but I think that's just an instance of Sturgeon's Law.

I'll add that Sussman's argument that human genetics are proof that we need radically programming languages is seriously flawed, though I'm having trouble articulating why. In that regard, it reminds me of the old "riddle":

Three guests check into a hotel room. The clerk says the bill is $30, so each guest pays $10. Later the clerk realizes the bill should only be $25. To rectify this, he gives the bellhop $5 to return to the guests. On the way to the room, the bellhop realizes that he cannot divide the money equally. As the guests didn't know the total of the revised bill, the bellhop decides to just give each guest $1 and keep $2 for himself.

Now that each of the guests has been given $1 back, each has paid $9, bringing the total paid to $27. The bellhop has $2. If the guests originally handed over $30, what happened to the remaining $1?

I suppose the easiest way to state the problem with Sussman's argument is that it relies on the premise that there even exists some language which admits a comprehensible encoding of the blueprint for a human being. How do we know that the shortest path to creating such a system isn't through hundreds of thousands of specially designed systems developed in languages similar to those we have today? The only evidence he gives for the existence of better languages is specific examples of things that clearly seem that they could be integrated into languages (and many/most of his examples already have been).

As soon as you add the

As soon as you add the ability to do unproven type refinement, there's really no downside to static typing.

Technically, I agree. Practically, I don't fully agree for the following reasons:

Statically typed languages sometimes lack the required dynamic semantics simply because their type system disallows that part. For example you aren't going to be able to do if p then 2 else "foo" in some languages, simply because the compiler or interpreter lacks the required support for this. Humans being humans, technically being able to do something doesn't mean that it is practical or easy or obvious enough that they will think of it.

Gradual typing promises to solve these problems. You start with a dynamic language, and then you add optional type annotations where you want more guarantees. The expressiveness remains the same as the dynamic language, and with the same convenience (you don't need to insert explicit casts). Another nice aspect of this is that there is no reason why the annotations have to be limited to things we currently know how to prove statically. You need a contract system anyway for safe interoperation of dynamically and statically typed code (for example if you pass a dynamically typed function to a statically typed function that expects a int -> int).

Do you think that gradual type systems are the future, or type systems with explicit type refinement, or full static type systems? Why?

It remains to be seen if static typing is required for very usable intellisense. Intellisense for dynamically typed languages is already pretty good. For it to be useful it doesn't need to work 100% of the time. So for example in a language with refinement types (predicate types), it would be fine to just pay attention to the simple types when doing intellisense. Intellisense is currently also answering the wrong question, in my opinion. The question should not be "which operations could the programmer possibly write here" but rather "what operations will the programmer probably write here, ranked by probability". The former leads to enormous lists of code completion in e.g. Visual Studio. And sometimes the list doesn't contain the completion I want because when typing the method name I realize that I gave the variable the wrong type. Now it is partly a statistical problem.

That leads to another subject that is not addressed enough in my opinion: programming languages are user interfaces for programmers. What can we do to improve them and the tools to use them in that light?

There are other features you need to support meta-programming, but I'll go out on a limb and claim that languages of the future feel more like static languages than dynamic ones.

I agree with this in the sense that languages like Ruby and Javascript are too dynamic. It makes code hard to understand for no real gain. On the other hand I don't think languages will follow the model of current statically typed languages where the compiler will flat out reject to run your program just because it contains a type error or worse because it wasn't convinced that your program doesn't contain a type error. Feedback in the IDE is valuable, but rejecting to run your program is not. Even disregarding the case where the compiler isn't convinced that your program is type safe, even if a program does provably contain a type error it is sometimes useful to run it to get an actual example of why it does. This becomes more important as type systems get more advanced, and the proof burden becomes higher.

What do you think about compile time (static) meta-programming?

I do agree with your general position on the relative importance of finding the right abstractions over proving the correctness of those abstractions. But I also think that searching for abstractions that are amenable to correctness proof is a good way of finding the simplest and best version of a given abstraction.

This is sometimes the case, but sometimes the reverse is the case. For example take the different ways of building data structures in Haskell. You have sum/product types, GADTs, records. For other ways of representing data (like objects and data types that span multiple modules) Haskell doesn't have explicit language support, but there are papers describing ways to get around this by encoding them in a certain way. All in all the situation is pretty complicated compared to what it would have been in a dynamically typed version of Haskell. We have a similar situation in languages like Java, where there is a distinction between objects and classes. But it is certainly true that types sometimes guide you to the right path. In general explicitly stating your pre and post conditions (i.e. assumptions and guarantees) helps design, even if they are not checked statically by the compiler (or at all).

How do we know that the shortest path to creating such a system isn't through hundreds of thousands of specially designed system developed in languages similar to those we have today?

Indeed we don't. His other example is the brain. There is evidence that the brain is homogeneous: if you take an animal and remove its eyes from its visual cortex and attach the eyes' nerves to the auditory cortex, then it will learn to see again. That indicates that to build brain level intelligence we shouldn't create better languages to describe each function separately, but rather we should find a universal statistical machine learning algorithm that the brain seems to be using.

Even though his argument is flawed, I think there is some truth in his conclusion. Alan Kay has said more or less the same thing: millions of lines of code in MS Windows is not right. And he and his research group actually provide existence proofs by creating languages that allow you to describe subsystems of modern computing with orders of magnitude less code. The gain comes at least as much from simplifying the requirements as from more expressive languages though. You're not going to recreate the exact behavior of Windows with significantly less code.

BTW, that riddle is nice. First I was confused, and on a second reading I was confused about why I was confused.

Mostly agreement

For example you aren't going to be able to do if p then 2 else "foo" in some languages

I don't know that it's a big deal that you can't do that in most static languages, but I agree that you should be able to do that. My thinking is that there should be a customizable type system with user defined semantics atop a logic that allows you to reason about constructions like the above.

One problem I have with gradual/optional types is that compiler/IDE won't always know what's going on. If the extent of interactivity that we want with our code is limited to name completion, then I might agree that this isn't a big problem, but I think the trend should be towards additional interactivity and as we move in that direction it will be increasingly important for the static system to know what's going on.

Another problem with the compile-time/run-time type interface hand-offs is that with a decent type system, you can't check a value for membership in a general type at run-time. Any type that quantifies over an infinite set of possibilities, for example, cannot be checked for membership in the naive way. So you either have a weak type system or you need a more general explanation of what happens when you move from untyped to typed code.

On the other hand I don't think languages will follow the model of current statically typed languages where the compiler will flat out reject to run your program just because it contains a type error or worse because it wasn't convinced that your program doesn't contain a type error.

Agreed. For example, the IDE should, when possible, strive to show you the result of your program and sub-programs at edit time, regardless of type or syntax errors elsewhere.

What do you think about compile time (static) meta-programming?

I think it should be much easier than it is in C++ and Haskell :). But I think staged programming is the goal and meta-programming in the form of code manipulating code is mostly a bad idea (except where the code is the primary subject of consideration, such as with symbolic manipulation / refactoring algorithms).

Alan Kay has said more or less the same thing: millions of lines of code in MS Windows is not right.

Absolutely agree. I find Sussman's human argument less than compelling, but do also believe that we need better languages and simpler abstractions. Linux isn't significantly better than Windows in this regard.

One problem I have with

One problem I have with gradual/optional types is that compiler/IDE won't always know what's going on. If the extent of interactivity that we want with our code is limited to name completion, then I might agree that this isn't a big problem, but I think the trend should be towards additional interactivity and as we move in that direction it will be increasingly important for the static system to know what's going on.

What kind of IDE features do you have in mind that depend on type information?

Another problem with the compile-time/run-time type interface hand-offs is that with a decent type system, you can't check a value for membership in a general type at run-time. Any type that quantifies over an infinite set of possibilities, for example, cannot be checked for membership in the naive way. So you either have a weak type system or you need a more general explanation of what happens when you move from untyped to typed code.

That's what contract checking help with. You need them to check function types as I explained above (because it is undecidable if you try to do it the naive way). They can also check universally and existentially quantified types [1] and dependent types [2] and, of course, general refinement types. Is that enough for the type system you have in mind?

You need the same anyway if you include unproven type refinement. For example when you cast a function from Any -> Any to int -> int you are going to need contracts to check whether that was valid (otherwise weird things can happen inside typed code when a string can dynamically flow into a variable of type int -- and then the compiler can't assume that typed code is type safe so it wouldn't be able to do any optimization based on types). The only difference with gradual/optional typing as far as I know is in whether you need to write the casts explicitly, or program in a "pay as you go" model where you add types as you need them and the IDE shows where unproven casts happen.

I think it should be much easier than it is in C++ and Haskell :). But I think staged programming is the goal and meta-programming in the form of code manipulating code is mostly a bad idea (except where the code is the primary subject of consideration, such as with symbolic manipulation / refactoring algorithms).

With staged programming you can make code run faster but you can't add a new construct to your language, right? Or are you thinking about a different kind of staged programming than I am? For example if your language didn't have monadic do-notation you wouldn't be able to add it. I think that being able to add new constructs is valuable, even though it should be heavily discouraged for most cases.

[1] http://homepages.inf.ed.ac.uk/wadler/papers/blame-for-all/blame-for-all.pdf

[2] http://users.soe.ucsc.edu/~cormac/papers/popl11.pdf

What kind of IDE features do

What kind of IDE features do you have in mind that depend on type information?

For example, ambiguity resolution, case enumeration, presenting relevant documentation, refactoring. Pretty much anything that requires static understanding.

You need the same [contract checking] anyway if you include unproven type refinement.

You need something along those lines, but really you just need the ability to keep the implementation memory safe and performant.

And anyway, my main point was the first one -- in an environment with gradual typing, you really want to have static types almost everywhere anyway. This approach that's frequently touted of starting with purely dynamic code and then slowly adding types is not the right one. Simple static types give too much benefit for almost free. Instead, we want to start with simple types that can be almost completely inferred and then slowly add more precise types as needed to establish extra guarantees. The cost of a spattering of casts that we might need is far outweighed by the benefit of having the compiler/IDE know what's going on.

With staged programming you can make code run faster but you can't add a new construct to your language, right? Or are you thinking about a different kind of staged programming than I am?

You're right, it's not just staged programming. We need a mechanism for syntax extension. I was just making the point that the better way to e.g. create a function binding for every value in a database column is to have an interaction with running code at compile-time, and not to generate code for the functions.

Instead, we want to start

Instead, we want to start with simple types that can be almost completely inferred and then slowly add more precise types as needed to establish extra guarantees. [...] The cost of a spattering of casts that we might need is far outweighed by the benefit of having the compiler/IDE know what's going on.

I agree completely: almost all code should at least have simple static types. But once you go to more advanced types it is helpful to have a smooth path between less precise types code to more precise types. That's where gradual typing can help. For example currently division by zero is a run time error, because it's just too mundane to have a special type of numbers that can't be zero and having to prove that it can't be zero or insert explicit casts. With gradual typing the path to stricter types is more smooth: the compiler automatically inserts the run-time check for you if it couldn't prove it the value to be nonzero statically, and the IDE shows you an indication that a precondition wasn't statically verified. The same goes for many other things, like zipping two lists of potentially different sizes, or multiplying two matrices with potentially mismatching sizes, or maintaining an invariant in a data structure.

Putting those strong preconditions as metadata on the code instead of as "throw DivisionByZeroException" inside the code has many advantages, even if they can't be statically checked (yet). Contracts allow more precise blame assignment that tells you where something went wrong, instead of a call stack dump in a potentially unrelated part of your application that blew up due to a type error that happened much earlier (you see this a lot when programming in Java and C# with null pointer exceptions). Randomized testing tools can test the pre- and post-conditions statistically, and white box fault finding tools that use theorems provers to find counter examples can also make use of them. But for it to be practical to use more precise types the compiler shouldn't require the programmer to insert casts every time it couldn't prove the validity.

To make progress towards better correctness checking we need to get rid of the false dichtonomy between "proven correct" and "incorrect". And investigate the interaction between humans and IDEs/compilers beyond "I rejected your program because it may contain a type error". For example hovering over a function call in your editor and getting "here's a list of things preconditions will provably always fail when you run this call, here's a list of preconditions that I have these examples for that make them fail, and here's a list of preconditions I couldn't prove don't fail but don't have any example for". For example:

foo(int a, int b, int c) = a / (b + c)

Now hovering over the / operator gives "May divide by zero: b=0, c=0". It is quite feasible to get this kind of information today with randomized testing and white box testing (e.g. Pex). Change it to this and you don't get a warning:

foo(int a, positiveInt b, positiveInt c) = a / (b + c)

I don't quite see how this should interact with type inference though. For example if you had this:

foo(a, b, c) = a / (b + c)

Should it now be assumed that a,b,c can be anything? So you'd even get messages saying that + could raise an error because b could be a string and c a number? On the other hand you obviously can't infer and check arbitrary preconditions statically. This part of the interaction with the programmer will need be figured out. Do you have ideas?

You're right, it's not just staged programming. We need a mechanism for syntax extension. I was just making the point that the better way to e.g. create a function binding for every value in a database column is to have an interaction with running code at compile-time, and not to generate code for the functions.

Agreed. Type providers in F# are an interesting solution for that example, though they don't allow for general syntactic extensions.

In general, with the type

In general, with the type system I have in mind, contracts will not be able to automatically assign blame. For example, consider the following:

1. I construct a value x of type DirectedGraph
2. I cast the value to type DAG (asserting without proof that there are no cycles in it)
3. I write a function f that maps DirectedGraphs to pairs of nodes
4. I cast the function to a type encoding that acyclic graphs map to a pair of nodes minimizing some distance function.
5. I set v1, v2 = f x
6. I observe that some other v3, v4 have a lower distance than v1, v2, and conclude that something has gone wrong.

How is the system going to automatically tell me which of the two casts (step 2 or 4) is to blame? It seems clear to me that it's not possible in the general case.

There's probably some recognizable fragment for which blame can be automatically assigned with contracts, but that's just a debugging tool anyway, once you've ensured that examples like the above are always implemented in a memory safe way.

I don't quite see how this should interact with type inference though. [...] Should it now be assumed that a,b,c can be anything?

I think the answer to your example depends on the details of your numeric facilities. Assuming that + and / are monomorphic functions dealing with types Float, I think you would want at least an inferred type of a:Float, b:Float, c:Float -> Float. You could also infer the requirement b+c!=0. Apologies if you were really trying to ask about overloading.

Another case is the following:

bar x = [1, x]

What type does this infer for bar? Any -> [Any]? I don't think that's particularly useful. The nice thing about inference is that we're inferring an assumption, not a conclusion. Thus, there's really no essential soundness condition that we need to respect. We can infer the type Int -> [Int] for the above function and only risk mildly annoying a programmer who intended the function to be more general and now must add an annotation.

At point 2 the contract DAG

At point 2 the contract DAG would check that the graph contains no cycles. At point 4, the MinimizesTheCostFunction contract would check that the pair returned minimizes the distance (by brute force -- there is a paper on asynchronous contract checking for these cases where your contracts are expensive).

Sure, there are cases where even run-time checking is not possible. For example if you have a Program and you cast it to a HaltingProgram. In those cases you'd just have the run-time checking of the code do nothing, or just some approximation that catches easy cases. This kind of thing is rare though, and type systems don't check these kind of properties either, so I don't consider it a major problem with contracts. In practice I've found that run time assertions cover a very much larger space of preconditions you want to specify than types found in most languages.

What type does this infer for bar?

I'd say you'd infer forall T. T -> [T union int] Or even forall T. T -> [T union 1] where 1 is the singleton type containing only 1.

Thus, there's really no essential soundness condition that we need to respect. We can infer the type Int -> [Int] for the above function and only risk mildly annoying a programmer who intended the function to be more general and now must add an annotation.

Yes, that is what I was I trying to ask about. Should we infer a weakest precondition that's neccesary not to fail, or a precondition that's sufficient not to fail, or something even stronger? To answer these questions I think you'd need to look at the user interface. How can you communicate useful information to the programmer, both at the definition where types are inferred and when calling the type inferred function. I don't think it's a good idea to just reject the bar x = [1,x]; bar 1.3, and not even bar x = [1,x]; bar "foo" as per the don't reject policy, but perhaps you'd want to show a warning in the latter case. A working system needs a general policy for this since you can't have a human make the decision on a case by case basis.

Inference bounds

At point 2 the contract DAG would check that the graph contains no cycles. At point 4, the MinimizesTheCostFunction contract would check that the pair returned minimizes the distance (by brute force -- there is a paper on asynchronous contract checking for these cases where your contracts are expensive).

Sure, we can do this, but I don't want this mechanism associated with my type system. I think it belongs in a separate facility. For example, the type level specification of "does not contain a cycle" might result in an exponential algorithm if extracted naively.

I'd say you'd infer forall T. T -> [T union int] Or even forall T. T -> [T union 1] where 1 is the singleton type containing only 1.

I don't think that either of those are going to be good answers in practice. They type more programs, but at the expense of the compiler/IDE quickly having no clue what's going on. And if your language has type directed overload resolution, then you probably don't even type more programs (at least interesting ones).

My idea is to associate to each type an inference bound for that type that's used in this situation. So even though 1 could have type (Singleton 1), the bound on that type will be something much bigger, like Int or Nat or Num (an abstract numeric type). To infer a type T, in a situation where we know type S such that S <: T, we infer that T <: bound(S).

A type of Num -> [Num] is probably the most reasonable thing to infer for the above program, IMO. Well, you could also infer "where the first element is 1".

I don't quite see how this should interact with type inference

I don't quite see how this should interact with type inference though. For example if you had this:

foo(a, b, c) = a / (b + c)

Should it now be assumed that a,b,c can be anything? So you'd even get messages saying that + could raise an error because b could be a string and c a number? On the other hand you obviously can't infer and check arbitrary preconditions statically. This part of the interaction with the programmer will need be figured out. Do you have ideas?

My idea would be to have a unique typing t * t -> t for the + operation, with an "associative function" type annotation.

Similarly the / operation would have a type like t * numeric type -> t | t, where the second option is used for exceptional conditions such as a divide by zero to return a NaN or an infinite value, if these can be represented with t. I would use syntaxic sugar to conflate the two options into one if the programmer does not special case them.

Finally, I would type foo using the type information at hand to define preconditions applied to a generic type, then let either the static compiler or a hypothetical JIT compiler specialize foo for a specific set of types at the point of invocation (using memoization to reuse specializations).

Now that I wrote this my "generic type" smells like dynamic typing doesn't it? I thought of it more in the lines of Java's Object, but they may be one and the same.

Multiple dispatch

Now that I wrote this my "generic type" smells like dynamic typing doesn't it? I thought of it more in the lines of Java's Object, but they may be one and the same.

In fact this is not dynamic typing, since multiple dispatch would be used in the generic type implementation to find (or not) an appropriately typed implementation.

That indicates that to build

That indicates that to build brain level intelligence we shouldn't create better languages to describe each function separately, but rather we should find a universal statistical machine learning algorithm that the brain seems to be using.

We already have it. It's called Solomonoff Induction.

Thank You

I was going to refer to Solomonoff Induction if no one else did.

I would also have a look at A Family of Gödel Machine Implementations for more on universal artificial intelligence, and The Experimental Effectiveness of Mathematical Proof for more on the relationship between mathematics and the natural sciences. Interestingly, both of these have delimited continuations as centers of their reasoning. Finally, see Foundations of Inference, which unifies probability and information theory in a really pleasing and, I expect, powerful way.

Another requirement is that

Another requirement is that it's practical ;)

Solomonoff Induction...

...is only "impractical" in the sense that it can't be completely computed in finite time, but so what? There's no evidence that human beings can do induction in the general case in finite time either!

On a slightly more serious note, SAT solving used to be "impractical" too, for being NP-complete, but that hasn't stopped us from making it a practical tool in a wide variety of domains, either.

The question should not be

The question should not be "which operations could the programmer possibly write here" but rather "what operations will the programmer probably write here, ranked by probability". The former leads to enormous lists of code completion in e.g. Visual Studio. And sometimes the list doesn't contain the completion I want because when typing the method name I realize that I gave the variable the wrong type. Now it is partly a statistical problem.

Coincidentally, I'm writing a paper on this subject right now: how to include probability in the code selection process using language design. Do you know of any sources with similar ideas?

I don't know of any sources

I don't know of any sources about sorting intellisense that way, but I suppose that your best bet is to look into machine learning. You want to know for a particular location l in the program for every possible completion x the probability P(x in l) that the user will insert that x in location l. This is a machine learning question. The standard approach would be to set up a graphical model, gather a corpus of training data, and use that to predict P(x in l). Features to put in the graphical model would be x (obviously), string_typed_so_far, expected_type, x_is_in_a_library, x_is_in_local_scope, x_is_in_stdlib, last_deleted_expression, surrounding_expression, i.e. anything that probably influences x. Once you know the predicted probability distribution of x'es, you probably don't want to just order by the probability. You will probably want to incorporate a loss function. For example to make sure that if string_typed_so_far is an exact match with a particular x, then that x appears at the top regardless of the predicted probabilities.

I wasn't thinking about

I wasn't thinking about using ML yet, rather to do some language design that made it easy to construct probability models by hand for a new language without corpus. I think it is key for static typing to be less rigid, but this doesn't necessarily mean resorting to dynamic typing; just make static typing friendlier and more forgiving?

I've been talking to a colleague speech recognition researcher about the harder problem of training on real source code, and his advise is similar to yours (the key is to define the right kind of features to train on). There is also some work in API selection but the heuristics are very basic, I don't think the results are very good yet, say "MAPO: Mining and Recommending API Usage Patterns," or the various work done by Marcel Bruch.

I know a few

This is off-topic, so e-mail me and I'll dig them up Thursday while I am baking the turkey. Alternatively, I think I saved a couple in my tumlbr log.

Invention

Do you disagree that the vast majority of language features were invented in dynamically typed languages? Higher-order functions, pattern matching, conditionals, objects, multiple dispatch, generic arithmetic, memoization, marshaling, meta programming.

Higher-order functions are an integral part of lambda-calculus. I'm a bit fuzzy on the original history of lambda-calculus, but I believe that the first versions were actually typed (but inconsistent), and that untyped lambda-calculus came only later. You were probably thinking of Lisp, which is probably the first implementation of higher-order functions, but lambda-calculi were already programming languages.

I always considered pattern matching as we know it to descend from Hope, which was statically typed if I remember correctly. Of course it's hard to exactly point out a discontinuous event like "invention of pattern matching", previous typecase-like constructions (present in ALGOL for example) were close in spirit.

Similarly, I always believed that object orientation was considered to be born in Simula, which was a statically typed language.

(You also claim that delimited continuation "originated in dynamically typed languages"; I would have believed that as well, given the strong Lisp/Scheme school of non-delimited continuations, but Wikipedia refers to an end-of-eighties paper describing "partial continuations" in a typed setting that I think predates implementations.)

I'm not trying to launch a war/contest (but still think it has value to try to be precise on those historic points) regarding programming language features innovation. In my mind, there is a relatively elastic but still well-defined separation between "static" and "dynamic" aspects of a programming language. Type systems are mostly in the static part (as well as, say, syntactic aspects), operational control/flow features (conditional, backtracking...) in the dynamic part, but overall language have both dynamic and static features. In your list you're mostly considering dynamic features. You didn't consider parametric polymorphism, GADTs, type-erasure semantics, abstract types, type inference, dependent types, because it's very hard to think about them in a language with no established static system (and I think that's thought-limiting as well).

I find it non-surprising that dynamic features have usually be considered first in a dynamic-only language setting. Some of those features can be trivially ported to a static setting (if-then-else conditionals for example), some other require more work (general references containing values of any type, delimited continuations, datatype-generic programming...).

You're entirely right that getting a satisfying static system for some language features (generally at the edge of what people think about for language design; your linking, run-time code loading etc. are good examples) is still an open research topic. On the other hand, not trying to statically reason about those features is generally easy, even in statically typed languages -- Caml and SML don't try to statically reason about exceptions for example. Yes, static reasoning is hard, but is also valuable.

I have absolutely no problem with the idea of using dynamic languages as a vehicle for language design and implementation -- I like to say, while I'm a strong supporter of type systems, the single language that impressed me most lately is Oz. In fact, even researchers that are well-known for type systems work do that (see the SAFE project for example). I'm not convinced by your idea that, for that to happen easily, we must use languages that have no static system at all -- in my experience, typed languages let you think about dynamic aspects just as well, only they also make you think about other things.

Suppose an alternative world where the Lisp family of languages didn't exist. We'd all be doing C. What would type system researchers be doing in that world?

You seem to ignore the entire brand of languages that predated C (ISWIM : 1966), rooted in mathematical research from the first half of the century, and that co-evolved powerful programming languages (eg., Caml, SML, Haskell, Agda...) and proof assistants (eg., LCF, HOL, Coq...). Where would those go in your alternate history where nobody is thinking about language design but Lisp and C people?

C is to Haskell as Haskell is to what?

I didn't consider lambda calculus to be a programming language, but indeed you could depending on your definition of programming language. [Is lambda calculus even the first use of lambdas in mathematics? Surely some mathematician must have written down an anonymous function before lambda calculus.] I thought that pattern matching was invented in Lisps in the late 60's, but I could easily be wrong. You're right about Simula. Interesting point about delimited continuations. I can't find the paper from June '87 you're referring to; what makes you think that it is in a typed setting? Wikipedia also mentions and links to a paper from February '87 "Beyond continuations" that describes delimited continuations in an untyped setting.

It's a good point that I am considering mostly dynamic features. I certainly agree that the static features you mention are good work. On the other hand, dynamic languages do fine without a static type system, but statically typed languages don't do fine without a dynamic semantics. So while it is to be expected that dynamic language research doesn't contribute much to static features, it isn't immediately clear that we shouldn't expect static language research not to contribute much to dynamic features. If type systems aren't thought limiting, then we should expect language constructs to come from the dynamic/static communities in proportion to their size, even if a static type system is not immediately invented to go with it (the static language research community seems to be larger than the dynamic one, is that right?).

On the other hand, not trying to statically reason about those features is generally easy, even in statically typed languages -- Caml and SML don't try to statically reason about exceptions for example. Yes, static reasoning is hard, but is also valuable. [...] in my experience, typed languages let you think about dynamic aspects just as well, only they also make you think about other things.

I agree that static reasoning is valuable, but I don't agree that "not trying to statically reason about those features is generally easy, even in statically typed languages". For example how many statically typed languages support run-time code loading, a solution to (the dynamic aspects of) the expression problem, and marshaling without extensive research?

You seem to ignore the entire brand of languages that predated C (ISWIM : 1966), rooted in mathematical research from the first half of the century, and that co-evolved powerful programming languages (eg., Caml, SML, Haskell, Agda...) and proof assistants (eg., LCF, HOL, Coq...). Where would those go in your alternate history where nobody is thinking about language design but Lisp and C people?

I assumed that if Lisp wasn't invented, then Caml/etc wouldn't be invented either. In any case the point is not a historically correct "what if" scenario, but rather "what if we were stuck with a language with bad dynamic semantics". I used C as an example. Would it then be the case that with current research methodologies we'd be adding type systems to an inferior dynamic semantics for a long time? The second question is "are we currently stuck with a language with bad dynamic semantics?". In other words, is there a language out there such that C is to Haskell as Haskell is to that language? (where Haskell is a free variable that varies over the set of languages we currently believe are good) Sussman believes that we are (unless I misunderstood the point of this talk).

One more lengthy post

I didn't consider lambda calculus to be a programming language, but indeed you could depending on your definition of programming language. [Is lambda calculus even the first use of lambdas in mathematics? Surely some mathematician must have written down an anonymous function before lambda calculus.]

λ-calculus is much more than the simple idea of anonymous functions. Those were certainly present already, but lambda-calculus came from the idea of using syntax — in one of the original papers, Church presents λ-terms as a generic binding syntax fit to represent logical quantifiers – to study meaning, by imposing a static discipline to avoid nonsensical, paradoxical mathematical statemnts. There was also the idea that syntactic reduction was a form of computation; I'm not sure if β-reduction was thought of as computation since the beginning of λ-calculus, but it was at least apparent when it was compared with the machine formalisms of Turing. Without a doubt, those ingredients define a programming language. Note that already at that time, people were aware of the static/dynamic description, and actually designed an untyped λ-calculus slightly after the first – typed — presentation, specifically as a device to study the purely computational aspects (confluence, etc.).

I agree that static reasoning is valuable, but I don't agree that "not trying to statically reason about those features is generally easy, even in statically typed languages". For example how many statically typed languages support run-time code loading, a solution to (the dynamic aspects of) the expression problem, and marshaling without extensive research?

OCaml for example supports run-time code loading (Dynlink module) and marshalling (Marshal module). I'm not sure how long it has been available, but I think it's mostly because the designers didn't consider those aspects as core part of the language, rather than being difficult. There is a cultural/structural effect that researchy languages – static or not – are often lead by small team, and hence often do not venture into those language aspects that are at the edge of software development. I'm quite certain no extensive research went into both those features, that are designed in a simple way.

Marshalling in OCaml, while simple, is relatively unsatisfying as it is has no static safety (the input_value has a polymorphic return type, so the type system will trust any use you make of it to be well-typed) *and* no dynamic safety, as type are erased at runtime and no dynamic checking is performed; OCaml marshalling is less safe than usual dynamic languages marshalling, as it can result in a segfault if misused. That is however a consequence of the type-erased implementation of OCaml; you can have static languages that carry runtime type information – Andreas can confirm, but iirc. AliceML does just that, and it has very good marshalling, code loading, code distribution... features, but then you have the usual performance issues of dynamic languages – or design ways to inject runtime type information at specific points, as was done in various Caml variants, or simply using Haskell type classes (Typeable). This is fairly representative of integration of difficult features : you can have the very simple choice of keeping them out of the type system – but then you don't have static information on this part — or do more elaborate things to mix them more gracefully with your type system. Note that in the marshalling case, the static behaviours are not very interesting, and this is hard in all programming languages, even dynamically typed one, thanks to sharing, code pointers, encapsulation, garbage collection, etc., that bring difficulties when interfacing with the outside world. Other kind of extensions such as control-flow oriented features integrate like a charm.

(I'm not sure what you're referring to regarding the expression problem, probably multiple dispatch. I'm not familiar with it and unable to look at this offline, I'll do more research when possible.)

That's why work on optional and gradual types is exciting. Instead of forcing you to stay in a box, you can control the expressiveness/guarantees slider. When you program in a class with limited expressiveness and you get more guarantees, but you can always trade some guarantees for extra expressiveness. Instead of brick walls, we have lines on the ground. Ironically this allows more advanced type systems to be practical.

I'm not sure I share you precise vision of gradual typing, but I fully agree with the sentiment that is expressed here: the goal is not full static verification/proof of all the properties of a given program, but having the possibility to do a compromise between expressive runtime verification (contracts, etc.) and static checking. Shifting any aspect of correctness from the dynamic to the static world is generally a bit of work, but rewarding; it can go from very easy (ML type inference) to very hard (full correctness proof with dependent types) depending on what your safety needs and time/work budget are.

What I disagreed with in your original post is the sentiment that static checking somehow is useless and impedes language progress. I think we have progress to do on all fronts, static checking/specification included, and don't think it's pertinent or has any kind of objectivity to try to categorize one part of language progress as "the right thing" and another as "time waste / actively harmful".

(Obviously this side discussion is not very closely related to the question of how to design languages that would allow us to program large biological systems; unfortunately I haven't seen the original document, which is in a video format, that is extremely impractical to see with limited connectivity. I suspect however that current biological systems have very little to tell us about design, for the reasons given by Ben. Their features and structures are however interesting to study and reproduce, and there are definitely valuable new programming language ideas to be found trying to reproduce them -- after all a lot of language design was performed by the AI tribe doing just that.)

Yes

you can have static languages that carry runtime type information – Andreas can confirm, but iirc. AliceML does just that, and it has very good marshalling, code loading, code distribution... features, but then you have the usual performance issues of dynamic languages

Yes, Alice ML is employing "packages", a variant of type Dynamic, but with first-class modules inside. But only packages carry runtime type information, other values don't. That is, you only pay the extra cost at the point where you actually use the dynamic functionality. Likewise, runtime type errors are isolated to the points where you unpack (usually when unmarshalling).

(I'm not sure what you're

(I'm not sure what you're referring to regarding the expression problem, probably multiple dispatch. I'm not familiar with it and unable to look at this offline, I'll do more research when possible.)

Wadler says: "The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts)."

By dynamic expression problem I meant the same problem without the requirement of type safety (and possibly recompiling code -- that seems like a rather arbitrary restriction).

What I disagreed with in your original post is the sentiment that static checking somehow is useless and impedes language progress. I think we have progress to do on all fronts, static checking/specification included, and don't think it's pertinent or has any kind of objectivity to try to categorize one part of language progress as "the right thing" and another as "time waste / actively harmful".

Right. I don't really believe what I said there (hence the "</tongue-in-cheek>"), but I do think that laying aside the requirement of static typing in your mind can help language design. For example think about what a dynamically typed Haskell would look like. Sum types would be extensible across modules. Similarly partial function definitions would be extensible across modules. The different branches of pattern matching in function definitions can be placed in different modules. That addresses the expression problem head-on. Finding a way to type check that doesn't seem impossible, and it seems like the extra expressiveness could be very useful.

unfortunately I haven't seen the original document, which is in a video format, that is extremely impractical to see with limited connectivity

You can find the slides here, if you're interested (though obviously a large part of the information in a talk is spoken): https://github.com/strangeloop/2011-slides/raw/master/Sussman-WeDontKnowHowToCompute.pdf

Their features and structures are however interesting to study and reproduce, and there are definitely valuable new programming language ideas to be found trying to reproduce them -- after all a lot of language design was performed by the AI tribe doing just that.

Yes. Programming languages have a role to play in these efforts. For example probabilistic languages and automatic differentiation for machine learning.

For example think about

For example think about what a dynamically typed Haskell would look like. Sum types would be extensible across modules. Similarly partial function definitions would be extensible across modules. The different branches of pattern matching in function definitions can be placed in different modules. That addresses the expression problem head-on. Finding a way to type check that doesn't seem impossible, and it seems like the extra expressiveness could be very useful.

Extensibility of sums and pattern matching already has a solution. If you prefer an OO solution, IIRC multiple dispatch can be fully typed now, which also handles placing definitions in different modules, unless I'm not understanding your intention there. These solutions have been available for a number of years, although they aren't widely implemented.

There's also a downside to dispersing definitions across space: it becomes harder to understand and audit a program's behaviour. You must keep additional context in mind, or use additional tools to display all the relevant context from all places at once.

I haven't read the paper

I haven't read the paper yet, but it sounds great. Is it implemented anywhere where I can easily use it? So many cool type system papers, and it's sometimes disappointing to find out that it's implemented in a popular language, or easily available at all (sometimes not even implemented at all).

Now that you're here, maybe you know of something else ;) In languages like Java and C# I've come across the following limitation.

Sometimes you have an interface, like IToString, and for generic types you have e.g. that List[T] : IToString if T : IToString. So much like Haskell's type classes, but dynamically dispatched instead of statically. Has that been researched and is it available somewhere?

Sometimes you have an

Sometimes you have an interface, like IToString, and for generic types you have e.g. that List[T] : IToString if T : IToString. So much like Haskell's type classes, but dynamically dispatched instead of statically. Has that been researched and is it available somewhere?

If I understand correctly, the predicate "x implements IToString?", is given by:

x = x is IToString
y<x> = x implements IToString?
y<x0, x1> = x0 implements IToString? || x1 implements IToString?
y<x0, x1, x2> =...

If the predicate is supposed to be compile-time only, we can define a set of extension method overloads following the above template, and the absence of an overload indicates "x implements IToString" is false.

If a runtime check that can be called on any type T is ok, simply check Type.IsAssignableFrom() on T and recursively on any type parameters if it's a generic, then cache the result using generics:

public static class Implements<T, TInterface>
{
  static bool? cached;
  public static bool Check()
  {
    if (!typeof(T).IsSealed) throw new ArgumentException("T is not sealed!");
    return cached ?? (cached = RuntimeCheck());
  }
  static bool RuntimeCheck() {
  {
    var type = typeof(T);
    if (typeof(TInterface).IsAssignableFrom(type)) return true;
    // check generic type params
    foreach (var tvar in type.GetGenericArguments()) { ... }
    return false;
  }
}

The fast path of specifying sealed types for T costs only a generic static field access, which is very fast. If you need to obtain the implementation of IToString to access its methods, something can probably be done for that as well.

The above is a generic framework for checking and memoizing the results of arbitrary predicates. I use a similar pattern to implement a generic fold/unfold framework used as a safe alternative to reflection in my Sasa library (a code generated structurally reflective function is memoized in that case). In the extreme, you can extract the IToString instance by writing a structurally recursive fold using my framework.

Recursive check is unnecessary?

If we are speaking in a .Net context, I believe the recursive checking of generic type parameters is unnecessary as its included in the check of the top level type.

If I'm mistaken and it's not included, then the recursive check needs to consider variance. .Net generic type parameters are labelled for covariance, contravarience, or invariance.

Jules is asking about

Jules is asking about post-hoc interface extensions, with default implementations ala Haskell's deriving, so a List<T> would implement some interface IFoo if T implements IFoo.

Hopefully we're on the same page now, and given this, the type test would fail on List<T> but succeed on the generic type parameter T, and I don't think variance annotations apply. Subtyping is a problem though, because some inhabitants of T may implement IFoo, and some may not.

I am curious about what specific problems Jules is solving though, because there has to be a better solution than a big set of dynamic type tests.

The specific problem I was

The specific problem I was trying to solve was essentially the .ToString problem, but instead of conversion to strings I had conversion to GUI elements. So objects implement a .ToWidget(Widget Parent). I remember the problem coming up several times before though. If you do control the classes you can solve it in the following way:

interface Showable
{
    string Show();
}

class Bar : Showable
{
    public string Show() { return "Bar"; }
}

class Foo
{
    protected T it;
    public Foo(T x)
    {
        it = x;
    }
}

class ShowableFoo : Foo, Showable where T : Showable
{
    public ShowableFoo(T x) : base(x) { }
    public string Show()
    {
        return "Foo(" + it.Show() + ")";
    }
}

As you see that's pretty ugly, and you have to get lucky and get a ShowableFoo when it's appropriate. You also have to duplicate a lot of code in the Foo and ShowableFoo classes, because you often want ShowableFoo methods to return new ShowableFoos instead of Foos. Existing code may give you a Foo when you really wanted a ShowableFoo.

Because of all these problems I decided to just put the Show method directly on the Foo classes and have them throw runtime exceptions if T doesn't implement .Show(). Then I created an extension method that typecases on the remaining classes that I don't control and otherwise calls the instance method.

The adapter pattern and dictionary passing don't really work as Foos contain other objects nested within that you'd need to dynamically dispatch on to get the right adapter/dictionary. So then you're back to the same problem.

Do you know of a better way?

Right, I've run into these

Right, I've run into these types of problems before, and to avoid the code duplication and sprinkling type tests everywhere I had to reorganize how I solved the problem. That's why I was wondering about the specifics. For instance, what does your .ToWidget() do when it encounters an object that that does not implement the interface? It sounds like you just throw an exception, in which case why not just require the showable classes at your interface boundary? I'd like to understand how an object becomes a widget, and vice versa.

In the UI library I'm working on, objects of unknown type appear in the interface only as polymorphic type parameters, ie. the value is serialized to an HTML form field, so we don't need to inspect/show it. I'm trying to understand how your problem is different. Perhaps you're integrating into an existing UI library and not exposing your own interface?

If that's the case, I would define a visitor interface over the set of classes you're handling, and your big type case would just dispatch into this visitor and cache the dispatched method as a delegate (as I described above with the Implements class). This way your big type case is executed at most once per type, and every transform over the set of objects is handled as a visitor instance and the type case is guaranteed to be exhaustive. This is exactly the approach I used to define efficient fold/unfold over the set of primitive CLR objects; CLR primitives obviously don't support fold/unfold, so this is essentially a post-hoc interface extension. There are still some problems with subtyping, but nothing that can't be handled.

Right, I've run into these

Right, I've run into these types of problems before, and to avoid the code duplication and sprinkling type tests everywhere I had to reorganize how I solved the problem. That's why I was wondering about the specifics. For instance, what does your .ToWidget() do when it encounters an object that that does not implement the interface? It sounds like you just throw an exception, in which case why not just require the showable classes at your interface boundary? I'd like to understand how an object becomes a widget, and vice versa.

ToWidget is the interface boundary, really. The point of this library was to have a quick and dirty way to set up a GUI that can be refined later. Often I'm writing code that I want to interact with in some way. Instead of dumping stuff to the console in a non-interactive way with ToString I want to show stuff in a more user friendly way in a GUI. For example if I have a list of floats somewhere I can explore that list in a GUI, and modify its contents, and plot it in a graph (and I plan to add a way to call its methods in a small REPL thanks to the Roslyn project).

When the interface is not present, I use reflection to show a generic GUI. For example for a Person object I show the class name and the attributes and call .ToWidget recursively on the values of the attributes. You can also edit these values. The workflow is that later when you decide that you want to show the Person objects in the final application, you just add the Showable interface to the Person class and implement a custom GUI. So while this provides a quick and dirty way to set something up, you can iteratively refine it into a working application.

How would you implement this?

I didn't want to pollute

I didn't want to pollute this already long thread with code, so I wrote up a detailed description on my blog. Basically, the abstraction I mentioned earlier can be used to define a visitor on an arbitrary set of classes that extends them with operations after the fact.

I use this in Sasa for type case and reflection, but you can just as easily appropriate it for building UIs from objects.

Sorry, I don't think that's

Sorry, I don't think that's what I meant. I meant IToString as a hypothetical interface containing the method MyToString (so it's similar to the Show typeclass in Haskell, except dynamically dispatched).

For example if Foo : IToString, then (new List()).MyToString () would be valid. If Bar doesn't implement IToString then (new List()).MyToString () would be a compile time error. This is needed because a List can only be expected to implement a method ToString when its elements implement it. My current solution is to have .MyToString () as a big type-casing extension method on Object and raise a runtime error when the type parameter doesn't implement the interface.

I know ToString is a method on Object in C#, but suppose it wasn't, or imagine another extension like .ToImage().

Edit: that probably sounded very incoherent. I'll try to summarize it like this: which languages allow existing classes to implement additional interfaces when you don't control their definition? Which languages allow you to let existing generic classes selectively implement interfaces when their type parameter implements some interface?

Edit: that probably sounded

Edit: that probably sounded very incoherent. I'll try to summarize it like this: which languages allow existing classes to implement additional interfaces when you don't control their definition?

Right, we had this discussion of post-hoc interface extension on reddit awhile back. I thought this was about something different.

Type classes are still the only solution I know of. Go almost got there with it's structural interface matching, but they limited it so it's not truly post-hoc.

I agree this is less than satisfactory. The idiomatic .NET solution is the adapter pattern, ie. wrap the underlying class in one that implements the interface, or an explicit dictionary passing implementation, ie. IEqualityComparer, IFormatProvider, etc. There's no way I know of to instantiate either implicitly though.

By 'big type-casing extension method', do you mean you perform type tests on all sorts of collection types to see if their type parameters implement a certain interface? Do you have a specific application in mind you can describe? Type classes address a wide range of problems. If it's a structurally recursive/polytypic problem, the fold/unfold framework I mentioned earlier can solve it, ie. generic queries ala Bondi and serialization-type problems.

If it requires class-specific overriding behaviour, then you'll probably have to use the adapter pattern or an explicit dictionary mentioned above. The only other possibility is that you're trying to structure your program using a functional model in an OO language, and your problems would disappear if you "invert" your program to play to OO's strengths, but you've probably already considered that.

Edit: it's a tough problem considering subtyping though, because T could be a non-sealed type, and some of it's children could implement your interface, and some not. Does your big type check test each instance in the collection, or the type parameter only?

Pimp my library

An adapter pattern that is implicitly applied : this is the Scala Pimp My Library design pattern.

The only ingredient of type classes you use is the implicit dictionary passing. Other aspects (such as the non-locality of their declarations) are actually hurting here. Implicit parameters are a simpler feature that fully captures what's needed here.

I have trouble seeing how

I have trouble seeing how the adapter pattern solves this problem. Wouldn't you still need a mechanism to do the actual dispatch at run time?

You'd need to declare a

You'd need to declare a class with an implicit conversion to a subtype that you control for each base type. Pretty heavy and more work, but at least it's implicit. You'd also still need type tests for generic parameters. I'm not convinced it's a real win overall.

You just want structural types

Sometimes you have an interface, like IToString, and for generic types you have e.g. that List[T] : IToString if T : IToString. So much like Haskell's type classes, but dynamically dispatched instead of statically. Has that been researched and is it available somewhere?

Type classes are not statically dispatched. That is a useful optimization in many cases, but generally impossible because of features like polymorphic recursion, and particularly, existential types. The latter make type class dispatch every bit as dynamic as, say, object dispatch.

Furthermore, any type system with structural object types or something similar meets your requirement. Take OCaml's object system as an example. Or, if you want, also ML's module language, especially if it allows for first-class modules.

Right, that terminology was

Right, that terminology was wrong. Type classes dispatch on types. What we have in C# and Java is dispatch on values, but the same concepts like "T : Showable => Foo : Showable" apply and would be useful.

Structural object types would indeed solve the problem very nicely. Do you have a pointer to a good discussion of structural compared to nominal types? Specifically what are the advantages of nominal types over structural types? Why did mainstream languages end up with nominal types?

Structural

Well, both dispatch on types in a sense, but typical class-based OO can dispatch only on the type of a given value, while type classes don't need an associated value. The latter approach is strictly more powerful.

Unfortunately, I don't have a pointer to a discussion on structural vs nominal, but it's a good question. The main argument you often hear against structural types in OO is accidental matching, but I think that is pretty much a non-issue. On the other hand, there is a certain, albeit moderate, cost in complexity.

Am I right that you mean

Am I right that you mean that OO class based dispatch corresponds to existential types + type classes?

How do type classes interact with subtyping? Thinking naively it seems that it could be problematic, since a value can have multiple types. Actually the same thing happens with polymorphism. For example you can have id :: a -> a or id :: Int -> Int. Is that problematic and lead to bad behavior? On the other hand, implicits in Scala seem to work well.

Accidental matching, if it really is a problem, can be solved by having method names be module scoped? So you'd have explicit method definitions and specify whether you want to implement moduleA.foo or moduleB.foo.

OO vs type classes

Am I right that you mean that OO class based dispatch corresponds to existential types + type classes?

Right, or more precisely, to the limited use of type classes where:

1. All type classes are single parameter.
2. All class methods have a type of the form a -> T, with a being the class parameter.
3. All class constraints appear only on existentially quantified variables, never with universal ones.

I don't have a good answer re subtyping. AFAICT, it shouldn't be a problem, but I suppose looking at OOHaskell or Timber should provide some insight. In any case, I suppose few people care, because subtyping has relatively limited expressiveness, and doesn't add much to a language that has type classes plus existentials already.

I don't have a good answer

I don't have a good answer re subtyping. AFAICT, it shouldn't be a problem [...]

Subtyping and type classes both fall under the domain of qualified types. I have an example demonstrating this similarity between overloading and subtyping.

Er...

Er, I'm sorry, but I don't think the example shows much of a similarity. It just shows that you can have overloaded selectors, not much more.

To emulate tuple subtyping with polymorphism you'd need to throw in row polymorphism and existential types. Qualified types are neither necessary nor sufficient (although they provide one possible way to type row polymorphism).

Overloaded selectors = structural subtyping?

[...] It just shows that you can have overloaded selectors, not much more.

To emulate tuple subtyping with polymorphism you'd need to throw in row polymorphism and existential types.

I don't really spend any time in OCaml (or any language with structural subtyping really) so perhaps there's something I've missed that you could explain. My understanding is that overloaded selectors are sufficient. They certainly take care of the consumption side (if you overload record selectors, you can handle any number of records with a "foo" field).

They're not sufficient to *extend* record types, which is why I put in the reference to qualified types (the author's language added a few qualifications to statically extend record types). It's a brief read, and if you're interested you'll find the case made clear that qualified types can subsume structural subtyping.

I assume that you want row polymorphism and existential types to abstract away "the rest" of the record? You can probably get there that way, but it seemed more painful (and more difficult on the compiler back-end) than QT to me when I last worked on this sort of thing in a toy compiler.

For example

With your code, try passing a triple to a function that expects a pair. You can't. You can only pass both to the same function if you made it properly generic beforehand.

Or try to simulate

f (g : (Int,Int)) = g (1,2); g (1,2,3)

You can't, you'd need higher-ranked polymorphism for that.

More importantly, try to build a list that contains both pairs and triples. You can't, you'd need existential types for that.

Row polymorphism is perhaps less essential, but without it you can only simulate subtyping by being fully abstract in the types everywhere, whereas both subtyping and row polymorphism provide a sort of middle ground.

Having said that, I still think subtyping isn't overly useful if you already have existential types and some form of interesting polymorphism (row or qualified types).

A minor point

With your code, try passing a triple to a function that expects a pair. You can't.

Ah, I think that I addressed that in the linked post. What I was saying was that it'd be sufficient if the compiler automatically added this qualification to every use of a pair, triple, etc (in that case, of course, you *could* always pass a triple to a function that expects a pair).

That is, in writing a compiler you're better off supporting qualified types in a middle layer than worrying about subtyping at that level (the translation is straightforward).

f (g : (Int,Int)) = g (1,2); g (1,2,3)

You can't, you'd need higher-ranked polymorphism for that.

More importantly, try to build a list that contains both pairs and triples. You can't, you'd need existential types for that.

I assume that you mean 'g' in your example to be a function on pairs? In any case, I hadn't considered those cases to be part of the definition of "subtyping" (arguably they're tangential), and there's a straightforward dictionary-passing interpretation to [Pair p => p] vs Pair p => [p] that essentially defines an existential type, but I agree that "vanilla qualified types" per se don't cover that case.

Perhaps the best ordering when putting this into a compiler is subtyping (implicit qualified types) -> explicit qualified types -> strategically-chosen existentials.

I've come to a similar conclusion for implementing function application -- where normally at the top-level we "overload" simple C-style function application along with closure-application. I use qualified types for functions as well, and ultimately represent closures by the obvious existential type (hiding away the enclosed variables) and C-style functions by a label/pointer.

Language feature or implementation technique?

What I was saying was that it'd be sufficient if the compiler automatically added this qualification to every use of a pair, triple, etc

Well, in that case it is not clear to me whether you are still talking about qualified types as a language-level substitute for subtyping, or rather as an implementation technique.

I assume that you mean 'g' in your example to be a function on pairs?

Ah yes, sorry.

In any case, I hadn't considered those cases to be part of the definition of "subtyping" (arguably they're tangential), and there's a straightforward dictionary-passing interpretation to [Pair p => p] vs Pair p => [p] that essentially defines an existential type, but I agree that "vanilla qualified types" per se don't cover that case.

Both cases are plain simple subtyping on tuples. If you call it subtyping, then it is supposed to be applicable everywhere, uniformly. Moreover, the use case of heterogeneous containers is very important in OO practice.

Of course, if you have full first-class polymorphism, you can translate subtyping to dictionary passing, but it is far more involved than you think, because subtyping becomes coercive. Consider a type [[[[(Int, Int, Int)]]]]. The very definition of "structural" subtyping is that this is a subtype of [[[[(Int, Int)]]]] (given immutable lists). When subtyping is coercive, though, this implies copying the whole nested list, coercing each existentially quantified element. Similarly, for functions, you'd have to create coercing wrappers. This is very expensive in general. And with mutable state, copying or wrapping isn't even possible.

Why is wrapping not possible

Why is wrapping not possible with mutable state? You could intercept mutations and do them on the original?

Don't see how that helps

Assuming you have a semantics that already allows you to do such interception. And you'd have to intercept reads, not writes. And even then I'm not sure how that could work, since you would need to know the static type assumption of the context performing the read to do the right coercion.

The only solution for mutable state that I'm aware of is to actually implement each language-level mutable reference as a first-class object, through which you do all reads and writes. You can then wrap those with a delegate object performing the right coercions, in every place you use subsumption.

That's what wrapping is,

That's what wrapping is, right? At least that's what I thought you meant by wrapping.

You don't just have to intercept reads. If the object after the coercion is mutable, then you have to intercept writes. The references themselves don't have to be first class, you just have to be able to intercept reads/writes. For example in C# you have properties that can have custom get/set handlers. In general if you have some class S <: U, then you can define a wrapper that wraps S and only exposes its U methods/properties/whatever else the language has (if the language allows you to wrap all these constructs -- for example in Java you can't do this because it doesn't have getters/setters). This isn't completely waterproof, because this breaks reference equality. But that too can be fixed, if you can wrap the equality operation.

Wrapping state

Yes, I wasn't clear: you generally have to coerce on both reads and writes.

That's what wrapping is, right? At least that's what I thought you meant by wrapping.

If all your state is already tied to objects then it's easier, of course. But that is only the case in OO languages, and not even all of those. When you have first-class references, for example, then you don't usually implement them with a heavy-weight indirection like that.

Also, stateful entities usually have an identity that is relevant. Simply creating ordinary language-level wrapper objects will loose it. So I was assuming some magic implementation-level mechanism that is transparent to identity, e.g. by employing explicit IDs instead of pointer identities.

Modularity requirement

By dynamic expression problem I meant the same problem without the requirement of type safety (and possibly recompiling code -- that seems like a rather arbitrary restriction).

It's not arbitrary at all, because a central point of the expression problem is having a modular solution. And modularity implies the ability to compile and deploy separately. Otherwise you could as well go back to the existing sources and patch them.

I don't really believe what I said there (hence the ""), but I do think that laying aside the requirement of static typing in your mind can help language design.

Looking at the terrible design ad-hocness you typically see in untyped languages, I beg to differ. Types are a better guiding principle for clean and well-structured language design than anything else I'm aware off. To paraphrase Simon Peyton Jones, they "keep you honest" as a language designer.

For example think about what a dynamically typed Haskell would look like. Sum types would be extensible across modules. Similarly partial function definitions would be extensible across modules. The different branches of pattern matching in function definitions can be placed in different modules. That addresses the expression problem head-on.

I don't see how you come to that conclusion. Just by removing types you don't gain the ability to magically split lambdas between places.

Also, there would likely be no sum types and pattern matching to start with. Instead, you'd have gangs of nasty IsThis and IsThat predicates, like you see in so many untyped languages (Erlang being the only exception I can think of right now).

It's not arbitrary at all,

It's not arbitrary at all, because a central point of the expression problem is having a modular solution. And modularity implies the ability to compile and deploy separately. Otherwise you could as well go back to the existing sources and patch them.

The essential requirement is modular type checking. If you add compilation as a requirement, then some languages would never qualify. For example those that have JIT compilers. You could counter that compilation happens in the step from source to intermediate code, not from intermediate code to machine code. But then you could as well take the source code as intermediate code, i.e. compilation is the identity function except that it also type checks.

Looking at the terrible design ad-hocness you typically see in untyped languages, I beg to differ. Types are a better guiding principle for clean and well-structured language design than anything else I'm aware off. To paraphrase Simon Peyton Jones, they "keep you honest" as a language designer.

I agree that thinking about static types can help language design. I don't agree that not thinking about static types can never help language design.

I don't see how you come to that conclusion. Just by removing types you don't gain the ability to magically split lambdas between places.

Also, there would likely be no sum types and pattern matching to start with. Instead, you'd have gangs of nasty IsThis and IsThat predicates, like you see in so many untyped languages (Erlang being the only exception I can think of right now).

The ability to pattern match doesn't depend on static types at all.

Under the hood a language implements pattern matching like this: every time you see a new case on a function f, you add the case to the collection of cases of function f. This extends naturally to cases split between multiple places.

Under the hood

The essential requirement is modular type checking. If you add compilation as a requirement, then some languages would never qualify.

Point taken. I agree that separate "compilation" is becoming a less meaningful term these days. You have to take it as a more abstract concept implying some form of compositionality, but I have no good definition at hand.

I agree that thinking about static types can help language design. I don't agree that not thinking about static types can never help language design.

Agreed.

The ability to pattern match doesn't depend on static types at all.

True, but in practice, there seems to be a correlation, perhaps because both follow the same kind of semantic structure in a language.

Under the hood a language implements pattern matching like this: every time you see a new case on a function f, you add the case to the collection of cases of function f. This extends naturally to cases split between multiple places.

This is not how it works. In conventional functional languages, typed or not, a function is defined and compiled as a single entity. Haskell (and some others) allow a function definition to look like a collection of clauses, but that is merely syntactic sugar for a single lambda with a case inside. You cannot separate these clauses -- and the reason that you can't has nothing to do with types either. It's deeply engrained in the whole operational approach.

If you want what you suggest then you are talking about a completely different kind of beast, e.g. a language based on rewriting, not lambda calculus.

True, but in practice, there

True, but in practice, there seems to be a correlation, perhaps because both follow the same kind of semantic structure in a language.

The correlation is definitely there, but it's not that strong. Many Lisps support pattern matching, and Erlang does as well.

This is not how it works. In conventional functional languages, typed or not, a function is defined and compiled as a single entity. Haskell (and some others) allow a function definition to look like a collection of clauses, but that is merely syntactic sugar for a single lambda with a case inside. You cannot separate these clauses -- and the reason that you can't has nothing to do with types either. It's deeply engrained in the whole operational approach.

If you want what you suggest then you are talking about a completely different kind of beast, e.g. a language based on rewriting, not lambda calculus.

Right. It's not hard to separate the clauses though. You just let each branch be a partial function (returning an optional value) and have ways to compose them. Then you can easily define syntactic sugar to support Haskell style function definitions that can span multiple modules.

Right. It's not hard to

Right. It's not hard to separate the clauses though. You just let each branch be a partial function (returning an optional value) and have ways to compose them. Then you can easily define syntactic sugar to support Haskell style function definitions that can span multiple modules.

Indeed, see First-Class Patterns.

Not the same thing

That is not the same thing. You can compose case expressions from multiple patterns, but you cannot compose case expressions themselves, or take an existing case expression (inside some function) and retroactively extend it -- which is more like what Jules is talking about.

I think Jules is mistaken if he thinks automatically merging partial functions is a straightforward extension to existing paradigms. Not at all. Manually constructing functions from partial ones is a different story, but not what he describes.

I don't see why it doesn't

I don't see why it doesn't apply to both. Existing case expressions just return an optional value, and the combinators from the paper can be used to extend it to handle more cases.

You assume

First of all, you assume that you have already set up your case such that it returns an optional value. Second, you still construct a new case expression (and thus function), not extend an existing one.

Just so we're on the same

Just so we're on the same page: to my reading, Jules is talking here about some possible language that implements pattern matching by traversing a list of case expressions that return Maybes. The compiler then simply collects all case expressions of the same name into a single list, which can be extended arbitrarily.

We can extend this to dynamically loaded code as well, as Jules stated, by adding the new cases and constructors from a loaded module to the lists. The list of cases need not be immutable as you seem to be implying. The "list" need not even be a real list, but could be JITed dispatch code; the principle is the same.

Confusion

OK, this subthread is getting a bit confused. I was primarily attacking Jules' original claim that removing types from Haskell would suddenly give you the ability to spread function clauses over several modules. That is clearly not the case, since the lambda calculus semantics underlying Haskell provide no immediate way of doing that.

Of course, you can very well program up something like that explicitly using e.g. first-class cases, but that would be an entirely different story.

You now seem to be talking about using the latter as an implementation technique for providing the former. But implementation is not the main issue, the main issue is semantics. When you can define f by separate clauses in separate modules, potentially with different visibility in different parts of the program, what is f then? Clearly not just a lambda anymore. As I said above, this is more akin to a general rewriting system than to lambda calculus.

But the fact that it has

But the fact that it has different semantics, and the ease of implementation are exactly the point. Once you remove the static typing requirement, experimenting with extensible pattern matching is a 5 minute exercise instead of a PhD thesis. So it becomes much easier to explore the landscape. That's what Sussman is saying: don't confine yourself to designing statically typed languages but explore more radically new things.

OK, this subthread is getting a bit confused. I was primarily attacking Jules' original claim that removing types from Haskell would suddenly give you the ability to spread function clauses over several modules. That is clearly not the case, since the lambda calculus semantics underlying Haskell provide no immediate way of doing that.

Sure, not directly. But if you look at how dynamic languages are implemented: defining a new function is like an imperative update to an environment structure. It is then natural to also implement defining a pattern matching case of a function as an imperative update into an environment structure, and then you automatically get the ability to spread pattern matching across several modules, just like e.g. in Ruby you can define methods on a single class across several modules because it is an imperative update of some data structure representing a class. So you should't read my claim as "if you remove static types from Haskell as it is now, then patterns could be spread over modules" but rather "if you had pattern matching as in Haskell in a language that was designed from the start as a dynamic language, then you'd probably have the ability to spread pattern cases over modules".

a 5 minutes PhD thesis ?

I don't see the link between typing and the extensibility (or not) of pattern-matching. Your whole "5 minutes extensibility" story is a bit fuzzy, but if what you're really talking about is a global reference to a mutable list of cases, then you can probably implement that just as easily in a typed setting, by fixing a type for the whole list and checking the type of new clauses added.

(* merging clauses of type ('a -> 'b option) *)
let merge f1 f2 = fun v ->
  match f1 v with
    | Some _ as res -> res
    | None -> f2 v

(* atomic patterns for lists *)
let nilpat v = function [] -> v | _ -> None
let conspat f = function [] -> None | hd::tl -> f hd tl

(* an extensible match..with *)
let dyna_match = ref (nilpat (Some 0))

let (Some 0, None) = !dyna_match [], !dyna_match [1; 2; 3]

(* dynamically extend the match with a case for non-empty lists *)
let () =
  dyna_match := merge !dyna_match
    (conspat (fun hd tl ->
      (* open recursion *)
      match !dyna_match tl with
        | None -> None
        | Some res -> Some (1 + res)))

let (Some 0, Some 3) = !dyna_match [], !dyna_match [1; 2; 3]

I'm a bit doubtful of the whole "5 minute exercise instead of PhD thesis" claim.

The solution is quite simple

The solution is quite simple once you think about the problem for a few minutes, assuming you're aware of the paper I linked to above on first-class cases. Jules' argument is that static typing would never lead you to this solution of spatially separated, extensible cases for a number of reasons.

Firstly, because type judgments naturally lead you to specifying sums and cases all together, even though this isn't necessarily the best approach for software engineering (I'm not convinced of this, but it's possible).

Secondly, typing and erasure often go hand in hand, which leads to impoverished runtimes lacking some essential features that industry programmers often need. This has resulted in a lot of great work on typing these features and providing them as libraries though, so I'm not sure that's necessarily a bad thing.

I don't agree entirely with Jules, but some of it is possible. I do agree with him that separating cases is possible, and that's all I responded to in this subthread.

When you can define f by separate clauses in separate modules, potentially with different visibility in different parts of the program, what is f then?

Global resolution like type classes, allowing only a single instance per constructor still seems to preserve the LC semantics as far as I can tell. Other types of resolution are possible I suppose. Local overloads could just be shadowed definitions, so I'm not sure how they break LC either, but I'll take your word for it.

Extensibility

Now you can do extensible pattern matching, but because of static typing only on a prespecified closed data type. That's not very useful.

if you had pattern matching

Once you remove the static typing requirement, experimenting with extensible pattern matching is a 5 minute exercise instead of a PhD thesis.

if you had pattern matching as in Haskell in a language that was designed from the start as a dynamic language, then you'd probably have the ability to spread pattern cases over modules

You're right that implementing such a thing would be easy. However, making it useful is very hard. Note that I'm talking from experience, as someone who mostly codes in a "dynamic language" and has had to (re-)implement pattern matching a couple times.

The main issue with allowing cases to be added dynamically is that of ordering. Usually, not all cases are disjoint, and the order in which cases are tested can't be arbitrarily changed.

When all the cases are specified together, we can simply let the programmer order them correctly. Otherwise, I have yet to see a satisfying way to do things.

We could execute cases in the order they are added; that sucks. Changing the meaning of a program depending on the order in which modules are loaded can't be anyone's idea of a good design tool.

We could order cases by specificity: when two cases overlap, make sure the most specific one is tested first. Basically, that's how class-based method resolution works. The only issue is that patterns are more complex animals than classes-with-subtyping. Even CLOS's rules for multiple inheritance can have surprising and hard-to-control effects.

Patterns make the problem of (potentially) overlapping but non-obviously-ordered patterns even worse. Depending on the pattern language, deciding overlap can be made arbitrarily difficult, at least up to undecidability. Ordering patterns is just as hard, if not harder. So, even when determining overlap is easy, we tend to be stuck with non-ideal rules to guide ordering, like ordering patterns lexicographically according to an arbitrary traversal (e.g. preorder depth-first search). Why should (1 _) be more specific than (_ 1)? I don't know. How should x and _ be ordered?

We could insert a little B&D in our system and insist on there being an explicit pattern to disambiguate overlapping patterns. In the previous paragraph's first example, (1 1) would make the ordering of the two other patterns irrelevant. Unfortunately, that also means non-local interactions between modules: two extensions can be correct by themselves, but require disambiguations when used together.

So, no, I disagree. It's not about typing. It's about New Jersey style. If you find experimentation in dynamic languages easier, it may simply be because you're ok with pretending fundamental issues do not exist as long as they don't bother you too much. In fact, Aaron M Ucko did a bit of that for his implementation of predicate dispatching in CLOS; as far as I can tell, it was still enough work to be part of his Master's at MIT. Even if there exists a neat and simple solution to the problems raised here, we are way past the 5-minute mark.

I mostly code in a dynamic language, and I'm not convinced that making ill-thought-out designs hard to execute is a bad thing.

We could execute cases in

We could execute cases in the order they are added; that sucks.

Yes, we should execute them in the reverse order they are added (so the reverse of Haskell). That gives you the same power as OO dispatch. A method foo in a class Bar becomes:

foo x | x is Bar = ...

And since subclasses are defined after superclasses, everything works out.

Changing the meaning of a program depending on the order in which modules are loaded can't be anyone's idea of a good design tool.

I don't think it's a problem. Many dynamic languages already have this problem where different modules can define the same method on the same class. Which method wins depends on the order the modules are loaded. That sometimes gives a problem because method names accidentally collide, but you wouldn't have this problem with these function definitions because contrary to methods they are lexically scoped. In practice patterns from different modules will rarely if ever overlap, because each module is defining its own data and pattern matching on that data.

We could order cases by specificity

That is a bad idea and near impossible. For the cases where the default ordering doesn't suffice you can implement a domain specific cases container that handles order differently.

There is no order

Yes, we should execute them in the reverse order they are added (so the reverse of Haskell).

In a pure language like Haskell there would be no such thing as "the order in which they were added"!

In practice patterns from different modules will rarely if ever overlap, because each module is defining its own data and pattern matching on that data.

So you are arguing that in practice, your feature is not a problem, because in practice, it will rarely be used?

In a pure language like

In a pure language like Haskell there would be no such thing as "the order in which they were added"!

I was just reusing Paul Khuong's terminology in the quote right before that sentence. If you wish you can replace that statement with "in the order they are defined and in the order modules are imported".

So you are arguing that in practice, your feature is not a problem, because in practice, it will rarely be used?

Not all patterns are overlapping.

You are changing your statement

Jules: Once you remove the static typing requirement, experimenting with extensible pattern matching is a 5 minute exercise instead of a PhD thesis.

I still don't know what you're talking about. Again, this has nothing to do with types, and pattern matching and functions are orthogonal constructs.

But if you look at how dynamic languages are implemented: defining a new function is like an imperative update to an environment structure.

OK, now you are conflating untyped with dynamic. Not all untyped languages necessarily view definitions as mutable. Some do only on the top-level, e.g. Scheme.

So your statement changes completely to "when you remove immutability of definitions then experimenting with extensible pattern matching is a 5 minute exercise". Unfortunately, it's still not true, even if you assume that pattern matching is always tied directly to functions. Because functions are first-class. If some code has already accessed some definition of a function and stored it away somewhere then it won't see any updates. Unless you introduce additional mechanisms.

Let me clarify again: what you have in mind is a fundamental change in the operational semantics of functions, definitions, and pattern matching. As such, it goes much deeper than just a question of typing. Can we agree on that?

Again, this has nothing to

Again, this has nothing to do with types, and pattern matching and functions are orthogonal constructs.

It has to do with types. In a typed language you'll either have to devise a new type system for it to be useful, or you'll have to find a way to encode a extending with a useful class of patterns in an existing type system. Which might not be that hard, but the human mind works toward utility but doesn't make big steps and that extra step could well be big enough not to think about extensible pattern matching.

As I said before, gradual typing could offer a solution. When necessary you can ignore types in a convenient way (just as convenient as programming in a dynamically typed language) but for the rest of the code you can make good us of types. And just like it offers a smooth path from untyped to typed, it also offers a smooth path from typed to more typed.

If some code has already accessed some definition of a function and stored it away somewhere then it won't see any updates

It doesn't. You don't mutate an existing binding, you mutate an object that such a binding refers to as I described, like it works with classes in e.g. Ruby.

Let me clarify again: what you have in mind is a fundamental change in the operational semantics of functions, definitions, and pattern matching. As such, it goes much deeper than just a question of typing. Can we agree on that?

You are approaching this from a typed mindset. Change with respect to what? Haskell? Yeah, then I'd agree with you. Whether it's a deep change I don't have an opinion on. The point is that dynamically typed languages allow us to explore such deep changes and their consequences easily.

Change in implementation with respect to a dynamically typed language with pattern matching like Haskell but where module loading and definitions already work as in most dynamically typed languages e.g. Ruby, Python, Clojure, etc. isn't big. The implementation could even be the removal of a check that disallows pattern matching to span multiple modules.

Types are irrelevant here, Vol. XVI

It has to do with types. In a typed language you'll either have to devise a new type system for it to be useful

Why so? Languages like ML have had extensible sum types with pattern matching for 30 years. By historical accident, they happen to be called exceptions, but that's just names.

You are approaching this from a typed mindset.

No, I'm approaching it from the angle of a functional language whose operational semantics is based on lambda calculus. Because that was the starting point of this subthread, AFAICT.

Change with respect to a dynamically typed language with pattern matching like Haskell but where module loading and definitions already work as in most dynamically typed languages e.g. Ruby, Python, Clojure, etc? -- That is a very big "but" that makes for an entirely different language, with types being the least relevant difference.

Why so? Languages like ML

Why so? Languages like ML have had extensible sum types with pattern matching for 30 years. By historical accident, they happen to be called exceptions, but that's just names.

Please show me how to do the same with ML's exceptions. Is it as obvious as the method I showed? ... I didn't think so.

No, I'm approaching it from the angle of a functional language whose operational semantics is based on lambda calculus. Because that was the starting point of this subthread, AFAICT.

I didn't see any rule that we had to limit our minds to lambda calculus in this subthread. In fact I see OCaml code that uses mutable state to implement what I showed in pseudocode for a dynamically typed language. Except, of course, that the static types disallow using it in a useful way by committing to a closed type beforehand.

That is a very big "but" that makes for an entirely different language, with types being the least relevant difference.

Those languages can work that way because they are dynamically typed.

Mutable state ≠ mutable definitions

Please show me how to do the same with ML's exceptions.

I lost track of what exactly your are referring to as "the same" now. Clearly, I can match a couple of cases of type exn, and delegate somewhere else if none matches. And I can create arbitrarily many constructors as well as functions over them. But I cannot magically merge several of these functions, even though they all have the same type -- which is my point, exactly.

I didn't see any rule that we had to limit our minds to lambda calculus in this subthread. In fact I see OCaml code that uses mutable state to implement what I showed in pseudocode for a dynamically typed language.

In OCaml, mutable state is explicit and orthogonal. Definitions aren't mutable. Likewise in Haskell, or most other functional languages. That's a big difference. Anything like merging/updating functions would have to be programmed up explicitly, types or no types.

We don't have to limit our minds. But this subthread started by you claiming that merely removing types from a language like Haskell would immediately allow you to do magical things. That is what I am refuting, while you, I feel, keep shifting your claim.

Where did I say that

Where did I say that removing types magically gives you that ability? It is not I who is shifting my claim, it is you who is shifting my claim. Furthermore you repeatedly try to nitpick and find small faults in wording and interpreting things out of the context as much as you can instead of having a constructive discussion. My claim was merely that if a language (like Haskell) is designed as a dynamically typed language from the start, then it is very easy to experiment with and add in such new features (like extensible pattern matching). I even agreed explicitly that you don't get extensible pattern matching "magically" by removing the type checker from Haskell, which is completely obvious and shouldn't even require mentioning, most certainly not repeated mentioning.

The way the discussion is currently going I have no desire to continue it, as this way it doesn't lead to better understanding for me or anyone else involved.

Sorry, but

Sorry, but the discussion started with you saying that types put the designer in a box, and that all sorts of features are ignored just because of types. And then you said:

For example think about what a dynamically typed Haskell would look like. Sum types would be extensible across modules. Similarly partial function definitions would be extensible across modules. The different branches of pattern matching in function definitions can be placed in different modules.

If "a dynamically typed Haskell" just was meant to refer to some random dynamic language instead of simply an untyped variant of Haskell then your claim may be right (though also pretty vacuous). But that's not how I read it. Sorry if I have misinterpreted it.

I also (repeatedly) explained that types are not the issue with allowing extensible pattern matching, but that it's a question of operational semantics. You still seem to disagree with that (and consider it nitpicking), and I honestly have no clue why.

The thing is: yes, certain features are not put into certain languages, and no, types are not necessarily the reason. They might simply be considered bad features (e.g. mutable definitions).

Indeed. First class patterns

Although first class patterns also provide a way to extend the things you can pattern match against, rather than extending cases of a pattern match, what I meant also fits naturally in that framework.

Take:

foo pat = body

Transform to:

foo.addCase(x -> case x of pat -> Some(body) | _ -> None)

Where foo is some kind of mutable container for cases. Applying such a container to a value iterates over the cases in the obvious way. You could also pass the pattern as an AST to addCase. Then you could JIT compile it (and recompile when more cases get added).

Yes, this is a bad hack, but I think you can make it nice. Something like Racket's Units allow you to link things up in a more principled manner.

I do

Do you disagree that the vast majority of language features were invented in dynamically typed languages? Higher-order functions, pattern matching, conditionals, objects, multiple dispatch, generic arithmetic, memoization, marshaling, meta programming.

I think your list is a bit biased towards dynamically typed languages. You are looking at abstractions and basically saying languages which are very flexible lend themselves to inventing abstractions. But not all language features are abstractions.

Probably the most important productivity enhancing language / implementation feature is the rise of the IDE, and those came out of the static community. Visual development, came out of the static community.

Moving down to lower level languages things like accelerated framebuffering coming from GPU assemblers came completely out of the static community. No one in the dynamic community was counting clock cycles on hardware in the mid 1980s. And that's made possible the abstract high performance graphics libraries we have today. The mass use of paravirtualization, is for the first time creating genuinely high performing abstract hardware.

I agree with you regarding expressiveness. But part of what makes biological systems so powerful is incredible levels of parallel processing. It may be that to achieve the examples Sussman wanted something like VHDL or Verilog would be the right languages and both of those are static. I certainly hope that the future of complex systems is elegant mathematical abstraction but in my darker hours...

The IDE did not come out of

The IDE did not come out of the static community. It came from Smalltalk, a dynamically typed language. What is Visual development? (btw, the first IDE wasn't Visual Studio ;-)).

Similarly, assembly languages, either for CPU or GPU, are not generally considered to be statically typed (they might be considered to be dynamically typed by some people, but people from the dynamic language community would probably disagree with them). I don't get the connection of paravirtualization to static/dynamic languages.

IDE, etc...

Well if we are going to say where it IDEs came first I'd agree it wasn't visual studio. But I think you are missing the point. It isn't about who got their first, IDEs were (are) a failure in the dynamic languages while a huge success in the static languages. The productivity gains from IDEs happened in the static world, which is why in the dynamic world you still have lots of people using simple text editors like Vim to write their code. The major dynamic languages (Perl, Python, Ruby) do not have good IDEs that is Komodo, RubyMine, Netbeans for Ruby, MacRuby (Xcode)... are matters of personal taste because they are not huge productivity gains like they are in the static community.

The problem with the Smalltalk programming environment counting as an IDE is that it didn't autogenerate code. The developer laying out an initial structure and the system generating generic boilerplate code that is then replaced is pretty key to an IDE and Smalltalks didn't have that. So I'm not going to deny there were integrated coding environments earlier than the 1990s but they were not all that important. What comes closest prior to the 1990s is Emacs, which then points to LISP (dynamic). What comes first is probably some of the integrated COBOL tools. I know that in the 1960s there was a BASIC integrated environment based on COBOL but ... So IMHO claiming IDE's for dynamic is like saying the United States invented French because it was spoken in Louisiana. The mass use of IDEs came from the static community.

As for the rest. VHDL and Verilog are static languages. Paravirtualization is being implemented primarily in .NET and secondarily in Objective-C; that's the connection. As for assembler I agree it's weakly typed neither static nor dynamic. My point was about making use of accelerated framebuffering that was purely done in static languages for years and then ported.

Revisionist history

As I recall, the reason IDEs became a force in the Java ecosystem was out of necessity: Java systems with millions of lines of code were in desperate need of refactoring and other IDE benefits like code navigation and integrated documentation. As well, the benefits of a standard class library are also a curse in that there quickly becomes more canonical widgets for doing things than users ever have the time to experiment with.

As for autogenerate code, Smalltalk and Lisp environments like D-Lisp had live objects and so didn't need code generation. But in the 1980s there were Smalltalk companies selling code generation tools under the sales phrase "rapid application prototyping". I still own books on this subject, however dusty they may be.

Modern Smalltalk IDEs like Newspeak's Hopscotch are still simpler and as feature robust as Visual Studio 2010 (and consider that all of Squeak is 600,000 SLOC vs. Visual Studio 2005's 8+ million SLOC). But this is not a technical issue. It is a byproduct of Microsoft trying to package its product for shrinkwrapped sale, targeting many different audiences. But technical issues do relate to quality: compare having a language with mix-in inheritance as a baked in Inversion of Control mechanism to Visual Studio's Managed Extensibility Framework (MEF), which requires tons of hoops to simulate the same features and still requires implicit hardwired module dependencies!

"revisionist"

As I recall, the reason IDEs became a force in the Java ecosystem was out of necessity Java systems with millions of lines of code were in desperate need of refactoring and other IDE benefits like code navigation and integrated documentation.

I'd argue it happened earlier in the C++ days. But yes, the huge advances in IDE came out of the complexity for Java development. I'm not sure how that contradicts what I said above. The original claim was that the major innovations all happened in dynamic languages, and I was disagreeing with that. Your claim is that the innovations did happen in static languages and did so because the static language had structural problems with the types of problems it was addressing. You aren't disagreeing with me, you are disagreeing with the grandparent.

As for autogenerate code, Smalltalk and Lisp environments like D-Lisp had live objects and so didn't need code generation. But in the 1980s there were Smalltalk companies selling code generation tools under the sales phrase "rapid application prototyping". I still own books on this subject, however dusty they may be.

Exactly "dusty", as contrasted with IDE developers who it is part of their day to day workflow. Which is the point above where it meaningfully invented was in the communities who made heavy use of it.

I get your points about the advantages of SmallTalk and I don't disagree. I do think though that IDEs are a major feature

Yes, I disagree

Do you disagree that type systems take away important expressiveness?

Yes, I disagree. Unless you drop the "important". In 98% of all cases (to pick a random number) it is not important (even though programmers sometimes tend to perceive it as important).

I didn't say that they merely take away expressiveness, but take it away they certainly do.

Alright, but that's what your box metaphor implies.

Do you disagree that the vast majority of language features were invented in dynamically typed languages?

Yes, I disagree. Your list is both biased and wrong, see e.g. Gabriel's reply.

Do you disagree that many of those are difficult to do in many statically typed languages because of the type system?

Replace "many" with "some" and I might agree. But so what? Some things are more difficult in untyped languages. Consider e.g. Haskell's approach to transactional memory, which benefits tremendously from its type system.

Do you disagree that if something is made difficult by the language you're working in then you probably won't think of it?

I agree, but that cuts both ways. Obviously, most proponents of untyped languages fail to think of typeful programming (i.e. enforcing abstraction through types) as a powerful tool that is missing in their toolbox.

If you believe, as Sussman argues, that we have only started to explore the language design landscape, isn't it better to work with languages that don't impede this exploration?

I disagree with Sussman, because his argument is heavily flawed. Biological processes bear no relation to the intentional construction of software for specific purposes. Just like geological processes bear no relation to constructing buildings, no matter how much more powerful they are.

> (And the bigger your team and your project, the larger the yield will be on the typed side of the deal.)

Do you have any evidence for that?

Just ask some Google veteran. Lots of its engineers love Python, but too many Python projects had to be rewritten in a typed language because they became unmaintainable after they reached a certain size and age.

I disagree with Sussman,

I disagree with Sussman, because his argument is heavily flawed. Biological processes bear no relation to the intentional construction of software for specific purposes. Just like geological processes bear no relation to constructing buildings, no matter how much more powerful they are.

This is a valid argument, but I was more interested in Jules' broader question: "have [we] only started to explore the language design landscape, isn't it better to work with languages that don't impede this exploration?"

I would say yes. For example, principal typings are an attempt to have your cake and eat it too. The point being compositional type inference leads to systems with greater points of variation with stronger guarantees about the types.

Why view Sussman's argument as divorced from type system research. Why not ask 'What sort of type system would fit his vision well?' So much hot air discussed so far over static vs. dynamic instead of fun questions we can create from such polemical perspectives. I really want to know how machine learning could benefit from type theory, since I am interested in a related problem domain (control theory).

Sussman's question is one about designing dynamical systems. I believe type theorists like Bob Harper have designed dynamical systems (I believe he won an award for this work, but never looked at it). What sort of powerful equations exist today for designing such systems?

Machine Learning

Z-Bo: Why view Sussman's argument as divorced from type system research. Why not ask 'What sort of type system would fit his vision well?' So much hot air discussed so far over static vs. dynamic instead of fun questions we can create from such polemical perspectives. I really want to know how machine learning could benefit from type theory, since I am interested in a related problem domain (control theory).

I think this is exactly the right approach/question. Right now, the strongest connection between machine learning and type theory (which is, after all, higher-order logic via the Curry-Howard Isomorphism) is via inductive logic programming. By the Curry-Howard Isomorphism, "inductive logic programming" is type inference. It's interesting to compare this with Kanren's "type inference" example, which, by being a first-class relation, can be treated as a type-inferer, a term-inferer, or a proof assistant, depending only on how much background information ("how much of the type") you give it in the first place.

Of course, most of what we call machine learning is probabilistic (I refuse to use the word "statistical," since there is no such thing as statistics, only people who don't understand probability). On that, I can't do any better than quoting E.T. Jaynes (at some length; sorry about that). From "The Ap distribution and rule of succession" in Probability Theory: The Logic of Science:

So far, all the propositions we have asked the robot to think about are "Aristotelian" ones of two-valued logic: they had to be either true or false. Suppose we bring in new propositions of a different type. It doesn't make sense to say the proposition is either true or false, but still we are going to say that the robot associates a real number with it, which obeys the rules of probability theory. Now, these propositions are sometimes hard to state verbally; but we noticed before that if we give the probabilities conditional on X for all propositions that we are going to use in a given problem, we have told you everything about X which is relevant to that mathematical problem (although of course, not everything about its meaning and significance to us, that may make us interested in the problem). So we introduce a new proposition Ap, defined by P(A|ApE)p where E is any additional evidence. If we had to render Ap as a verbal statement, it would come out something like this: Ap ≡ regardless of anything else you may have been told, the probability of A is p.

Now, Ap is a strange proposition, but if we allow the robot to reason with propositions of this sort, Bayes' theorem guarantees that there's nothing to prevent it from getting an Ap worked over on the left side in its probabilities: P(Ap|E). What are we doing here? It seems almost as if we are talking about the "probability of a probability..."

Rather than trying to think of a "probability of a probability," we may think of two different levels of reasoning: an "outer robot" in contact with the external world and reasoning about it; and an "inner robot" who observes the activity of the outer robot and thinks about it. The conventional probability formulas that we used before this chapter represent the reasoning of the outer robot; the Ap density represents the inner robot at work...

The idea of a nested hierarchy of robots, each thinking about propositions on a different level, is in some ways similar to Bertrand Russell's "theory of types," which he introduced as a means of avoiding some paradoxes that arose in the first formulations of his Principia Mathematica. There may be a relationship between them; but these efforts at what Peano and Poincaré called "logistic" made in the early 20th century, are now seen as so flawed and confused—with an unlimited proliferation of weird and self-contradictory definitions, yet with no recognition of the concept of information—that it seems safest to scrap this old work entirely and rebuild from the start using our present understanding of the role of information and our new respect for Kronecker's warnings, so appropriate in an age of computers, that constructibility is the first criterion for judging whether a newly defined set or other mathematical object makes any sense or can serve any useful purpose.

There you have it. E.T. Jaynes, who was a physicist, not a computer scientist, understood the relationships among logic, probability, and information theory. He clearly saw a role for constructive type theory and a form of type inference in probability theory. He would have thoroughly enjoyed Foundations of Inference and no doubt would have come up with applications of it.

Expressiveness

I disagree that type systems take away expressiveness. On the surface it would seem a restrictive type-checker may disallow a proper expression, and thus take away expressiveness.

But on the other hand type annotations -- another facet of type systems -- provide expression of intent, and so add to expressiveness.

And clearly type systems can do a lot more, for example overloading relies on typing for dispatch, whether the overloading is open as in C++ or encapsulated in type classes as in Haskell: surely this is adding to expressiveness precisely by imposing constraints?

Furthermore, related to this, what would be a lot of distinct routines handling common cases in optimal ways can often be compressed into a more compact piece of more generic code, relying on the compiler to use the type system to generate optimised cases when required. Surely the more compact code is more expressive, and could not serve well were it not for type driven compiler optimisations?

Consider:

double a = 1.0;
int b = 2;
cout << a + b << endl;

Given the need to convert b to a float before performing the floating addition, surely the last line is more expressive than:

cout << a + (double)b << endl;

which can be seen by considering a change:

long b = 2l;

FWIW in my Felix compiler so much focus is on typing as a way of selecting and optimising that the checking aspect is more or less irrelevant. We don't just check which function could be applied but are required to rank them.

It would be good to consider that one of the primary extensions to C by C++ was the addition of classes, which is major enhancement of the type system. Again, virtual function polymorphism is not primarily subject to type checking (a constraint) so much as using type information to actually drive the run time dispatch. The dynamic isn't constrained by type systems so much as it is enabled by them. You do not so much check that D is derived from base B as use that information to construct virtual tables.

I give this example to show that here, improved type systems are driving improvements in expressiveness.

Roughly: the more restrictions that are imposed on you, the more structure the rest has, and the freer you are to use it without worry.

I'm going to take a

I'm going to take a different tack, and suggest that type systems don't limit your thinking, but they might interfere with invention for two reasons:

  1. Instead of focusing on one problem, ie. inventing useful abstractions, your attention is now diverted to two problems, ie. the former and making ones type system more expressive so it can support more existing use cases. Any effort spent on the latter is clearly time away from the former. Sometimes solving typing problems leads to new invention, but sometimes not.
  2. Users of your language will invariably be people that are interested in typing, so there is a bit of a selection bias in that the problems they bring to your attention will very likely relate to typing.

type diddling vs. his examples

His vision example might make it easier to understand why diddling with type systems isn't um... necessarily a priority for the kind of problem he's talking about.

Out of something very simple -- like his constraint system or like a brain cell -- we should be able to build up components, like the functional units of a brain -- and eventually a whole computational vision system with capabilities comparable to our own. We have little idea how to build that kind of computation, today, and certainly no strong sense of what the programming language(s) will look like.

But we figure there will be components that get combined using some style of composition, that this is at least kinda recursive so that compositions are then a new kind of component for higher-level combinations ..... that'd be some kind of programming language.

How do we type these compositions?

Well, in the case of vision, the problem of optical illusions and other optical brain bugs suggests we might not be able to and shouldn't obviously bother trying. Can the natural vision computer diverge? yield wrongly typed results (e.g., trigger a seizure)? give nonsense results (m.c. escher)? It seems so. We don't have any firm grasp about how to type these compositions. If I show you a "circuit diagram" of these computational elements and you point out 7 different ways it can produce nonsense -- that doesn't in and of itself tell us whether or not this is a good program, at least at the current stage of understanding..

That's not a bug, that's a feature

Anaszki's triangle illusion is comes from a property of our vision system, that allow us to recognize the two parts of a lion laying behind a tree as belonging to the same entity. This is, by the way, a 'feature' we acquire; it has been shown that babies don't get it.

Anaszki's illusion can be viewed as a 'bug', but on the other hand this is a bug on an extremely useful feature; it seems to me that, if we can produce an artificial vision system, it being tricked by optical illusions should be understood as an achievement.

But on the other hand, I have doubts on the practical usefulness of artificial systems that reproduce the bugs and problems of the natural systems they copycat. What's the point of an autopilot that may misbehave exactly like a human pilot?

So how do you test your feedback system is robust and...

...not susceptible to Anaszki's triangle illusion? If you can't, this probably suggests something very deep about the connection between logic and computation.

In other words, most machines that use vision have some learning ability to adapt to their environment. With a dynamic environment and open feedback, how does the machine guard against Anaszki's triangle if it is a bug? How do we guarantee the machine doesn't learn this bad feature of human vision? Or can we at least give an upper bound on the error?

Put simply, calling it a bug is not enough. We need to know how to prevent the bug. Are there predators in a biological ecosystem that are very good at seeing prey regardless of cloaking effect? Or prey for predators? Oliver Saks book Anthropologist from Mars is full of fascinating stories about 0.000001% of humans with unique 'features'. Such as chapter 1, The Case of the Colorblind Painter.

that's getting at it

Thank you Z-Bo! I was having trouble articulating something and you just made it way easier.

Constraint programming?

What Sussman seems to talk about in this lecture seems, to me, to be entirely within the constraint programming school. While general purpose continuous domain constraint programming is an unsolved problem, general purpose finite domain CP seems rather useful and tractable in many domains - ex: the idea, in Oz, of "computation spaces" that run concurrent propagators that enforce constraints between finite domain variables is a pretty darn neat idea.

If I'm missing some key point he's making that isn't covered by current CP practice, please correct me.

Edit: I'm referring to the "little machines that read and add information to cells" section of his talk.

Constraint solving is just

Constraint solving is just one instantiation of the idea. The point is not, I think, any instantiation, but rather the general framework as a basis for programming. Not even that really, but just that work should be done on different ways of programming, instead of minor variations on functional/imperative programming.

Agreed. I think he's trying

Agreed. I think he's trying to tell us that, since we "don't know how to compute" (very complex systems), we can't tell a computer how to do things.

We should be telling a computer (or a huge set of small cell-like computers) *what* we need to compute, but not *how* to do it.

So yes, functional & imperative languages serve to program those little computers one by one. But more complex systems (built with those little computers) require new languages and paradigms (such as constraint programming).

Related: A survey of Programming Languages and Platforms for Multi-Agent Systems (PDF)

Classic

This is classic Sussman and classic Lisp for that matter. It makes more sense if one follows the slides under the video.

Sussman's argument, if one

Sussman's argument, if one accepts it as such, is basically that we don't know how to compute because--look, over there, a self-replicating biological system that we don't understand! That's misdirection. Biological systems weren't created by a conscious process. There isn't any mind that _does_ understand them.

And the whole argument about our complexity fitting into a gigabyte....so what? If anything, natural selection has been so extremely successful at optimizing the information density of our DNA, to the point where the "higher level language", or whatever it is that Sussman is seeking, is so thoroughly obscured by the unconscious, unplanned, and accidental reuse of fragments of DNA and proteins that I don't even think that 1000x could reveal a "higher level structure" anymore; there just isn't a higher-level structure. I'd also like to point out that our DNA doesn't encode the molecular structure of amino acids nor the differential equations (that we use as a model) of the laws of physics. DNA doesn't encode the binding energies of atomic bonds nor the freezing point of water. DNA operates on a completely different machine than our computers.

I find these biology/computation metaphors extremely irritating for this reason. Molecular dynamics is not a hundred times more complicated than computers. It's not even 2 or 3 or 4 orders of magnitude more complicated. It's more like 10 orders of magnitude (10 billion times) if not 15 (1 quadrillion times) more complicated. And DNA gets that complexity "for free" by existing in our universe and running on the molecular machines on which it does.

Maybe the example is important

People seem to have become quite hung up on the specifics of the video, but the general thrust is quite interesting and it does seem to produce a coherent argument. We've started out by looking at machinery for pushing symbols around, we've spent eight decades exploring at what we can do and we seem to be reaching diminishing returns without stumbling upon how those self-replicating biological systems actually do their thing.

One of the nicest sections was near the beginning: (paraphrasing) is correctness really that important? does it really matter if we get something wrong - what about a different type of criteria, what if we look at systems that don't try to get the correct answer, what if we settle for something that is reasonable?

If anything that is the part of his argument that seems most applicable to the languages crowd, and it is the one that seems most directly relevant to biology. Sure, we don't know whether or not there exist any sentiences that could understand the complexity of a biological system - but we know that we can understand the local rules. We don't know if there is any analogue to a higher level language for DNA - but if we can understand the effects of particular DNA strings on the proteins that they construct then it still raises the same question: could there be a more structured / abstract description of the process?

Biology is generally a bit irritating. Things are messy and complex, yet they arise from some simple systems. That huge leap in complexity from computers to molecular dynamics arises in the effects and visible outputs from very simple processes, not the original descriptions. What domains have we not examined yet that behave radically differently to those that we have constructed languages for? Maybe there is something interesting in domains that are messy yet do reasonable things.

cyborgs

Biological systems are emergent: they evolved through millions of years of trial and error where good results are filtered via natural selection. Such a design process is quite different from what we are capable of, which is based on concrete goals and very short time frames (relative to the time that evolution needs to work). We need control, otherwise how are we going to get our desired result? Biological construction ultimately results in stable (eco)systems, which mean very little to us individually. We shouldn't throw away our incredibly-focused/non-emergent/but-very-limited symbols just yet.

We should try relinquishing more control and just letting things happen. In AI, machine learning and local rule-based systems are seen as more effective than symbolic systems, but of course a symbolic method will always tempt you with the answer you want rather than some random answer you might not want. We could perhaps continue on cautious path where we mix symbols with biological emergent techniques; starting with cyborgs (biological/machine hybrids) rather going straight to engineering life (in this case, we hit singularity anyways). The analogy works quite well for software; i.e., cyborg software combines symbolic programming with some emergent techniques.

Not all biological systems are highly emergent in that sense

Sometimes a biological organism can find a optimal solution to a problem very fast. Evolution is as much about change as it is about persistence. I was reading about a micro-organism that was immune to DNA damage due to storing its DNA in concentric rings, and that this design was of great interest to 'biological computing' and potentially gaining insights from this micro-organism, such that we can apply it to long-term disk storage (edit: think of solving The Write Problem with flash memory). The most astounding thing is that its ability to keep its DNA in tact also likely means its DNA has not changed for millions of years.

't Hooft is on the path of

't Hooft is on the path of describing quantum mechanics with cellular automota. So I both agree and disagree with your argument. I disagree because the fundamental automota underlying QM could be quite simple, but I agree with you that on top of this simple substrate is an infrastructure that is amenable to mutation and adaptation which computer systems currently lack.

I think adaptive/self-healing and live systems will indeed be the subject of current and future research.

My impressions of this thread...

I have read this thread, and was intrigued by the biological comparisons. How come we can't express as much in code as genomes can for animals?

A number of things strikes me. There was a comment (sorry for not remembering who...) that programming language is an interface between human and computer. Genomes has a quite different purpose. Something that I also have asked myself before is what can genomes really encode? What kind of animals would be possible? Could it create a space ship? There are SF films implying this could be possible... But anything is possible in SF films.

Another thing that makes me wonder about the validity of the comparison: Genomes encodes physical beings. Microsoft Windows isn't physical, at least not as far as I've seen. (Luckily enough...)

Having said that, I do feel somewhat frustrated over current programming languages. Since I mostly use C, I guess you know why... But even though languages like Haskell might be better, I still feel a lot more could be done. The question is how.

So yes, sure we can get better languages. And frankly, I don't think static vs dynamic typing is what will produce the new revolutionary language. Nor any other ideas currently discussed. I think it would need a new way of thinking. Like what functional programming did to imperative programming. A few more inventions like that and we would soon have a nice toolbox from which we can build a revolutionary language...

People erroneously believe

People erroneously believe that evolution is "goal directed" or that we can design arbitrary "fitness criteria" that evolution-based "computation" must eventually meet. These are total fictions. There is no "search space."

In reality, the only "fitness criteria" is sustainable reproduction under current environmental constraints. There is simply random mutation and survival via reproduction - full stop.

There is no other "right answer." Semantically, "goal" is a complete misnomer. The "unified field theory" of evolution is only the uniform, static state of "gray goo" posited by nanotechnology theorists. That is the end-state of evolutionary "success".

Now, perhaps some very artificial "environment" and "genome" might be constructed to efficiently explore some known and specific search space for "solutions." But in the general case, it is not clear a priori, aside from a limited set of problem domains, that this will require any less effort by human engineers than the careful, painstaking and deliberate design of formal systems to solve the same problems.

People erroneously believe

People erroneously believe that evolution is "goal directed" or that we can design arbitrary "fitness criteria" that evolution-based "computation" must eventually meet. These are total fictions. There is no "search space."

Once you translate "adaption to an environment" into "value of a fitness function" you get a rather straightforward translation of Darwinian evolution. Also gathering (computing) resources under competition is a necessary condition for reproduction. Evolution is causal after all, not just a particular way to talk ( produce tautological phrases ) as social constructivists are used to think.

No, evolution is completely

No, evolution is completely reactive - there is absolutely no teleology (goal) to the process. An ever changing environment "responds" to a series of completely random mutations. What's more, there are complex interactions between genomic "products" (plants, animals, in-betweeners, bizarre outliers) and the environment, most obviously genocide of other species, but also other far more subtle interactions (say, lichens - itself a symbiotic creature - slowly producing soil from rocks).

At any given time-slice, it's still an insane, very arguably incomprehensible Rube Goldberg machine - both in the crazy multi-indirected gene expression which constitutes an individual genomic product and in the interaction between genomic products' live and inert (but ever and randomly changing) environment.

Evolution may or may not be causal depending upon the ontology of one's philosophy of science. But in either case, it is not purposeful. Over time, most species simply die out for any number or reasons, typically an environmental change, rather than evolve. If we want to use species and environment as a metaphor for answers to questions, we might consider them short lived wrong answers to short lived meaningless questions (aside from physical laws that don't change).

OTOH, rough metaphors have indeed inspired much innovation in the arts and sciences, so who knows.

lessons from evolution

Fwiw, the analogy with evolution suggests to me two possible techniques to improve software construction:

  • Each time a programmer writes software with a bug in it, flip a fair coin. If the coin comes up tails, shoot the programmer.
  • Spend a hundred million years debugging the code.

Speaking as a programmer, of these two I would prefer the latter.

The important take away is

The important take away is that evolution probably acts too slow to be useful for guided construction. Rather "technology" and "engineering" emulate nature in ways that we can control, even if they are not as powerful.

evolution too slow to be guided

The important take away is that evolution probably acts too slow to be useful for guided construction.

Here is how I picture you while you write that:

Snacking on an apple, a cat in front of the fire and a dog at your feet, your steak grilling in the kitchen and the water boiling in anticipation of your corn. On the table next to you a pre-print about recently bred (not GMO) algae that might be helpful making biofuel. Coursing through your veins a live culture vaccine -- which upsets your stomach a bit so you better have some yogurt. I'm not sure whether you'll enjoy cannabis, tobacco, or a nice hoppy beer but at least you have the option.

None of this is to suggest that literally breeding organisms is an effective computational method but it is to point out that the *kind* (in, sure, a vague sense) of computation that evolution is doing can very much be used for guided construction.

I agree with that. I

I agree with that. I mentioned in another post that we probably want to combine quick guided construction techniques with evolutionary fitness function techniques (presuming we can guide the fitness function!).

Modularity is the big problem in evolutionary computing, we can using machine learning for some simple behaviors, but it is difficult to combine the learned behaviors together. Once we figure that out, I think we could do some really cool things with it.

Note that genetic algorithms

Note that genetic algorithms don't really work. This has been my personal experience, but just read this paper where two big genetic algorithms researchers try to find a *problem* that genetic algorithms solve better than simple hill climbing. It was long thought that GAs are better at solving a particular artificial problem that was designed to show the advantages of GAs. Well, it turned out that even on this problem, hill climbing outperforms GAs by a factor of 10. After a couple of tries of changing the artificial problem to be more well suited for GAs, and optimizing the GA while not optimizing the hill climber (choice quote: "where the number of crossover points for each pair of parents was selected from a Poisson distribution with mean 2.816"), they barely succeed: the GA is a factor of 2 faster at most.

Our claim is that in order to understand how the GA works in general and where it will be most useful, we must rst understand how it works and where it will be most useful on simple yet carefully designed landscapes such as these.

They do still try to pass this off as a success instead of concluding that GAs simply don't work; randomized hill climbing is both simpler and faster. So read the paper but draw your own conclusions. I hope I didn't just piss off a 2nd subfield of computer science in this thread...

Machine learning works

Machine learning works really well for some fields, evolutionary-based training also can work well. Here is a flash demo of using genetic algorithms to generate a working car, seems very reasonable. Walking, speech recognition, etc...are other areas where genetic algorithms can work. They aren't general by any means, but to say that they don't really work at all is too pessimistic.

They don't work in the sense

They don't work in the sense that they're a complicated modification of hill climbing (relative to hill climbing), but the modification just hurts. I definitely agree that machine learning works; GAs just don't. Speech recognition is done with HMMs, not GAs. There are some papers that do feature selection with GAs, but to my knowledge these papers don't check whether hill climbing works better.

It's from 1993

There has been some new directions in evolutionary algorithms. Ken Stanley's novelty search is one: Rewarding different behaviors rather than a target goal seems more likely to perform well for non-convex problems. My reasoning being is that convex problems's necessary conditions for optimization are also sufficient. The same conditions are necessary in non-convex scenarios, but they are not sufficient. The demands are far greater, and there is no explicit requirement for a unique answer like the tangent to a quadratic curve.

It would probably be cool if you could back-up your opinion by taking the author's most recent work towards Automatic Programming and do it better using better understood optimization techniques.

Finding the maximum distance

Finding the maximum distance of N points on an M dimensional manifold seems to characterize novelty search quite well and why is it not an optimization problem? Isn't that the "random restart" part of "hill climbing with random restarts" and isn't solving it efficiently exactly the reason why heuristic search techniques such as GA are used?

I thought the major benefit of GAs and related techniques is that they don't waste much resources in a detailed examination of unpromising regions, something that random restart + hill climbing does not address at all because hill climbing is this detailed examination. I wonder why this should be wrong and I'm hard to convince by hand-waving arguments.

Can you rephrase?

Sorry, I understood your general observation, but am not sure what your point is.

As I understand GA, the primary reason hill climbing has been more effective is that we apparently still don't know have robust theorems for schemas, and how we can take a schema theorem and compare its effectiveness to hill climbing. Also, the reality is that many GAs waste a lot of resources in detailed examination of unpromising regions - something novelty search tries to overcome. Within GA, as I understand it, the way to overcome repeated evaluation of unpromising regions is through a good schema theorem that can cultivate robust building blocks (as in, favorable gene expression sequences).

Even if GA and hill climbing were equal, the fact hill climbing is an anytime algorithm makes it more robust. It can approximate an answer in the same way tunneling to find an extrema can.

Whether we have computational theories to compare a given GA to hill climbing and prove dominance is a separate matter. I have no idea there, either.

Sorry, I understood your

Sorry, I understood your general observation, but am not sure what your point is.

Hill climbing just doesn't solve the problem which is addressed by global search with an unknown number of optima. So why comparing it with GA at all? Also alternatives like random search don't tell me which variance to choose for my random distribution but that's the interesting part. ES by Rechenberg addresses this problem.

I probably don't understand novelty search because I don't see why it isn't just another objective that can be implemented for arbitrary heuristic search such as GA, great deluge, ES, random search etc.

Whether we have computational theories to compare a given GA to hill climbing and prove dominance is a separate matter. I have no idea there, either.

When the problem domain is that of a "general function" one can only apply measure theoretic arguments and I suspect all but a zero set of functions don't show any pattern. They are randomly distributed sets of spikes whose magnitude is normal distributed. If this is true it should follow as a corollary that no algorithm could perform better on most objective functions than random search. Should we encourage using random search for that reason?

Probably not. So classically one studied the properties of the objective function to develop an adequate algorithm. Those classes were relatively small though and one began to develop heuristic search for much wider classes. But what is a typical problem of practical interest then?

It is well known that no

It is well known that no heuristic search can do better than brute force search in the general case (no free lunch theorem). We should evaluate heuristic search procedures on practical problems it solves. GA's fail in that regard, while random restart hill climbing works relatively well on a large class of problems.

Here's a quote from the

Here's a quote from the latest paper on automatic evolutionary program repair:

We have conducted some experiments using a brute
force algorithm (which applies simple mutation operations
in a predetermined order) and random search (which ap-
plies mutation operations randomly without any selection
or inheritance of partial solutions). Both these simpler al-
ternatives perform as well or better than the GP on many,
but not all, of our benchmark programs.

They didn't even compare to hill climbing. They just compared with random search and brute force search.

Good point.

Thanks for sharing.

How did random search

How did random search preserved syntax? I mean when you don't need syntax preserving change operations such as crossover which are free because one doesn't have to apply another check for containment of a solution in the target domain there is no point in selecting GP.

By picking mutations from a

By picking mutations from a probability distribution with a support of valid programs? Exactly like the mutations done by GP? Hill climbing is essentially GP with crossover turned off and a population size of 1.

Ah, supporting valid

Ah, supporting valid programs is trivial. Than that's of course trivial. Thanks.

No No. NoNo. NoNoNo

No

No. NoNo. NoNoNo ...

Evolution is completely reactive - there is absolutely no teleology (goal) to the process. An ever changing environment "responds" to a series of completely random mutations.

When Darwin traveled around the world he observed that biological properties of species were fitting environmental constraints. So instead of a breeder who actively selected properties the environment did the same passively. When designing algorithms one simulates simple models of both species and environment and observes an optimization process which can be causally understood i.e. there are initial causes and a dynamics instead of final causes and a meaning.

This is not a philosophical viewpoint, it is not metaphoric and the environment is neither reactive nor event-driven. At this stage it is a simple model with explanatory power, not a world view which has to compete with religion because of an accident of history.

Sure a Black Metal or Dark Ambient version of Gaia is somehow cool, this panoramic view of a nihilist God who exists beyond space and time and observes a sick and random world. However a new scientifically informed priesthood is uncalled at this point.

Sorry, I didn't mean to

Sorry, I didn't mean to anthropomorphize "the environment." I just meant to say that genetic mutations happen or not, as the case may be. The universe is indifferent. When it comes to "life," the old saying "nature abhors a vacuum" is entirely false. "No life at all" is a perfectly acceptable and, in fact, prevalent default.

Then, when I said that the environment "reacts" (yes, potentially a kind of deep grammatical mistake) to random genetic mutations, I just mean that the current, local state of the environment, in the broadest sense, determines whether or not a given mutation survives - nearly always not, in fact. Again, "species death" and even "no life at all" are commonplace satisfactory and actual "evolutionary phenomena."

It's so remarkably bland, tiny and inconsequential phenomena of the natural universe, it's surprising that there is life at all. My ontology is a "desert landscape" - certainly no Gaia nor unicorns sliding down rainbows. Sorry if I were insufficiently clear.

Oversimplifying

A biased die can produce accurate information. Random mutation and survival via. reproduction is a mechanism, which may even be essentially meaningless at an individual level that does not mean the result of the computation collectively is meaningless.

Assume for a moment that I'm thinking of a number between 1 and 100. Assume that you can ask me "is your number X" and I'll tell the truth 60% of the time and lie 40% of the time (independent rolls for repeated questions).

With 10 asks you not going to figure out the number.
With 100 asks you can get lucky with an initial yes and then repeat ask.
But by 10,000 asks you can get my number every-time with essentially 100% accuracy.

The mutations may be random (and even that is questionable) but the survival creates intelligence in a collective sense.

typing problem stated simply

There is, empirically, a class of naturally arising computing systems whose capabilities we're far from matching with purposefully synthesized systems. Those natural systems have features like being massively parallel or being self-constructing.

We suspect there may be ways to harness that computational ability and build synthetic systems, including programming languages, to take advantage of this capability observed in nature.

We don't, a priori, know what, if anything, static analysis will be able to tell us about these kinds of programs. It is possible we will be able to build useful and reliable systems without anything at all resembling today's static analysis. It is possible that conventional static analysis will be a practical impossibility for these systems forever. Therefore, to explore the possibility of building these systems, we should not waste all our time diddling with type systems until we better understand how these systems can be built.

"Lambda glue" and dynamic DSLs are a handy way to begin to explore the possibilities.

The problem is that QA

The problem is that QA occurs via natural selection. Any synthetic system will be an approximation of nature, and should be more amenable to analysis than the real thing. Nature is amenable to analysis, but it is usually statistical, not propositional.

Flexible machines

When it comes to building "flexible computers" the whole debate about static analysis is misplaced.

We know already how it works: an unknown LISP programmer came up with a simple solution to a problem in the 1970s at the times when Sussmann was a young guy and now 40 years later everyone in language research flocks on the same problem and proudly presents the "type safe solution". It's a game play and it bores me but I understand the value system of those who play it and accept it. It gives researchers lots of puzzles to solve and they are happy and write papers and the world is fine.

But it is still a different game than building "flexible machines" which requires intuitions and approaches not much trained in the past decades. I'm sure whatever we'll find will become "type safe" a few decades later or even faster but that's an afterthought. The instruments of exploration are mostly a matter of taste and skill of individual researchers.

Handling imperfection in programming languages

When thinking about why the brain seems to be able to so much with so little effort, while our biggest computers can't do nearly as much, I realized that the brain cheats. You could call it laziness, but it is very effective. If you can solve a problem without really calculating it but instead make an educated guess and live with the errors. Works as long as the errors aren't too big.

Imagine a programmer would like to reason the same way. Sure, I want my application to work. But if it behaves a bit badly from time to time, make the best effort to patch things there and then and continue as nothing wrong had happened.

There are several things involved here:
1) Use of knowledge. Past knowledge is used to solve future problems. How is knowledge coded? How is expanding this knowledge coded? Kind of "Animal" program I wrote in BASIC when I was a kid, but a lot more advanced of course. The code is describing what kind of knowledge that is of interest, and how to make use of future information.

2) Fudgy thinking. Nothing is precise. Some kind of fuzzy logic on steroids would be needed.

3) Error handling which tries to patch the error in a way that the program can continue as though no error has occurred. Kind of "things go bad sometimes and no one will ever know why, just patch it and go on with your life". If there's a lesson to be learned, it would be nice to learn something from it though. The code needs to deal with categorizing this, when is the error so strange and intermittent that we try to ignore it, and when is it intrusive enough to force the program to learn from it?

How's that as a starting point for a new language?

Sounds like neural networks.

Sounds like neural networks.

Or machine learning in

Or machine learning in general.

Also see probabilistic languages.

Neural languages...

I actually browsed around Internet trying to find a new cool language inspired by the latest neural science. Haven't find anything so far, it seems like neural researchers uses common languages.

Anyone who happens to know a language specifically designed to easily implement neural reasoning?

no, but sounds interesting

Mats: Anyone who happens to know a language specifically designed to easily implement neural reasoning?

No, but I could have missed it. Why does a language informed by neural science seem interesting? (I don't ask to suppress your interest; I think there may be value in your unstated reasoning, so I'm hoping you'll share. Asking directly seemed clear.)

Do you mean high or low level reasoning? By neural I assume you mean low level, like what occurs in neural nets, in contrast to high level reasoning via, say, frame-based models. Both approaches are essentially pattern matching at different scales. Neural nets are good at saying yes or no when asked if a new data set matches a pattern learned early during training from sample data sets.

I had a neural network class in college over twenty years ago, and was disappointed. It seemed like a one trick pony and made me ask, "Is that it?" I suspected general questions about reasoning were avoided because it was easy to pursue a simple mechanical model yielding consistent results in narrow problems. Given a focus on batch-oriented numerics, Fortran would work fine for neural networks.

In contrast, frame-based reasoning aims to build models of relationships between elements of components that might appear in a situation. When a new input situation is provided, you perform some match-making to look for related schemes you know about already. This approach seems highly dependent on researcher cleverness in defining models. (In college I had an instructor whose PhD was in this area at UCLA, and I asked a lot of questions because something sounded plausible.) I suspect Lisp was common for that sort of thing.

I'd expect no value for you to leverage in neural reasoning research, if you're looking for a nice result in programming languages. But there's a chance whatever you're thinking is already at least as clever as what you'll find by looking.

I think R is popular for

I think R is popular for neural networks, which is Scheme-like and focuses on statistics.

Anyone who happens to know a

Anyone who happens to know a language specifically designed to easily implement neural reasoning?

NNs can be easily implemented in almost any language, but calling for a "neural semantics" at the foundations of a language would surely be fascinating. Whether or not it becomes "esoteric" or just accidentally unusable or academic "paper work" isn't even important at this point. I would surely encourage creative work in this direction.

I can't for the life of me

I can't for the life of me find it, but there was a paper in the past year or two suggesting a completely different approach to computer architecture. It was organized around a more cell-like model, where you have a grid of computing cells connected to neighbours via 1 MBps interconnects. The cells themselves were modest processors, with some local cache I believe, and the idea was to push computation into the network in a massively parallel way, that naturally also accommodates failure and "self-healing" hardware via routing protocols.

Quite a different way of approaching the problem, focusing on stacking small computing building blocks into smart fabrics and wires. This sounds like the start of what you're looking for, if only I could find the reference...

Edit: the CAM-8 from the early 90s also sounds interesting.

Maybe something like the old

Maybe something like the old Connection Machines? IIRC, there was a Connection Machine Lisp developed for them. Huh, I still remember the big black Connection Machine with oodles of red blinking lights back in college. Never personally frob'd one, though.

No, it was something more

No, it was something more like cellular computing or the continuum computer architecture.

SpiNNaker

Perhaps it was something on SpiNNaker. I can't get through the paywall at the moment to give you a more accurate answer, but there was a paper from Manchester within the past couple of years that sounds like your description. Failing that, it isn't a bad answer to Mats original question although they are looking more at architectures that permit good descriptions of neural networks rather than specific languages to make those descriptions in.

This is a very interesting

This is a very interesting field! Didn't realize so much work has gone into it already. It was probably a paper along the lines we've both linked to. I just remember specifically the 1 MBps interconnects between "CPUs", and how the cells can form chains and cell-like membranes from these primitive elements, eg. for filtering and exchange.

I think the "CPUs" were actually very, very simple, akin to an automaton, so much simpler than the ARM processors most of these papers use as a substrate.

Ivan Sutherland's Fleet architecture?

Ivan Sutherland's Fleet architecture?

That sounds like the closest

Fleet sounds like the closest so far, so that's probably it. Fleet is basically a "one instruction" computer, which emphasizes communication, which is costly these days, over logic, which is cheap these days. It does this by connecting logical elements, like ALUs, FPUs, etc. over a uniform switching fabric, and exposes a move instruction for that fabric.

Not exactly, but

There have been custom languages for controlling millions of autonomous nodes in a network. Peter Lee's Meld comes to mind. It is actually based on a DSL for overlay networks.

starting point

I think it is a great idea for a new language! You were looking for one, this would be wonderful. A system where probabilistic operators and variables are first class objects would be terrific. Very much like the way that math languages that add complex numbers or vectors in with easy polymorphism aid design.

If I could just do something like

a = from_normal(2,1) # a is a random variable taken from a normal distribution mu=2, sigma =1
b = 3
print a*b
print a+b
print a+a
print a*a
print a > 3

and get an output of
from_normal(6,3)
from_normal(5,1)
from_normal(2,sqrt(2))
from_product_distribution(normal(2,1),normal(2,1)) #would simplify for distributions like lognormal
from_boolean(.1687,.8413) # 16.87% chance of true, 84.13% chance of false

Something like that exists:

Something like that exists: probabilistic languages. They generally don't reason symbolically about normal distributions and such (but there's no reason they couldn't). Instead they have efficient inference algorithms (e.g. variable elimination, MCMC).

Effects

Your idea fits in the general framework of "usual programming with some funny implicit effects during computation" (here, probabilistic choice). It can be integrated quite nicely with rich enough existing languages, using various formalisms. See for example:
- PFP for Haskell
- HANSEI for OCaml

For such domain-specific semantics there is usually little value in building a whole language on top of it, it is more cost-efficient to embed the domain knowledge in a library for a sufficiently expressive general purpose language. See for example the Escape from Zurg story, where it turns out that a classic search+backtracking problem turned is as easy (or easier) for students to solve in Haskell than in Prolog.

PS: when those effect are embedded in a general-purpose language, you usually pay a syntactic cost to use them (even if, when possible, the do-notation or let!-notation are effective at hiding it); it is often possible to make this embedding implicit/transparent, see Lightweight monadic programming in ML.

partially

Good links. I guess we should find out if Mats likes this idea or not.

I agree these things are small steps in this direction. But I wanted first class objects. You can see with Hansei examples even with a simple boolean distribution how complex it got.

PFP comes closer to what I'm picturing above:

But mapD (uncurry (+)) (prod die die)
is not the same as
die + die
there is too much intellectual overhead for the programmer, they are being forced to get into the muck of the implementation details too much.

I don't see any reason in Haskell why you could create an instance num for uniform A
where A is a num
(+), (-), (*), (negate), (abs) are all easy to define for a uniform distribution of numbers. And things like fromIntegral are easy with 100% probability of a particular number.
Then you could do what I'm picturing for uniform or boolean distributions on top of PFP. I think first class is vital.

But... uniform is a rather simple distribution, the one I gave with normal is a bit more complex. Though streams and first class functions might allow you to avoid the symbolic evaluation and just return distributions (maybe a graph). Frankly though I think the symbolic evaluation is really useful in that it keeps the objects as simple as possible. And PFP isn't close to being able to do that example. As Jules pointed out in his comments, to do that example requires at least special purpose symbolic evaluator.

I would agree though that expanding on PFP is probably a good way to get your feet wet.

But I wanted first class

But I wanted first class objects. You can see with Hansei examples even with a simple boolean distribution how complex it got.

I don't understand what you mean by "first class" here. I'm under the impression that in both case, probabilistic values can be manipulated as any values in the language (abstracted over, passed to functions, etc.). Whether there is a static distinction between probabilistic ints and non-probablistic ints is a different story (but I think it's good to separate concerns).

I don't know which complex HANSEI example you're talking about. Could you provide more details ? Here is the "grass model" example in HANSEI, I don't find it particularly complex or troublesome.

let grass_model = fun () ->
  let cloudy    = flip 0.5 in
  let rain      = flip (if cloudy then 0.8 else 0.2) in
  let sprinkler = flip (if cloudy then 0.1 else 0.5) in
  let wet_roof  = flip 0.7 && rain in
  let wet_grass = flip 0.9 && rain || flip 0.9 && sprinkler in
  if wet_grass then rain else fail ()

There is a similar example in "Lightweight Monadic Programming": the following code,

let rain = flip .5 in
let sprinkler = flip .3 in
let chance = flip .9 in
let grass_is_wet = (rain || sprinkler) && chance in
if grass_is_wet then rain else fail ()

in their system, desugars / is translated into

bindp (flip .5) (fun rain->
bindp (flip .3) (fun sprinkler->
bindp (flip .9) (fun chance->
bindp (unitp ((rain || sprinkler) && chance))
(fun grass_is_wet->
if grass_is_wet then unitp rain else fail ()))))

I suppose such a system could make PFP-style programming just as easy.

I don't wish to dismiss your concerns, but I'm wondering whether they are about something deep and fundamental, or only something rather superficial. What do you have in mind that would be fundamentally better than those examples?

example of the problem

whether they are about something deep and fundamental, or only something rather superficial.

The symbolic stuff is fundamental. The first class is both superficial and fundamental :) Sugar matters. I don't want the details of the implementation leaking out. That example is cool, but you can already see the leakage with the constant "in" and the way they are bound tightly.


a = b * 13
b = 7
c = b+a

and the system resolves the order of computation. Even in systems that don't I can define b then a then c hundreds or thousands of lines apart. I don't have to specify their relationship with one another tightly. Making this structure go away is just standard DSL kind of work. But I think it absolutely has to be done.

The difference between a shell scripting language and a typical programing language is that in a shell scripting language files and directories are first class objects. It is easy to write a shell in almost any programming language, but until you've actually done it, you are still holding the filesystem at arms length. And the result of that is in practice in non shells people tend to treat variables in the files very differently than variables in memory.

To pick another example, what makes FP so cool is that functions are first class objects. It is easy in C++ given f(a,b,c) to create 6 functions which return function pointers f1(a)(b,c), f2(b)(a,c), f3(c)(a,b), f4(a,b)(c), f5(a,c)(b), f6(b,c)(a) and then invoke those function pointers. And you have the preprocessor of IDE create them automagically. But in FP that all happens automatically and as a result people actually do it naturally and frequently. People in FP languages don't think about the complexity of currying and partial application because they don't see it.

So put me down on first class as: easy to do & incredibly important (IMHO).

The in keyword is just how

Agreed that ease of expression is important, but the in keyword is just how OCaml's variables are defined. It has nothing to do with the probability DSL. You could remove the in's (like F# does); that would be an improvement to OCaml and not specifically to the probabilistic DSL.

The example would look like this transliterated to Python syntax:

rain = flip(.5)
sprinkler = flip(.3)
chance = flip(.9)
grass_is_wet = (rain or sprinkler) and chance
return rain if grass_is_wet else fail()

You mean that's not .5

You mean that's not .5 inches?

What do you mean?


I think this was a joke. I

I think this was a joke. I found it amusing.

I see now :)


Thanks. I understand what

Thanks. I understand what the "in" is doing each statement is being evaluated "in" the context of the next statement. Let-in is a fairly common syntax. I will admit I've never seen it nested that much except in exercises, I'm more used to seeing chains of binds handled via. the do construct. But then again I don't read much OCaml. I don't think you could remove the ins from the code, because of the monadic structure, you have to create that chain of binds. And that was really my only point about leakage, and the distinction between first class and 2nd class objects. The implementation as a monad is leaking through for his logical booleans while for example if cloudy then 0.1 else 0.5 doesn't have all that machinery.

I think we all agree this is easy enough to fix.

The Python syntax you gave is getting closer to what I would want. But even if your version you still have leakage with the fail...


cloudy = flip 0.5
rain = flip (if cloudy then 0.8 else 0.2)
sprinkler = flip (if cloudy then 0.1 else 0.5)
wet_roof = flip 0.7 && rain
wet_grass = flip 0.9 && rain || flip 0.9 && sprinkler
grass_model = {
force wet_grass;
return rain;
}

where grass_model is basically doing if wet_grass then rain else goto cloudy statement.

I'm not just objecting to functional programming, I'm objecting to implementation details leaking through. That IMHO is the distinction between a DSL and a library.

That last part is easy

That last part is easy enough to fix. Just define:

    let observe c = if c then () else fail ()

And then:

   let grass_is_wet = ...
   observe grass_is_wet
   return rain

Yep

Yep that works, assuming you can touch rain like that in the original.

Probabilistic language

The first time I saw this idea was Church.

Hadn't seen PFP before. Very cool.

Alternative proposal

Type 1 rules are rigid, precise and efficient. They act critically on data and reject most of them as junk, not properly typed etc.

Type 2 rules are uncritical, accept any data, store them and mine them for pattern.

Type 3 rules build type 1. rules based on pattern mined by rules of type 2.

Type 4 rules are higher order and manage the rule system, assign priorities to rules and replace rules of type 1 by rules created by rules of type 3.

Finally collapse the hierarchy: all rules are essentially of type 1. They are only distinct by their different responsibilities and their access to data ( a basic Lispy intuition ).

If we give up on the collapse we likely introduce a "programmer" who is not part of the system but takes over the roles of type 2 - type 4 in order to get rules of type 1. If we give up on the programmer we need "evolution", a "market" or a "teacher" because the system is closed and not critical. It doesn't know how to behave right or wrong, good or bad based on the myriads of possibilities of its inner dynamics. There is only meaning for the system and meaning of the system for others, no meaning in itself. That it needs to have meaning for us may be a limiting factor which is so severe that it will never be fully realized i.e. we prefer to recruit programmers after all who at best introduce flexibility by parametrizations or late choices.